Date of Publication :17th April 2018
Abstract: This paper discusses an approach of making a confidence scoring for phone duration in speech recognition. A confidence scoring mechanism is derived out of correspondence between a Hidden Markov Model(HMM) based forced aligner and a Multi-Layer Perceptron(MLP) based frame classifier. Phone duration for noise is also factored into the approach which makes it more reliable.
Reference :
-
- R. Dong and J. Zhu, “On use of duration modeling for continuous digits speech recognition”
- N. Sridhar Krishna, Partha Pratim Talukdar, Kalika Bali, A.G. Ramakrishnan, Duration Modeling for Hindi Tex t-to-Speech Synthesis System”,
- Hossein Hadian, Daniel Povey, Hossein Sameti, Sanjeev Khudanpur, Phone duration modeling for LVCSR using neural networks”
- Tanel Alumae, “Neural Network Phone Duration Model for Speech Recognition”,
- Dino Seppi, Daniele Falavigna, Georg Stemmer, Roberto Gretter, “Word Duration Modeling for Word Graph Rescoring in LVCSR