CN101118745A - Confidence degree quick acquiring method in speech identification system - Google Patents

Confidence degree quick acquiring method in speech identification system Download PDF

Info

Publication number
CN101118745A
CN101118745A CNA2006100891355A CN200610089135A CN101118745A CN 101118745 A CN101118745 A CN 101118745A CN A2006100891355 A CNA2006100891355 A CN A2006100891355A CN 200610089135 A CN200610089135 A CN 200610089135A CN 101118745 A CN101118745 A CN 101118745A
Authority
CN
China
Prior art keywords
voice
state
frame
speech
acoustic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2006100891355A
Other languages
Chinese (zh)
Other versions
CN101118745B (en
Inventor
董滨
赵庆卫
颜永红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Acoustics CAS
Beijing Kexin Technology Co Ltd
Original Assignee
Institute of Acoustics CAS
Beijing Kexin Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Acoustics CAS, Beijing Kexin Technology Co Ltd filed Critical Institute of Acoustics CAS
Priority to CN2006100891355A priority Critical patent/CN101118745B/en
Publication of CN101118745A publication Critical patent/CN101118745A/en
Application granted granted Critical
Publication of CN101118745B publication Critical patent/CN101118745B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The present invention relates to an improved algorithm to the confidence level of a voice recognition system, including: the pre-treatment of sub-frames; the pick-up of voice features of every frame voice; the likelihood probability p(xt/sj) of each frame voice in the graphic state is worked out according to state chart, acoustic model and the feature vector of the frame voice; the likelihood probability p(xt/sj) is stored in the light of frame number and state number; the state gets trimmed according to the likelihood probability p (xt/sj); the likelihood probability of an acoustic space and the general posteriori probability are calculated after trimming; the general posteriori probability of each acoustic element is worked out and regarded as the scores for the confidence level. In prior art, the search for acoustic elements is needed to obtain the acoustic element candidates, and then a second search is carried to calculate the confidence level by using a variety of acoustic models. The present invention is a synchronous calculation method which works out the confidence level by using the same acoustic model when a recognizer is in the course of searching frame in-phase beams, therefore, the search is done once, and the operating time of the system and the complexity of calculation are saved.

Description

Method for quickly solving confidence coefficient in voice recognition system
Technical Field
The invention belongs to the technical field of voice recognition, and particularly relates to a method for quickly solving confidence coefficient of a voice recognition system.
Background
The speech recognition system is used in natural conditions, unlike in ideal environments, where the performance of the speech recognition system is greatly degraded. Moreover, for real spoken language, many non-speech sounds, such as abnormal pauses, coughing sounds and many environmental noises, are mixed in the speech, which makes it difficult for the conventional speech recognition system to achieve the original recognition performance. In addition, if the words spoken by the user are not in the preset domain range of the voice recognition system, recognition errors are easily caused. In summary, for a commercial speech recognition system, the user's desire is to reject as much as possible the wrong speech, and the confidence score evaluation method is a good way to solve these difficulties.
The confidence evaluation method can carry out hypothesis test on the recognition result of the voice recognition system, evaluate the reliability of the recognition result through a threshold value set by tests, and locate errors in the result, thereby improving the recognition rate and the robustness of the recognition system.
At present, a two-pass calculation method is a method which is widely applied when calculating confidence. The input speech is first decoded in one pass by the recognizer, in which process a word graph or sequence of words corresponding to the input speech is obtained. The second pass of the calculation process is performed on the basis of the previously obtained word graph or word sequence, and a confidence score is calculated, as shown in fig. 2. In the two-pass calculation process, the used acoustic models are different, and the acoustic model in the second pass calculation of the confidence coefficient generally uses a full-phoneme model. Because two decoding passes are needed, the confidence coefficient is higher in calculation complexity, longer system time is needed, and the online use of the voice recognition system is not facilitated.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and comprehensively consider the calculation speed and the robustness, thereby providing a method for quickly obtaining the confidence coefficient by only searching once.
In order to achieve the above object, the method for quickly obtaining confidence level in a speech recognition system provided by the present invention comprises the following steps:
1) And inputting the voice to be recognized into the voice recognition system.
2) And preprocessing the input voice, wherein the preprocessing comprises framing processing.
3) And extracting the voice features to obtain the MFCC feature vector of each frame of voice.
4) Traversing all the voice frames, and calculating the likelihood probability p (x) of each state in the state diagram corresponding to the frame voice according to the state diagram, the acoustic model and the MFCC feature vector of the frame voice for each frame voice t /s j ) The negative logarithm is:
Figure A20061008913500051
wherein x is t In order to input the characteristics of the speech,
S j for the state of its corresponding Markov model, the model is the normal distribution N (μ) j Σ j); n is the dimension of the feature vector;
5) Storing the likelihood probability p (x) obtained in the step 4) according to the frame number and the state number of the current voice t /s j )。
6) Judging whether the current pointer points to a virtual node in the state diagram, if so, entering the step
7) (ii) a If the judgment result is no, pruning is carried out on the current state; the virtual node is a mark of the end of a phoneme in the state diagram;
7) Calculating likelihood probability sum of acoustic space after pruning
Figure A20061008913500052
Wherein D is * Is the set of all the states retained in the state diagram after pruning;
8) Calculating a generalized posterior probability of
Figure A20061008913500053
9) Respectively calculating generalized posterior probability of each phoneme
Figure A20061008913500054
Where N is the number of states that make up each HMM. Tau. b [j]、τ e [j]Respectively indicating the initial frame number and the ending frame number of the voice input data in the current state, wherein j is a state number; and taking the generalized posterior probability of the phoneme as the confidence score of the phoneme.
In the above technical solution, the preprocessing the input speech in step 2) includes digitizing, pre-emphasizing, high-frequency boosting, framing, and windowing the input speech.
In the above technical solution, the extracting of the voice feature in step 3) includes: and calculating MFCC cepstrum coefficients, cepstrum weighting and calculating differential cepstrum coefficients.
In the above technical solution, the pruning process in the step 6) adopts a pruning method based on frame synchronization beam search.
The invention has the advantages that only one decoding is needed, in the prior art, after phoneme searching is carried out to obtain phoneme candidates, second searching is carried out for calculating the confidence coefficient, and different acoustic models are used for the two searching.
Drawings
FIG. 1 is a flow diagram of one embodiment of a fast confidence score method of the present invention;
FIG. 2 is a schematic diagram of a confidence two-pass search calculation method of the prior art;
FIG. 3 is a schematic diagram of the state diagram of the present invention;
FIG. 4 is a schematic diagram of a state diagram of the present invention;
FIG. 5 is a schematic diagram of confidence synchronization calculation pruning based on a state diagram according to the present invention;
FIG. 6 is a ROC plot of the performance of the one-pass search method of the present invention versus the two-pass search method of the prior art.
Detailed Description
The invention is further described with reference to the following figures and specific embodiments.
Examples
As shown in fig. 1, the method for fast obtaining confidence in a speech recognition system provided by the present invention includes the following steps:
a) And inputting the speech to be recognized into the speech recognition system.
b) And (5) voice preprocessing. Mainly performs framing processing. In this example, the pretreatment was carried out by the following flow: 1. digitizing a speech signal at a 16K sampling rate
2. The high frequency boosting by pre-emphasis the pre-emphasis filter is
H(z)=1-αz -1 Wherein α =0.98
3. The data is processed by framing, the frame length is 20ms, and the frames are overlapped for 10ms.
4. And (5) windowing. The window function adopts a common hamming window function, namely:
Figure A20061008913500061
c) And extracting the voice features. The invention adopts MFCC (mel-frequency cepstral coefficient), a characteristic extraction method, and the specific flow is as follows:
5. calculating the MFCC cepstrum coefficient c (m), wherein m is more than or equal to 1 and less than or equal to N c
Wherein N is c Is the number of cepstral coefficients, N c =14
6. And (4) cepstral weighting. I.e. adjusting the weight of each dimension of the cepstral coefficients
Figure A20061008913500071
Weighted cepstrum coefficients
Figure A20061008913500072
,1≤m≤N c
7. First and second order differences of the energy features and the cepstral features are calculated.
The difference cepstrum coefficients are calculated using the regression formula:
Figure A20061008913500073
where μ is the normalization factor, τ is an integer, 2t +1 is the number of speech frames used to calculate the differential cepstrum coefficients.
8. For each frame, a 39-dimensional MFCC feature vector is generated.
The invention can also adopt LPC feature extraction method, which is the prior art and is not described again.
d) For each frame of speech, likelihood probabilities p (x) corresponding to each state constituting a phoneme Markov model are calculated for the frame of speech based on a state diagram, an acoustic model, and MFCC feature vectors of the frame of speech itself t /s j ) The likelihood probability p (x) t /s j ) Is inputting a speech feature x t Corresponding state s j Acoustic layer of markov modelsAnd (6) scoring.
The method for constructing the state diagram utilized in this step is as follows:
as shown in fig. 3, a word-based search space, i.e. a word network, is first built up according to the content of the task grammar, and the recognizer will search on the word network to find the best path corresponding to the input speech as the recognition result. Before searching, the network of words is expanded into a phoneme network whose minimum unit is a phoneme by means of the information of the dictionary in the recognition system. Each node is transformed from a word to a phoneme and each phoneme is then replaced by a corresponding markov model (HMM) in the acoustic model. Each Malkov model (HMM) is composed of several states, so that the final search space becomes a state diagram, as shown in FIG. 4.
In fig. 4, each node represents one state in a certain HMM. Any path in the state diagram represents a sentence or word candidate in the task grammar. In order to reduce the search space and the space required for storage, the state diagrams are merged, so that the final state diagram is obtained. In this process, each node is subjected to forward combining and backward combining. When forward combining, searching nodes with the same forward path and combining; when backward combining, those nodes with the same backward path are combined.
The method of calculating the likelihood probability for each state is as follows:
in the form of traversing all speech frames, when a frame of data enters the recognizer, the likelihood probability p (x) of each state corresponding to the current frame in the state diagram is calculated first t /s j ) The comparison of the accumulation of the likelihood probability and the state transition probability with the pruning threshold will be used as the basis for pruning. Likelihood probability p (x) t /s j ) Is inputting a speech feature x t Corresponding state s j The acoustic layer score of the markov model of (a), the negative logarithm of the acoustic layer score being:
Figure A20061008913500081
wherein state s j Is modeled as a normal distribution N (μ) j ,∑ j ) The specific value of which can be obtained from an acoustic model, x t Is the feature vector, mu, of a speech frame j Sum Σ j Are respectively state S j The mean vector and covariance matrix of the model of (2), n is the eigenvector x t Dimension (i.e. mu) of j Sum Σ j Dimension (c) of (a).
The acoustic model employed in the present embodiment is an acoustic model containing 5005 states, 16 gaussian models.
e) Storing the likelihood probability p (x) obtained in step d) according to the frame number and state number of the current voice t /s j )。
f) Judging whether the pointer points to the virtual node, if so, entering the step g); if not, pruning the current state.
In the state diagram used by the recognition system, each phoneme has a dummy node as a marker for ending. A phoneme is identified as long as the search pointer reaches a dummy node.
In the decoding process of the recognizer, the pruning strategy is implemented to improve the decoding speed and reduce the search space. In fig. 5, the solid dots represent the state of remaining after pruning, and the hollow dots represent the state of being pruned. As shown, when a state contributes too little to the appearance of an observation sequence (the observation sequence in this embodiment is a MFCC feature vector), the likelihood probability p (x) of the state for the observation sequence is t /s j ) If the current state is less than the preset threshold value, the state is cut off. In this embodiment, a pruning strategy based on frame synchronization beam search is used in the decoding process. The search strategy employs a conventional viterbi algorithm. In this embodiment, the pruning threshold is set to 200, and the pruning standard is as follows: taking the logarithm value of the probability of the current frame speech for each state,probability pair with current positionThe maximum value of the numerical values is compared with the value obtained after the pruning threshold is cut off, and if the logarithm value of the probability of the current frame speech for each state is smaller than the value, the numerical value is cut off.
g) Calculating likelihood probability sum of acoustic space after pruning
Figure A20061008913500091
Wherein D is * Is the set of all states that remain in the state diagram after pruning.
The accumulation of likelihood probabilities for those states that remain after pruning is much larger than the accumulation of likelihood probabilities for those states that are pruned, so they can be used entirely as denominators of the generalized posterior probability of being
Figure A20061008913500092
h) The generalized posterior probability of each phoneme is calculated.
In speech recognition systems, each phoneme is represented by a Markov model (HMM). The generalized posterior probability of each phoneme is defined as the arithmetic mean of the posterior probabilities of each state corresponding to the phoneme:
Figure A20061008913500093
where N is the number of states that make up each HMM. Tau is b 、τ e Respectively mean that the voice input data is
The starting frame number and ending frame number of the previous state, j is the state number. p(s) j |X t ) I.e. the generalized posterior probability obtained in step g).
i) The generalized posterior probability of a phoneme can be used as the confidence score of the phoneme.
The state-graph based confidence likelihood synchronization estimation algorithm of the present invention was tested using a database of telephone names in chinese for testing of actual telephone speech recognition systems. The test task was to evaluate the recognition rate of a recognition system containing a 1278 personal name dictionary. The test speech was normal speech from 6 speakers including 3 men and 3 women. In the test set, 180 out-of-set words are included. Each task grammar includes 213 person names. The confidence score is used to reject those out-of-set words in the test set. Our goal is to increase rejection, i.e., to reduce the false accepted rate of those words that are out of the set.
Two different algorithms are used to calculate confidence. One is defined as a two-Pass (2 Pass) search algorithm, and the other is defined as a one-Pass (1 Pass) algorithm, namely a synchronization estimation algorithm, based on the state diagram confidence synchronization calculation method of the invention, as shown in fig. 2. In a two-pass search algorithm, two different acoustic models are used. The first pass uses an acoustic model containing 5005 states, 16 gaussian models, while the acoustic model used to calculate confidence is a smaller model covering only all phonemes, containing 1005 states and 8 gaussian models. In one search algorithm pass, an acoustic model is used, which contains 5005 states and 16 gaussian models.
The performance curves ROC (receiver operating characteristics) of the two algorithms are shown in fig. 6. It can be seen from the figure that the performance of the one-pass search algorithm used in the present invention is better than the two-pass search algorithm. The equal error rate of the search algorithm adopted by the invention is 16.1%, and the equal error rate of the two-pass search algorithm is 21%. Because only one acoustic model is used in one-pass search algorithm and the model used in the calculation of the confidence coefficient is fine, although the calculation of the acoustic space after pruning is an approximate value, the performance is still not reduced.
In addition, the two methods have different computational complexity, and the speed of the one-time search algorithm is improved by 16% compared with the two-time search algorithm.

Claims (4)

1. A method for fast solving confidence in a speech recognition system is characterized by comprising the following steps:
1) Inputting the voice to be recognized into a voice recognition system;
2) Preprocessing input voice, wherein the preprocessing comprises framing processing;
3) Extracting MFCC feature vectors of each frame of voice;
4) Traversing all the voice frames, and for each frame of voice, calculating the likelihood probability p (x) of each state in the state diagram corresponding to the frame of voice according to the state diagram and the acoustic model in the voice recognition system and the MFCC feature vector of the frame of voice t /s j ) Negative logarithm of the likelihood probability
Figure A2006100891350002C1
Wherein x is t Is the feature vector, mu, of a speech frame j Sum Σ j Are respectively the state s j N is the dimension of the feature vector;
5) Storing the likelihood probability p (x) obtained in the step 4) according to the frame number and the state number of the current voice t /s j );
6) Judging whether the current pointer points to a virtual node in the state diagram, if so, entering the step
7) (ii) a If the judgment result is no, pruning is carried out on the current state; the virtual node is a mark for ending a phoneme in the state diagram;
7) Calculating likelihood probability sum of acoustic space after pruningWherein D is * Is the set of all the states retained in the state diagram after pruning;
8) Calculating generalized posterior probability
Figure A2006100891350002C3
9) Computing generalized posterior probabilities for each phoneme
Figure A2006100891350002C4
Taking the generalized posterior probability of the phoneme as the confidence score of the phoneme;
where N is the number of states that make up each Markov model. Tau. b [j]、τ e [j]Respectively indicating the starting frame number and the ending frame number of the voice input data in the current state, wherein j is a state number.
2. The method of claim 1 wherein preprocessing the input speech in step 2) includes digitizing, pre-emphasizing, high-frequency boosting, framing and windowing the input speech.
3. The method of fast confidence level calculation in a speech recognition system according to claim 1, wherein said extracting speech features in step 3) comprises: and calculating MFCC cepstrum coefficients, cepstrum weighting and calculating differential cepstrum coefficients.
4. The method for fast confidence level estimation in a speech recognition system according to claim 1, wherein the pruning in step 6) is performed by a pruning method based on frame-synchronous beam search.
CN2006100891355A 2006-08-04 2006-08-04 Confidence degree quick acquiring method in speech identification system Expired - Fee Related CN101118745B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2006100891355A CN101118745B (en) 2006-08-04 2006-08-04 Confidence degree quick acquiring method in speech identification system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2006100891355A CN101118745B (en) 2006-08-04 2006-08-04 Confidence degree quick acquiring method in speech identification system

Publications (2)

Publication Number Publication Date
CN101118745A true CN101118745A (en) 2008-02-06
CN101118745B CN101118745B (en) 2011-01-19

Family

ID=39054824

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2006100891355A Expired - Fee Related CN101118745B (en) 2006-08-04 2006-08-04 Confidence degree quick acquiring method in speech identification system

Country Status (1)

Country Link
CN (1) CN101118745B (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101894549A (en) * 2010-06-24 2010-11-24 中国科学院声学研究所 Method for fast calculating confidence level in speech recognition application field
CN101980336A (en) * 2010-10-18 2011-02-23 福州星网视易信息系统有限公司 Hidden Markov model-based vehicle sound identification method
CN101393739B (en) * 2008-10-31 2011-04-27 清华大学 Computation method for characteristic value of Chinese speech recognition credibility
CN101650886B (en) * 2008-12-26 2011-05-18 中国科学院声学研究所 Method for automatically detecting reading errors of language learners
CN101645271B (en) * 2008-12-23 2011-12-07 中国科学院声学研究所 Rapid confidence-calculation method in pronunciation quality evaluation system
CN102047322B (en) * 2008-06-06 2013-02-06 株式会社雷特龙 Audio recognition device, audio recognition method, and electronic device
CN103021408A (en) * 2012-12-04 2013-04-03 中国科学院自动化研究所 Method and device for speech recognition, optimizing and decoding assisted by stable pronunciation section
CN102142253B (en) * 2010-01-29 2013-05-29 富士通株式会社 Voice emotion identification equipment and method
CN103811008A (en) * 2012-11-08 2014-05-21 中国移动通信集团上海有限公司 Audio frequency content identification method and device
CN103810997A (en) * 2012-11-14 2014-05-21 北京百度网讯科技有限公司 Method and device for determining confidence of voice recognition result
CN106297769A (en) * 2015-05-27 2017-01-04 国家计算机网络与信息安全管理中心 A kind of distinctive feature extracting method being applied to languages identification
CN106611048A (en) * 2016-12-20 2017-05-03 李坤 Language learning system with online voice assessment and voice interaction functions
CN107004408A (en) * 2014-12-09 2017-08-01 微软技术许可有限责任公司 For determining the method and system of the user view in spoken dialog based at least a portion of semantic knowledge figure is converted into Probability State figure
CN109872715A (en) * 2019-03-01 2019-06-11 深圳市伟文无线通讯技术有限公司 A kind of voice interactive method and device
CN110447068A (en) * 2017-03-24 2019-11-12 三菱电机株式会社 Speech recognition equipment and audio recognition method
CN110634469A (en) * 2019-09-27 2019-12-31 腾讯科技(深圳)有限公司 Speech signal processing method and device based on artificial intelligence and storage medium
CN112151020A (en) * 2019-06-28 2020-12-29 北京声智科技有限公司 Voice recognition method and device, electronic equipment and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5566272A (en) * 1993-10-27 1996-10-15 Lucent Technologies Inc. Automatic speech recognition (ASR) processing using confidence measures
US5737489A (en) * 1995-09-15 1998-04-07 Lucent Technologies Inc. Discriminative utterance verification for connected digits recognition
US5794189A (en) * 1995-11-13 1998-08-11 Dragon Systems, Inc. Continuous speech recognition
CN1223985C (en) * 2002-10-17 2005-10-19 中国科学院声学研究所 Phonetic recognition confidence evaluating method, system and dictation device therewith
CN100514446C (en) * 2004-09-16 2009-07-15 北京中科信利技术有限公司 Pronunciation evaluating method based on voice identification and voice analysis
GB0426347D0 (en) * 2004-12-01 2005-01-05 Ibm Methods, apparatus and computer programs for automatic speech recognition

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102047322B (en) * 2008-06-06 2013-02-06 株式会社雷特龙 Audio recognition device, audio recognition method, and electronic device
CN101393739B (en) * 2008-10-31 2011-04-27 清华大学 Computation method for characteristic value of Chinese speech recognition credibility
CN101645271B (en) * 2008-12-23 2011-12-07 中国科学院声学研究所 Rapid confidence-calculation method in pronunciation quality evaluation system
CN101650886B (en) * 2008-12-26 2011-05-18 中国科学院声学研究所 Method for automatically detecting reading errors of language learners
CN102142253B (en) * 2010-01-29 2013-05-29 富士通株式会社 Voice emotion identification equipment and method
CN101894549A (en) * 2010-06-24 2010-11-24 中国科学院声学研究所 Method for fast calculating confidence level in speech recognition application field
CN101980336A (en) * 2010-10-18 2011-02-23 福州星网视易信息系统有限公司 Hidden Markov model-based vehicle sound identification method
CN103811008A (en) * 2012-11-08 2014-05-21 中国移动通信集团上海有限公司 Audio frequency content identification method and device
CN103810997A (en) * 2012-11-14 2014-05-21 北京百度网讯科技有限公司 Method and device for determining confidence of voice recognition result
CN103810997B (en) * 2012-11-14 2018-04-03 北京百度网讯科技有限公司 A kind of method and apparatus for determining voice identification result confidence level
CN103021408A (en) * 2012-12-04 2013-04-03 中国科学院自动化研究所 Method and device for speech recognition, optimizing and decoding assisted by stable pronunciation section
CN107004408B (en) * 2014-12-09 2020-07-17 微软技术许可有限责任公司 Method and system for determining user intent in spoken dialog based on converting at least a portion of a semantic knowledge graph to a probabilistic state graph
CN107004408A (en) * 2014-12-09 2017-08-01 微软技术许可有限责任公司 For determining the method and system of the user view in spoken dialog based at least a portion of semantic knowledge figure is converted into Probability State figure
CN106297769A (en) * 2015-05-27 2017-01-04 国家计算机网络与信息安全管理中心 A kind of distinctive feature extracting method being applied to languages identification
CN106297769B (en) * 2015-05-27 2019-07-09 国家计算机网络与信息安全管理中心 A kind of distinctive feature extracting method applied to languages identification
CN106611048A (en) * 2016-12-20 2017-05-03 李坤 Language learning system with online voice assessment and voice interaction functions
CN110447068A (en) * 2017-03-24 2019-11-12 三菱电机株式会社 Speech recognition equipment and audio recognition method
CN109872715A (en) * 2019-03-01 2019-06-11 深圳市伟文无线通讯技术有限公司 A kind of voice interactive method and device
CN112151020A (en) * 2019-06-28 2020-12-29 北京声智科技有限公司 Voice recognition method and device, electronic equipment and storage medium
CN110634469A (en) * 2019-09-27 2019-12-31 腾讯科技(深圳)有限公司 Speech signal processing method and device based on artificial intelligence and storage medium
CN110634469B (en) * 2019-09-27 2022-03-11 腾讯科技(深圳)有限公司 Speech signal processing method and device based on artificial intelligence and storage medium

Also Published As

Publication number Publication date
CN101118745B (en) 2011-01-19

Similar Documents

Publication Publication Date Title
CN101118745A (en) Confidence degree quick acquiring method in speech identification system
US6125345A (en) Method and apparatus for discriminative utterance verification using multiple confidence measures
EP0880126B1 (en) Speech-silence discrimination based on unsupervised HMM adaptation
KR100631786B1 (en) Method and apparatus for speech recognition by measuring frame's confidence
Lin et al. OOV detection by joint word/phone lattice alignment
Akbacak et al. Environmental sniffing: noise knowledge estimation for robust speech systems
CN111640423B (en) Word boundary estimation method and device and electronic equipment
Mengistu Automatic text independent amharic language speaker recognition in noisy environment using hybrid approaches of LPCC, MFCC and GFCC
KR100969138B1 (en) Method For Estimating Noise Mask Using Hidden Markov Model And Apparatus For Performing The Same
Anguera et al. Evolutive speaker segmentation using a repository system
Matsuda et al. ATR parallel decoding based speech recognition system robust to noise and speaking styles
Benıtez et al. Different confidence measures for word verification in speech recognition
Nakamura et al. Multi-modal temporal asynchronicity modeling by product HMMs for robust audio-visual speech recognition
KR100586045B1 (en) Recursive Speaker Adaptation Automation Speech Recognition System and Method using EigenVoice Speaker Adaptation
Nazreen et al. A joint enhancement-decoding formulation for noise robust phoneme recognition
KR20050036301A (en) Apparatus and method for distinction using pitch and mfcc
Remes et al. Missing feature reconstruction and acoustic model adaptation combined for large vocabulary continuous speech recognition
Casar et al. Analysis of HMM temporal evolution for automatic speech recognition and utterance verification.
Yamada et al. Improvement of rejection performance of keyword spotting using anti-keywords derived from large vocabulary considering acoustical similarity to keywords.
Kosaka et al. Speaker adaptation based on system combination using speaker-class models.
Zacharie et al. Keyword spotting on word lattices
JP3105708B2 (en) Voice recognition device
Scanzio et al. Word confidence using duration models.
Moon et al. Out-of-vocabulary word rejection algorithm in Korean variable vocabulary word recognition
ICU et al. Recognition Confidence

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20110119