JP2005164837A5 - - Google Patents

Download PDF

Info

Publication number
JP2005164837A5
JP2005164837A5 JP2003401724A JP2003401724A JP2005164837A5 JP 2005164837 A5 JP2005164837 A5 JP 2005164837A5 JP 2003401724 A JP2003401724 A JP 2003401724A JP 2003401724 A JP2003401724 A JP 2003401724A JP 2005164837 A5 JP2005164837 A5 JP 2005164837A5
Authority
JP
Japan
Prior art keywords
word
speech recognition
recognition result
posterior probability
generalized
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP2003401724A
Other languages
Japanese (ja)
Other versions
JP4478925B2 (en
JP2005164837A (en
Filing date
Publication date
Application filed filed Critical
Priority to JP2003401724A priority Critical patent/JP4478925B2/en
Priority claimed from JP2003401724A external-priority patent/JP4478925B2/en
Publication of JP2005164837A publication Critical patent/JP2005164837A/en
Publication of JP2005164837A5 publication Critical patent/JP2005164837A5/ja
Application granted granted Critical
Publication of JP4478925B2 publication Critical patent/JP4478925B2/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Claims (7)

音声認識デコーダから出力される、各々単語事後確率が付与された単語からなる、複数の仮説単語列をあらわす音声認識結果を受け、前記単語事後確率に基づいて当該音声認識結果の信頼度を検証するための、音声認識結果の信頼度検証装置であって、
前記音声認識結果に含まれる各単語について、前記音声認識結果に含まれる単語の単語事後確率に基づいて一般化単語事後確率を算出するための一般化単語事後確率算出手段と、
前記音声認識結果に含まれる各単語の単語事後確率を、前記一般化単語事後確率算出手段により算出された一般化単語事後確率で更新するための更新手段と、
前記更新手段により単語事後確率が更新された前記音声認識結果に基づき、前記複数の仮説単語列の中で、当該仮説単語列に含まれる単語の単語事後確率の和が最大となるものを探索するための探索手段と、
前記探索手段により探索された仮説単語列の単語事後確率の和が所定の条件を充足するか否かを判定することにより、前記音声認識結果の信頼度を検証するための判定手段とを含む、音声認識結果の信頼度検証装置。
Receiving a speech recognition result representing a plurality of hypothesized word strings each consisting of a word to which a word posterior probability is given, output from the speech recognition decoder, and verifying the reliability of the speech recognition result based on the word posterior probability A speech recognition result reliability verification device for
For each word included in the speech recognition result, generalized word posterior probability calculating means for calculating a generalized word posterior probability based on the word posterior probability of the word included in the speech recognition result;
Updating means for updating the word posterior probability of each word included in the speech recognition result with the generalized word posterior probability calculated by the generalized word posterior probability calculating means;
Based on the speech recognition result in which the word posterior probability is updated by the updating means, a search is made among the plurality of hypothesis word strings for which the sum of the word posterior probabilities of the words included in the hypothesis word string is maximized. Search means for,
Determination means for verifying the reliability of the speech recognition result by determining whether or not the sum of the word posterior probabilities of the hypothesis word string searched by the search means satisfies a predetermined condition, Reliability verification device for speech recognition results.
前記一般化単語事後確率算出手段による一般化単語確率の算出に先立って、前記音声認識結果のうち、所定の基準により定められるしきい値よりも尤度が高いものからなる単語列のみを選択して前記一般化単語事後確率算出手段に与えるための手段をさらに含む、請求項1に記載の音声認識結果の信頼度検証装置。   Prior to the calculation of generalized word probabilities by the generalized word posterior probability calculating means, only the word string consisting of the speech recognition results having a higher likelihood than the threshold value determined by a predetermined criterion is selected. The speech recognition result reliability verification apparatus according to claim 1, further comprising means for giving to the generalized word posterior probability calculating means. 前記音声認識結果に含まれる各単語には、さらに前記音声認識デコーダへの入力発話中における時間期間を定める情報が付されており、
前記一般化単語事後確率算出手段は、
前記音声認識結果中に含まれる各単語について、当該単語の時間期間と重なる時間期間であって、かつ当該単語と一致する単語を前記音声認識結果中で検索ための単語検索手段と、
前記単語検索手段により検索された単語の単語事後確率の総和と、前記音声認識結果に含まれる全ての単語の単語事後確率の総和とに基づいて、前記各単語の一般化単語事後確率を算出するための手段とを含む、請求項1又は請求項2に記載の音声認識結果の信頼度検証装置。
Each word included in the speech recognition result is further provided with information for determining a time period during the input utterance to the speech recognition decoder,
The generalized word posterior probability calculating means is:
For each word included in the speech recognition result, a word search means for searching for a word in the speech recognition result that has a time period overlapping with the time period of the word and matches the word,
Based on the sum of the word posterior probabilities of the words searched by the word search means and the sum of the word posterior probabilities of all words included in the speech recognition result, the generalized word posterior probability of each word is calculated. The speech recognition result reliability verification apparatus according to claim 1 or 2, further comprising:
前記一般化単語事後確率を算出するための手段は、前記単語検索手段により検索された単語の単語事後確率の総和と、前記音声認識結果に含まれる全ての単語の単語事後確率の総和との比率によって、前記各単語の一般化単語事後確率を算出するための手段を含む、請求項3に記載の音声認識結果の信頼度検証装置。 The means for calculating the generalized word posterior probability is a ratio between the sum of the word posterior probabilities of the words searched by the word search means and the sum of the word posterior probabilities of all words included in the speech recognition result. The speech recognition result reliability verification apparatus according to claim 3, further comprising means for calculating a generalized word posterior probability of each word. 前記仮説単語列中の単語wの一般化単語事後確率p([w;s、t]|x1 T)(ただしs及びtはそれぞれ単語wの時間期間の開始時刻及び終了時刻)は次の式
で与えられ、ただしx1 T=x1,…,xTは観測された音声シーケンスであり、Mは音声認識結果の仮説に含まれる単語数であり、sn及びtnはそれぞれ、単語wと一致するn番目の単語wnの開始時刻及び終了時刻であり、p(xsm tm|w m )は音響尤度であり、p(wm|w1 M)は言語尤度であり、p(x1 T)は音響観測尤度であり、α及びβはそれぞれ所定の定数である、請求項1〜請求項4のいずれかに記載の音声認識結果の信頼度検証装置。
The generalized word posterior probability p ([w; s, t] | x 1 T ) of the word w in the hypothesis word string (where s and t are the start time and end time of the time period of the word w, respectively) is formula
Given, provided that x 1 T = x 1, ... , x T is the observed speech sequence, M is the number of words included in the hypothesis of a speech recognition result, s n and t n, respectively, word w Are the start time and end time of the nth word wn , where p (x sm tm | w m ) is the acoustic likelihood, and p (w m | w 1 M ) is the language likelihood, The speech verification result reliability verification apparatus according to claim 1, wherein p (x 1 T ) is an acoustic observation likelihood, and α and β are predetermined constants, respectively.
コンピュータにより実行されると、請求項1〜請求項5のいずれかに記載の音声認識結果の信頼度検証装置の各手段を実現するよう、当該コンピュータを動作させる、コンピュータプログラム。 A computer program that, when executed by a computer, causes the computer to operate so as to realize each unit of the speech recognition result reliability verification apparatus according to any one of claims 1 to 5. 請求項6に記載のコンピュータプログラムによりプログラムされたコンピュータ。 A computer programmed by the computer program according to claim 6.
JP2003401724A 2003-12-01 2003-12-01 Speech recognition result reliability verification apparatus, computer program, and computer Expired - Fee Related JP4478925B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2003401724A JP4478925B2 (en) 2003-12-01 2003-12-01 Speech recognition result reliability verification apparatus, computer program, and computer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2003401724A JP4478925B2 (en) 2003-12-01 2003-12-01 Speech recognition result reliability verification apparatus, computer program, and computer

Publications (3)

Publication Number Publication Date
JP2005164837A JP2005164837A (en) 2005-06-23
JP2005164837A5 true JP2005164837A5 (en) 2005-09-22
JP4478925B2 JP4478925B2 (en) 2010-06-09

Family

ID=34725558

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2003401724A Expired - Fee Related JP4478925B2 (en) 2003-12-01 2003-12-01 Speech recognition result reliability verification apparatus, computer program, and computer

Country Status (1)

Country Link
JP (1) JP4478925B2 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4659541B2 (en) * 2005-07-11 2011-03-30 日本放送協会 Speech recognition apparatus and speech recognition program
JP4816409B2 (en) 2006-01-10 2011-11-16 日産自動車株式会社 Recognition dictionary system and updating method thereof
JP4836076B2 (en) * 2006-02-23 2011-12-14 株式会社国際電気通信基礎技術研究所 Speech recognition system and computer program
JP4947545B2 (en) * 2006-08-30 2012-06-06 株式会社国際電気通信基礎技術研究所 Speech recognition apparatus and computer program

Similar Documents

Publication Publication Date Title
US20230206914A1 (en) Efficient empirical determination, computation, and use of acoustic confusability measures
Ferrer et al. Is the speaker done yet? Faster and more accurate end-of-utterance detection using prosody
Chen et al. Query-by-example keyword spotting using long short-term memory networks
US6873993B2 (en) Indexing method and apparatus
JP5048934B2 (en) Method and apparatus for providing recognition of proper names or partial proper names
US6985861B2 (en) Systems and methods for combining subword recognition and whole word recognition of a spoken input
US5797123A (en) Method of key-phase detection and verification for flexible speech understanding
Parlak et al. Spoken term detection for Turkish broadcast news
US6044337A (en) Selection of superwords based on criteria relevant to both speech recognition and understanding
US11024298B2 (en) Methods and apparatus for speech recognition using a garbage model
US7286984B1 (en) Method and system for automatically detecting morphemes in a task classification system using lattices
Ferrer et al. A prosody-based approach to end-of-utterance detection that does not require speech recognition
US8108215B2 (en) Speech recognition apparatus and method
Alon et al. Contextual speech recognition with difficult negative training examples
Schwartz et al. Multiple-pass search strategies
WO2004066268A2 (en) Dual search acceleration technique for speech recognition
JP2001092496A (en) Continuous voice recognition device and recording medium
CN113574545A (en) Training data modification for training models
US20170270923A1 (en) Voice processing device and voice processing method
JP5360414B2 (en) Keyword extraction model learning system, method and program
US20040148169A1 (en) Speech recognition with shadow modeling
JP2005164837A5 (en)
Ramesh et al. Context dependent anti subword modeling for utterance verification.
Vozila et al. Grapheme to phoneme conversion and dictionary verification using graphonemes.
JP2015118354A (en) Speech recognition device and speech recognition method