JP2002073078A - Voice recognition method and medium recording program thereon - Google Patents

Voice recognition method and medium recording program thereon

Info

Publication number
JP2002073078A
JP2002073078A JP2000268443A JP2000268443A JP2002073078A JP 2002073078 A JP2002073078 A JP 2002073078A JP 2000268443 A JP2000268443 A JP 2000268443A JP 2000268443 A JP2000268443 A JP 2000268443A JP 2002073078 A JP2002073078 A JP 2002073078A
Authority
JP
Japan
Prior art keywords
hypothesis
word
history
time
score
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP2000268443A
Other languages
Japanese (ja)
Other versions
JP3550350B2 (en
Inventor
Takaaki Hori
貴明 堀
Yoshiaki Noda
喜昭 野田
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP2000268443A priority Critical patent/JP3550350B2/en
Publication of JP2002073078A publication Critical patent/JP2002073078A/en
Application granted granted Critical
Publication of JP3550350B2 publication Critical patent/JP3550350B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Abstract

PROBLEM TO BE SOLVED: To decrease a calculated quantity a hypothetical history correction. SOLUTION: A hypothesis rejected by hypothetical narrowing down is stored in storage areas 43, 44, 45, etc., of a buffer memory 26 different for every rejection time, final word, or phoneme history. Only a hypothesis stored in the same storage area as that of the starting time of the final word of the hypothesis to be reappraised when correcting the history and a word just before, or phoneme histories until the final word is read from the buffer memory 26, and it is used for a history correction.

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【0001】[0001]

【発明の属する技術分野】この発明は、与えられた文法
によって生成可能な数多くの仮説から、入力された音声
に最も近い仮説を効率的に見つける仮説の探索可能とす
る音声認識方法及びそのプログラム記録媒体。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition method capable of efficiently searching for a hypothesis closest to an input speech from many hypotheses that can be generated by a given grammar, and a program recording thereof. Medium.

【0002】[0002]

【従来の技術】図4に示す音声認識処理において、入力
音声11は、分析処理部12により、特徴パラメータの
ベクトルデータ時系列に変換され、探索処理部13によ
り文法/言語モデル(文法モデル及び言語モデル)16
の許容する仮説と照合される。この仮説との照合結果の
評価値であるスコアは、仮説に対応する音響モデル15
と入力音声11の尤もらしさ(類似性)を示す音響スコ
アと、仮説の存在する確率に対応した言語スコアとから
なり、最も高いスコアを持つ仮説が認識結果14として
出力される。
2. Description of the Related Art In a speech recognition process shown in FIG. 4, an input speech 11 is converted into a time series of feature parameter vector data by an analysis processing unit 12, and a grammar / language model (grammar model and language) is searched by a search processing unit 13. Model) 16
Is checked against the allowable hypothesis. The score, which is the evaluation value of the result of collation with this hypothesis, is stored in the acoustic model 15 corresponding to the hypothesis.
And a sound score indicating the likelihood (similarity) of the input speech 11 and a language score corresponding to the probability that a hypothesis exists. The hypothesis having the highest score is output as the recognition result 14.

【0003】分析処理部12における信号処理として、
よく用いられるのは、フィルタバンク分析、線形予測分
析(Linear Predictive Coding、LPCと呼ばれる)、
特徴パラメータとしては、LPCケプストラム、LPC
デルタケプストラム、メルケプストラム(mel-frequenc
y cepstral coefficients、MFCCと呼ばれる)、デ
ルタMFCC、対数パワーなどがある。音響モデル15
としては確率・統計理論に基づいてモデル化された隠れ
マルコフモデル法(Hidden Markov Model、以後HMM
法と呼ぶ)が主流である。このHMM法の詳細は、例え
ば、社団法人電子情報通信学会編、中川聖一著「確率モ
デルによる音声認識」に開示されている。
[0003] As signal processing in the analysis processing section 12,
Commonly used are filter bank analysis, linear predictive analysis (referred to as LPC),
LPC cepstrum, LPC
Delta cepstrum, mel-frequenc
y cepstral coefficients, called MFCC), delta MFCC, log power, and the like. Acoustic model 15
Hidden Markov Model (HMM) modeled based on probability and statistical theory
Is called the law). The details of the HMM method are disclosed, for example, in Seiichi Nakagawa, "Speech Recognition by Stochastic Model", edited by the Institute of Electronics, Information and Communication Engineers.

【0004】文法/言語モデル16は、認識対象とする
文を定義するための単語の連結関係を規定したものであ
り、単語を枝とした単語ネットワークや言語の確率モデ
ル等が用いられる。言語の確率モデルは、単語単体の存
在確率、2つ以上の単語の連結する確率が用いられる。
この言語の確率モデルの詳細は、例えば、社団法人電子
情報通信学会編、中川聖一著「確率モデルによる音声認
識」に開示されている。
[0004] The grammar / language model 16 defines the connection of words for defining a sentence to be recognized, and uses a word network with words as branches, a language probability model, or the like. The probability model of a language uses the existence probability of a single word and the probability that two or more words are connected.
The details of the stochastic model of this language are disclosed, for example, in "Speech Recognition by a Stochastic Model" by Seichi Nakagawa, edited by the Institute of Electronics, Information and Communication Engineers.

【0005】探索処理部13は、文法で規定された単語
の接続関係を示す単語ネットワーク上の単語系列に対応
した音響モデル15と特徴パラメータのベクトルデータ
時系列とを照合し、音響的な尤もらしさ(類似性)を示
す音響スコアを求める一方、その単語系列に対応した言
語モデル16から言語スコアを求め、音響スコアと言語
スコアからなる仮説のスコアを時刻(認識処理の単位時
間を単位とする時刻、以後この明細書中の時刻はこのこ
とを意味する)毎に求め、仮説のスコアの低い仮説を捨
て、仮説のスコアの高い仮説を残し、次の時刻では前の
時刻で残された仮説に対し、必要であれば単語の拡張を
行い、再び音響モデル15、言語モデル16に基づいて
評価を行う。
[0005] The search processing unit 13 collates the acoustic model 15 corresponding to the word sequence on the word network indicating the connection relationship of the words specified by the grammar with the time series of the characteristic parameter vector data, and the acoustic likelihood. While obtaining an acoustic score indicating (similarity), a language score is obtained from the language model 16 corresponding to the word sequence, and the score of a hypothesis composed of the acoustic score and the language score is calculated as the time (time in units of unit time of the recognition process). Hereafter, the time in this specification means this), the hypothesis with a low hypothesis score is discarded, the hypothesis with a high hypothesis score is left, and the next time is replaced with the hypothesis left at the previous time. On the other hand, if necessary, word expansion is performed, and evaluation is performed again based on the acoustic model 15 and the language model 16.

【0006】次に、この照合計算における時刻毎の最も
一般的な処理の流れについて、図4を用いて説明する。
探索処理部13では、その時刻に計算すべき仮説のリス
トを持っており、そのリスト内の個々の仮説は、現時刻
までのその仮説のスコアとその仮説の履歴を表す単語系
列の情報とを保持しているものとする。始めに、仮説生
成部21において、一つ前の時刻に計算された仮説のリ
ストの中で、仮説の履歴の最終単語の終端まで計算が終
わっている仮説があれば、その仮説の最終単語の次に接
続可能な単語を文法で規定された単語のネットワークか
ら求めて、その仮説の終端に付加することにより新しい
仮説を生成し、その生成した仮説を前記仮説のリストに
加え、現時刻の仮説のリストを作る。
Next, the most general processing flow for each time in the collation calculation will be described with reference to FIG.
The search processing unit 13 has a list of hypotheses to be calculated at that time, and each hypothesis in the list includes a score of the hypothesis up to the current time and information of a word series representing the history of the hypothesis. Shall be held. First, in the hypothesis generation unit 21, if there is a hypothesis that has been calculated up to the end of the last word of the hypothesis history in the list of hypotheses calculated at the immediately preceding time, the last word of the hypothesis is calculated. Next, a connectable word is obtained from the network of words defined by the grammar, and a new hypothesis is generated by adding the hypothesis to the end of the hypothesis. Make a list of

【0007】単語内スコア計算部22では、現時刻の仮
説のリスト内の各仮説について、対応する音響モデル1
5と入力音声との照合を行い、その仮説のスコアに音響
スコアを加算する。そして、単語終端スコア計算部23
では、現時刻の仮説のリスト内で、仮説の最終単語の終
端まで音響スコアが加算されている仮説に対して、文法
/言語モデル16を用いてその最終単語の存在確率と対
応した言語スコアを加算する。
The in-word score calculation unit 22 calculates the corresponding acoustic model 1 for each hypothesis in the list of hypotheses at the current time.
5 is compared with the input voice, and the acoustic score is added to the score of the hypothesis. Then, the word end score calculation unit 23
In the list of hypotheses at the current time, for the hypothesis in which the acoustic score is added up to the end of the last word of the hypothesis, the grammar / language model 16 is used to calculate the language score corresponding to the existence probability of the final word. to add.

【0008】次に、仮説絞込み部24において、現時刻
の仮説のリスト内で、同じ最終単語を持ち、音響スコア
の加算がその最終単語の同じ部分まで終わっている仮説
の中で、その最終単語の直前の単語が同一、または、直
前の単語の後部の一定個数の音素が同一である仮説は、
最も高いスコアを持つ仮説のみ現時刻の仮説のリストに
残し、他の仮説は棄却する。そして、時刻を1増やし、
再び仮説生成部21に戻る。
Next, in the hypothesis narrowing unit 24, among the hypotheses that have the same final word in the list of the hypotheses at the current time and the addition of the acoustic score ends up to the same part of the final word, the final word The hypothesis that the word immediately before is the same, or that a certain number of phonemes after the immediately preceding word are the same,
Only the hypothesis with the highest score is left in the list of hypotheses at the current time, and the other hypotheses are rejected. And increase the time by 1,
The process returns to the hypothesis generation unit 21 again.

【0009】この時刻毎の計算は、入力音声の全ての時
刻において行われ、入力音声の終了時刻での仮説のリス
トにおいて、最終単語の終端まで計算が終わっている仮
説の中から最もスコアの高い仮説を選び、認識結果とす
る。なお、時刻毎の計算を始める前は、履歴なし、スコ
ア0の1つの仮説を、仮説のリストに入れておく。この
探索手法では、仮説の絞込み部24において、計算する
仮説の数の増加を抑えている。仮説の絞込みを行う方法
として、仮説の最終単語とその直前の単語が同一である
仮説を絞り込む単語対近似(Word Pair Approximatio
n)と呼ばれる方法、仮説の最終単語とその直前の単語
の後部から一定個数の音素が同一である仮説を絞り込む
音素履歴近似と呼ばれる方法があり、単語対近似は、I
EEEの国際会議ICASSP’91のR.Schwartz、S.
Austin著、“A Comparison ofSeveral Approximate Alg
orithm for Finding Multiple(N-best)Sentence Hypoth
eses”、音素履歴近似は、社団法人電子情報通信学会の
技術研究報告SP96−102の野田喜昭、松永昭一、
嵯峨山茂樹著、“単語グラフを用いた大語彙連続音声認
識における近似演算手法の検討”に開示されている。以
後、仮説の最終単語の直前の単語を先行単語、仮説の最
終単語の直前の単語の後部の任意個数の音素履歴を、先
行音素履歴と呼ぶことにする。
The calculation for each time is performed at all times of the input voice. In the list of hypotheses at the end time of the input voice, the highest score among the hypotheses whose calculation is completed up to the end of the final word is obtained. Select a hypothesis and use it as the recognition result. Before starting the calculation for each time, one hypothesis with no history and a score of 0 is put in the hypothesis list. In this search method, the hypothesis narrowing unit 24 suppresses an increase in the number of hypotheses to be calculated. As a method of narrowing down hypotheses, a word pair approximation (Word Pair Approximatio) is used to narrow down hypotheses in which the last word of the hypothesis and the word immediately before it are the same.
n), there is a method called phoneme history approximation that narrows down a hypothesis in which a certain number of phonemes are the same from the last word of the hypothesis and the last part of the word immediately before it.
R. Schwartz, S. of the International Conference of EEE ICASPSP'91
Austin, “A Comparison of Several Approximate Alg
orithm for Finding Multiple (N-best) Sentence Hypoth
eses ", phoneme history approximation is based on the IEICE technical report SP96-102 Yoshiaki Noda, Shoichi Matsunaga,
It is disclosed in "Study of approximate calculation method in large vocabulary continuous speech recognition using word graph" by Shigeki Sagayama. Hereinafter, the word immediately before the last word of the hypothesis is referred to as the preceding word, and the arbitrary number of phoneme histories at the end of the word immediately before the last word of the hypothesis is referred to as the preceding phoneme history.

【0010】続いて、探索処理部13に履歴修正部と棄
却した仮説を蓄積するバッファメモリを有する場合の処
理の流れを図5を用いて説明する。尚、図中、図4で示
したものと同一の部分は同一の記号を付して重複説明を
省略した。仮説生成部21、単語内スコア計算部22、
単語終端スコア計算部23までの処理は図4と同じであ
る。仮説絞込み部24の処理は、仮説の絞込みの方法は
図4と同じであるが、絞込みを行った際に棄却した仮説
をバッファメモリ26に記録する。単語終端スコア計算
部23の次の処理である履歴修正部25では、仮説絞込
み部24において棄却された仮説をバッファメモリ26
から読み出し、単語終端スコア計算部23において評価
した仮説の先行単語までの履歴と、評価中の時刻以前に
前記バッファメモリ26に保持された仮説の履歴とを入
れ替えてスコアを計算し、最も高いスコアを与える仮説
の履歴と入れ替えたときのスコアが、入れ替えない場合
のスコアよりも高くなれば、前記最も高いスコアを与え
る仮説の履歴を現在の仮説の先行単語までの履歴とし、
その仮説のスコアを前記最も高いスコアとする。
Next, the flow of processing when the search processing unit 13 has a history correction unit and a buffer memory for storing rejected hypotheses will be described with reference to FIG. In the figure, the same parts as those shown in FIG. 4 are denoted by the same reference numerals, and redundant description is omitted. Hypothesis generation unit 21, in-word score calculation unit 22,
The processing up to the word end score calculation unit 23 is the same as that in FIG. The processing of the hypothesis narrowing unit 24 is the same as that of FIG. 4 for narrowing the hypotheses, but records the hypothesis rejected when the narrowing is performed in the buffer memory 26. The history correction unit 25, which is the next process of the word end score calculation unit 23, stores the hypothesis rejected by the hypothesis narrowing unit 24 in the buffer memory 26.
, And replaces the history up to the preceding word of the hypothesis evaluated by the word end score calculation unit 23 with the history of the hypothesis held in the buffer memory 26 before the time during the evaluation, and calculates the score. If the score when replaced with the history of the hypothesis that gives is higher than the score when not replaced, the history of the hypothesis that gives the highest score is the history up to the preceding word of the current hypothesis,
The score of the hypothesis is defined as the highest score.

【0011】このような履歴の修正が必要となる理由を
次に述べる。単語対近似を用いる場合は、最終単語と先
行単語が同一の仮説の中でスコア最大の仮説をただ一つ
残すので、始端から先行単語より前までの単語系列が、
このスコア最大の仮説と異なる仮説は失われ、仮説のリ
ストには始端から先行単語の前までの単語系列が同一の
仮説しか残らない。しかしながら、3つ以上の単語系列
に対して言語スコアを与える言語モデル(例えば単語ト
ライグラム)を用いる場合は、仮説の最終単語と先行単
語と先行単語の前までの1つ以上の単語系列に基づいて
言語スコアを計算するが、その仮説と最終単語、先行単
語が同一で、先行単語より前までの単語系列が異なるす
でに失われた仮説に対して言語スコアを計算して加算し
た方が、仮説のスコアが高くなる場合がある。これは、
3つ以上の単語系列に対する言語スコアが先行単語より
前までの単語系列に依存して異なる値をとるためであ
る。従って、現時刻よりも後の時刻において、仮説のス
コアが他の仮説のスコアよりも高くなるような仮説を絞
込みによって失うことになる。以後、この現象を絞込み
の誤りと呼ぶことにする。
The reason why such a history correction is required will be described below. When word pair approximation is used, the last word and the preceding word remain the only hypothesis with the highest score in the same hypothesis, so the word sequence from the beginning to the preceding word is
The hypothesis different from the hypothesis with the highest score is lost, and only the hypothesis with the same word sequence from the beginning to the preceding word remains in the hypothesis list. However, when a language model (for example, a word trigram) that gives a language score to three or more word sequences is used, the last word of the hypothesis, the preceding word, and one or more word sequences before the preceding word are used. The language score is calculated using the hypothesis, the final word and the preceding word are the same, and the word sequence up to the preceding word is different. Score may be higher. this is,
This is because language scores for three or more word sequences take different values depending on the word sequences up to the preceding word. Therefore, at a time later than the current time, a hypothesis in which the score of the hypothesis becomes higher than the scores of the other hypotheses is lost by narrowing. Hereinafter, this phenomenon will be referred to as a narrowing error.

【0012】同様に、音素履歴近似を用いる場合は、最
終単語と先行音素履歴が同一の仮説の中でスコア最大の
仮説をただ一つ残すので、先行音素履歴が同一の仮説の
中で、始端から先行単語までの単語系列がこのスコア最
大の仮説と異なる仮説は棄却され、仮説のリストには先
行音素履歴ごとに始端から先行単語までの単語系列が同
一の仮説しか残らない。しかしながら、2つ以上の単語
系列に対して言語スコアを与える言語モデル(例えば単
語バイグラム)を用いる場合は、仮説の最終単語と先行
単語までの1つ以上の単語系列に基づいて言語スコアを
計算するが、その仮説と最終単語ならびに先行音素履歴
が同一で、先行単語までの単語系列が異なるすでに棄却
された仮説に対して言語スコアを計算して加算した方
が、仮説のスコアが高くなる場合がある。これは、言語
スコアが先行単語までの単語系列に依存して異なる値を
とるためである。
Similarly, when the phoneme history approximation is used, only one hypothesis with the highest score remains in the hypothesis in which the last word and the preceding phoneme history are the same, so that the starting hypothesis in the same preceding phoneme history is the same. Hypotheses whose word series from the first word to the preceding word differ from the hypothesis with the highest score are rejected, and only the same hypothesis from the beginning to the preceding word remains in the list of hypotheses for each preceding phoneme history. However, when a language model (for example, a word bigram) that gives a language score to two or more word sequences is used, the language score is calculated based on one or more word sequences up to the last word of the hypothesis and the preceding word. However, if the hypothesis and the final word and the preceding phoneme history are the same and the word sequence up to the preceding word is different, calculating and adding the linguistic score to the already rejected hypothesis may increase the score of the hypothesis. is there. This is because the language score takes different values depending on the word sequence up to the preceding word.

【0013】以上に述べたように、単語対近似では3つ
以上の単語系列に対して言語スコアを与える言語モデ
ル、音素履歴近似では2つ以上の単語系列に対して言語
スコアを与える言語モデルを用いる場合に、絞込みの誤
りが起こり得る。履歴の修正を行う手法としてDela
yed Bigram(デイレイドバイグラム)という
手法がある。当該仮説の最終単語の開始時刻に終了し
た、バッファメモリに記録されているすべての仮説と、
当該仮説の先行単語までの仮説とを入れ替えて、バイグ
ラム(Bigram)確率に基づいて評価することによ
り、最も高いスコアとなる仮説の履歴を当該仮説の先行
単語までの履歴とする方法である。Delayed B
igramは、IEEEの国際会議ICASSP’96
のM.Woszczyna、M.Finke、“Minimizing search errors
due to delayed bigrams in real-time speech recogn
ition systems”に開示されている。Delayed
Bigramでは、時刻ごとに棄却された、単語終端ま
でのスコア計算が終わっている仮説をバッファメモリに
記録しておく。履歴修正部では再評価する仮説の最終単
語の開始時刻に基づいて、その開始時刻において棄却さ
れた仮説をバッファメモリから一つずつ読み出して、再
評価する仮説の履歴と入れ替えてスコアを計算すること
を繰り返す。
As described above, in the word pair approximation, a language model that gives a language score to three or more word sequences, and in a phoneme history approximation, a language model that gives a language score to two or more word sequences. When used, narrowing errors can occur. Dela as a method to correct history
There is a method called yeard bigram (delayed bigram). All hypotheses recorded in the buffer memory that ended at the start time of the last word of the hypothesis,
This is a method in which the hypothesis up to the preceding word of the hypothesis is replaced with the hypothesis up to the preceding word of the hypothesis, and the history of the hypothesis having the highest score is evaluated based on the bigram (Bigram) probability. Delayed B
igram is an international conference of IEEE ICASPSP '96
M. Woszczyna and M. Finke, “Minimizing search errors
due to delayed bigrams in real-time speech recogn
ition systems ". Delayed
In Bigram, the hypothesis that has been rejected at each time and the score calculation up to the end of the word has been completed is recorded in the buffer memory. Based on the start time of the last word of the hypothesis to be reevaluated, the history correction unit reads out the rejected hypotheses at the start time one by one from the buffer memory and replaces the hypothesis with the history of the reevaluation hypothesis to calculate the score. repeat.

【0014】次に、この時刻ごとに棄却された仮説を記
録する方法について図6を用いて説明する。図6は、図
5において仮説絞込み部24から棄却された仮説をバッ
ファメモリ26に渡す部分を詳細に記したものである。
当該時刻をtとするとき、仮説絞込み部24で棄却され
た仮説31は、仮説の受け渡し処理27によってバッフ
ァメモリ26中の時刻tの仮説を記憶する領域35に追
加される。以後、仮説を記憶する領域を仮説記憶領域と
呼ぶ。ここで、仮説33は時刻tにおける仮説絞込み部
24の処理過程で、仮説31よりも前に時刻tの仮説記
憶領域35に記憶された仮説とする。従って、仮説31
も領域35内に仮説33として記憶されることになる。
また、バッファメモリ26中の時刻tより以前の時刻の
仮説記憶領域(例えば時刻t−1の仮説記憶領域34)
には、各時刻で棄却された仮説(例えば仮説32)が記
録されており、時刻t以降の時刻の仮説記憶領域(例え
ば時刻t+1の仮説記憶領域36)には、まだ何も記録
されていない。このように記憶しておけば、再評価する
仮説の最終単語の開始時刻を用いて、バッファメモリ2
6から履歴の修正に利用する複数の仮説を簡単に探し出
して読み出すことができる。
Next, a method of recording a hypothesis rejected at each time will be described with reference to FIG. FIG. 6 shows in detail a portion of passing the hypothesis rejected from the hypothesis narrowing section 24 to the buffer memory 26 in FIG.
Assuming that the time is t, the hypothesis 31 rejected by the hypothesis narrowing unit 24 is added to the area 35 for storing the hypothesis at the time t in the buffer memory 26 by the passing process 27 of the hypothesis. Hereinafter, an area for storing a hypothesis is referred to as a hypothesis storage area. Here, the hypothesis 33 is a hypothesis stored in the hypothesis storage area 35 at the time t before the hypothesis 31 in the process of the hypothesis narrowing unit 24 at the time t. Therefore, hypothesis 31
Are also stored in the area 35 as the hypothesis 33.
Further, the hypothesis storage area in the buffer memory 26 before the time t (for example, the hypothesis storage area 34 at the time t-1).
, The hypothesis rejected at each time (for example, hypothesis 32) is recorded, and nothing is recorded yet in the hypothesis storage area at the time after time t (for example, hypothesis storage area 36 at time t + 1). . With this storage, the start time of the last word of the hypothesis to be re-evaluated is used to store data in the buffer memory 2.
From 6, it is possible to easily find and read out a plurality of hypotheses to be used for correcting the history.

【0015】しかしながら、このような仮説の記録・読
み出し方法を用いると、同じ時刻に記録された仮説の数
が多くなった場合に、バッファメモリ26内の各時刻の
仮説記憶領域に記録される仮説の数が多くなり、履歴修
正部における仮説の再評価に要する処理量が大きくな
る。
However, if such a hypothesis recording / reading method is used, if the number of hypotheses recorded at the same time increases, the hypothesis recorded in the hypothesis storage area at each time in the buffer memory 26 is increased. And the amount of processing required for the reevaluation of the hypothesis in the history correction unit increases.

【0016】[0016]

【発明が解決しようとする課題】Delayed Bi
gramに基づく履歴の修正方法では、棄却された単語
終端までのスコア計算が終わっている仮説を時刻ごとに
バッファメモリに記録しておき、履歴を修正する際は、
再評価する仮説の最終単語の開始時刻に対応するバッフ
ァメモリ内の仮説記憶領域に記録されたすべての仮説
を、再評価する仮説の履歴と入れ替えてスコアを計算す
るため、同じ時刻に記録された仮説の数が多い場合に、
履歴の修正にかかる計算量が大きいという問題があっ
た。
SUMMARY OF THE INVENTION Delayed Bi
In the history correction method based on the gram, the hypothesis for which the score calculation up to the end of the rejected word has been completed is recorded in the buffer memory for each time, and when the history is corrected,
All the hypotheses recorded in the hypothesis storage area in the buffer memory corresponding to the start time of the last word of the hypothesis to be reevaluated are recorded at the same time in order to calculate the score by replacing the hypothesis with the history of the hypothesis to be reevaluated. If there are many hypotheses,
There was a problem that the amount of calculation required to correct the history was large.

【0017】[0017]

【課題を解決するための手段】この発明によれば仮説絞
込みにより棄却された各仮説を、その棄却された時刻を
表す仮説棄却時刻と、その単語履歴の最終単語もしくは
最終単語の終端までの任意個数の音素列を表す音素履
歴、つまり履歴属性ごとに分類してバッファメモリに記
録し、履歴修正時に再評価する仮説の最終単語の開始時
刻と同一の仮説棄却時刻、ならびに履歴属性が同一、つ
まり再評価する仮説の最終単語の一つ前の単語を表す先
行単語もしくは先行単語の終端までの音素履歴と同一に
分類されているバッファメモリ内の仮説のみを読み出し
て再評価する仮説の履歴の修正に利用する。音素履歴を
用いる場合は、その音素の個数は従来の技術の項で述べ
た音素履歴近似法における先行音素履歴に用いる音素数
と同程度であればよい。
According to the present invention, each hypothesis rejected by the hypothesis narrowing is converted into a hypothesis rejection time indicating the rejection time and an arbitrary word up to the last word of the word history or the end of the last word. The phoneme history representing the number of phoneme strings, that is, classified into each history attribute, recorded in the buffer memory, and the same hypothesis rejection time as the start time of the last word of the hypothesis to be reevaluated at the time of history correction, and the same history attribute, Modification of the history of the hypothesis that reads only the hypothesis in the buffer memory that is classified as the same as the preceding word representing the word before the last word of the hypothesis to be re-evaluated or the phoneme history up to the end of the previous word and re-evaluates Use for When a phoneme history is used, the number of phonemes may be approximately the same as the number of phonemes used for the preceding phoneme history in the phoneme history approximation method described in the section of the related art.

【0018】[0018]

【発明の実施の形態】以下、図面を用いてこの発明の実
施の形態について説明する。図1はこの発明の一実施の
形態に係る、棄却した仮説の記録方法を示す図である。
尚、図中、図6で示したものと同一の部分は同一の記号
を付して重複説明を省略した。以下、この実施の形態を
図1を基に説明する。従来法であるDelayedBi
gramでは、図6において示したように、棄却した仮
説はバッファメモリ26の時刻ごとに別々の仮説記憶領
域に記録するが、これに対し、この実施の形態では、棄
却した仮説をバッファメモリ26の時刻ごとに別々に、
さらに、履歴属性、つまり最終単語もしくは音素履歴に
ついても別々に分類して記録する。
Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a diagram showing a method for recording a rejected hypothesis according to an embodiment of the present invention.
In the figure, the same parts as those shown in FIG. 6 are denoted by the same reference numerals, and redundant description is omitted. Hereinafter, this embodiment will be described with reference to FIG. DelayedBi, a conventional method
In the gram, as shown in FIG. 6, the rejected hypothesis is recorded in a separate hypothesis storage area at each time of the buffer memory 26. In contrast, in this embodiment, the rejected hypothesis is stored in the buffer memory 26. Separately for each time,
Further, the history attribute, that is, the last word or phoneme history is separately classified and recorded.

【0019】当該時刻をtとするとき、仮説絞込み部2
4で棄却された仮説41は、仮説の受け渡し処理27に
よってバッファメモリ26の時刻tの、仮説41の音素
履歴acと同一の音素履歴acに対応する仮説記憶領域
46に追加する。ここで、仮説43は、時刻tにおける
仮説絞込み部24の処理過程で、仮説41よりも前に時
刻tの各音素履歴ab又はacに対応する仮説記憶領域
45,46に追加された仮説とする。また、バッファメ
モリ26の時刻tより以前の時刻の仮説記憶領域(例え
ば時刻t−1の仮説記憶領域43,44)には、各時刻
で棄却された仮説(例えば仮説42)が音素履歴(例え
ば音素履歴bc,de)毎に記録されており、時刻t以
降の時刻の仮説記憶領域(例えば時刻t+1の仮説記憶
領域47,48)には、まだ何も記録されていない。
When the time is t, the hypothesis narrowing unit 2
The hypothesis 41 rejected in 4 is added to the hypothesis storage area 46 corresponding to the same phoneme history ac as the phoneme history ac of the hypothesis 41 at time t in the buffer memory 26 by the hypothesis passing process 27. Here, the hypothesis 43 is a hypothesis added to the hypothesis storage areas 45 and 46 corresponding to each phoneme history ab or ac at the time t before the hypothesis 41 in the process of the hypothesis narrowing unit 24 at the time t. . Also, in the hypothesis storage areas of the buffer memory 26 before the time t (for example, the hypothesis storage areas 43 and 44 at the time t-1), the hypotheses rejected at each time (for example, the hypothesis 42) are stored in the phoneme history (for example, Each of the phoneme histories bc, de) is recorded, and nothing is recorded in the hypothesis storage areas at times after time t (for example, the hypothesis storage areas 47, 48 at time t + 1).

【0020】そして、以上のように、棄却された仮説を
各時刻の音素履歴ごとにバッファメモリに記録してお
き、履歴修正部においては、再評価する仮説の最終単語
の開始時刻が仮説棄却時刻と一致し、なおかつ、先行音
素履歴が音素履歴と一致するバッファメモリ内の仮説記
憶領域、例えば領域45を探し出して、その仮説記憶領
域45に記録された仮説のみを読み出して履歴の修正に
利用する。棄却された仮説を各時刻ごとに、最終単語ご
とに各別の仮説記憶領域に記憶し、履歴修正時には、そ
の再評価する仮説の最終単語の開始時刻と一致した仮説
棄却時刻の先行単語が棄却仮説の最終単語と一致する仮
説記憶領域内の仮説のみを履歴修正に利用してもよい。
As described above, the rejected hypotheses are recorded in the buffer memory for each phoneme history at each time, and the history correction unit determines the start time of the last word of the hypothesis to be reevaluated by the hypothesis rejection time. And finds a hypothesis storage area in the buffer memory where the preceding phoneme history coincides with the phoneme history, for example, the area 45, reads out only the hypothesis recorded in the hypothesis storage area 45, and uses it for history correction. . The rejected hypothesis is stored in a separate hypothesis storage area for each time and for each final word, and when the history is modified, the preceding word of the hypothesis rejection time that matches the start time of the final word of the hypothesis to be re-evaluated is rejected. Only the hypothesis in the hypothesis storage area that matches the last word of the hypothesis may be used for history correction.

【0021】その結果、各時刻の最終単語もしくは音素
履歴ごとの仮説記憶領域に記録される仮説の数は、各時
刻ごとの仮説記憶領域に記録するDelayed Bi
gramの場合に比べて少ないことから、仮説の履歴を
修正する際に、履歴を入れ替えてスコアを計算する回数
を抑えることができる。この発明の方法をコンピュータ
によりプログラムを実行させて達成させることもでき
る。その場合の機能構成を図2に示す。各部はバス59
に接続され、分析プログラムがCD−ROM、磁気ディ
スクなど、あるいは通信回線を介してメモリ54にイン
ストールされてあり、同様に探索処理プログラムがメモ
リ57にインストールされてある。入力部51から音声
が入力されると、必要に応じて記憶部53に格納されな
がら、メモリ54の分析プログラムをCPU58が実行
することにより特徴パラメータのベクトルデータ時系列
に変換される。文法/言語モデルデータベース56とし
て、この例では単語ネットワークメモリ56aと言語モ
デルデータベース56bが設けられた場合である。
As a result, the number of hypotheses recorded in the hypothesis storage area for each last word or phoneme history at each time is determined by the Delayed Bi recorded in the hypothesis storage area for each time.
Since the number is smaller than in the case of the gram, the number of times of changing the history and calculating the score can be suppressed when the history of the hypothesis is corrected. The method of the present invention can also be achieved by causing a computer to execute a program. FIG. 2 shows a functional configuration in that case. Each part is a bus 59
And an analysis program is installed in the memory 54 via a CD-ROM, a magnetic disk, or the like, or a communication line. Similarly, a search processing program is installed in the memory 57. When a voice is input from the input unit 51, the analysis program in the memory 54 is executed by the CPU 58 while being stored in the storage unit 53 as needed, and is converted into a time series of feature parameter vector data. In this example, a word network memory 56a and a language model database 56b are provided as the grammar / language model database 56.

【0022】メモリ26の探索処理プログラムをCPU
58が実行することにより、特徴パラメータのベクトル
データ時系列に対し、前述したように、つまり図3に示
すように各時刻ごとに、仮説リストの生成を行い(S
1)、その仮説について音響モデルデータベース55の
音響モデルを用いて音響スコアを計算して加算し(S
2)、最終単語の終端の仮説についてはそれまでの音響
スコアの加算値に対し、言語モデルを用いて言語スコア
を求めて加算して仮説のスコアを求め(S3)、次にそ
の仮説スコアを求めた仮説の最大のものについて、その
最終単語の開始時刻とその直前の履歴属性とが一致する
仮説記憶領域をバッファメモリ26から探し、その領域
内の仮説を読み出して、履歴修正処理を行う(S4)。
この履歴修正処理した後の仮説スコアを用いて仮説絞込
みを行い、棄却仮説をその時刻と履歴属性ごとに分類し
てバッファメモリ26のその記憶領域に記録する(S
5)。入力音声が終了していなければ(S6)、ステッ
プS1に戻り、終了した場合は、その時の最終単語の終
端に達した仮説中のスコアが最大の仮説を認識結果とし
て出力部52から出力する(S7)。
A search processing program for the memory 26 is executed by a CPU.
58, the hypothesis list is generated for the time series of the vector data of the characteristic parameters as described above, that is, at each time as shown in FIG.
1) For the hypothesis, an acoustic score is calculated using the acoustic model in the acoustic model database 55 and added (S)
2) Regarding the hypothesis at the end of the final word, a language score is obtained by using a language model with respect to the added value of the acoustic score up to that point, and the result is added to obtain a hypothesis score (S3). For the largest hypothesis obtained, the buffer memory 26 searches for a hypothesis storage area in which the start time of the last word matches the history attribute immediately before it, reads out the hypothesis in that area, and performs history correction processing ( S4).
Hypothesis narrowing is performed using the hypothesis score after the history correction processing, and rejection hypotheses are classified according to the time and the history attribute and recorded in the storage area of the buffer memory 26 (S
5). If the input speech has not ended (S6), the process returns to step S1, and if it has ended, the hypothesis with the largest score in the hypothesis reaching the end of the final word at that time is output from the output unit 52 as a recognition result ( S7).

【0023】なお、この発明は1つの単語から全ての単
語への接続を許すような文法モデルを用いてもよい。
The present invention may use a grammar model that allows connection from one word to all words.

【0024】[0024]

【発明の効果】以上説明したように、この発明は、仮説
絞込みの際に棄却された仮説を、各時刻の履歴属性(最
終単語もしくは音素履歴)ごとに記録し、履歴修正の際
に、再評価する仮説の最終単語の開始時刻が仮説棄却時
刻と一致し、なおかつその仮説の先行単語もしくは先行
音素履歴が対応する履歴属性、つまり最終単語もしくは
音素履歴と一致するバッファメモリ内の仮説記憶領域に
記録された仮説のみを読み出して履歴の修正に利用する
ようにしているため、各時刻の仮説のリストに含まれる
各仮説の履歴を修正する際に、履歴を入れ替えてスコア
を計算する回数が従来より少なくなり履歴修正の計算量
が低減するという効果を奏する。
As described above, according to the present invention, hypotheses rejected during hypothesis narrowing are recorded for each history attribute (last word or phoneme history) at each time, and when hypothesis correction is performed, re-creation is performed. The start time of the last word of the hypothesis to be evaluated matches the hypothesis rejection time, and the preceding word or preceding phoneme history of the hypothesis corresponds to the corresponding history attribute, that is, the last word or phoneme history. Since only the recorded hypotheses are read and used to correct the history, when correcting the history of each hypothesis included in the list of hypotheses at each time, the number of times to replace the history and calculate the score has been This has the effect of reducing the number of calculations and the amount of calculation for history correction.

【図面の簡単な説明】[Brief description of the drawings]

【図1】この発明方法における棄却された仮説の記録状
態の例を示す図。
FIG. 1 is a diagram showing an example of a recording state of a rejected hypothesis in the method of the present invention.

【図2】この発明の方法をコンピュータにより機能させ
る場合の機能構成例を示す図。
FIG. 2 is a diagram showing an example of a functional configuration when a method of the present invention is caused to function by a computer.

【図3】この発明の実施例の処理手順を示す流れ図。FIG. 3 is a flowchart showing a processing procedure according to the embodiment of the present invention.

【図4】従来の音声認識処理の概要を示す図。FIG. 4 is a diagram showing an outline of a conventional speech recognition process.

【図5】履歴修正部とバッファメモリを有する音声認識
処理の概要を示す図。
FIG. 5 is a diagram showing an outline of a speech recognition process having a history correction unit and a buffer memory.

【図6】従来の棄却された仮説の記録例を示す図。FIG. 6 is a diagram showing a recording example of a conventional rejected hypothesis.

Claims (2)

【特許請求の範囲】[Claims] 【請求項1】 各時刻に新しい仮説を生成し、その仮説
の単語内における入力音声の、仮説と対応する音響モデ
ルとの近さを示す音響スコアを計算すると共に、仮説の
単語終端において、単語間の接続関係を規定する言語モ
デルを用いてその仮説の存在する確率と対応した言語ス
コアを求め、これら音響スコアと言語スコアとから文法
の許容する入力音声内容に関するその仮説の尤もらしさ
を示す仮説のスコアを求め、その仮説のスコアの低い仮
説を棄却し、この棄却した仮説をバッファメモリに記録
し、仮説の入力音声の開始時刻から現在の時刻までの単
語系列を表す単語履歴を、上記バッファメモリに記録さ
れた過去の時刻に棄却された仮説に基づいて修正するこ
とで仮説を再評価し、全ての音声が入力された時点で最
もスコアの高い仮説を認識結果として出力する音声認識
方法において、 上記棄却された仮説を、仮説が棄却された時刻を表す仮
説棄却時刻と、単語履歴の最後の単語を表す最終単語、
もしくは最終単語の終端までの所定個数の音素列を表す
音素履歴ごとに分類して上記バッファメモリに記録し、 前記再評価する仮説の最終単語の開始時刻と同一の仮説
棄却時刻、ならびに、再評価する仮説の最終単語の一つ
前の単語を表す先行単語もしくは最終単語までの音素履
歴と同一の、最終単語もしくは音素履歴に分類されてい
る上記バッファメモリ内の仮説のみを読み出して、再評
価する仮説の履歴の修正に利用することを特徴とする音
声認識方法。
At each time, a new hypothesis is generated, an acoustic score indicating the closeness of the input speech in the word of the hypothesis to the hypothesis and the corresponding acoustic model is calculated, and the word at the word end of the hypothesis is calculated. Using a language model that defines the connection relationship between the language and the language score corresponding to the probability that the hypothesis exists, a hypothesis indicating the likelihood of the hypothesis regarding the input speech content allowed by the grammar from the acoustic score and the language score Of the hypothesis, reject the hypothesis with a low score of the hypothesis, record the rejected hypothesis in the buffer memory, and store the word history representing the word sequence from the start time of the input speech of the hypothesis to the current time in the buffer. The hypothesis is re-evaluated by correcting it based on the hypothesis rejected at the past time recorded in the memory, and the hypothesis with the highest score when all voices are input In a speech recognition method that outputs the rejected hypothesis, a hypothesis rejection time indicating the time when the hypothesis was rejected, a final word indicating the last word in the word history,
Alternatively, the data is classified into phoneme histories representing a predetermined number of phoneme strings up to the end of the last word, recorded in the buffer memory, and the same hypothesis rejection time as the start time of the last word of the hypothesis to be reevaluated, and reevaluation Only the hypothesis in the buffer memory classified as the last word or phoneme history, which is the same as the preceding word or the phoneme history up to the last word representing the word before the last word of the hypothesis to be read, is reevaluated. A speech recognition method, which is used for correcting a hypothesis history.
【請求項2】 請求項1記載の方法をコンピュータによ
り実行させるプログラムを記録した記録媒体。
2. A recording medium on which a program for causing a computer to execute the method according to claim 1 is recorded.
JP2000268443A 2000-09-05 2000-09-05 Voice recognition method and program recording medium Expired - Lifetime JP3550350B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2000268443A JP3550350B2 (en) 2000-09-05 2000-09-05 Voice recognition method and program recording medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2000268443A JP3550350B2 (en) 2000-09-05 2000-09-05 Voice recognition method and program recording medium

Publications (2)

Publication Number Publication Date
JP2002073078A true JP2002073078A (en) 2002-03-12
JP3550350B2 JP3550350B2 (en) 2004-08-04

Family

ID=18755227

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2000268443A Expired - Lifetime JP3550350B2 (en) 2000-09-05 2000-09-05 Voice recognition method and program recording medium

Country Status (1)

Country Link
JP (1) JP3550350B2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010231149A (en) * 2009-03-30 2010-10-14 Kddi Corp Terminal using kana-kanji conversion system for voice recognition, method and program
JP2021018413A (en) * 2019-07-17 2021-02-15 バイドゥ オンライン ネットワーク テクノロジー(ペキン) カンパニー リミテッド Method, apparatus, device, and computer readable storage medium for recognizing and decoding voice based on streaming attention model

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010231149A (en) * 2009-03-30 2010-10-14 Kddi Corp Terminal using kana-kanji conversion system for voice recognition, method and program
JP2021018413A (en) * 2019-07-17 2021-02-15 バイドゥ オンライン ネットワーク テクノロジー(ペキン) カンパニー リミテッド Method, apparatus, device, and computer readable storage medium for recognizing and decoding voice based on streaming attention model
JP7051919B2 (en) 2019-07-17 2022-04-11 バイドゥ オンライン ネットワーク テクノロジー(ペキン) カンパニー リミテッド Speech recognition and decoding methods based on streaming attention models, devices, equipment and computer readable storage media
US11355113B2 (en) 2019-07-17 2022-06-07 Baidu Online Network Technology (Beijing) Co., Ltd. Method, apparatus, device and computer readable storage medium for recognizing and decoding voice based on streaming attention model

Also Published As

Publication number Publication date
JP3550350B2 (en) 2004-08-04

Similar Documents

Publication Publication Date Title
EP1128361B1 (en) Language models for speech recognition
US6961701B2 (en) Voice recognition apparatus and method, and recording medium
US7240002B2 (en) Speech recognition apparatus
JP4802434B2 (en) Voice recognition apparatus, voice recognition method, and recording medium recording program
US9697827B1 (en) Error reduction in speech processing
US20010053974A1 (en) Speech recognition apparatus, speech recognition method, and recording medium
JP4757936B2 (en) Pattern recognition method and apparatus, pattern recognition program and recording medium therefor
JPH08278794A (en) Speech recognition device and its method and phonetic translation device
Schwartz et al. Multiple-pass search strategies
US6016470A (en) Rejection grammar using selected phonemes for speech recognition system
US20040158468A1 (en) Speech recognition with soft pruning
JP2003208195A5 (en)
Rybach et al. On lattice generation for large vocabulary speech recognition
JP3550350B2 (en) Voice recognition method and program recording medium
JP2001242885A (en) Device and method for speech recognition, and recording medium
JP3494338B2 (en) Voice recognition method
JPH09134192A (en) Statistical language model forming device and speech recognition device
JP3042455B2 (en) Continuous speech recognition method
US20050049873A1 (en) Dynamic ranges for viterbi calculations
US20040148163A1 (en) System and method for utilizing an anchor to reduce memory requirements for speech recognition
JP4600705B2 (en) Voice recognition apparatus, voice recognition method, and recording medium
JP7259988B2 (en) DETECTION DEVICE, METHOD AND PROGRAM THEREOF
JP3369121B2 (en) Voice recognition method and voice recognition device
JP4696400B2 (en) Voice recognition apparatus, voice recognition method, program, and recording medium
JP2999726B2 (en) Continuous speech recognition device

Legal Events

Date Code Title Description
A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20031224

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20040113

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20040227

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20040330

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20040423

R151 Written notification of patent or utility model registration

Ref document number: 3550350

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R151

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20090430

Year of fee payment: 5

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20090430

Year of fee payment: 5

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20100430

Year of fee payment: 6

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20100430

Year of fee payment: 6

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20110430

Year of fee payment: 7

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20120430

Year of fee payment: 8

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20130430

Year of fee payment: 9

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20140430

Year of fee payment: 10

S531 Written request for registration of change of domicile

Free format text: JAPANESE INTERMEDIATE CODE: R313531

R350 Written notification of registration of transfer

Free format text: JAPANESE INTERMEDIATE CODE: R350

EXPY Cancellation because of completion of term