JP2002073078A

JP2002073078A - Voice recognition method and medium recording program thereon

Info

Publication number: JP2002073078A
Application number: JP2000268443A
Authority: JP
Inventors: Takaaki Hori; 貴明堀; Yoshiaki Noda; 喜昭野田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2000-09-05
Filing date: 2000-09-05
Publication date: 2002-03-12
Anticipated expiration: 2020-09-05
Also published as: JP3550350B2

Abstract

PROBLEM TO BE SOLVED: To decrease a calculated quantity a hypothetical history correction. SOLUTION: A hypothesis rejected by hypothetical narrowing down is stored in storage areas 43, 44, 45, etc., of a buffer memory 26 different for every rejection time, final word, or phoneme history. Only a hypothesis stored in the same storage area as that of the starting time of the final word of the hypothesis to be reappraised when correcting the history and a word just before, or phoneme histories until the final word is read from the buffer memory 26, and it is used for a history correction.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、与えられた文法
によって生成可能な数多くの仮説から、入力された音声
に最も近い仮説を効率的に見つける仮説の探索可能とす
る音声認識方法及びそのプログラム記録媒体。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition method capable of efficiently searching for a hypothesis closest to an input speech from many hypotheses that can be generated by a given grammar, and a program recording thereof. Medium.

【０００２】[0002]

【従来の技術】図４に示す音声認識処理において、入力
音声１１は、分析処理部１２により、特徴パラメータの
ベクトルデータ時系列に変換され、探索処理部１３によ
り文法／言語モデル（文法モデル及び言語モデル）１６
の許容する仮説と照合される。この仮説との照合結果の
評価値であるスコアは、仮説に対応する音響モデル１５
と入力音声１１の尤もらしさ（類似性）を示す音響スコ
アと、仮説の存在する確率に対応した言語スコアとから
なり、最も高いスコアを持つ仮説が認識結果１４として
出力される。2. Description of the Related Art In a speech recognition process shown in FIG. 4, an input speech 11 is converted into a time series of feature parameter vector data by an analysis processing unit 12, and a grammar / language model (grammar model and language) is searched by a search processing unit 13. Model) 16
Is checked against the allowable hypothesis. The score, which is the evaluation value of the result of collation with this hypothesis, is stored in the acoustic model 15 corresponding to the hypothesis.
And a sound score indicating the likelihood (similarity) of the input speech 11 and a language score corresponding to the probability that a hypothesis exists. The hypothesis having the highest score is output as the recognition result 14.

【０００３】分析処理部１２における信号処理として、
よく用いられるのは、フィルタバンク分析、線形予測分
析（Linear Predictive Coding、ＬＰＣと呼ばれる）、
特徴パラメータとしては、ＬＰＣケプストラム、ＬＰＣ
デルタケプストラム、メルケプストラム（mel-frequenc
y cepstral coefficients、ＭＦＣＣと呼ばれる）、デ
ルタＭＦＣＣ、対数パワーなどがある。音響モデル１５
としては確率・統計理論に基づいてモデル化された隠れ
マルコフモデル法（Hidden Markov Model、以後ＨＭＭ
法と呼ぶ）が主流である。このＨＭＭ法の詳細は、例え
ば、社団法人電子情報通信学会編、中川聖一著「確率モ
デルによる音声認識」に開示されている。[0003] As signal processing in the analysis processing section 12,
Commonly used are filter bank analysis, linear predictive analysis (referred to as LPC),
LPC cepstrum, LPC
Delta cepstrum, mel-frequenc
y cepstral coefficients, called MFCC), delta MFCC, log power, and the like. Acoustic model 15
Hidden Markov Model (HMM) modeled based on probability and statistical theory
Is called the law). The details of the HMM method are disclosed, for example, in Seiichi Nakagawa, "Speech Recognition by Stochastic Model", edited by the Institute of Electronics, Information and Communication Engineers.

【０００４】文法／言語モデル１６は、認識対象とする
文を定義するための単語の連結関係を規定したものであ
り、単語を枝とした単語ネットワークや言語の確率モデ
ル等が用いられる。言語の確率モデルは、単語単体の存
在確率、２つ以上の単語の連結する確率が用いられる。
この言語の確率モデルの詳細は、例えば、社団法人電子
情報通信学会編、中川聖一著「確率モデルによる音声認
識」に開示されている。[0004] The grammar / language model 16 defines the connection of words for defining a sentence to be recognized, and uses a word network with words as branches, a language probability model, or the like. The probability model of a language uses the existence probability of a single word and the probability that two or more words are connected.
The details of the stochastic model of this language are disclosed, for example, in "Speech Recognition by a Stochastic Model" by Seichi Nakagawa, edited by the Institute of Electronics, Information and Communication Engineers.

【０００５】探索処理部１３は、文法で規定された単語
の接続関係を示す単語ネットワーク上の単語系列に対応
した音響モデル１５と特徴パラメータのベクトルデータ
時系列とを照合し、音響的な尤もらしさ（類似性）を示
す音響スコアを求める一方、その単語系列に対応した言
語モデル１６から言語スコアを求め、音響スコアと言語
スコアからなる仮説のスコアを時刻（認識処理の単位時
間を単位とする時刻、以後この明細書中の時刻はこのこ
とを意味する）毎に求め、仮説のスコアの低い仮説を捨
て、仮説のスコアの高い仮説を残し、次の時刻では前の
時刻で残された仮説に対し、必要であれば単語の拡張を
行い、再び音響モデル１５、言語モデル１６に基づいて
評価を行う。[0005] The search processing unit 13 collates the acoustic model 15 corresponding to the word sequence on the word network indicating the connection relationship of the words specified by the grammar with the time series of the characteristic parameter vector data, and the acoustic likelihood. While obtaining an acoustic score indicating (similarity), a language score is obtained from the language model 16 corresponding to the word sequence, and the score of a hypothesis composed of the acoustic score and the language score is calculated as the time (time in units of unit time of the recognition process). Hereafter, the time in this specification means this), the hypothesis with a low hypothesis score is discarded, the hypothesis with a high hypothesis score is left, and the next time is replaced with the hypothesis left at the previous time. On the other hand, if necessary, word expansion is performed, and evaluation is performed again based on the acoustic model 15 and the language model 16.

【０００６】次に、この照合計算における時刻毎の最も
一般的な処理の流れについて、図４を用いて説明する。
探索処理部１３では、その時刻に計算すべき仮説のリス
トを持っており、そのリスト内の個々の仮説は、現時刻
までのその仮説のスコアとその仮説の履歴を表す単語系
列の情報とを保持しているものとする。始めに、仮説生
成部２１において、一つ前の時刻に計算された仮説のリ
ストの中で、仮説の履歴の最終単語の終端まで計算が終
わっている仮説があれば、その仮説の最終単語の次に接
続可能な単語を文法で規定された単語のネットワークか
ら求めて、その仮説の終端に付加することにより新しい
仮説を生成し、その生成した仮説を前記仮説のリストに
加え、現時刻の仮説のリストを作る。Next, the most general processing flow for each time in the collation calculation will be described with reference to FIG.
The search processing unit 13 has a list of hypotheses to be calculated at that time, and each hypothesis in the list includes a score of the hypothesis up to the current time and information of a word series representing the history of the hypothesis. Shall be held. First, in the hypothesis generation unit 21, if there is a hypothesis that has been calculated up to the end of the last word of the hypothesis history in the list of hypotheses calculated at the immediately preceding time, the last word of the hypothesis is calculated. Next, a connectable word is obtained from the network of words defined by the grammar, and a new hypothesis is generated by adding the hypothesis to the end of the hypothesis. Make a list of

【０００７】単語内スコア計算部２２では、現時刻の仮
説のリスト内の各仮説について、対応する音響モデル１
５と入力音声との照合を行い、その仮説のスコアに音響
スコアを加算する。そして、単語終端スコア計算部２３
では、現時刻の仮説のリスト内で、仮説の最終単語の終
端まで音響スコアが加算されている仮説に対して、文法
／言語モデル１６を用いてその最終単語の存在確率と対
応した言語スコアを加算する。The in-word score calculation unit 22 calculates the corresponding acoustic model 1 for each hypothesis in the list of hypotheses at the current time.
5 is compared with the input voice, and the acoustic score is added to the score of the hypothesis. Then, the word end score calculation unit 23
In the list of hypotheses at the current time, for the hypothesis in which the acoustic score is added up to the end of the last word of the hypothesis, the grammar / language model 16 is used to calculate the language score corresponding to the existence probability of the final word. to add.

【０００８】次に、仮説絞込み部２４において、現時刻
の仮説のリスト内で、同じ最終単語を持ち、音響スコア
の加算がその最終単語の同じ部分まで終わっている仮説
の中で、その最終単語の直前の単語が同一、または、直
前の単語の後部の一定個数の音素が同一である仮説は、
最も高いスコアを持つ仮説のみ現時刻の仮説のリストに
残し、他の仮説は棄却する。そして、時刻を１増やし、
再び仮説生成部２１に戻る。Next, in the hypothesis narrowing unit 24, among the hypotheses that have the same final word in the list of the hypotheses at the current time and the addition of the acoustic score ends up to the same part of the final word, the final word The hypothesis that the word immediately before is the same, or that a certain number of phonemes after the immediately preceding word are the same,
Only the hypothesis with the highest score is left in the list of hypotheses at the current time, and the other hypotheses are rejected. And increase the time by 1,
The process returns to the hypothesis generation unit 21 again.

【０００９】この時刻毎の計算は、入力音声の全ての時
刻において行われ、入力音声の終了時刻での仮説のリス
トにおいて、最終単語の終端まで計算が終わっている仮
説の中から最もスコアの高い仮説を選び、認識結果とす
る。なお、時刻毎の計算を始める前は、履歴なし、スコ
ア０の１つの仮説を、仮説のリストに入れておく。この
探索手法では、仮説の絞込み部２４において、計算する
仮説の数の増加を抑えている。仮説の絞込みを行う方法
として、仮説の最終単語とその直前の単語が同一である
仮説を絞り込む単語対近似（Word Pair Approximatio
n）と呼ばれる方法、仮説の最終単語とその直前の単語
の後部から一定個数の音素が同一である仮説を絞り込む
音素履歴近似と呼ばれる方法があり、単語対近似は、Ｉ
ＥＥＥの国際会議ＩＣＡＳＳＰ’９１のR.Schwartz、S.
Austin著、“A Comparison ofSeveral Approximate Alg
orithm for Finding Multiple(N-best)Sentence Hypoth
eses”、音素履歴近似は、社団法人電子情報通信学会の
技術研究報告ＳＰ９６−１０２の野田喜昭、松永昭一、
嵯峨山茂樹著、“単語グラフを用いた大語彙連続音声認
識における近似演算手法の検討”に開示されている。以
後、仮説の最終単語の直前の単語を先行単語、仮説の最
終単語の直前の単語の後部の任意個数の音素履歴を、先
行音素履歴と呼ぶことにする。The calculation for each time is performed at all times of the input voice. In the list of hypotheses at the end time of the input voice, the highest score among the hypotheses whose calculation is completed up to the end of the final word is obtained. Select a hypothesis and use it as the recognition result. Before starting the calculation for each time, one hypothesis with no history and a score of 0 is put in the hypothesis list. In this search method, the hypothesis narrowing unit 24 suppresses an increase in the number of hypotheses to be calculated. As a method of narrowing down hypotheses, a word pair approximation (Word Pair Approximatio) is used to narrow down hypotheses in which the last word of the hypothesis and the word immediately before it are the same.
n), there is a method called phoneme history approximation that narrows down a hypothesis in which a certain number of phonemes are the same from the last word of the hypothesis and the last part of the word immediately before it.
R. Schwartz, S. of the International Conference of EEE ICASPSP'91
Austin, “A Comparison of Several Approximate Alg
orithm for Finding Multiple (N-best) Sentence Hypoth
eses ", phoneme history approximation is based on the IEICE technical report SP96-102 Yoshiaki Noda, Shoichi Matsunaga,
It is disclosed in "Study of approximate calculation method in large vocabulary continuous speech recognition using word graph" by Shigeki Sagayama. Hereinafter, the word immediately before the last word of the hypothesis is referred to as the preceding word, and the arbitrary number of phoneme histories at the end of the word immediately before the last word of the hypothesis is referred to as the preceding phoneme history.

【００１０】続いて、探索処理部１３に履歴修正部と棄
却した仮説を蓄積するバッファメモリを有する場合の処
理の流れを図５を用いて説明する。尚、図中、図４で示
したものと同一の部分は同一の記号を付して重複説明を
省略した。仮説生成部２１、単語内スコア計算部２２、
単語終端スコア計算部２３までの処理は図４と同じであ
る。仮説絞込み部２４の処理は、仮説の絞込みの方法は
図４と同じであるが、絞込みを行った際に棄却した仮説
をバッファメモリ２６に記録する。単語終端スコア計算
部２３の次の処理である履歴修正部２５では、仮説絞込
み部２４において棄却された仮説をバッファメモリ２６
から読み出し、単語終端スコア計算部２３において評価
した仮説の先行単語までの履歴と、評価中の時刻以前に
前記バッファメモリ２６に保持された仮説の履歴とを入
れ替えてスコアを計算し、最も高いスコアを与える仮説
の履歴と入れ替えたときのスコアが、入れ替えない場合
のスコアよりも高くなれば、前記最も高いスコアを与え
る仮説の履歴を現在の仮説の先行単語までの履歴とし、
その仮説のスコアを前記最も高いスコアとする。Next, the flow of processing when the search processing unit 13 has a history correction unit and a buffer memory for storing rejected hypotheses will be described with reference to FIG. In the figure, the same parts as those shown in FIG. 4 are denoted by the same reference numerals, and redundant description is omitted. Hypothesis generation unit 21, in-word score calculation unit 22,
The processing up to the word end score calculation unit 23 is the same as that in FIG. The processing of the hypothesis narrowing unit 24 is the same as that of FIG. 4 for narrowing the hypotheses, but records the hypothesis rejected when the narrowing is performed in the buffer memory 26. The history correction unit 25, which is the next process of the word end score calculation unit 23, stores the hypothesis rejected by the hypothesis narrowing unit 24 in the buffer memory 26.
, And replaces the history up to the preceding word of the hypothesis evaluated by the word end score calculation unit 23 with the history of the hypothesis held in the buffer memory 26 before the time during the evaluation, and calculates the score. If the score when replaced with the history of the hypothesis that gives is higher than the score when not replaced, the history of the hypothesis that gives the highest score is the history up to the preceding word of the current hypothesis,
The score of the hypothesis is defined as the highest score.

【００１１】このような履歴の修正が必要となる理由を
次に述べる。単語対近似を用いる場合は、最終単語と先
行単語が同一の仮説の中でスコア最大の仮説をただ一つ
残すので、始端から先行単語より前までの単語系列が、
このスコア最大の仮説と異なる仮説は失われ、仮説のリ
ストには始端から先行単語の前までの単語系列が同一の
仮説しか残らない。しかしながら、３つ以上の単語系列
に対して言語スコアを与える言語モデル（例えば単語ト
ライグラム）を用いる場合は、仮説の最終単語と先行単
語と先行単語の前までの１つ以上の単語系列に基づいて
言語スコアを計算するが、その仮説と最終単語、先行単
語が同一で、先行単語より前までの単語系列が異なるす
でに失われた仮説に対して言語スコアを計算して加算し
た方が、仮説のスコアが高くなる場合がある。これは、
３つ以上の単語系列に対する言語スコアが先行単語より
前までの単語系列に依存して異なる値をとるためであ
る。従って、現時刻よりも後の時刻において、仮説のス
コアが他の仮説のスコアよりも高くなるような仮説を絞
込みによって失うことになる。以後、この現象を絞込み
の誤りと呼ぶことにする。The reason why such a history correction is required will be described below. When word pair approximation is used, the last word and the preceding word remain the only hypothesis with the highest score in the same hypothesis, so the word sequence from the beginning to the preceding word is
The hypothesis different from the hypothesis with the highest score is lost, and only the hypothesis with the same word sequence from the beginning to the preceding word remains in the hypothesis list. However, when a language model (for example, a word trigram) that gives a language score to three or more word sequences is used, the last word of the hypothesis, the preceding word, and one or more word sequences before the preceding word are used. The language score is calculated using the hypothesis, the final word and the preceding word are the same, and the word sequence up to the preceding word is different. Score may be higher. this is,
This is because language scores for three or more word sequences take different values depending on the word sequences up to the preceding word. Therefore, at a time later than the current time, a hypothesis in which the score of the hypothesis becomes higher than the scores of the other hypotheses is lost by narrowing. Hereinafter, this phenomenon will be referred to as a narrowing error.

【００１２】同様に、音素履歴近似を用いる場合は、最
終単語と先行音素履歴が同一の仮説の中でスコア最大の
仮説をただ一つ残すので、先行音素履歴が同一の仮説の
中で、始端から先行単語までの単語系列がこのスコア最
大の仮説と異なる仮説は棄却され、仮説のリストには先
行音素履歴ごとに始端から先行単語までの単語系列が同
一の仮説しか残らない。しかしながら、２つ以上の単語
系列に対して言語スコアを与える言語モデル（例えば単
語バイグラム）を用いる場合は、仮説の最終単語と先行
単語までの１つ以上の単語系列に基づいて言語スコアを
計算するが、その仮説と最終単語ならびに先行音素履歴
が同一で、先行単語までの単語系列が異なるすでに棄却
された仮説に対して言語スコアを計算して加算した方
が、仮説のスコアが高くなる場合がある。これは、言語
スコアが先行単語までの単語系列に依存して異なる値を
とるためである。Similarly, when the phoneme history approximation is used, only one hypothesis with the highest score remains in the hypothesis in which the last word and the preceding phoneme history are the same, so that the starting hypothesis in the same preceding phoneme history is the same. Hypotheses whose word series from the first word to the preceding word differ from the hypothesis with the highest score are rejected, and only the same hypothesis from the beginning to the preceding word remains in the list of hypotheses for each preceding phoneme history. However, when a language model (for example, a word bigram) that gives a language score to two or more word sequences is used, the language score is calculated based on one or more word sequences up to the last word of the hypothesis and the preceding word. However, if the hypothesis and the final word and the preceding phoneme history are the same and the word sequence up to the preceding word is different, calculating and adding the linguistic score to the already rejected hypothesis may increase the score of the hypothesis. is there. This is because the language score takes different values depending on the word sequence up to the preceding word.

【００１３】以上に述べたように、単語対近似では３つ
以上の単語系列に対して言語スコアを与える言語モデ
ル、音素履歴近似では２つ以上の単語系列に対して言語
スコアを与える言語モデルを用いる場合に、絞込みの誤
りが起こり得る。履歴の修正を行う手法としてＤｅｌａ
ｙｅｄＢｉｇｒａｍ（デイレイドバイグラム）という
手法がある。当該仮説の最終単語の開始時刻に終了し
た、バッファメモリに記録されているすべての仮説と、
当該仮説の先行単語までの仮説とを入れ替えて、バイグ
ラム（Ｂｉｇｒａｍ）確率に基づいて評価することによ
り、最も高いスコアとなる仮説の履歴を当該仮説の先行
単語までの履歴とする方法である。ＤｅｌａｙｅｄＢ
ｉｇｒａｍは、ＩＥＥＥの国際会議ＩＣＡＳＳＰ’９６
のM.Woszczyna、M.Finke、“Minimizing search errors
due to delayed bigrams in real-time speech recogn
ition systems”に開示されている。Ｄｅｌａｙｅｄ
Ｂｉｇｒａｍでは、時刻ごとに棄却された、単語終端ま
でのスコア計算が終わっている仮説をバッファメモリに
記録しておく。履歴修正部では再評価する仮説の最終単
語の開始時刻に基づいて、その開始時刻において棄却さ
れた仮説をバッファメモリから一つずつ読み出して、再
評価する仮説の履歴と入れ替えてスコアを計算すること
を繰り返す。As described above, in the word pair approximation, a language model that gives a language score to three or more word sequences, and in a phoneme history approximation, a language model that gives a language score to two or more word sequences. When used, narrowing errors can occur. Dela as a method to correct history
There is a method called yeard bigram (delayed bigram). All hypotheses recorded in the buffer memory that ended at the start time of the last word of the hypothesis,
This is a method in which the hypothesis up to the preceding word of the hypothesis is replaced with the hypothesis up to the preceding word of the hypothesis, and the history of the hypothesis having the highest score is evaluated based on the bigram (Bigram) probability. Delayed B
igram is an international conference of IEEE ICASPSP '96
M. Woszczyna and M. Finke, “Minimizing search errors
due to delayed bigrams in real-time speech recogn
ition systems ". Delayed
In Bigram, the hypothesis that has been rejected at each time and the score calculation up to the end of the word has been completed is recorded in the buffer memory. Based on the start time of the last word of the hypothesis to be reevaluated, the history correction unit reads out the rejected hypotheses at the start time one by one from the buffer memory and replaces the hypothesis with the history of the reevaluation hypothesis to calculate the score. repeat.

【００１４】次に、この時刻ごとに棄却された仮説を記
録する方法について図６を用いて説明する。図６は、図
５において仮説絞込み部２４から棄却された仮説をバッ
ファメモリ２６に渡す部分を詳細に記したものである。
当該時刻をｔとするとき、仮説絞込み部２４で棄却され
た仮説３１は、仮説の受け渡し処理２７によってバッフ
ァメモリ２６中の時刻ｔの仮説を記憶する領域３５に追
加される。以後、仮説を記憶する領域を仮説記憶領域と
呼ぶ。ここで、仮説３３は時刻ｔにおける仮説絞込み部
２４の処理過程で、仮説３１よりも前に時刻ｔの仮説記
憶領域３５に記憶された仮説とする。従って、仮説３１
も領域３５内に仮説３３として記憶されることになる。
また、バッファメモリ２６中の時刻ｔより以前の時刻の
仮説記憶領域（例えば時刻ｔ−１の仮説記憶領域３４）
には、各時刻で棄却された仮説（例えば仮説３２）が記
録されており、時刻ｔ以降の時刻の仮説記憶領域（例え
ば時刻ｔ＋１の仮説記憶領域３６）には、まだ何も記録
されていない。このように記憶しておけば、再評価する
仮説の最終単語の開始時刻を用いて、バッファメモリ２
６から履歴の修正に利用する複数の仮説を簡単に探し出
して読み出すことができる。Next, a method of recording a hypothesis rejected at each time will be described with reference to FIG. FIG. 6 shows in detail a portion of passing the hypothesis rejected from the hypothesis narrowing section 24 to the buffer memory 26 in FIG.
Assuming that the time is t, the hypothesis 31 rejected by the hypothesis narrowing unit 24 is added to the area 35 for storing the hypothesis at the time t in the buffer memory 26 by the passing process 27 of the hypothesis. Hereinafter, an area for storing a hypothesis is referred to as a hypothesis storage area. Here, the hypothesis 33 is a hypothesis stored in the hypothesis storage area 35 at the time t before the hypothesis 31 in the process of the hypothesis narrowing unit 24 at the time t. Therefore, hypothesis 31
Are also stored in the area 35 as the hypothesis 33.
Further, the hypothesis storage area in the buffer memory 26 before the time t (for example, the hypothesis storage area 34 at the time t-1).
, The hypothesis rejected at each time (for example, hypothesis 32) is recorded, and nothing is recorded yet in the hypothesis storage area at the time after time t (for example, hypothesis storage area 36 at time t + 1). . With this storage, the start time of the last word of the hypothesis to be re-evaluated is used to store data in the buffer memory 2.
From 6, it is possible to easily find and read out a plurality of hypotheses to be used for correcting the history.

【００１５】しかしながら、このような仮説の記録・読
み出し方法を用いると、同じ時刻に記録された仮説の数
が多くなった場合に、バッファメモリ２６内の各時刻の
仮説記憶領域に記録される仮説の数が多くなり、履歴修
正部における仮説の再評価に要する処理量が大きくな
る。However, if such a hypothesis recording / reading method is used, if the number of hypotheses recorded at the same time increases, the hypothesis recorded in the hypothesis storage area at each time in the buffer memory 26 is increased. And the amount of processing required for the reevaluation of the hypothesis in the history correction unit increases.

【００１６】[0016]

【発明が解決しようとする課題】ＤｅｌａｙｅｄＢｉ
ｇｒａｍに基づく履歴の修正方法では、棄却された単語
終端までのスコア計算が終わっている仮説を時刻ごとに
バッファメモリに記録しておき、履歴を修正する際は、
再評価する仮説の最終単語の開始時刻に対応するバッフ
ァメモリ内の仮説記憶領域に記録されたすべての仮説
を、再評価する仮説の履歴と入れ替えてスコアを計算す
るため、同じ時刻に記録された仮説の数が多い場合に、
履歴の修正にかかる計算量が大きいという問題があっ
た。SUMMARY OF THE INVENTION Delayed Bi
In the history correction method based on the gram, the hypothesis for which the score calculation up to the end of the rejected word has been completed is recorded in the buffer memory for each time, and when the history is corrected,
All the hypotheses recorded in the hypothesis storage area in the buffer memory corresponding to the start time of the last word of the hypothesis to be reevaluated are recorded at the same time in order to calculate the score by replacing the hypothesis with the history of the hypothesis to be reevaluated. If there are many hypotheses,
There was a problem that the amount of calculation required to correct the history was large.

【００１７】[0017]

【課題を解決するための手段】この発明によれば仮説絞
込みにより棄却された各仮説を、その棄却された時刻を
表す仮説棄却時刻と、その単語履歴の最終単語もしくは
最終単語の終端までの任意個数の音素列を表す音素履
歴、つまり履歴属性ごとに分類してバッファメモリに記
録し、履歴修正時に再評価する仮説の最終単語の開始時
刻と同一の仮説棄却時刻、ならびに履歴属性が同一、つ
まり再評価する仮説の最終単語の一つ前の単語を表す先
行単語もしくは先行単語の終端までの音素履歴と同一に
分類されているバッファメモリ内の仮説のみを読み出し
て再評価する仮説の履歴の修正に利用する。音素履歴を
用いる場合は、その音素の個数は従来の技術の項で述べ
た音素履歴近似法における先行音素履歴に用いる音素数
と同程度であればよい。According to the present invention, each hypothesis rejected by the hypothesis narrowing is converted into a hypothesis rejection time indicating the rejection time and an arbitrary word up to the last word of the word history or the end of the last word. The phoneme history representing the number of phoneme strings, that is, classified into each history attribute, recorded in the buffer memory, and the same hypothesis rejection time as the start time of the last word of the hypothesis to be reevaluated at the time of history correction, and the same history attribute, Modification of the history of the hypothesis that reads only the hypothesis in the buffer memory that is classified as the same as the preceding word representing the word before the last word of the hypothesis to be re-evaluated or the phoneme history up to the end of the previous word and re-evaluates Use for When a phoneme history is used, the number of phonemes may be approximately the same as the number of phonemes used for the preceding phoneme history in the phoneme history approximation method described in the section of the related art.

【００１８】[0018]

【発明の実施の形態】以下、図面を用いてこの発明の実
施の形態について説明する。図１はこの発明の一実施の
形態に係る、棄却した仮説の記録方法を示す図である。
尚、図中、図６で示したものと同一の部分は同一の記号
を付して重複説明を省略した。以下、この実施の形態を
図１を基に説明する。従来法であるＤｅｌａｙｅｄＢｉ
ｇｒａｍでは、図６において示したように、棄却した仮
説はバッファメモリ２６の時刻ごとに別々の仮説記憶領
域に記録するが、これに対し、この実施の形態では、棄
却した仮説をバッファメモリ２６の時刻ごとに別々に、
さらに、履歴属性、つまり最終単語もしくは音素履歴に
ついても別々に分類して記録する。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a diagram showing a method for recording a rejected hypothesis according to an embodiment of the present invention.
In the figure, the same parts as those shown in FIG. 6 are denoted by the same reference numerals, and redundant description is omitted. Hereinafter, this embodiment will be described with reference to FIG. DelayedBi, a conventional method
In the gram, as shown in FIG. 6, the rejected hypothesis is recorded in a separate hypothesis storage area at each time of the buffer memory 26. In contrast, in this embodiment, the rejected hypothesis is stored in the buffer memory 26. Separately for each time,
Further, the history attribute, that is, the last word or phoneme history is separately classified and recorded.

【００１９】当該時刻をｔとするとき、仮説絞込み部２
４で棄却された仮説４１は、仮説の受け渡し処理２７に
よってバッファメモリ２６の時刻ｔの、仮説４１の音素
履歴ａｃと同一の音素履歴ａｃに対応する仮説記憶領域
４６に追加する。ここで、仮説４３は、時刻ｔにおける
仮説絞込み部２４の処理過程で、仮説４１よりも前に時
刻ｔの各音素履歴ａｂ又はａｃに対応する仮説記憶領域
４５，４６に追加された仮説とする。また、バッファメ
モリ２６の時刻ｔより以前の時刻の仮説記憶領域（例え
ば時刻ｔ−１の仮説記憶領域４３，４４）には、各時刻
で棄却された仮説（例えば仮説４２）が音素履歴（例え
ば音素履歴ｂｃ，ｄｅ）毎に記録されており、時刻ｔ以
降の時刻の仮説記憶領域（例えば時刻ｔ＋１の仮説記憶
領域４７，４８）には、まだ何も記録されていない。When the time is t, the hypothesis narrowing unit 2
The hypothesis 41 rejected in 4 is added to the hypothesis storage area 46 corresponding to the same phoneme history ac as the phoneme history ac of the hypothesis 41 at time t in the buffer memory 26 by the hypothesis passing process 27. Here, the hypothesis 43 is a hypothesis added to the hypothesis storage areas 45 and 46 corresponding to each phoneme history ab or ac at the time t before the hypothesis 41 in the process of the hypothesis narrowing unit 24 at the time t. . Also, in the hypothesis storage areas of the buffer memory 26 before the time t (for example, the hypothesis storage areas 43 and 44 at the time t-1), the hypotheses rejected at each time (for example, the hypothesis 42) are stored in the phoneme history (for example, Each of the phoneme histories bc, de) is recorded, and nothing is recorded in the hypothesis storage areas at times after time t (for example, the hypothesis storage areas 47, 48 at time t + 1).

【００２０】そして、以上のように、棄却された仮説を
各時刻の音素履歴ごとにバッファメモリに記録してお
き、履歴修正部においては、再評価する仮説の最終単語
の開始時刻が仮説棄却時刻と一致し、なおかつ、先行音
素履歴が音素履歴と一致するバッファメモリ内の仮説記
憶領域、例えば領域４５を探し出して、その仮説記憶領
域４５に記録された仮説のみを読み出して履歴の修正に
利用する。棄却された仮説を各時刻ごとに、最終単語ご
とに各別の仮説記憶領域に記憶し、履歴修正時には、そ
の再評価する仮説の最終単語の開始時刻と一致した仮説
棄却時刻の先行単語が棄却仮説の最終単語と一致する仮
説記憶領域内の仮説のみを履歴修正に利用してもよい。As described above, the rejected hypotheses are recorded in the buffer memory for each phoneme history at each time, and the history correction unit determines the start time of the last word of the hypothesis to be reevaluated by the hypothesis rejection time. And finds a hypothesis storage area in the buffer memory where the preceding phoneme history coincides with the phoneme history, for example, the area 45, reads out only the hypothesis recorded in the hypothesis storage area 45, and uses it for history correction. . The rejected hypothesis is stored in a separate hypothesis storage area for each time and for each final word, and when the history is modified, the preceding word of the hypothesis rejection time that matches the start time of the final word of the hypothesis to be re-evaluated is rejected. Only the hypothesis in the hypothesis storage area that matches the last word of the hypothesis may be used for history correction.

【００２１】その結果、各時刻の最終単語もしくは音素
履歴ごとの仮説記憶領域に記録される仮説の数は、各時
刻ごとの仮説記憶領域に記録するＤｅｌａｙｅｄＢｉ
ｇｒａｍの場合に比べて少ないことから、仮説の履歴を
修正する際に、履歴を入れ替えてスコアを計算する回数
を抑えることができる。この発明の方法をコンピュータ
によりプログラムを実行させて達成させることもでき
る。その場合の機能構成を図２に示す。各部はバス５９
に接続され、分析プログラムがＣＤ−ＲＯＭ、磁気ディ
スクなど、あるいは通信回線を介してメモリ５４にイン
ストールされてあり、同様に探索処理プログラムがメモ
リ５７にインストールされてある。入力部５１から音声
が入力されると、必要に応じて記憶部５３に格納されな
がら、メモリ５４の分析プログラムをＣＰＵ５８が実行
することにより特徴パラメータのベクトルデータ時系列
に変換される。文法／言語モデルデータベース５６とし
て、この例では単語ネットワークメモリ５６ａと言語モ
デルデータベース５６ｂが設けられた場合である。As a result, the number of hypotheses recorded in the hypothesis storage area for each last word or phoneme history at each time is determined by the Delayed Bi recorded in the hypothesis storage area for each time.
Since the number is smaller than in the case of the gram, the number of times of changing the history and calculating the score can be suppressed when the history of the hypothesis is corrected. The method of the present invention can also be achieved by causing a computer to execute a program. FIG. 2 shows a functional configuration in that case. Each part is a bus 59
And an analysis program is installed in the memory 54 via a CD-ROM, a magnetic disk, or the like, or a communication line. Similarly, a search processing program is installed in the memory 57. When a voice is input from the input unit 51, the analysis program in the memory 54 is executed by the CPU 58 while being stored in the storage unit 53 as needed, and is converted into a time series of feature parameter vector data. In this example, a word network memory 56a and a language model database 56b are provided as the grammar / language model database 56.

【００２２】メモリ２６の探索処理プログラムをＣＰＵ
５８が実行することにより、特徴パラメータのベクトル
データ時系列に対し、前述したように、つまり図３に示
すように各時刻ごとに、仮説リストの生成を行い（Ｓ
１）、その仮説について音響モデルデータベース５５の
音響モデルを用いて音響スコアを計算して加算し（Ｓ
２）、最終単語の終端の仮説についてはそれまでの音響
スコアの加算値に対し、言語モデルを用いて言語スコア
を求めて加算して仮説のスコアを求め（Ｓ３）、次にそ
の仮説スコアを求めた仮説の最大のものについて、その
最終単語の開始時刻とその直前の履歴属性とが一致する
仮説記憶領域をバッファメモリ２６から探し、その領域
内の仮説を読み出して、履歴修正処理を行う（Ｓ４）。
この履歴修正処理した後の仮説スコアを用いて仮説絞込
みを行い、棄却仮説をその時刻と履歴属性ごとに分類し
てバッファメモリ２６のその記憶領域に記録する（Ｓ
５）。入力音声が終了していなければ（Ｓ６）、ステッ
プＳ１に戻り、終了した場合は、その時の最終単語の終
端に達した仮説中のスコアが最大の仮説を認識結果とし
て出力部５２から出力する（Ｓ７）。A search processing program for the memory 26 is executed by a CPU.
58, the hypothesis list is generated for the time series of the vector data of the characteristic parameters as described above, that is, at each time as shown in FIG.
1) For the hypothesis, an acoustic score is calculated using the acoustic model in the acoustic model database 55 and added (S)
2) Regarding the hypothesis at the end of the final word, a language score is obtained by using a language model with respect to the added value of the acoustic score up to that point, and the result is added to obtain a hypothesis score (S3). For the largest hypothesis obtained, the buffer memory 26 searches for a hypothesis storage area in which the start time of the last word matches the history attribute immediately before it, reads out the hypothesis in that area, and performs history correction processing ( S4).
Hypothesis narrowing is performed using the hypothesis score after the history correction processing, and rejection hypotheses are classified according to the time and the history attribute and recorded in the storage area of the buffer memory 26 (S
5). If the input speech has not ended (S6), the process returns to step S1, and if it has ended, the hypothesis with the largest score in the hypothesis reaching the end of the final word at that time is output from the output unit 52 as a recognition result ( S7).

【００２３】なお、この発明は１つの単語から全ての単
語への接続を許すような文法モデルを用いてもよい。The present invention may use a grammar model that allows connection from one word to all words.

【００２４】[0024]

【発明の効果】以上説明したように、この発明は、仮説
絞込みの際に棄却された仮説を、各時刻の履歴属性（最
終単語もしくは音素履歴）ごとに記録し、履歴修正の際
に、再評価する仮説の最終単語の開始時刻が仮説棄却時
刻と一致し、なおかつその仮説の先行単語もしくは先行
音素履歴が対応する履歴属性、つまり最終単語もしくは
音素履歴と一致するバッファメモリ内の仮説記憶領域に
記録された仮説のみを読み出して履歴の修正に利用する
ようにしているため、各時刻の仮説のリストに含まれる
各仮説の履歴を修正する際に、履歴を入れ替えてスコア
を計算する回数が従来より少なくなり履歴修正の計算量
が低減するという効果を奏する。As described above, according to the present invention, hypotheses rejected during hypothesis narrowing are recorded for each history attribute (last word or phoneme history) at each time, and when hypothesis correction is performed, re-creation is performed. The start time of the last word of the hypothesis to be evaluated matches the hypothesis rejection time, and the preceding word or preceding phoneme history of the hypothesis corresponds to the corresponding history attribute, that is, the last word or phoneme history. Since only the recorded hypotheses are read and used to correct the history, when correcting the history of each hypothesis included in the list of hypotheses at each time, the number of times to replace the history and calculate the score has been This has the effect of reducing the number of calculations and the amount of calculation for history correction.

[Brief description of the drawings]

【図１】この発明方法における棄却された仮説の記録状
態の例を示す図。FIG. 1 is a diagram showing an example of a recording state of a rejected hypothesis in the method of the present invention.

【図２】この発明の方法をコンピュータにより機能させ
る場合の機能構成例を示す図。FIG. 2 is a diagram showing an example of a functional configuration when a method of the present invention is caused to function by a computer.

【図３】この発明の実施例の処理手順を示す流れ図。FIG. 3 is a flowchart showing a processing procedure according to the embodiment of the present invention.

【図４】従来の音声認識処理の概要を示す図。FIG. 4 is a diagram showing an outline of a conventional speech recognition process.

【図５】履歴修正部とバッファメモリを有する音声認識
処理の概要を示す図。FIG. 5 is a diagram showing an outline of a speech recognition process having a history correction unit and a buffer memory.

【図６】従来の棄却された仮説の記録例を示す図。FIG. 6 is a diagram showing a recording example of a conventional rejected hypothesis.

Claims

[Claims]

At each time, a new hypothesis is generated, an acoustic score indicating the closeness of the input speech in the word of the hypothesis to the hypothesis and the corresponding acoustic model is calculated, and the word at the word end of the hypothesis is calculated. Using a language model that defines the connection relationship between the language and the language score corresponding to the probability that the hypothesis exists, a hypothesis indicating the likelihood of the hypothesis regarding the input speech content allowed by the grammar from the acoustic score and the language score Of the hypothesis, reject the hypothesis with a low score of the hypothesis, record the rejected hypothesis in the buffer memory, and store the word history representing the word sequence from the start time of the input speech of the hypothesis to the current time in the buffer. The hypothesis is re-evaluated by correcting it based on the hypothesis rejected at the past time recorded in the memory, and the hypothesis with the highest score when all voices are input In a speech recognition method that outputs the rejected hypothesis, a hypothesis rejection time indicating the time when the hypothesis was rejected, a final word indicating the last word in the word history,
Alternatively, the data is classified into phoneme histories representing a predetermined number of phoneme strings up to the end of the last word, recorded in the buffer memory, and the same hypothesis rejection time as the start time of the last word of the hypothesis to be reevaluated, and reevaluation Only the hypothesis in the buffer memory classified as the last word or phoneme history, which is the same as the preceding word or the phoneme history up to the last word representing the word before the last word of the hypothesis to be read, is reevaluated. A speech recognition method, which is used for correcting a hypothesis history.

2. A recording medium on which a program for causing a computer to execute the method according to claim 1 is recorded.