JP2000099089A

JP2000099089A - Search device for continuous voice recognition and search method for continuous voice recognition

Info

Publication number: JP2000099089A
Application number: JP10268590A
Authority: JP
Inventors: Yoshiharu Abe; 芳春阿部
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1998-09-22
Filing date: 1998-09-22
Publication date: 2000-04-07
Anticipated expiration: 2018-09-22
Also published as: JP3583299B2

Abstract

PROBLEM TO BE SOLVED: To prevent the omission of an optimum word train in search of the first stage and to enable searching candidates of word train without increasing search space in search of the second stage. SOLUTION: An analyzed result made by a voice analyzing means 102 is inputted, an optimum syllable train 4 obtained by an optimum solution obtaining means 2 refers to a difference model 6 in which likelihood corresponding to a syllable train of right solution and a word dictionary 7 in which standard syllable train of words is described. Then, candidates of a word train are searched, and a word train candidate 8 is outputted from a word train searching means 5.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、大語彙からなる
連続音声を認識して正解の単語列侯補を求めることがで
きる連続音声認識用の探索装置および探索方法に関する
ものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a search device and a search method for continuous speech recognition capable of recognizing continuous speech composed of large vocabulary words and finding a correct word string candidate.

【０００２】[0002]

【従来の技術】大語彙からなる連続音声を認識して単語
列の侯補を求める連続音声認識において、単語列の侯補
を求める探索方法として、１段で探索する方法、およ
び、多段で探索する方法があり、１段で探索する方法と
してはビーム探索法がある。また、多段で探索する方法
としては、１段目で単語グラフを作成し、２段目で単語
グラフの中で単語列侯補を求める方法がある。2. Description of the Related Art In continuous speech recognition for recognizing continuous speech composed of large vocabulary words and finding candidates for word strings, a single-step search method and a multi-step search method for finding candidate word strings are available. There is a beam search method as a method of searching in one stage. In addition, as a method of searching in multiple stages, there is a method in which a word graph is created in the first stage, and a word string candidate is found in the word graph in the second stage.

【０００３】ビーム探索法は、単語列侯補の部分列を表
す仮説として空の単語列の仮説から出発して、入力フレ
ームと同期して、仮説の展開処理を行い単語列を成長さ
せる方法であり、このビーム探索法では、入力フレーム
の進行と共に、可能な単語の組合わせが増え、単語列の
侯補数が増大するので、音響モデルの尤度と言語モデル
の尤度を用いて尤度の低い仮説について枝刈処理が行わ
れる。枝刈処理により、仮説を一定数に押さえて、正解
の単語列が単語列侯補の中から脱落しないようにして探
索を進める。なお、尤度とは、標準音節列が最適音節列
に対応づけられる確率の対数値である。[0003] The beam search method is a method of starting from an empty word string hypothesis as a hypothesis representing a partial string of a word string candidate, developing the hypothesis in synchronization with an input frame, and growing the word string. In this beam search method, as the number of possible word combinations increases with the progress of the input frame, and the number of candidates of the word string increases, the likelihood of the acoustic model and the likelihood of the language model are used. The pruning process is performed on the low hypothesis. By the pruning process, the number of hypotheses is reduced to a certain number, and the search is advanced so that the correct word string does not drop out of the candidate word strings. The likelihood is a logarithmic value of a probability that a standard syllable string is associated with an optimal syllable string.

【０００４】一方、単語グラフを用いる方法は、２段階
で探索を進めるものである。まず、１段目の探索で単語
の候補を残すが、例えば、直前の１単語から派生する単
語の侯補のみを残すことで行う。次の２段目の探索で
は、１段目で作成された単語の侯補を組合わせて、単語
列の侯補を作成する。この際、音響モデルの尤度と言語
モデルの尤度を加えて、尤度の大きい単語列侯補を探索
する。２段目の探索では、スタックデコーダを用いた探
索が用いられる。On the other hand, the method using a word graph advances the search in two stages. First, word candidates are left in the first-stage search. For example, the search is performed by leaving only candidate words derived from the immediately preceding word. In the next search at the second stage, candidates of the word string are created by combining candidates of the word created at the first stage. At this time, the likelihood of the acoustic model and the likelihood of the language model are added to search for a candidate word string having a large likelihood. In the second search, a search using a stack decoder is used.

【０００５】また、多段階で探索する探索方法として
は、１段目で最適解を求め、１段目の最適解の変形によ
り２段目の探索を行うものがあり、特開平５−１８１４
９８号公報に開示されている。この発明では、１段目
で、荒い精度で動的計画法（以下、ＤＰ：Ｄｙｎａｍｉ
ｃＰｒｏｇｒａｍｍｉｎｇ法という）を用いて最適単
語を高速に求め、２段目で１段目で選択された複数の侯
補パターンデータの中からＤＰ法で認識結果を求めるも
のである。この方法では、最適単語列は必ずしも正解単
語列と一致しないが、正解単語列とかなり類似してい
る。しかし、１段目で正解単語列のパターンデータが見
いだされなければ、２段目でも、正解単語列を求めるこ
とはできない。As a search method for searching in multiple stages, there is a method in which an optimum solution is obtained in the first stage and a search in the second stage is performed by modifying the optimum solution in the first stage.
No. 98 is disclosed. According to the present invention, in the first stage, dynamic programming (hereinafter referred to as DP: Dynami
c is called a programming method), and a recognition result is obtained by the DP method from a plurality of candidate pattern data selected in the first stage in the second stage. In this method, the optimal word string does not always match the correct word string, but is quite similar to the correct word string. However, if the pattern data of the correct word string is not found in the first row, the correct word string cannot be obtained even in the second row.

【０００６】[0006]

【発明が解決しようとする課題】従来の連続音声認識用
の探索装置および連続音声認識用の探索方法は以上のよ
うに構成されているので、１段目の探索で単語侯補の中
に正解が残らない場合、２段目で正解を求めることがで
きないなどの課題があった。また、１段目で正解を残そ
うとすると、単語侯補の数が増大し、２段目の処理で考
慮すべき単語の組み合わせが増大し探索空間が増大する
などの課題があった。さらに、音響的に類似した単語列
侯補が探索されるため認識精度が低下するなどの課題が
あった。The conventional search device for continuous speech recognition and the conventional search method for continuous speech recognition are constructed as described above. However, there is a problem that the correct answer cannot be obtained in the second stage when the error does not remain. In addition, when trying to leave a correct answer in the first row, there is a problem that the number of word candidates increases, the number of combinations of words to be considered in the processing in the second row increases, and the search space increases. Furthermore, there is a problem that the recognition accuracy is reduced because a word string candidate that is acoustically similar is searched.

【０００７】この発明は上記のような課題を解決するた
めになされたもので、１段目の探索では最適な単語列が
脱落することを防止し、２段目の探索では探索空間を増
大させずに単語列の侯補を探索することができる連続音
声認識用の探索装置および連続音声認識用の探索方法を
得ることを目的とする。SUMMARY OF THE INVENTION The present invention has been made to solve the above-described problems. In the first stage search, an optimal word string is prevented from being dropped, and in the second stage search, the search space is increased. It is an object of the present invention to obtain a search device for continuous speech recognition and a search method for continuous speech recognition that can search for candidates of a word string without having to search.

【０００８】[0008]

【課題を解決するための手段】この発明に係る連続音声
認識用の探索装置は、１段目で求めた最適解と正解とが
対応する尤度を表現した差分モデルを設け、１段目で求
めた最適解から差分モデルを適用して、２段目の探索を
行うようにしたものである。A search device for continuous speech recognition according to the present invention is provided with a difference model expressing the likelihood that the optimal solution obtained in the first stage corresponds to the correct solution, and the first stage provides a difference model. A second stage search is performed by applying a difference model from the obtained optimal solution.

【０００９】この発明に係る連続音声認識用の探索装置
は、入力音声を分析する音声分析手段の作成した分析結
果を入力し、音節間の接続を表すオートマトンで制御さ
れ最適な音節列を最適解取得手段により求め、最適解取
得手段が求めた最適音節列を入力し、最適解取得手段が
求めた最適な音節列が正解の音節列に対応する尤度を記
述した差分モデルと単語の標準的な音節列を記述した単
語辞書とを参照し単語列の侯補を探索し単語列の侯補を
単語列探索手段により出力するようにしたものである。A search device for continuous speech recognition according to the present invention inputs an analysis result created by speech analysis means for analyzing an input speech, and controls an automaton representing a connection between syllables to find an optimal syllable string. A standard model of words and a difference model that describes the likelihood that the optimal syllable sequence obtained by the optimal solution obtaining device is input and the optimal syllable sequence obtained by the optimal solution obtaining device corresponds to the correct syllable sequence. A candidate word string is described with reference to a word dictionary describing a syllable string, and the candidate word string is output by a word string search unit.

【００１０】この発明に係る連続音声認識用の探索装置
は、差分モデルにおいて、最適な音節列の部分音節列と
正解の音節列の部分音節列とこれらの対応する尤度を記
述した音節列間変換尤度テーブルとし、単語列探索手段
は音節列間変換尤度テーブルに記述された尤度に基づい
て単語列の侯補を探索するようにしたものである。A search apparatus for continuous speech recognition according to the present invention is characterized in that, in the difference model, a partial syllable sequence of an optimum syllable sequence, a partial syllable sequence of a correct syllable sequence, and a corresponding syllable sequence describing the likelihood thereof are described. The conversion likelihood table is used, and the word string search means searches for a candidate of a word string based on the likelihood described in the inter-syllable string conversion likelihood table.

【００１１】この発明に係る連続音声認識用の探索装置
は、差分モデルにおいて、最適な音節列の部分音節列と
正解の音節列の部分音節列とこれらの対応する尤度を記
述した音節列間変換尤度テーブルと、最適な音節列の長
さと単語辞書の音節列の長さとこれらが対応する尤度を
記述した単語音節長変換尤度テーブルを備え、単語列探
索手段は音節列間変換尤度テーブルと単語音節長変換尤
度テーブルとに記述された尤度に基づいて単語列の侯補
を探索するようにしたものである。The search apparatus for continuous speech recognition according to the present invention is characterized in that, in the difference model, a partial syllable string of an optimal syllable string, a partial syllable string of a correct syllable string, and a corresponding syllable string describing the likelihood thereof are described. A word syllable length conversion likelihood table which describes an optimal syllable string length, a syllable string length of a word dictionary, and a likelihood corresponding to the syllable string length, and a word string search means. A candidate of a word string is searched based on the likelihood described in the degree table and the word syllable length conversion likelihood table.

【００１２】この発明に係る連続音声認識用の探索装置
は、入力音声を分析する音声分析手段の作成した分析結
果を入力し、音節間の接続を表すオートマトンで制御さ
れ最適な音節列を最適解取得手段により求め、最適解取
得手段が求めた最適な音節列が正解の音節列に対応する
尤度を記述した差分モデルと単語の標準的な音節列を記
述した単語辞書を参照し、最適解取得手段が求めた最適
音節列を入力し、単語辞書の各単語について、単語と差
分モデルに記述に基づいて単語辞書の標準的な音節列を
変形した音節グラフとを記述した差分モデル適用単語辞
書を参照し単語列の侯補を探索し単語列の侯補を単語列
探索手段により出力するようにしたものである。A search device for continuous speech recognition according to the present invention inputs an analysis result created by speech analysis means for analyzing an input speech, and controls an automaton representing a connection between syllables to find an optimal syllable string. The optimal solution obtained by the obtaining means is referred to a difference model describing the likelihood that the optimal syllable string obtained by the obtaining means corresponds to the correct syllable string, and the word dictionary describing the standard syllable string of the word. A difference model-applied word dictionary in which the optimal syllable string obtained by the acquisition means is input, and for each word in the word dictionary, a word and a syllable graph obtained by transforming a standard syllable string of the word dictionary based on the description in the difference model are described. And searches for candidates for the word string, and outputs the candidates for the word string by the word string search means.

【００１３】この発明に係る連続音声認識用の探索装置
は、入力音声を分析する音声分析手段の作成した分析結
果を入力し、音節間の接続を表すオートマトンで制御さ
れ最適な音節列を最適解取得手段により求め、最適な音
節列を入力し、最適解取得手段が求めた最適な音節列が
正解の音節列に対応する尤度を記述した差分モデルの記
述に基づいて最適な音節列を変形してグラフを差分モデ
ル適用音節グラフ作成手段により作成し、差分モデル適
用音節グラフ作成手段が作成したグラフを入力して、単
語の標準的な音節列を記述した単語辞書を参照し単語列
の侯補を探索し単語列の侯補を単語列探索手段により出
力するようにしたものである。A search device for continuous speech recognition according to the present invention inputs an analysis result created by speech analysis means for analyzing input speech, and controls an automaton representing connections between syllables to find an optimal syllable string. The optimal syllable string is input based on the optimal syllable string determined by the acquisition means, and the optimal syllable string determined by the optimal solution acquiring means is transformed based on the description of the difference model describing the likelihood corresponding to the correct syllable string. Then, a graph is created by the difference model applied syllable graph creating means, and the graph created by the difference model applied syllable graph creating means is input, and a word dictionary is described by referring to a word dictionary describing a standard syllable string of the word. In this method, a complement is searched for and a candidate of a word string is output by a word string search unit.

【００１４】この発明に係る連続音声認識用の探索装置
は、入力音声を分析する音声分析手段の作成した分析結
果を入力し、音節間の接続を表すオートマトンで制御さ
れ最適な上位Ｎ個の音節からなる音節列をＮベスト解取
得手段により求め、Ｎベスト解取得手段が求めた最適な
上位Ｎ個の音節からなる音節列を入力し、Ｎベスト解取
得手段が求めた最適な上位Ｎ個の音節からなる音節列が
正解の音節列に対応する尤度を記述した差分モデルと単
語の標準的な音節列を記述した単語辞書とを参照し単語
列の侯補を探索し単語列の侯補を単語列探索手段により
出力するようにしたものである。A search device for continuous speech recognition according to the present invention inputs an analysis result created by a speech analysis means for analyzing an input speech, and is controlled by an automaton representing a connection between syllables to form an optimum top N syllables. Is obtained by the N best solution obtaining means, and a syllable string composed of the optimum upper N syllables obtained by the N best solution obtaining means is inputted, and the optimum upper N syllables obtained by the N best solution obtaining means are obtained. Search for candidate word strings by referring to a difference model that describes the likelihood that a syllable string consisting of syllables corresponds to the correct syllable string and a word dictionary that describes a standard syllable string of words. Is output by the word string search means.

【００１５】この発明に係る連続音声認識用の探索装置
は、入力音声を分析する音声分析手段の作成した分析結
果を入力し、音節間の接続を表すオートマトンで制御さ
れ最適なＮ個の音節からなる音節列をＮベスト解取得手
段により求め、Ｎベスト解取得手段が求めた最適なＮ個
の音節からなる音節列が正解の音節列に対応する尤度を
記述した差分モデルと単語の標準的な音節列を記述した
単語辞書を参照し、Ｎベスト解取得手段が求めた最適音
節列を入力し、単語辞書の各単語について、単語と差分
モデルに記述に基づいて単語辞書の標準的な音節列を変
形した音節グラフとを記述した差分モデル適用単語辞書
を参照し単語列の侯補を探索し単語列の侯補を単語列探
索手段により出力するようにしたものである。A search device for continuous speech recognition according to the present invention receives an analysis result created by a speech analysis means for analyzing an input speech, and controls the optimal N syllables controlled by an automaton representing connections between syllables. A standard model of a word and a difference model that describes the likelihood that the optimal syllable string composed of N syllables obtained by the N best solution obtaining means determines the likelihood corresponding to the correct syllable string. The syllable string described by the N best solution obtaining means is input by referring to the word dictionary describing the various syllable strings, and for each word in the word dictionary, the standard syllable of the word dictionary is described based on the word and the difference model. A candidate of a word string is searched for with reference to a difference model-applied word dictionary describing a syllable graph in which a string is transformed, and a candidate of a word string is output by a word string search unit.

【００１６】この発明に係る連続音声認識用の探索装置
は、入力音声を分析する音声分析手段の作成した分析結
果を入力し、音節間の接続を表すオートマトンで制御さ
れ最適なＮ個の音節からなる音節列をＮベスト解取得手
段により求め、最適な音節列を入力し、Ｎベスト解取得
手段が求めた最適なＮ個の音節からなる音節列が正解の
音節列に対応する尤度を記述した差分モデルの記述に基
づいて最適なＮ個の音節からなる音節列を変形してグラ
フを差分モデル適用音節グラフ作成手段により作成し、
差分モデル適用音節グラフ作成手段が作成したグラフを
入力して、単語の標準的な音節列を記述した単語辞書を
参照し単語列の侯補を探索し単語列の侯補を単語列探索
手段により出力するようにしたものである。A search device for continuous speech recognition according to the present invention inputs an analysis result created by a speech analysis means for analyzing an input speech, and controls an optimal N syllables controlled by an automaton representing connections between syllables. The best syllable string is obtained by the N best solution obtaining means, and the likelihood that the optimum N syllable string obtained by the N best solution obtaining means corresponds to the correct syllable string is described. Based on the description of the difference model, the optimal syllable string composed of N syllables is transformed to create a graph by the difference model applied syllable graph creation means,
By inputting the graph created by the difference model applied syllable graph creating means, referring to a word dictionary describing a standard syllable string of the word, searching for a candidate of the word string, and finding a candidate of the word string by the word string searching means. This is to output.

【００１７】この発明に係る連続音声認識用の探索装置
は、差分モデルにおいて、最適な音節列の長さと単語辞
書の音節列の長さとこれらが対応する尤度を記述した単
語音節長変換尤度テーブルを備え、単語列探索手段は、
単語音節長変換尤度テーブルの尤度に基づいて単語列の
侯補を探索するようにしたものである。The search apparatus for continuous speech recognition according to the present invention provides a word syllable length conversion likelihood describing an optimal syllable string length, a syllable string length of a word dictionary, and a likelihood corresponding thereto in a difference model. Comprising a table, the word string search means comprises:
A candidate for a word string is searched for based on the likelihood of the word syllable length conversion likelihood table.

【００１８】この発明に係る連続音声認識用の探索装置
は、入力音声を分析する音声分析手段の作成した分析結
果を入力し、単語間の接続を表すオートマトンで制御さ
れ最適な単語列を最適解取得手段により求め、最適解取
得手段が求めた最適な単語列を音節列変換手段により音
節列に変換し、音節列変換手段が求めた最適音節列を入
力し、音節列変換手段が求めた音節列が正解の音節列に
対応する尤度を記述した差分モデルと単語の標準的な音
節列を記述した単語辞書とを参照し、単語列の侯補を探
索し単語列の侯補を単語列探索手段により出力するよう
にしたものである。A search device for continuous speech recognition according to the present invention inputs an analysis result created by speech analysis means for analyzing an input speech, and controls an automaton representing a connection between words to find an optimal word sequence. The syllable string obtained by the syllable string conversion means is converted by the syllable string conversion means into the syllable string obtained by the syllable string conversion means. Referencing the difference model describing the likelihood corresponding to the correct syllable sequence and the word dictionary describing the standard syllable sequence of the word, searching for the candidate of the word sequence and searching for the candidate of the word sequence in the word sequence This is output by the search means.

【００１９】この発明に係る連続音声認識用の探索装置
は、入力音声を分析する音声分析手段の作成した分析結
果を入力し、単語間の接続を表すオートマトンで制御さ
れ最適な単語列を最適解取得手段により求め、最適解取
得手段が求めた最適単語列を入力し、最適解取得手段が
求めた最適な単語列が正解の単語列に対応する尤度を記
述した差分モデルと単語を記述した単語辞書とを参照し
単語列の侯補を探索し単語列の侯補を単語列探索手段に
より出力するようにしたものである。A search device for continuous speech recognition according to the present invention inputs an analysis result created by speech analysis means for analyzing an input speech, and controls an automaton representing a connection between words to find an optimal word sequence. The difference model and the word that describe the likelihood that the optimum word string obtained by the optimum solution obtaining means is input and the optimum word string obtained by the optimum solution obtaining means corresponds to the correct word string are described. A candidate for a word string is searched for with reference to a word dictionary, and a candidate for a word string is output by a word string search unit.

【００２０】この発明に係る連続音声認識用の探索装置
は、差分モデルにおいて、単語辞書の単語と対応する最
適な単語列の長さとその尤度を記述した単語音節長変換
尤度テーブルを備え、単語列探索手段は、単語音節長変
換尤度テーブルの尤度に基づいて単語列の侯補を探索す
るようにしたものである。A search device for continuous speech recognition according to the present invention includes a word syllable length conversion likelihood table in which a difference model describes the length of an optimal word string corresponding to a word in a word dictionary and the likelihood thereof. The word string search means is configured to search for a candidate for a word string based on the likelihood of the word syllable length conversion likelihood table.

【００２１】この発明に係る連続音声認識用の探索方法
は、１段目で求めた最適解と正解とが対応する尤度を表
現した差分モデルを設け、１段目で求めた最適解から差
分モデルを適用して、２段目の探索を行うようにしたも
のである。In the search method for continuous speech recognition according to the present invention, a difference model expressing a likelihood corresponding to the optimum solution obtained in the first step and the correct answer is provided, and the difference is calculated from the optimum solution obtained in the first step. A second-stage search is performed by applying a model.

【００２２】この発明に係る連続音声認識用の探索方法
は、入力音声の分析結果を入力し、音節間の接続を表す
オートマトンで制御された最適な音節列を求め、この最
適な音節列が正解の音節列に対応する尤度を記述した差
分モデルと単語の標準的な音節列を記述した単語辞書と
を参照し単語列の侯補を探索し、単語列の侯補を出力す
るようにしたものである。In the search method for continuous speech recognition according to the present invention, an analysis result of an input speech is inputted, and an optimal syllable string controlled by an automaton representing a connection between syllables is obtained. Search for candidate words in a word string by referring to a difference model describing the likelihood corresponding to the syllable string and a word dictionary describing a standard syllable string of words, and output candidate words in the word string Things.

【００２３】この発明に係る連続音声認識用の探索方法
は、入力音声の分析結果を入力し、音節間の接続を表す
オートマトンで制御された最適な音節列を求め、この最
適な音節列が正解の音節列に対応する尤度を記述した差
分モデルと単語の標準的な音節列を記述した単語辞書と
を参照し、単語辞書の各単語について、単語と差分モデ
ルに記述に基づいて単語辞書の標準的な音節列を変形し
た音節グラフとを記述した差分モデル適用単語辞書を参
照し、単語列の侯補を探索し単語列の侯補を出力するよ
うにしたものである。In the search method for continuous speech recognition according to the present invention, an analysis result of an input speech is inputted, and an optimal syllable string controlled by an automaton representing a connection between syllables is obtained. For each word in the word dictionary, reference is made to the difference model describing the likelihood corresponding to the syllable sequence of the word and the word dictionary describing the standard syllable sequence of the word. With reference to a difference model-applied word dictionary that describes a syllable graph obtained by transforming a standard syllable string, a candidate for the word string is searched for and a candidate for the word string is output.

【００２４】この発明に係る連続音声認識用の探索方法
は、入力音声の分析結果を入力し、音節間の接続を表す
オートマトンで制御された最適な音節列を求め、この最
適な音節列が正解の音節列に対応する尤度を記述した差
分モデルの記述に基づいて最適な音節列を変形してグラ
フを作成し、この作成したグラフを入力して、単語の標
準的な音節列を記述した単語辞書を参照し単語列の侯補
を探索し単語列の侯補を出力するようにしたものであ
る。In the search method for continuous speech recognition according to the present invention, an analysis result of an input speech is input, and an optimum syllable string controlled by an automaton representing a connection between syllables is obtained. Based on the description of the difference model that describes the likelihood corresponding to the syllable sequence of, a graph was created by transforming the optimal syllable sequence, and the created graph was input to describe the standard syllable sequence of words The candidate of the word string is searched by referring to the word dictionary, and the candidate of the word string is output.

【００２５】この発明に係る連続音声認識用の探索方法
は、入力音声の分析結果を入力し、音節間の接続を表す
オートマトンで制御され最適な上位Ｎ個の音節からなる
音節列を求め、これら最適な上位Ｎ個の音節からなる音
節列を入力し、最適な上位Ｎ個の音節からなる音節列が
正解の音節列に対応する尤度を記述した差分モデルと単
語の標準的な音節列を記述した単語辞書とを参照し、単
語列の侯補を探索し、単語列の侯補を出力するようにし
たものである。In the search method for continuous speech recognition according to the present invention, an analysis result of an input speech is input, and an optimal syllable string composed of upper N syllables controlled by an automaton representing a connection between syllables is obtained. Input a syllable string consisting of the optimal top N syllables, a difference model describing the likelihood that the optimal syllable string consisting of the top N syllables corresponds to the correct syllable string, and a standard syllable string of words. By referring to the described word dictionary, a candidate for a word string is searched for, and a candidate for the word string is output.

【００２６】この発明に係る連続音声認識用の探索方法
は、入力音声の分析結果を入力し、音節間の接続を表す
オートマトンで制御され最適なＮ個の音節からなる音節
列を求め、この最適音節列を入力し、この最適なＮ個の
音節からなる音節列が正解の音節列に対応する尤度を記
述した差分モデルと単語の標準的な音節列を記述した単
語辞書を参照し、この単語辞書の各単語について、単語
と差分モデルに記述に基づいて単語辞書の標準的な音節
列を変形した音節グラフとを記述した差分モデル適用単
語辞書を参照し、単語列の侯補を探索し、単語列の侯補
を出力するようにしたものである。In the search method for continuous speech recognition according to the present invention, an analysis result of an input speech is inputted, and a syllable string composed of N optimal syllables controlled by an automaton representing a connection between syllables is obtained. A syllable string is input, and a difference model describing the likelihood that the optimal syllable string composed of N syllables corresponds to a correct syllable string and a word dictionary describing a standard syllable string of words are referred to. For each word in the word dictionary, refer to the difference model applied word dictionary that describes a word and a syllable graph obtained by transforming a standard syllable string of the word dictionary based on the description in the difference model, and search for a candidate for the word string. , The candidate of the word string is output.

【００２７】この発明に係る連続音声認識用の探索方法
は、入力音声の分析結果を入力し、音節間の接続を表す
オートマトンで制御され最適なＮ個の音節からなる音節
列を求め、この最適なＮ個の音節からなる音節列が正解
の音節列に対応する尤度を記述した差分モデルの記述に
基づいて最適なＮ個の音節からなる音節列を変形してグ
ラフを作成し、この作成したグラフを入力して、単語の
標準的な音節列を記述した単語辞書を参照し単語列の侯
補を探索し単語列の侯補を出力するようにしたものであ
る。In the search method for continuous speech recognition according to the present invention, an analysis result of an input speech is inputted, and a syllable string composed of N optimal syllables controlled by an automaton representing a connection between syllables is obtained. A graph is created by deforming an optimal syllable string of N syllables based on the description of the difference model that describes the likelihood that the syllable string of N syllables corresponds to the correct syllable string. Then, the input graph is input, a candidate word string is searched for by referring to a word dictionary describing a standard syllable string of the word, and the candidate word string is output.

【００２８】この発明に係る連続音声認識用の探索方法
は、入力音声の分析結果を入力し、単語間の接続を表す
オートマトンで制御された最適な単語列を求め、この最
適な単語列を音節列に変換し、この音節列が正解の音節
列に対応する尤度を記述した差分モデルと単語の標準的
な音節列を記述した単語辞書とを参照し、単語列の侯補
を探索し、単語列の侯補を出力するようにしたものであ
る。In the search method for continuous speech recognition according to the present invention, an analysis result of an input speech is inputted, an optimum word string controlled by an automaton representing a connection between words is obtained, and this optimum word string is converted to a syllable. Converted into a sequence, this syllable sequence is referred to a difference model that describes the likelihood corresponding to the correct syllable sequence and a word dictionary that describes a standard syllable sequence of words, and searches for candidates for the word sequence. A candidate for a word string is output.

【００２９】この発明に係る連続音声認識用の探索方法
は、入力音声の分析結果を入力し、単語間の接続を表す
オートマトンで制御された最適な単語列を求め、この最
適な単語列が正解の単語列に対応する尤度を記述した差
分モデルと単語を記述した単語辞書とを参照し、単語列
の侯補を探索し、単語列の侯補を出力するようにしたも
のである。In the search method for continuous speech recognition according to the present invention, an analysis result of an input speech is inputted, and an optimal word sequence controlled by an automaton representing a connection between words is obtained. With reference to a difference model describing the likelihood corresponding to the word string and a word dictionary describing the word, a candidate for the word string is searched for, and a candidate for the word string is output.

【００３０】[0030]

【発明の実施の形態】以下、この発明の実施の一形態を
説明する。実施の形態１．図１はこの発明の実施の形態１による連
続音声認識用の探索装置を示す構成図であり、図におい
て、１０１は入力音声、１０２は入力音声１０１を分析
して特徴ベクトル時系列１０３に変換する音声分析手
段、２は特徴ベクトル時系列１０３を入力し音節ネット
ワーク３に従った最適音節列４を得る最適解取得手段、
５は最適音節列４を入力し差分モデル６と単語辞書７を
参照し単語列侯補８を探索する単語列探索手段である。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS One embodiment of the present invention will be described below. Embodiment 1 FIG. FIG. 1 is a configuration diagram showing a search device for continuous speech recognition according to a first embodiment of the present invention. In FIG. 1, reference numeral 101 denotes an input speech, and 102 analyzes the input speech 101 and converts it into a feature vector time series 103. Voice analysis means 2, an optimal solution obtaining means for inputting the feature vector time series 103 and obtaining an optimal syllable string 4 according to the syllable network 3;
Reference numeral 5 denotes a word string search unit that inputs the optimal syllable string 4 and refers to the difference model 6 and the word dictionary 7 to search for a word string candidate 8.

【００３１】図２はこの発明の実施の形態１による連続
音声認識用の探索装置において、音節ネットワークを示
す説明図、図３はこの発明の実施の形態１による連続音
声認識用の探索装置において、基本ＨＭＭを示す説明図
である。音節ネットワーク３は音節（一般に単語あるい
はサブワード）の接続をネットワーク表現したものであ
り、図２のように音節間を接続するためのノードと音節
を表すアークから構成される。音節のアークは図３のよ
うな基本ＨＭＭの連鎖によって表される。音節内あるい
は音節間の調音結合の影響を考慮するため、基本ＨＭＭ
として音素環境依存の音素モデルを用いる。FIG. 2 is an explanatory diagram showing a syllable network in the search device for continuous speech recognition according to the first embodiment of the present invention. FIG. 3 is a diagram showing the search device for continuous speech recognition according to the first embodiment of the present invention. FIG. 3 is an explanatory diagram showing a basic HMM. The syllable network 3 is a network representation of the connection of syllables (generally words or subwords), and is composed of nodes for connecting syllables and arcs representing syllables as shown in FIG. Syllable arcs are represented by a chain of elementary HMMs as shown in FIG. To consider the effect of articulatory coupling within or between syllables, the basic HMM
Is used as a phoneme environment-dependent phoneme model.

【００３２】図４はこの発明の実施の形態１による連続
音声認識用の探索装置において、オートマトン制御を示
すアルゴリズム、図５はこの発明の実施の形態１による
連続音声認識用の探索装置において、単語辞書の例を示
す説明図である。最適解取得手段２は、図４に示すオー
トマトン制御１パスＤＰアルゴリズムに基づいて特徴ベ
クトル時系列１０３に対応する最適な音節列を取得し、
最適音節列４として出力する。単語列探索手段５は最適
音節列４が入力されると単語列の侯補を単語辞書７を参
照して探索する。単語辞書７は図５のように単語の表記
と標準的な音節列の記述から構成される。FIG. 4 is an algorithm showing an automaton control in the search apparatus for continuous speech recognition according to the first embodiment of the present invention. FIG. 5 is a diagram showing a search method for continuous speech recognition according to the first embodiment of the present invention. FIG. 4 is an explanatory diagram illustrating an example of a dictionary. The optimal solution acquiring means 2 acquires an optimal syllable string corresponding to the feature vector time series 103 based on the automaton control one-pass DP algorithm shown in FIG.
Output as optimal syllable string 4. When the optimal syllable string 4 is input, the word string search means 5 searches for a candidate of the word string with reference to the word dictionary 7. The word dictionary 7 includes word descriptions and standard syllable string descriptions as shown in FIG.

【００３３】図６はこの発明の実施の形態１による連続
音声認識用の探索装置において、差分モデルを示す構成
図、図７はこの発明の実施の形態１による連続音声認識
用の探索装置において、音節列間変換尤度テーブルの例
を示す表図である。差分モデル６は図６のように音節列
間変換尤度テーブル６０１から構成される。音節列間変
換尤度テーブル６０１には、図７のように、標準音節列
と対応する最適音節列、および、標準音節列が最適音節
列に変換される尤度が記述されている。この尤度は、標
準音節列が最適音節列に対応づけられる確率の対数値と
してある。標準音節列及び最適音節列の長さは０以上の
任意の値でよい。図では、標準音節列は長さ１、最適音
節列は長さ１〜２の範囲にある。FIG. 6 is a block diagram showing a difference model in the search device for continuous speech recognition according to the first embodiment of the present invention. FIG. 7 is a diagram showing a search device for continuous speech recognition according to the first embodiment of the present invention. It is a table | surface figure which shows the example of the conversion likelihood table between syllable strings. The difference model 6 includes a syllable string conversion likelihood table 601 as shown in FIG. As shown in FIG. 7, the inter-syllable string conversion likelihood table 601 describes the optimal syllable string corresponding to the standard syllable string, and the likelihood that the standard syllable string is converted into the optimum syllable string. This likelihood is a logarithmic value of the probability that the standard syllable string is associated with the optimal syllable string. The length of the standard syllable string and the optimum syllable string may be any value equal to or greater than zero. In the figure, the standard syllable string has a length of 1 and the optimal syllable string has a length of 1-2.

【００３４】図８はこの発明の実施の形態１による連続
音声認識用の探索装置において、差分モデルの学習手段
の例を示す構成図である。差分モデルは図８に示すよう
な構成の学習手段で学習される。音声データベース１０
から入力音声１０１を得て、音声分析手段１０２で特徴
ベクトル時系列１０３に変換する。最適解取得手段２は
特徴ベクトル時系列１０３に対して、音節ネットワーク
３を参照し最適音節列４を出力する。最適音節列４は音
声データベース１０から得られる正解単語列１１と正解
音節列１２とともに、差分モデル学習手段９に入力され
る。差分モデル学習手段９は最適音節列４と正解音節列
１２との間でＤＰマッチングを行い、両者の時間軸上の
対応づけを求める。これを音声データベース１０のすべ
ての音声について行うことで、最適音節列４の部分音節
列１２が正解音節列の部分音節列と対応する尤度を求
め、差分モデル６を出力する。FIG. 8 is a block diagram showing an example of learning means for a difference model in the search device for continuous speech recognition according to the first embodiment of the present invention. The difference model is learned by learning means having a configuration as shown in FIG. Voice database 10
, An input speech 101 is obtained, and is converted into a feature vector time series 103 by a speech analysis unit 102. The optimal solution obtaining means 2 refers to the syllable network 3 and outputs the optimal syllable sequence 4 for the feature vector time series 103. The optimal syllable string 4 is input to the difference model learning means 9 together with the correct word string 11 and the correct syllable string 12 obtained from the speech database 10. The difference model learning means 9 performs DP matching between the optimal syllable string 4 and the correct syllable string 12, and finds a correspondence between them on the time axis. By performing this for all the speeches in the speech database 10, the likelihood that the partial syllable sequence 12 of the optimal syllable sequence 4 corresponds to the partial syllable sequence of the correct syllable sequence is obtained, and the difference model 6 is output.

【００３５】次に動作について説明する。図９はこの発
明の実施の形態１による連続音声認識用の探索装置にお
いて、単語列探索手段の動作手順を示すフローチャート
である。探索はスタックデコーダに基づいて、図９に示
すフローチャートに沿って行われる。このスタックデコ
ーダでは、最適音節列４の始端から単語の検索を始め
て、単語辞書７の単語を順次結合し、最適音節列４の始
端から終端までをカバーする単語列の侯補を求める。こ
こでは、最適音節列４の始端から途中までをカバーする
単語列侯補を仮説とする。一つの仮説は、属性として、
単語列、終端時刻、評価値を有する。終端時刻はその仮
説の単語列がカバーしている最適音節列４の長さであ
り、最適音節列４の全体の長さをＴとすると、終端時刻
は０〜Ｔの範囲の整数値である。Next, the operation will be described. FIG. 9 is a flowchart showing an operation procedure of the word string searching means in the search device for continuous speech recognition according to the first embodiment of the present invention. The search is performed according to the flowchart shown in FIG. 9 based on the stack decoder. In this stack decoder, a word search is started from the beginning of the optimal syllable string 4, words in the word dictionary 7 are sequentially combined, and a candidate for a word string covering from the beginning to the end of the optimal syllable string 4 is obtained. Here, a word string candidate covering the beginning to the middle of the optimal syllable string 4 is assumed to be a hypothesis. One hypothesis is that
It has a word string, an end time, and an evaluation value. The end time is the length of the optimal syllable string 4 covered by the word string of the hypothesis, and if the entire length of the optimal syllable string 4 is T, the end time is an integer value in the range of 0 to T. .

【００３６】例えば、最適音節列４が「おんせえにんし
きそおち」であった場合、最適音節列４全体をカバーす
る仮説の単語列は「音声（おんせえ）認識（にんしき）
装置（そおち）」であり、その終端時刻は１１である。
また、仮説の単語列が「音声（おんせえ）認識（にんし
き）」であった場合、この仮説の終端時刻は８である。For example, if the optimal syllable string 4 is "Onse-nen-shiki-sochi", the word string of the hypothesis covering the entire optimal syllable string 4 is "speech (onse-e) recognition". )
The end time is 11.
If the word string of the hypothesis is “speech recognition, the end time of the hypothesis is 8.

【００３７】つぎに、本実施の形態１で用いたスタック
デコーダの動作を説明する。まず、空の単語列からなる
仮説を作成し、スタックに格納し（ステップＳＴ１０
１）、スタックが空か否かを判断し（ステップＳＴ１０
２）、スタックが空となった時点で処理を終了する（ス
テップＳＴ１０３）。次に、ステップＳＴ１０２の判断
でスタックが空でない場合には、スタックの中から評価
値が最大の仮説Ｈ０を取り出し（ステップＳＴ１０
４）、仮説Ｈ０の終端時刻をＴ０とする。次に、仮説Ｈ
０の終端時刻Ｔ０が最適音節長Ｔと等しいか否かを判断
し（ステップＳＴ１１１）、最適音節長Ｔと等しいなら
ば、その仮説の単語列を単語列侯補８の一つとして出力
した後（ステップＳＴ１１２）、ステップＳＴ１０４に
戻る。一方、ステップＳＴ１１１で仮説Ｈ０の終端時刻
Ｔ０が最適音節長Ｔと等しくない場合には、単語辞書７
から単語を一つ取り出しその単語をｎとする（ステップ
ＳＴ１０５）。以下ステップＳＴ１０６〜ＳＴ１１０ま
での処理を単語辞書の任意の単語ｎについて行う。Next, the operation of the stack decoder used in the first embodiment will be described. First, a hypothesis consisting of an empty word string is created and stored in the stack (step ST10).
1) It is determined whether or not the stack is empty (step ST10)
2) When the stack becomes empty, the process ends (step ST103). Next, when the stack is not empty in the judgment of step ST102, the hypothesis H0 having the largest evaluation value is extracted from the stack (step ST10).
4), let T0 be the end time of hypothesis H0. Next, hypothesis H
It is determined whether the end time T0 of 0 is equal to the optimal syllable length T (step ST111). If it is equal to the optimal syllable length T, the word string of the hypothesis is output as one of the word string candidates 8. (Step ST112), and returns to step ST104. On the other hand, if the end time T0 of the hypothesis H0 is not equal to the optimal syllable length T in step ST111, the word dictionary 7
, One word is taken out from the list and the word is set as n (step ST105). Hereinafter, the processing of steps ST106 to ST110 is performed for an arbitrary word n in the word dictionary.

【００３８】ステップＳＴ１０６では、Ｔ０＋１を始端
時刻として終端Ｔまでの範囲を終端時刻Ｔ１（Ｔ１：Ｔ
０＋１〜Ｔ）として最適音節列と単語ｎの標準音節列と
の照合を行う。この照合では部分最適音節列Ｗ１と単語
ｎの標準音節列Ｗ２との間で、標準音節列と最適音節列
とが対応する尤度から照合尤度を求める。In step ST106, the range from T0 + 1 as the start time to the end T is set to the end time T1 (T1: T
0 + 1 to T), and collate the optimal syllable string with the standard syllable string of the word n. In this collation, a matching likelihood is obtained from the likelihood that the standard syllable string and the optimal syllable string correspond between the partial optimal syllable string W1 and the standard syllable string W2 of the word n.

【００３９】Ｗ１＝Ｘ（Ｔ０＋１）、Ｘ（Ｔ０＋２）〜Ｘ（Ｔ１）・・・（１）Ｗ２＝Ｙ（１）、Ｙ（２）〜Ｙ（Ｊ（ｎ））・・・（２）Ｔ１：Ｔ０〜Ｔの範囲の整数Ｊ（ｎ）：単語ｎの標準音節列長W1 = X (T0 + 1), X (T0 + 2) to X (T1) (1) W2 = Y (1), Y (2) to Y (J (n)) (2) T1: integer in the range of T0 to T J (n): standard syllable string length of word n

【００４０】次に、最適音節列と単語ｎの標準音節列と
の照合は図１０のフローチャートに基づいて行う。図１
０はこの発明の実施の形態１による連続音声認識用の探
索装置において、最適音節列と単語ｎの標準音節列との
照合手順を示すフローチャートである。まず、最適音節
列および単語ｎの標準音節列を与え（ステップＳＴ２０
１）、それぞれについて、標準音節列および最適音節列
を状態と遷移からなるグラフＧ１およびＧ２に変換する
（ステップＳＴ２０２，ＳＴ２０３）。次に、差分モデ
ルの適用対象が最適音節列か標準音節列かを判断し（ス
テップＳＴ２０４）、差分モデルの適用対象を最適音節
列とした場合、グラフＧ１に差分モデル６を適用し、変
更後のグラフＧ１’を求める（ステップＳＴ２０５）。
一方、ステップＳＴ２０４の判断の結果、差分モデルの
適用対象を標準音節列とした場合、グラフＧ２に差分モ
デル６を適用し、変更後のグラフＧ２’を求める（ステ
ップＳＴ２０７）。Next, the collation between the optimal syllable string and the standard syllable string of the word n is performed based on the flowchart of FIG. FIG.
0 is a flowchart showing a procedure for collating the optimum syllable string with the standard syllable string of the word n in the search device for continuous speech recognition according to the first embodiment of the present invention. First, an optimal syllable string and a standard syllable string of the word n are given (step ST20).
1) For each, the standard syllable string and the optimal syllable string are converted into graphs G1 and G2 composed of states and transitions (steps ST202 and ST203). Next, it is determined whether the application target of the difference model is the optimal syllable sequence or the standard syllable sequence (step ST204). When the application target of the difference model is the optimal syllable sequence, the difference model 6 is applied to the graph G1, and Is obtained (step ST205).
On the other hand, as a result of the determination in step ST204, when the application target of the difference model is a standard syllable string, the difference model 6 is applied to the graph G2 to obtain a graph G2 ′ after the change (step ST207).

【００４１】ここで、図１１を用いてステップＳＴ２０
２〜ＳＴ２０８の動作を具体例を示して説明する。図１
１はこの発明の実施の形態１による連続音声認識用の探
索装置において、照合動作を示す説明図である。図１１
では最適音節列が「おんせにんひそおち」であり、その
うち音節列「にんひ」の部分と標準音節列「にんしき」
との間で照合するときを示す。このとき、ステップＳＴ
２０２によって最適音節列のグラフはＧ１に、また、ス
テップＳＴ２０３によって標準音節列のグラフはＧ２と
なる。また、差分モデル６としての音節列間変換尤度テ
ーブル６０１には、「しき／ひ［−２．３］」、「しき
／しき［−０．１］」、「ひ／ひ［−０．１］」、その
他の音節（Ｘとする）については、「Ｘ／Ｘ［０．
０］」という記述があったとする。ステップＳＴ２０４
で差分モデル６の適用対象を最適音節列とした場合、ス
テップＳＴ２０５でグラフＧ１は差分モデル６により、
Ｇ１’「にん（ひ［−０．１］−しき［−２．３］）」
と変形される。この結果、グラフＧ２の「にんしき」と
最適音節列の照合が可能となり、単語列侯補として正解
の単語「認識（にんしき）」を含む単語列を探索できる
ようになる。Here, referring to FIG. 11, step ST20 will be described.
2 to ST208 will be described with reference to specific examples. FIG.
FIG. 1 is an explanatory diagram showing a collation operation in the search device for continuous speech recognition according to the first embodiment of the present invention. FIG.
The optimal syllable sequence is "Onsenninhisosochi", of which the syllable sequence "Ninhi" and the standard syllable sequence "Ninshiki"
Indicates when to match between At this time, step ST
The graph of the optimal syllable string becomes G1 by 202, and the graph of the standard syllable string becomes G2 by step ST203. In addition, in the syllable string conversion likelihood table 601 as the difference model 6, “dish / hi [−2.3]”, “dish / hide [−0.1]”, “hi / hi [−0. 1] ”and other syllables (referred to as X) are“ X / X [0.
0]]. Step ST204
When the application target of the difference model 6 is an optimal syllable string, the graph G1 is represented by the difference model 6 in step ST205.
G1 '"Nin (hi [-0.1] -shiki [-2.3])"
Is transformed. As a result, it is possible to collate “Ninshi” in the graph G2 with the optimal syllable string, and to search for a word string including the correct word “recognition (Ninshi)” as a candidate word string.

【００４２】また、ステップＳＴ２０４で差分モデル６
の適用対象を標準音節列とした場合にも、ステップＳＴ
２０７でグラフＧ２は差分モデルにより、Ｇ２’「にん
（しき［−０．１］）−ひ［−２．３］」と変形され
る。この結果、グラフＧ２の「にんしき」と最適音節列
の照合が可能となり、単語列侯補として正解の単語「認
識（にんしき）」を含む単語列を探索できるようにな
る。ステップＳＴ２０６あるいはステップＳＴ２０８で
変更後のグラフの間（Ｇ１’とＧ２あるいはＧ１とＧ
２’）で、次の漸化式を計算することで照合尤度Ｄ（Ｗ
１，Ｗ２）を求める。In step ST204, the difference model 6
When the target of application is a standard syllable string, the step ST
In 207, the graph G2 is transformed into G2 ′ “Nin ([−0.1]) − H [−2.3]” by the difference model. As a result, it is possible to collate “Ninshi” in the graph G2 with the optimal syllable string, and to search for a word string including the correct word “recognition (Ninshi)” as a candidate word string. Between the graphs changed in step ST206 or ST208 (G1 ′ and G2 or G1 and G
2 ′), the following recurrence formula is calculated to obtain the matching likelihood D (W
1, W2).

【００４３】Ｇ（ｊ，ｎ）＝０、（ｊ，ｎ）∈｛初期ノードの組｝・・・（３）Ｇ（ｊ，ｎ）＝−∞、（ｊ，ｎ）∈｛初期ノードの組以外｝・・・（４）Ｇ（ｊ，ｎ）＝ｍａｘＧ（ｉ，ｍ）＋ｇ（ｉ→ｊ）＋ｇ（ｍ→ｎ）＋ｅ（ｉ→ｊ，ｍ→ｎ）、（ｉ，ｍ）∈｛ノード（ｊ，ｎ）に可能な遷移｝・・・（５）Ｄ（Ｗ１，Ｗ２）＝ｍａｘＧ（ｊ，ｎ）、（ｊ，ｎ）∈｛最終ノードの組｝・・・（６）ここで、ｉ，ｊは最適音節列側のグラフの状態、ｍ，ｎ
は標準音節列側のグラフの状態、ｇ（ｉ→ｊ）及びｇ
（ｍ→ｎ）はそれぞれ、状態遷移ｉ→ｊ及び状態遷移ｍ
→ｎの対数尤度、ｅ（ｉ→ｊ，ｍ→ｎ）は状態遷移ｉ→
ｊ及び状態遷移ｍ→ｎに関連づけられた最適音節列側の
音節Ｘ（ｉ→ｊ）及び標準音節列側の音節Ｙ（ｍ→ｎ）
の一致度を表し、ここでは、一致したとき０を、不一致
の時−∞としている。G (j, n) = 0, (j, n) {set of initial nodes} (3) G (j, n) = −）, (j, n) {initial node Other than the set｝ (4) G (j, n) = max G (i, m) + g (i → j) + g (m → n) + e (i → j, m → n), (i, m) {possible transition for node (j, n)} (5) D (W1, W2) = max G (j, n), (j, n) {set of last node} (6) where i and j are graph states on the optimal syllable string side, and m and n
Is the state of the graph on the standard syllable string side, g (i → j) and g
(M → n) are the state transition i → j and the state transition m, respectively.
→ log likelihood of n, e (i → j, m → n) is the state transition i →
j and syllable X (i → j) on the optimal syllable string side and syllable Y (m → n) on the standard syllable string side associated with state transition m → n
Here, 0 is determined when the values match, and −∞ when the values do not match.

【００４４】図９のフローチャートにおいて、照合尤度
Ｄ（Ｗ１，Ｗ２）が閾値より高いか否かを判断し（ステ
ップＳＴ１０７）、照合尤度Ｄ（Ｗ１，Ｗ２）が閾値よ
り高くない場合は、ステップＳＴ１０８〜ステップＳＴ
１１０の処理は行わない。一方、ステップＳＴ１０７の
判断の結果、照合尤度Ｄ（Ｗ１，Ｗ２）が閾値より高い
場合は、ステップＳＴ１０８〜ステップＳＴ１１０の処
理を行う。ステップＳＴ１０８では仮説Ｈ０をコピーし
て仮説Ｈ１を作成し、仮説Ｈ１の終端時刻を更新してＴ
１とし（ステップＳＴ１０９）、仮説Ｈ１の単語列に単
語ｎを加えて単語列を１単語分成長させる。また、仮説
Ｈ１の評価値を照合尤度Ｄ（Ｗ１、Ｗ２）分だけ増加す
る。次に、仮説Ｈ１をスタックに格納する（ステップＳ
Ｔ１１０）。なお、仮説Ｈ１の評価値には、照合尤度と
共に、単語列の言語モデルの尤度を計算して加える。こ
の場合、言語モデルの尤度は、単語列に対するＮグラム
モデルを用いて計算する。In the flowchart of FIG. 9, it is determined whether or not the matching likelihood D (W1, W2) is higher than the threshold (step ST107). If the matching likelihood D (W1, W2) is not higher than the threshold, Step ST108 to Step ST
Step 110 is not performed. On the other hand, if the result of determination in step ST107 is that the matching likelihood D (W1, W2) is higher than the threshold, the processing of steps ST108 to ST110 is performed. In step ST108, the hypothesis H0 is copied to create the hypothesis H1, the end time of the hypothesis H1 is updated, and
1 (step ST109), the word n is added to the word string of the hypothesis H1, and the word string is grown by one word. Further, the evaluation value of the hypothesis H1 is increased by the matching likelihood D (W1, W2). Next, the hypothesis H1 is stored in the stack (step S1).
T110). The likelihood of the language model of the word string is calculated and added to the evaluation value of the hypothesis H1 together with the matching likelihood. In this case, the likelihood of the language model is calculated using the N-gram model for the word string.

【００４５】以上のように、この実施の形態１によれ
ば、探索の１段目の最適解取得手段で求めた最適解から
差分モデルを適用して、探索の２段目で、１段目の最適
解を入力し、差分モデルと単語を記述した単語辞書を参
照し単語列の侯補を探索する単語列探索手段とを備えた
ため、１段目で最適解の脱落を防止できるとともに、２
段目で、正解の脱落を少なくすることができるなどの効
果が得られる。As described above, according to the first embodiment, the difference model is applied from the optimal solution obtained by the optimal solution obtaining means at the first stage of the search, and the first stage of the search is performed at the second stage of the search. And a word string search unit for searching for a candidate for a word string by referring to a difference model and a word dictionary describing words, thereby preventing dropout of the optimum solution at the first stage.
At the stage, effects such as a decrease in the number of correct answers can be obtained.

【００４６】実施の形態２．図１２はこの発明の実施の
形態２による連続音声認識用の探索装置において、単語
辞書を示す説明図、図１３はこの発明の実施の形態２に
よる連続音声認識用の探索装置において、差分モデルを
示す構成図、図１４はこの発明の実施の形態２による連
続音声認識用の探索装置において、単語音節長変換尤度
テーブルの例を示す表図であり、図において、実施の形
態１と同一の符号については同一または相当部分を示す
ので説明を省略する。この実施の形態２の単語辞書７は
図１２のように単語ｎを構成する標準音節列の長さＪ
（ｎ）を含んでいる。また、この実施の形態２の差分モ
デル６は図１３のように音節列間変換尤度テーブル６０
１と単語音節長変換尤度テーブル６０２を備える。単語
音節長変換尤度テーブル６０２は、図１４のような単語
を構成する標準音節列の長さと最適音節列の長さに対応
づけられる尤度が記述されている。Embodiment 2 FIG. 12 is an explanatory diagram showing a word dictionary in the search device for continuous speech recognition according to the second embodiment of the present invention. FIG. 13 is a diagram showing a difference model in the search device for continuous speech recognition according to the second embodiment of the present invention. FIG. 14 is a diagram showing an example of a word syllable length conversion likelihood table in the search device for continuous speech recognition according to the second embodiment of the present invention. The same reference numerals are used to indicate the same or corresponding parts, and a description thereof will be omitted. The word dictionary 7 of the second embodiment has a length J of a standard syllable string forming a word n as shown in FIG.
(N). The difference model 6 according to the second embodiment has a syllable sequence conversion likelihood table 60 as shown in FIG.
1 and a word syllable length conversion likelihood table 602. The word syllable length conversion likelihood table 602 describes the likelihood associated with the length of the standard syllable string and the length of the optimal syllable string forming a word as shown in FIG.

【００４７】次に動作について説明する。図９のステッ
プＳＴ１０６の照合において、部分最適音節列Ｗ１と単
語ｎの標準音節列Ｗ２との照合の尤度には、実施の形態
１で説明した方法で求めた照合尤度Ｄ（Ｗ１、Ｗ２）に
加えて、単語音節長の尤度を加える。この単語音節長の
尤度は、現在照合中の最適の音節長（これは、Ｔ１−Ｔ
０である）と単語ｎを構成する標準音節列の長さ（これ
はＪ（ｎ）であり、単語辞書７から得られる）とから、
単語音節長変換尤度テーブル６０２を引いて尤度を求め
る。これにより、最適音節長が単語標準音節長と大きく
異って照合する場合、尤度は小さくなり、図９のステッ
プＳＴ１０７の照合尤度と閾値との判定処理により、ス
テップＳＴ１０８〜ＳＴ１１０の処理がされないことに
なる。Next, the operation will be described. In the matching in step ST106 in FIG. 9, the likelihood of matching between the sub-optimal syllable string W1 and the standard syllable string W2 of the word n includes the matching likelihood D (W1, W2) obtained by the method described in the first embodiment. ), And the likelihood of the word syllable length is added. The likelihood of the word syllable length is determined by the optimal syllable length currently being matched (this is T1-T
0) and the length of the standard syllable string constituting word n (this is J (n) and obtained from word dictionary 7),
The likelihood is obtained by subtracting the word syllable length conversion likelihood table 602. Accordingly, when the matching is performed with the optimal syllable length largely different from the word standard syllable length, the likelihood is reduced, and the processing of steps ST108 to ST110 is performed by the determination processing of the matching likelihood and the threshold in step ST107 in FIG. Will not be.

【００４８】以上のように、この実施の形態２によれ
ば、極端な照合を防ぐことができ、無駄な仮説の生成が
削減され、探索処理の量が減少するなどの効果が得られ
る。As described above, according to the second embodiment, effects such as extreme collation can be prevented, unnecessary generation of hypotheses can be reduced, and the amount of search processing can be reduced.

【００４９】実施の形態３．図１５はこの発明の実施の
形態３による連続音声認識用の探索装置を示す構成図で
あり、図において実施の形態１および実施の形態２と同
一の符号については同一または相当部分を示すので説明
を省略する。単語列探索手段５での認識処理に先だっ
て、差分モデル適用単語辞書作成手段１４は、単語辞書
７の標準音節列をグラフに変換して差分モデル適用単語
辞書１３として記憶する。次に、単語列探索手段５の処
理において、実施の形態１の図９のステップＳＴ１０６
における仮説Ｈ０の終端時刻Ｔ０より後に単語ｎを追加
するとき、照合尤度の計算は図１０のステップＳＴ２０
３及びＳＴ２０７の標準音節列をグラフに変換する処理
を省略できる。Embodiment 3 FIG. 15 is a block diagram showing a search device for continuous speech recognition according to a third embodiment of the present invention. In the figure, the same reference numerals as those in the first and second embodiments denote the same or corresponding parts, and will be described. Is omitted. Prior to the recognition processing by the word string search means 5, the difference model applied word dictionary creating means 14 converts the standard syllable string of the word dictionary 7 into a graph and stores it as the difference model applied word dictionary 13. Next, in the processing of the word string search means 5, step ST106 of FIG.
When the word n is added after the end time T0 of the hypothesis H0 in FIG.
Steps 3 and ST207 for converting the standard syllable string into a graph can be omitted.

【００５０】以上のように、この実施の形態３によれ
ば、単語標準音節列を予めグラフに変換した結果をすべ
ての単語について記憶しておくので、メモリ量が増加す
るが、単語列探索手段５の処理において、動的な変換の
演算を省略できるため、演算処理が高速化されるなどの
効果が得られる。As described above, according to the third embodiment, since the result of previously converting a word standard syllable string into a graph is stored for all words, the amount of memory is increased. In the processing of No. 5, the operation of the dynamic conversion can be omitted, so that effects such as speeding up of the arithmetic processing can be obtained.

【００５１】実施の形態４．図１６はこの発明の実施の
形態４による連続音声認識用の探索装置を示す構成図で
あり、図において、実施の形態１から実施の形態３と同
一の符号については同一または相当部分を示すので説明
を省略する。この実施の形態４では、差分モデル適用音
節グラフ作成手段１５を備え、最適音節列４に対して、
差分モデル６を適用して差分モデル適用入力音節グラフ
１６を作成するものである。単語列探索手段５は差分モ
デル適用入力音節グラフ１６を入力として単語辞書７を
参照して、単語列侯補８を出力する。したがって、最適
音節列に対して差分モデル６を適用するので、１つの最
適音声について、１回だけ差分モデル６を適用するだけ
で済む。Embodiment 4 FIG. FIG. 16 is a configuration diagram showing a search device for continuous speech recognition according to a fourth embodiment of the present invention. In the figure, the same reference numerals as those in the first to third embodiments indicate the same or corresponding parts. Description is omitted. In the fourth embodiment, a difference model applied syllable graph creating unit 15 is provided.
The difference model 6 is applied to create a difference model applied input syllable graph 16. The word string search means 5 outputs the word string candidate 8 with reference to the word dictionary 7 with the input syllable graph 16 to which the difference model is applied. Therefore, since the difference model 6 is applied to the optimum syllable string, it is only necessary to apply the difference model 6 only once for one optimum voice.

【００５２】以上のように、この実施の形態４によれ
ば、実施の形態１のように、単語列探索の中で、単語ｎ
ごとに最適音節列あるいは単語ｎの標準音節列に対して
差分モデル６を適用する処理が必要がないため、演算処
理を高速化することができるなどの効果が得られる。As described above, according to the fourth embodiment, as in the first embodiment, the word n
Since it is not necessary to apply the difference model 6 to the optimal syllable string or the standard syllable string of the word n for each case, it is possible to obtain effects such as speeding up the arithmetic processing.

【００５３】実施の形態５．図１７はこの発明の実施の
形態５による連続音声認識用の探索装置を示す構成図で
あり、図において実施の形態１と同一の符号については
同一または相当部分を示すので説明を省略する。実施の
形態１では最適な音節列に差分モデル６を適用して、最
適な音節列の変形で正解を求めるものであるが、可能性
としては、正解を求めることができないケースがあっ
た。しかし、この実施の形態５は、実施の形態１の最適
解取得手段２の代りにＮベスト解取得手段２１を用い
て、音節のＮベスト侯補を求めＮベスト音節グラフ２２
を出力するものである。Embodiment 5 FIG. 17 is a configuration diagram showing a search device for continuous speech recognition according to a fifth embodiment of the present invention. In the figure, the same reference numerals as those in the first embodiment denote the same or corresponding parts, and a description thereof will be omitted. In the first embodiment, the difference model 6 is applied to the optimum syllable string, and the correct answer is obtained by the deformation of the optimum syllable string. However, in some cases, the correct answer cannot be obtained. However, in the fifth embodiment, the N best solutions of the syllables are obtained by using the N best solution obtaining means 21 instead of the optimum solution obtaining means 2 of the first embodiment, and the N best syllable graph 22 is obtained.
Is output.

【００５４】以上のように、この実施の形態５によれ
ば、最適音節のＮベスト侯補を求めて、これをＮベスト
音節グラフ２２として、差分モデル６を用いた単語列侯
補８の探索を行うので、正解が求められないケースが減
少し、認識率を向上させることができるなどの効果が得
られる。As described above, according to the fifth embodiment, the N best candidates of the optimum syllable are obtained, and the obtained N best candidates are used as the N best syllable graph 22 to search for the word sequence candidates 8 using the difference model 6. Is performed, the number of cases in which a correct answer is not required is reduced, and effects such as an improvement in recognition rate can be obtained.

【００５５】実施の形態６．図１８はこの発明の実施の
形態６による連続音声認識用の探索装置を示す構成図で
あり、図において実施の形態１から実施の形態５と同一
の符号については同一または相当部分を示すので説明を
省略する。この実施の形態６では、実施の形態３の最適
解取得手段２の代りにＮベスト侯補を含むＮベスト音節
グラフ２２を出力するＮベスト解取得手段２１を設けた
ものである。実施の形態３では、単語辞書７の単語の標
準音節列に差分モデル６を適用して予め作成した差分モ
デル適用単語辞書１３を用いて、最適な音節列から単語
列侯補の探索を行うため、最適音節列は１種類であっ
た。このため、可能性としては差分モデル適用単語辞書
１３では、正解の単語列を探索できないことがあった。
しかし、この実施の形態６によれば、最適音節のＮベス
ト侯補を求めて、これを曖昧性を許した最適音節グラフ
として、差分モデル適用単語辞書１３を用いて単語列侯
補の探索を行うものである。Embodiment 6 FIG. FIG. 18 is a configuration diagram showing a search device for continuous speech recognition according to a sixth embodiment of the present invention. In the figure, the same reference numerals as those in the first to fifth embodiments denote the same or corresponding parts, and thus will be described. Is omitted. In the sixth embodiment, an N best solution obtaining means 21 for outputting an N best syllable graph 22 including the N best candidates is provided in place of the optimum solution obtaining means 2 of the third embodiment. In the third embodiment, a word sequence candidate is searched from an optimal syllable sequence using a difference model applied word dictionary 13 created in advance by applying the difference model 6 to a standard syllable sequence of words in the word dictionary 7. The optimal syllable sequence was one type. For this reason, there is a possibility that the difference model applied word dictionary 13 cannot search for a correct word string.
However, according to the sixth embodiment, the N best candidates of the optimum syllables are obtained, and the search for the word sequence candidates is performed using the difference model applied word dictionary 13 as the optimum syllable graph allowing ambiguity. Is what you do.

【００５６】以上のように、この実施の形態６によれ
ば、正解の単語列が求められないケースを減少させ、認
識率を向上させるという効果が得られるとともに、同じ
Ｎベスト解取得手段２１を用いる実施の形態５に比べ
て、辞書側の音節系列を変形する点が異なり、傾向の異
なる認識結果を得ることができるなどの効果が得られ
る。As described above, according to the sixth embodiment, it is possible to reduce the number of cases in which a correct word string cannot be obtained and to improve the recognition rate. Compared to Embodiment 5 in which the syllable sequence on the dictionary side is modified, it is possible to obtain an effect such that a recognition result having a different tendency can be obtained.

【００５７】実施の形態７．図１９はこの発明の実施の
形態７による連続音声認識用の探索装置を示す構成図で
あり、図において実施の形態１から実施の形態６と同一
の符号については同一または相当部分を示すので説明を
省略する。この実施の形態７は、実施の形態４の最適解
取得手段２の代りにＮベスト解取得手段２１として、音
節のＮベスト侯補を求めＮベスト音節グラフ２２を出力
するものである。実施の形態４では最適解取得手段２で
得られる１通りの最適音節列４に差分モデル６を適用し
て、差分モデル適用入力音節グラフ１６を作成し、これ
を最適として単語列侯補８の探索を行っていた。このた
め、可能性としては差分モデル６適用の単語辞書７で
は、正解の単語列を探索できないことがあった。しか
し、実施の形態７においては、最適音節のＮベスト侯補
を求め、これに基づいてＮベスト音節グラフ２２を作成
し、さらに、差分モデル６を適用した上で単語列侯補８
の探索を行うものである。Embodiment 7 FIG. FIG. 19 is a configuration diagram showing a search device for continuous speech recognition according to a seventh embodiment of the present invention. In the figure, the same reference numerals as those in the first to sixth embodiments denote the same or corresponding parts, and thus will be described. Is omitted. In the seventh embodiment, instead of the optimum solution obtaining means 2 of the fourth embodiment, the N best solution obtaining means 21 obtains the N best candidates of syllables and outputs the N best syllable graph 22. In the fourth embodiment, the difference model 6 is applied to one optimal syllable string 4 obtained by the optimum solution obtaining means 2 to create a difference model applied input syllable graph 16. I was searching. Therefore, there is a possibility that the word dictionary 7 to which the difference model 6 is applied cannot search for a correct word string. However, in the seventh embodiment, the N best candidates of the optimal syllable are obtained, the N best syllable graph 22 is created based on this, and the word sequence candidate 8 is applied after applying the difference model 6.
The search is performed.

【００５８】以上のように、この実施の形態７によれ
ば、正解の単語列が求められないケースを減少させ、認
識率を向上させるとともに、同じＮベスト解取得手段２
１を用いる実施の形態６に比べて、最適側の音節系列を
変形する点が異なり、傾向の異なる認識結果を得ること
ができるなどの効果が得られる。As described above, according to the seventh embodiment, the number of cases in which a correct word string cannot be obtained is reduced, the recognition rate is improved, and the same N best solution obtaining means 2 is used.
Compared to the sixth embodiment using No. 1, the difference is that the syllable sequence on the optimum side is deformed, and effects such as recognition results having different tendencies can be obtained.

【００５９】実施の形態８．図２０はこの発明の実施の
形態８による連続音声認識用の探索装置を示す構成図で
あり、図において実施の形態１から実施の形態７と同一
の符号については同一または相当部分を示すので説明を
省略する。この実施の形態８では、最適解取得手段２
が、単語ネットワーク１７を用いて、最適単語列１８を
求め、音節列変換手段１９によってこれを音節列に戻し
て、最適音節列４を求め、単語列探索手段５の入力とす
るものである。したがって、最適解取得手段２は参照す
るネットワークとして、音節の代りに音響的により長い
単位である単語を用いるため、調音結合の影響を受ける
ことが少ない最適単語列１８を求めることができる。Embodiment 8 FIG. FIG. 20 is a block diagram showing a search apparatus for continuous speech recognition according to an eighth embodiment of the present invention. In the figure, the same reference numerals as those in the first to seventh embodiments denote the same or corresponding parts, and will be described. Is omitted. In the eighth embodiment, the optimal solution obtaining means 2
However, using the word network 17, an optimum word string 18 is obtained, and converted into a syllable string by the syllable string conversion means 19 to obtain the optimum syllable string 4, which is input to the word string search means 5. Therefore, since the optimal solution obtaining means 2 uses a word which is an acoustically longer unit instead of a syllable as a network to be referred to, the optimal word string 18 which is less affected by articulation coupling can be obtained.

【００６０】以上のように、この実施の形態８によれ
ば、調音結合の影響を受けることが少ない最適単語列１
８を最適音節列４に戻すため、正解を求める可能性が増
大するなどの効果が得られる。As described above, according to the eighth embodiment, the optimal word string 1 that is less affected by articulation coupling
Since 8 is returned to the optimal syllable string 4, effects such as an increased possibility of finding a correct answer are obtained.

【００６１】実施の形態９．図２１はこの発明の実施の
形態９による連続音声認識用の探索装置を示す構成図、
図２２はこの発明の実施の形態９による連続音声認識用
の探索装置において、差分モデルを示す構成図、図２３
はこの発明の実施の形態９による連続音声認識用の探索
装置において、単語列単語間変換テーブルを示す表図で
あり、図において、実施の形態１から実施の形態８と同
一の符号については同一または相当部分を示すので説明
を省略する。この実施の形態９では、最適解取得手段２
が、単語ネットワーク１７を用いて、最適単語列１８を
求め、これを単語列探索手段５の入力とするものであ
り、単語列探索手段５は差分モデル６と単語辞書７を参
照して単語列侯補８を探索するものである。Embodiment 9 FIG. FIG. 21 is a configuration diagram showing a search device for continuous speech recognition according to Embodiment 9 of the present invention.
FIG. 22 is a block diagram showing a difference model in the search device for continuous speech recognition according to the ninth embodiment of the present invention.
FIG. 15 is a table showing a word string word-to-word conversion table in the search device for continuous speech recognition according to the ninth embodiment of the present invention. In the figure, the same reference numerals as those in the first to eighth embodiments denote the same parts. Or, since a corresponding portion is shown, the description is omitted. In the ninth embodiment, the optimal solution obtaining means 2
Is obtained by using the word network 17 to obtain an optimum word string 18 and using the word string as an input to the word string search means 5. The word string search means 5 refers to the difference model 6 and the word dictionary 7 to search for the word string. Search for candidate 8.

【００６２】この実施の形態９では、差分モデル６は図
２２のように単語列単語間変換尤度テーブル６０３から
構成され、図２３のように最適単語列１８の部分単語列
の欄と対応する正解の単語の欄と尤度が記述されてい
る。最適単語列１８の部分単語列が単語とともに与えら
れるとこのテーブルを検索することで尤度が得られる。In the ninth embodiment, the difference model 6 is composed of a word string inter-word conversion likelihood table 603 as shown in FIG. 22, and corresponds to the partial word string column of the optimum word string 18 as shown in FIG. The column of the correct word and the likelihood are described. When the partial word string of the optimum word string 18 is given together with the word, the table is searched to obtain the likelihood.

【００６３】次に動作について説明する。図２４はこの
発明の実施の形態９による連続音声認識用の探索装置に
おいて、単語列探索手段の単語列探索手順を示すフロー
チャート、図２５はこの発明の実施の形態９による連続
音声認識用の探索装置において、単語と最適単語列の部
分単語列と照合手順を示すフローチャートである。単語
列探索手段５における単語列探索は図２４のフローチャ
ートに基づいて行われる。ステップＳＴ３０１では図２
５のフローチャートに基づいて行われる。まず、最適単
語列１８と単語ｎとが与えられて（ステップＳＴ４０
１）、単語ｎと最適単語列１８の部分単語列と照合を行
い、照合の尤度を求める。照合の尤度は、最適単語列１
８の部分が単語ｎに対応する尤度を差分モデル６の単語
列単語間変換尤度テーブル６０３を引いて求める（ステ
ップＳＴ４０２）。Next, the operation will be described. FIG. 24 is a flowchart showing a word string search procedure of the word string search means in the search device for continuous speech recognition according to the ninth embodiment of the present invention, and FIG. 25 is a search for continuous speech recognition according to the ninth embodiment of the present invention. 6 is a flowchart showing a word and a partial word string of an optimum word string and a collation procedure in the apparatus. The word string search in the word string search means 5 is performed based on the flowchart of FIG. In step ST301, FIG.
5 is performed based on the flowchart of FIG. First, the optimal word string 18 and the word n are given (step ST40).
1) Match the word n with the partial word string of the optimum word string 18 to obtain the likelihood of the matching. Matching likelihood is optimal word string 1
The likelihood corresponding to the word n in part 8 is obtained by subtracting the word string inter-word conversion likelihood table 603 of the difference model 6 (step ST402).

【００６４】以上のように、この実施の形態９によれ
ば、単語列探索手段５における単語ｎと最適単語列１８
の部分単語列との照合がテーブル検索で実現されるた
め、単語列侯補８の探索が容易になるなどの効果が得ら
れる。As described above, according to the ninth embodiment, the word n and the optimum word string 18
Since the collation with the partial word string is realized by the table search, effects such as that the search for the word string candidate 8 becomes easy can be obtained.

【００６５】[0065]

【発明の効果】以上のように、この発明によれば、１段
目で求めた最適解と正解とが対応する尤度を表現した差
分モデルを設け、１段目で求めた最適解から差分モデル
を適用して、２段目の探索を行うように構成したので、
１段目で最適解の脱落を防止でき、また、２段目で、正
解の脱落を少なくすることができる効果がある。As described above, according to the present invention, a difference model expressing the likelihood corresponding to the optimal solution obtained at the first stage and the correct solution is provided, and the difference model is obtained from the optimal solution obtained at the first stage. Since the second stage search was performed by applying the model,
The first stage has the effect of preventing the drop of the optimal solution, and the second stage has the effect of reducing the drop of the correct solution.

【００６６】この発明によれば、入力音声を分析する音
声分析手段の作成した分析結果を入力し、音節間の接続
を表すオートマトンで制御された最適音節列を最適解取
得手段により求め、最適解取得手段が求めた最適音節列
を入力し、最適解取得手段が求めた最適音節列が正解の
音節列に対応する尤度を記述した差分モデルと単語の標
準的な音節列を記述した単語辞書とを参照し、単語列の
侯補を探索し、単語列侯補を単語列探索手段から出力す
るように構成したので、１段目で最適解の脱落を防止で
き、また、２段目で、正解の脱落を少なくすることがで
きる効果がある。According to the present invention, the analysis result prepared by the voice analysis means for analyzing the input voice is input, and the optimum syllable string controlled by the automaton representing the connection between syllables is obtained by the optimum solution obtaining means, and the optimum solution is obtained. A difference model that describes the likelihood that the optimal syllable string determined by the acquisition means corresponds to the correct syllable string, and a word dictionary that describes the standard syllable string of words , The candidate of the word string is searched for, and the candidate of the word string is output from the word string searching means. This has the effect of reducing the dropout of correct answers.

【００６７】この発明によれば、差分モデルにおいて、
最適音節列の部分音節列と正解の音節列の部分音節列と
これらの対応する尤度を記述した音節列間変換尤度テー
ブルとし、単語列探索手段は音節列間変換尤度テーブル
に記述された尤度に基づいて単語列の侯補を探索するよ
うに構成したので、１段目で最適解の脱落を防止でき、
また、２段目で、正解の脱落を少なくすることができる
効果がある。According to the present invention, in the difference model,
An inter-syllable string conversion likelihood table describing the partial syllable string of the optimal syllable string and the partial syllable string of the correct syllable string and their corresponding likelihoods, and the word string search means is described in the inter-syllable string conversion likelihood table. The candidate of the word string is searched based on the likelihood, so that the first stage can prevent the drop of the optimal solution,
In the second stage, there is an effect that the drop of correct answers can be reduced.

【００６８】この発明によれば、差分モデルにおいて、
最適音節列の部分音節列と正解の音節列の部分音節列と
これらの対応する尤度を記述した音節列間変換尤度テー
ブルと、最適音節列の長さと単語辞書の音節列の長さと
これらが対応する尤度を記述した単語音節長変換尤度テ
ーブルを備え、単語列探索手段は音節列間変換尤度テー
ブルと単語音節長変換尤度テーブルとに記述された尤度
に基づいて単語列侯補を探索するように構成したので、
極端な照合を防ぐことができ、無駄な仮説の生成が削減
され、探索処理の量を減少させることができる効果があ
る。According to the present invention, in the difference model,
An inter-syllable string conversion likelihood table describing the partial syllable string of the optimal syllable string, the partial syllable string of the correct syllable string, and their corresponding likelihood, the length of the optimal syllable string, the length of the syllable string of the word dictionary, and the like. Has a word syllable length conversion likelihood table describing the likelihood corresponding thereto, and the word string search means uses the word string based on the likelihood described in the inter-syllable string conversion likelihood table and the word syllable length conversion likelihood table. Since it was configured to search for a candidate,
Extreme collation can be prevented, useless hypotheses are reduced, and the amount of search processing can be reduced.

【００６９】この発明によれば、入力音声を分析する音
声分析手段の作成した分析結果を入力し、音節間の接続
を表すオートマトンで制御された最適音節列を最適解取
得手段により求め、最適解取得手段が求めた最適音節列
が正解の音節列に対応する尤度を記述した差分モデルと
単語の標準的な音節列を記述した単語辞書を参照し、最
適解取得手段が求めた最適音節列を入力し、単語辞書の
各単語について、単語と差分モデルに記述に基づいて単
語辞書の標準的な音節列を変形した音節グラフとを記述
した差分モデル適用単語辞書を参照し、単語列の侯補を
探索し、単語列侯補を単語列探索手段から出力するよう
に構成したので、メモリ量が増加するが、単語列探索手
段の処理において、動的な変換の演算を省略できるた
め、演算処理を高速化することができる効果がある。According to the present invention, the analysis result prepared by the voice analysis means for analyzing the input voice is input, and the optimum syllable sequence controlled by the automaton representing the connection between syllables is obtained by the optimum solution obtaining means, and the optimum solution is obtained. The optimal syllable string obtained by the optimal solution obtaining means is referred to the difference model describing the likelihood that the optimum syllable string obtained by the obtaining means corresponds to the correct syllable string and the word dictionary describing the standard syllable string of words. For each word in the word dictionary, refer to the difference model applied word dictionary that describes the word and a syllable graph obtained by transforming the standard syllable string of the word dictionary based on the description in the difference model, Although the configuration is such that the word string candidate is searched for and the word string candidate is output from the word string search means, the amount of memory is increased. However, in the processing of the word string search means, the operation of dynamic conversion can be omitted. Fast processing There is an effect that can be.

【００７０】この発明によれば、入力音声を分析する音
声分析手段の作成した分析結果を入力し、音節間の接続
を表すオートマトンで制御された最適音節列を最適解取
得手段により求め、最適音節列を入力し、最適解取得手
段が求めた最適音節列が正解の音節列に対応する尤度を
記述した差分モデルの記述に基づいて最適音節列を変形
してグラフを差分モデル適用音節グラフ作成手段により
作成し、差分モデル適用音節グラフ作成手段が作成した
グラフを入力して、単語の標準的な音節列を記述した単
語辞書を参照し、単語列の侯補を探索し、単語列の侯補
を単語列探索手段から出力するように構成したので、演
算処理を高速化することができる効果がある。According to the present invention, the analysis result prepared by the voice analysis means for analyzing the input voice is input, the optimum syllable sequence controlled by the automaton representing the connection between syllables is obtained by the optimum solution obtaining means, and the optimum syllable is obtained. Input a sequence and transform the optimal syllable sequence based on the description of the difference model that describes the likelihood that the optimal syllable sequence determined by the optimal solution obtaining means corresponds to the correct syllable sequence. Means, input the graph created by the difference model applied syllable graph creating means, refer to a word dictionary describing a standard syllable string of words, search for candidates of word strings, Since the complement is configured to be output from the word string search means, there is an effect that the arithmetic processing can be sped up.

【００７１】この発明によれば、入力音声を分析する音
声分析手段の作成した分析結果を入力し、音節間の接続
を表すオートマトンで制御された最適な上位Ｎ個の音節
からなる音節列をＮベスト解取得手段により求め、Ｎベ
スト解取得手段が求めた最適な上位Ｎ個の音節からなる
音節列を入力し、Ｎベスト解取得手段が求めた最適な上
位Ｎ個の音節からなる音節列が正解の音節列に対応する
尤度を記述した差分モデルと単語の標準的な音節列を記
述した単語辞書とを参照し単語列の侯補を探索し単語列
の侯補を単語列探索手段により出力するように構成した
ので、正解が求められないケースを減少し、認識率を向
上させることができる効果がある。According to the present invention, the analysis result created by the speech analysis means for analyzing the input speech is input, and the optimal syllable string composed of the upper N syllables controlled by the automaton representing the connection between the syllables is converted to N. The best syllable string composed of the top N syllables obtained by the best solution obtaining means and obtained by the N best solution obtaining means is inputted. By referring to the difference model describing the likelihood corresponding to the correct syllable sequence and the word dictionary describing the standard syllable sequence of the word, the candidate of the word sequence is searched, and the candidate of the word sequence is searched by the word sequence searching means. Since it is configured to output, it is possible to reduce the number of cases where a correct answer is not required and to improve the recognition rate.

【００７２】この発明によれば、入力音声を分析する音
声分析手段の作成した分析結果を入力し、音節間の接続
を表すオートマトンで制御され最適なＮ個の音節からな
る音節列をＮベスト解取得手段により求め、Ｎベスト解
取得手段が求めた最適なＮ個の音節からなる音節列が正
解の音節列に対応する尤度を記述した差分モデルと単語
の標準的な音節列を記述した単語辞書を参照し、Ｎベス
ト解取得手段が求めた最適音節列を入力し、単語辞書の
各単語について、単語と差分モデルに記述に基づいて単
語辞書の標準的な音節列を変形した音節グラフとを記述
した差分モデル適用単語辞書を参照し単語列の侯補を探
索し単語列の侯補を単語列探索手段により出力するよう
に構成したので、正解の単語列が求められないケースを
減少させ、認識率を向上させるという効果が得られると
ともに、辞書側の音節系列を変形する点が異なり、傾向
の異なる認識結果を得ることができる効果がある。According to the present invention, the analysis result created by the speech analysis means for analyzing the input speech is input, and the optimal syllable string controlled by the automaton representing the connection between syllables and composed of N syllables is converted into the N best syllables. A difference model describing the likelihood that the optimal syllable string composed of N syllables obtained by the obtaining means and obtained by the N best solution obtaining means corresponds to the correct syllable string, and a word describing the standard syllable string of the word A syllable graph obtained by inputting the optimal syllable sequence obtained by the N best solution obtaining means with reference to the dictionary, and transforming a standard syllable sequence of the word dictionary based on the description in the word and the difference model for each word in the word dictionary; Is configured to search for candidates for word strings by referring to the difference model applied word dictionary that describes the word strings, and to output candidates for word strings by means of word string search means. , Recognition rate With effect that improves, except that it modified the syllable sequence of dictionary side, there is an effect that it is possible to obtain different recognition results tend.

【００７３】この発明によれば、入力音声を分析する音
声分析手段の作成した分析結果を入力し、音節間の接続
を表すオートマトンで制御され最適なＮ個の音節からな
る音節列をＮベスト解取得手段により求め、最適な音節
列を入力し、Ｎベスト解取得手段が求めた最適なＮ個の
音節からなる音節列が正解の音節列に対応する尤度を記
述した差分モデルの記述に基づいて最適なＮ個の音節か
らなる音節列を変形してグラフを差分モデル適用音節グ
ラフ作成手段により作成し、差分モデル適用音節グラフ
作成手段が作成したグラフを入力して、単語の標準的な
音節列を記述した単語辞書を参照し単語列の侯補を探索
し単語列の侯補を単語列探索手段により出力するように
構成したので、正解の単語列が求められないケースを減
少させ、認識率を向上させるとともに、最適側の音節系
列を変形する点が異なり、傾向の異なる認識結果を得る
ことができるなどの効果が得られる。According to the present invention, the analysis result created by the speech analysis means for analyzing the input speech is input, and the optimal syllable string composed of N syllables controlled by the automaton representing the connection between the syllables is converted into the N best solution. The optimal syllable string obtained by the obtaining means is input, and the optimal syllable string composed of N syllables obtained by the N best solution obtaining means is described based on the description of the difference model describing the likelihood corresponding to the correct syllable string. A syllable string composed of N syllables is transformed to create a graph by the syllable graph applying means applying the difference model, and the graph created by the syllable graph applying means applying the difference model is inputted, and a standard syllable of the word is input. It is configured to search for candidates of word strings by referring to the word dictionary describing the strings and output candidates of word strings by the word string search means. Rate Causes the above, except that it modified the syllable sequence of optimal side effects such as can be obtained with different recognition results tend to obtain.

【００７４】この発明によれば、差分モデルにおいて、
最適な音節列の長さと単語辞書の音節列の長さとこれら
が対応する尤度を記述した単語音節長変換尤度テーブル
を備え、単語列探索手段は、単語音節長変換尤度テーブ
ルの尤度に基づいて単語列の侯補を探索するように構成
したので、極端な照合を防ぐことができ、無駄な仮説の
生成が削減され、探索処理の量を減少させることができ
る効果がある。According to the present invention, in the difference model,
A word syllable length conversion likelihood table describing the length of an optimal syllable string, the length of a syllable string in a word dictionary, and the likelihood corresponding thereto; Is configured to search for candidates of a word string based on the above. Therefore, it is possible to prevent an extreme collation, reduce the generation of useless hypotheses, and reduce the amount of search processing.

【００７５】この発明によれば、入力音声を分析する音
声分析手段の作成した分析結果を入力し、単語間の接続
を表すオートマトンで制御され最適な単語列を最適解取
得手段により求め、最適解取得手段が求めた最適な単語
列を音節列変換手段により音節列に変換し、音節列変換
手段が求めた最適音節列を入力し、音節列変換手段が求
めた音節列が正解の音節列に対応する尤度を記述した差
分モデルと単語の標準的な音節列を記述した単語辞書と
を参照し、単語列の侯補を探索し単語列の侯補を単語列
探索手段により出力するように構成したので、調音結合
の影響を受けることが少ない最適単語列を最適音節列に
戻すため、正解を求める可能性が増大することができる
効果がある。According to the present invention, the analysis result created by the speech analysis means for analyzing the input speech is input, and the optimum word string controlled by the automaton representing the connection between words is obtained by the optimum solution obtaining means, and the optimum solution is obtained. The optimal word string obtained by the acquiring means is converted into a syllable string by the syllable string converting means, and the optimal syllable string obtained by the syllable string converting means is inputted, and the syllable string obtained by the syllable string converting means becomes a correct syllable string. Referring to the difference model describing the corresponding likelihood and the word dictionary describing the standard syllable string of the word, the candidate of the word string is searched, and the candidate of the word string is output by the word string search means. With this configuration, the optimal word string that is less affected by articulation coupling is returned to the optimal syllable string, so that there is an effect that the possibility of finding a correct answer can be increased.

【００７６】この発明によれば、入力音声を分析する音
声分析手段の作成した分析結果を入力し、単語間の接続
を表すオートマトンで制御され最適な単語列を最適解取
得手段により求め、最適解取得手段が求めた最適単語列
を入力し、最適解取得手段が求めた最適な単語列が正解
の単語列に対応する尤度を記述した差分モデルと単語を
記述した単語辞書とを参照し単語列の侯補を探索し単語
列の侯補を単語列探索手段により出力するように構成し
たので、単語列侯補の探索を容易にすることができる効
果がある。According to the present invention, the analysis result created by the speech analysis means for analyzing the input speech is input, and the optimum word string controlled by the automaton representing the connection between words is obtained by the optimum solution obtaining means. The optimal word string obtained by the obtaining means is input, and the optimal word string obtained by the optimal solution obtaining means is referred to a difference model describing the likelihood corresponding to the correct word string and the word dictionary describing the word. Since the candidate of the string is searched and the candidate of the word string is output by the word string searching means, there is an effect that the search for the candidate of the word string can be facilitated.

【００７７】この発明によれば、差分モデルにおいて、
単語辞書の単語と対応する最適な単語列の長さとその尤
度を記述した単語音節長変換尤度テーブルを備え、単語
列探索手段は、単語音節長変換尤度テーブルの尤度に基
づいて単語列の侯補を探索するように構成したので、極
端な照合を防ぐことができ、無駄な仮説の生成が削減さ
れ、探索処理の量を減少させることができる効果があ
る。According to the present invention, in the difference model,
A word syllable length conversion likelihood table describing the length and likelihood of an optimal word string corresponding to a word in the word dictionary; and a word string search unit, based on the likelihood of the word syllable length conversion likelihood table. Since it is configured to search for a candidate in a column, it is possible to prevent extreme collation, to reduce generation of useless hypotheses, and to reduce the amount of search processing.

【００７８】この発明によれば、１段目で求めた最適解
と正解とが対応する尤度を表現した差分モデルを設け、
１段目で求めた最適解から差分モデルを適用して、２段
目の探索を行うように構成したので、１段目で最適解の
脱落を防止でき、また、２段目で、正解の脱落を少なく
することができる効果がある。According to the present invention, a difference model expressing the likelihood that the optimal solution obtained in the first stage corresponds to the correct solution is provided,
Since the second step is performed by applying the difference model from the optimal solution obtained in the first step, it is possible to prevent the optimal solution from dropping out in the first step, and to determine the correct answer in the second step. There is an effect that the dropout can be reduced.

【００７９】この発明によれば、入力音声の分析結果を
入力し、音節間の接続を表すオートマトンで制御された
最適な音節列を求め、この最適な音節列が正解の音節列
に対応する尤度を記述した差分モデルと単語の標準的な
音節列を記述した単語辞書とを参照し単語列の侯補を探
索し、単語列の侯補を出力するように構成したので、極
端な照合を防ぐことができ、無駄な仮説の生成が削減さ
れ、探索処理の量を減少させることができる効果があ
る。According to the present invention, an analysis result of an input speech is input, and an optimal syllable string controlled by an automaton representing a connection between syllables is obtained, and this optimal syllable string corresponds to a correct syllable string. It is configured to search for candidate word strings and output candidate word strings by referring to a difference model describing the degree and a word dictionary describing a standard syllable string of words. Thus, the generation of useless hypotheses can be reduced, and the amount of search processing can be reduced.

【００８０】この発明によれば、入力音声の分析結果を
入力し、音節間の接続を表すオートマトンで制御された
最適な音節列を求め、この最適な音節列が正解の音節列
に対応する尤度を記述した差分モデルと単語の標準的な
音節列を記述した単語辞書とを参照し、単語辞書の各単
語について、単語と差分モデルに記述に基づいて単語辞
書の標準的な音節列を変形した音節グラフとを記述した
差分モデル適用単語辞書を参照し、単語列の侯補を探索
し単語列の侯補を出力するように構成したので、メモリ
量が増加するが、単語列探索手段の処理において、動的
な変換の演算を省略できるため、演算処理を高速化する
ことができる効果がある。According to the present invention, an analysis result of an input speech is input, and an optimal syllable string controlled by an automaton representing a connection between syllables is obtained, and this optimal syllable string corresponds to a correct syllable string. By referring to the difference model describing the degree and the word dictionary describing the standard syllable string of the word, for each word in the word dictionary, the standard syllable string of the word dictionary is transformed based on the description in the word and the difference model. By referring to the difference model applied word dictionary describing the syllable graph and the candidate word string, the candidate word string is searched for and the candidate word string is output, so that the memory amount increases. In the processing, since the operation of the dynamic conversion can be omitted, there is an effect that the operation process can be sped up.

【００８１】この発明によれば、入力音声の分析結果を
入力し、音節間の接続を表すオートマトンで制御された
最適な音節列を求め、この最適な音節列が正解の音節列
に対応する尤度を記述した差分モデルの記述に基づいて
最適な音節列を変形してグラフを作成し、この作成した
グラフを入力して、単語の標準的な音節列を記述した単
語辞書を参照し単語列の侯補を探索し単語列の侯補を出
力するように構成したので、演算処理を高速化すること
ができる効果がある。According to the present invention, an analysis result of an input speech is inputted, and an optimal syllable string controlled by an automaton representing a connection between syllables is obtained, and this optimal syllable string corresponds to a correct syllable string. A graph is created by transforming the optimal syllable sequence based on the description of the difference model describing the degree, and the created graph is input, and the word sequence is referred to the word dictionary that describes the standard syllable sequence of the word. Since the candidate is searched for and the candidate of the word string is output, there is an effect that the arithmetic processing can be speeded up.

【００８２】この発明によれば、入力音声の分析結果を
入力し、音節間の接続を表すオートマトンで制御され最
適な上位Ｎ個の音節からなる音節列を求め、これら最適
な上位Ｎ個の音節からなる音節列を入力し、最適な上位
Ｎ個の音節からなる音節列が正解の音節列に対応する尤
度を記述した差分モデルと単語の標準的な音節列を記述
した単語辞書とを参照し、単語列の侯補を探索し、単語
列の侯補を出力するように構成したので、正解が求めら
れないケースを減少し、認識率を向上させることができ
る効果がある。According to the present invention, an analysis result of an input speech is input, and an optimal syllable string composed of upper N syllables controlled by an automaton representing connections between syllables is obtained. Syllable string consisting of the following, and refer to the difference model that describes the likelihood that the optimal syllable string consisting of the top N syllables corresponds to the correct syllable string and the word dictionary that describes the standard syllable string of words Since the candidate of the word string is searched for and the candidate of the word string is output, it is possible to reduce the number of cases in which a correct answer is not required and to improve the recognition rate.

【００８３】この発明によれば、入力音声の分析結果を
入力し、音節間の接続を表すオートマトンで制御され最
適なＮ個の音節からなる音節列を求め、この最適音節列
を入力し、この最適なＮ個の音節からなる音節列が正解
の音節列に対応する尤度を記述した差分モデルと単語の
標準的な音節列を記述した単語辞書を参照し、この単語
辞書の各単語について、単語と差分モデルに記述に基づ
いて単語辞書の標準的な音節列を変形した音節グラフと
を記述した差分モデル適用単語辞書を参照し、単語列の
侯補を探索し、単語列の侯補を出力するように構成した
ので、正解の単語列が求められないケースを減少させ、
認識率を向上させるという効果が得られるとともに、辞
書側の音節系列を変形する点が異なり、傾向の異なる認
識結果を得ることができる効果がある。According to the present invention, an analysis result of an input speech is inputted, a syllable string composed of N optimum syllables controlled by an automaton representing a connection between syllables is obtained, and this optimum syllable string is inputted. With reference to the difference model describing the likelihood that the syllable string composed of the optimal N syllables corresponds to the correct syllable string and the word dictionary describing the standard syllable string of the word, for each word in this word dictionary, By referring to the difference model applied word dictionary that describes a word and a syllable graph obtained by transforming a standard syllable string of the word dictionary based on the description in the difference model, searching for a candidate for the word string, and finding a candidate for the word string. Because it was configured to output, reduce the case where the correct word string is not found,
The effect of improving the recognition rate is obtained, and the effect is that the syllable sequence on the dictionary side is different, so that recognition results with different tendencies can be obtained.

【００８４】この発明によれば、入力音声の分析結果を
入力し、音節間の接続を表すオートマトンで制御され最
適なＮ個の音節からなる音節列を求め、この最適なＮ個
の音節からなる音節列が正解の音節列に対応する尤度を
記述した差分モデルの記述に基づいて最適なＮ個の音節
からなる音節列を変形してグラフを作成し、この作成し
たグラフを入力して、単語の標準的な音節列を記述した
単語辞書を参照し単語列の侯補を探索し単語列の侯補を
出力するように構成したので、正解の単語列が求められ
ないケースを減少させ、認識率を向上させるとともに、
最適側の音節系列を変形する点が異なり、傾向の異なる
認識結果を得ることができるなどの効果が得られる。According to the present invention, an analysis result of an input speech is input, and an optimal syllable string composed of N syllables controlled by an automaton representing a connection between syllables is obtained. Based on the description of the difference model that describes the likelihood that the syllable string corresponds to the correct syllable string, a graph is created by modifying the optimal syllable string composed of N syllables, and the created graph is input. By referring to a word dictionary that describes a standard syllable string of words, searching for candidates for word strings and outputting candidates for word strings, the number of cases where correct word strings are not obtained is reduced, While improving the recognition rate,
The difference is that the syllable sequence on the optimum side is modified, and effects such as the recognition results having different tendencies can be obtained.

【００８５】この発明によれば、入力音声の分析結果を
入力し、単語間の接続を表すオートマトンで制御された
最適な単語列を求め、この最適な単語列を音節列に変換
し、この音節列が正解の音節列に対応する尤度を記述し
た差分モデルと単語の標準的な音節列を記述した単語辞
書とを参照し、単語列の侯補を探索し、単語列の侯補を
出力するように構成したので、調音結合の影響を受ける
ことが少ない最適単語列を最適音節列に戻すため、正解
を求める可能性が増大することができる効果がある。According to the present invention, an analysis result of an input speech is input, an optimum word string controlled by an automaton representing a connection between words is obtained, and this optimum word string is converted into a syllable string. Referencing a difference model describing the likelihood corresponding to the correct syllable sequence and a word dictionary describing the standard syllable sequence of words, searching for candidate words and outputting candidate words With this configuration, the optimal word string that is less affected by articulation coupling is returned to the optimal syllable string, so that the possibility of finding the correct answer can be increased.

【００８６】この発明によれば、入力音声の分析結果を
入力し、単語間の接続を表すオートマトンで制御された
最適な単語列を求め、この最適な単語列が正解の単語列
に対応する尤度を記述した差分モデルと単語を記述した
単語辞書とを参照し、単語列の侯補を探索し、単語列の
侯補を出力するように構成したので、単語列侯補の探索
を容易にすることができる効果がある。According to the present invention, an analysis result of an input speech is input, and an optimal word sequence controlled by an automaton representing a connection between words is obtained. This optimal word sequence corresponds to a correct word sequence. By referring to the difference model describing the degree and the word dictionary describing the word, the candidate of the word string is searched, and the candidate of the word string is output, so that the search for the candidate of the word string can be easily performed. There is an effect that can be.

[Brief description of the drawings]

【図１】この発明の実施の形態１による連続音声認識
用の探索装置を示す構成図である。FIG. 1 is a configuration diagram showing a search device for continuous speech recognition according to a first embodiment of the present invention.

【図２】この発明の実施の形態１による連続音声認識
用の探索装置において、音節ネットワークを示す説明図
である。FIG. 2 is an explanatory diagram showing a syllable network in the search device for continuous speech recognition according to the first embodiment of the present invention.

【図３】この発明の実施の形態１による連続音声認識
用の探索装置において、基本ＨＭＭを示す説明図であ
る。FIG. 3 is an explanatory diagram showing a basic HMM in the search device for continuous speech recognition according to the first embodiment of the present invention;

【図４】この発明の実施の形態１による連続音声認識
用の探索装置において、オートマトン制御を示すアルゴ
リズムである。FIG. 4 is an algorithm showing automaton control in the search device for continuous speech recognition according to the first embodiment of the present invention.

【図５】この発明の実施の形態１による連続音声認識
用の探索装置において、単語辞書の例を示す説明図であ
る。FIG. 5 is an explanatory diagram showing an example of a word dictionary in the search device for continuous speech recognition according to the first embodiment of the present invention.

【図６】この発明の実施の形態１による連続音声認識
用の探索装置において、差分モデルを示す構成図であ
る。FIG. 6 is a configuration diagram showing a difference model in the search device for continuous speech recognition according to the first embodiment of the present invention.

【図７】この発明の実施の形態１による連続音声認識
用の探索装置において、音節列間変換尤度テーブルの例
を示す表図である。FIG. 7 is a table showing an example of an inter-syllable string conversion likelihood table in the search device for continuous speech recognition according to the first embodiment of the present invention;

【図８】この発明の実施の形態１による連続音声認識
用の探索装置において、差分モデルの学習手段の例を示
す構成図である。FIG. 8 is a configuration diagram showing an example of a difference model learning means in the search device for continuous speech recognition according to the first embodiment of the present invention;

【図９】この発明の実施の形態１による連続音声認識
用の探索装置において、単語列探索手段の動作手順を示
すフローチャートである。FIG. 9 is a flowchart showing an operation procedure of a word string search unit in the search device for continuous speech recognition according to the first embodiment of the present invention.

【図１０】この発明の実施の形態１による連続音声認
識用の探索装置において、最適音節列と単語ｎの標準音
節列との照合手順を示すフローチャートである。FIG. 10 is a flowchart showing a procedure for collating an optimal syllable string with a standard syllable string of word n in the search device for continuous speech recognition according to the first embodiment of the present invention.

【図１１】この発明の実施の形態１による連続音声認
識用の探索装置において、照合動作を示す説明図であ
る。FIG. 11 is an explanatory diagram showing a collation operation in the search device for continuous speech recognition according to the first embodiment of the present invention.

【図１２】この発明の実施の形態２による連続音声認
識用の探索装置において、単語辞書を示す説明図であ
る。FIG. 12 is an explanatory diagram showing a word dictionary in the search device for continuous speech recognition according to the second embodiment of the present invention.

【図１３】この発明の実施の形態２による連続音声認
識用の探索装置において、差分モデルを示す構成図であ
る。FIG. 13 is a configuration diagram showing a difference model in the search device for continuous speech recognition according to the second embodiment of the present invention.

【図１４】この発明の実施の形態２による連続音声認
識用の探索装置において、単語音節長変換尤度テーブル
の例を示す表図である。FIG. 14 is a table showing an example of a word syllable length conversion likelihood table in the search device for continuous speech recognition according to the second embodiment of the present invention.

【図１５】この発明の実施の形態３による連続音声認
識用の探索装置を示す構成図である。FIG. 15 is a configuration diagram showing a search device for continuous speech recognition according to a third embodiment of the present invention.

【図１６】この発明の実施の形態４による連続音声認
識用の探索装置を示す構成図である。FIG. 16 is a configuration diagram showing a search device for continuous speech recognition according to a fourth embodiment of the present invention.

【図１７】この発明の実施の形態５による連続音声認
識用の探索装置を示す構成図である。FIG. 17 is a configuration diagram showing a search device for continuous speech recognition according to a fifth embodiment of the present invention.

【図１８】この発明の実施の形態６による連続音声認
識用の探索装置を示す構成図である。FIG. 18 is a configuration diagram showing a search device for continuous speech recognition according to a sixth embodiment of the present invention.

【図１９】この発明の実施の形態７による連続音声認
識用の探索装置を示す構成図である。FIG. 19 is a configuration diagram showing a search device for continuous speech recognition according to a seventh embodiment of the present invention.

【図２０】この発明の実施の形態８による連続音声認
識用の探索装置を示す構成図である。FIG. 20 is a configuration diagram showing a search device for continuous speech recognition according to an eighth embodiment of the present invention.

【図２１】この発明の実施の形態９による連続音声認
識用の探索装置を示す構成図である。FIG. 21 is a configuration diagram showing a search device for continuous speech recognition according to a ninth embodiment of the present invention.

【図２２】この発明の実施の形態９による連続音声認
識用の探索装置において、差分モデルを示す構成図であ
る。FIG. 22 is a configuration diagram showing a difference model in the search device for continuous speech recognition according to the ninth embodiment of the present invention.

【図２３】この発明の実施の形態９による連続音声認
識用の探索装置において、単語列単語間変換テーブルを
示す表図である。FIG. 23 is a table showing a word string word-to-word conversion table in the search device for continuous speech recognition according to the ninth embodiment of the present invention;

【図２４】この発明の実施の形態９による連続音声認
識用の探索装置において、単語列探索手段の単語列探索
手順を示すフローチャートである。FIG. 24 is a flowchart showing a word string search procedure of the word string search means in the search device for continuous speech recognition according to the ninth embodiment of the present invention.

【図２５】この発明の実施の形態９による連続音声認
識用の探索装置において、単語と最適単語列の部分単語
列と照合手順を示すフローチャートである。FIG. 25 is a flowchart showing a word and a partial word string of an optimal word string and a collation procedure in the search device for continuous speech recognition according to the ninth embodiment of the present invention;

【符号の説明】２最適解取得手段、４最適音節列、５単語列探索
手段、６差分モデル、７単語辞書、８単語列侯
補、１３差分モデル適用単語辞書、１５差分モデル
適用音節グラフ作成手段、２１Ｎベスト解取得手段、
１０１入力音声、１０２音声分析手段、６０１音
節列間変換尤度テーブル、６０２単語音節長変換尤度
テーブル。[Description of Signs] 2 Optimum solution obtaining means, 4 Optimum syllable string, 5 Word string searching means, 6 Difference model, 7 Word dictionary, 8 Word string candidate, 13 Difference model applied word dictionary, 15 Difference model applied syllable graph creation Means, 21 N best solution obtaining means,
101 input speech, 102 speech analysis means, 601 syllable sequence conversion likelihood table, 602 word syllable length conversion likelihood table.

Claims

[Claims]

1. An audio analysis means for analyzing an input audio, and an analysis result created by the audio analysis means is searched for an optimal solution at a first stage, and a second stage is obtained by modifying the optimal solution at the first stage. In a search device for continuous speech recognition performing a search, a difference model expressing a likelihood corresponding to an optimal solution obtained in the first step and a correct answer is provided, and the difference model is calculated from the optimal solution obtained in the first step. A search device for continuous speech recognition, wherein the search is performed in the second stage by applying the search.

2. Speech analysis means for analyzing an input speech, an optimum solution obtaining means for inputting an analysis result created by the speech analysis means and obtaining an optimum syllable sequence controlled by an automaton representing a connection between syllables, A difference model describing the likelihood that the optimal syllable string determined by the optimal solution obtaining means corresponds to a correct syllable string; a word dictionary describing a standard syllable string of words; Word string searching means for inputting an optimal syllable string, referring to the difference model and the word dictionary, searching for a candidate for the word string, and outputting a candidate for the word string. Item 2. A search device for continuous speech recognition according to Item 1.

3. The difference model is a syllable string conversion likelihood table that describes a partial syllable string of an optimal syllable string, a partial syllable string of a correct syllable string, and their corresponding likelihoods. 3. The search device for continuous speech recognition according to claim 2, wherein the search for a candidate word string is performed based on the likelihood described in the syllable string conversion likelihood table.

4. A difference model comprises: a syllable string conversion likelihood table describing a partial syllable string of an optimal syllable string, a partial syllable string of a correct syllable string, and their corresponding likelihoods; A word syllable length conversion likelihood table that describes the lengths of syllable strings in the word dictionary and the likelihoods corresponding to the word syllable strings; the word string search means includes the inter-syllable string conversion likelihood table and the word syllable length conversion likelihood table. 3. A candidate word string is searched for based on the likelihood described in (2).
A search device for continuous speech recognition according to any of the preceding claims.

5. Speech analysis means for analyzing an input speech, an optimum solution obtaining means for inputting the analysis result created by the speech analysis means and obtaining an optimum syllable string controlled by an automaton representing a connection between syllables, A difference model describing the likelihood that the optimal syllable sequence obtained by the optimal solution obtaining means corresponds to a correct syllable sequence, a word dictionary describing a standard syllable sequence of words, the difference model and the word dictionary A difference model-applied word dictionary that describes a word and a syllable graph obtained by transforming a standard syllable string of the word dictionary based on the description of the difference model for each word in the word dictionary; And a word string searching means for inputting the optimal syllable string, referring to the difference model applied word dictionary, searching for a candidate for the word string, and outputting a candidate for the word string. 1 continuous sound Searching device for recognition.

6. Speech analysis means for analyzing input speech, optimal solution obtaining means for inputting the analysis result created by the speech analysis means and obtaining an optimal syllable string controlled by an automaton representing connections between syllables, A difference model describing the likelihood that the optimum syllable string obtained by the optimum solution obtaining means corresponds to the correct syllable string; and the optimum syllable string based on the description of the difference model by inputting the optimum syllable string. Inputting the graph created by the difference model applied syllable graph creating means, and a word dictionary describing a standard syllable string of words; 2. The search apparatus for continuous speech recognition according to claim 1, further comprising: a word string search unit that searches for a word string candidate by referring to a dictionary and outputs a candidate word string.

7. Speech analysis means for analyzing an input speech, and an analysis result created by the speech analysis means is inputted, and a syllable string composed of upper N best syllables controlled by an automaton representing connections between syllables is obtained. A N-best solution obtaining means to be obtained, a difference model describing the likelihood that the optimal syllable string composed of the top N syllables obtained by the N-best solution obtaining means corresponds to a correct syllable string, A word dictionary that describes a syllable string and a syllable string composed of the optimal top N syllables obtained by the N best solution obtaining means are input, and the difference model and the word dictionary are referenced to find a word string candidate. 2. A search device for continuous speech recognition according to claim 1, further comprising: a word string search means for searching and outputting a word string candidate.

8. A voice analysis means for analyzing an input voice, and an analysis result created by the voice analysis means is input, and a syllable string composed of N optimal syllables controlled by an automaton representing connections between syllables is obtained. N-best solution obtaining means to be obtained, a difference model describing the likelihood that the optimal syllable string composed of N syllables obtained by the N-best solution obtaining means corresponds to a correct syllable string, and a standard syllable of a word A word dictionary describing a sequence, and a syllable graph obtained by transforming a standard syllable sequence of the word dictionary based on the description of the word and the difference model for each word of the word dictionary with reference to the difference model and the word dictionary. The described difference model application word dictionary and the optimal syllable string obtained by the N best solution obtaining means are input,
2. A search device for continuous speech recognition according to claim 1, further comprising: a word string search means for searching for a word string candidate with reference to the difference model applied word dictionary and outputting the word string candidate. .

9. Speech analysis means for analyzing an input speech, and an analysis result created by the speech analysis means is inputted, and an optimal syllable string composed of N syllables controlled by an automaton representing a connection between syllables is obtained. N best solution acquiring means, a difference model describing the likelihood that the optimal N syllable string obtained by the N best solution acquiring means corresponds to a correct syllable string, and the optimal syllable string are input. And a difference model-applied syllable graph creating means for creating a graph by transforming the optimal syllable string of N syllables based on the description of the difference model, and a word dictionary describing a standard syllable string of words And word string search means for inputting a graph created by the difference model applied syllable graph creation means, referring to the word dictionary, searching for word string candidates, and outputting word string candidates. Characterized by Motomeko 1 Continuous seeker for speech recognition according.

10. A word syllable length conversion likelihood table which describes an optimal syllable string length, a syllable string length of a word dictionary, and a likelihood corresponding to each other as a difference model. The search device for continuous speech recognition according to any one of claims 5 to 9, wherein a candidate word string is searched for based on the likelihood of the syllable length conversion likelihood table.

11. Speech analysis means for analyzing input speech, optimal solution acquisition means for inputting an analysis result created by the speech analysis means and obtaining an optimal word string controlled by an automaton representing a connection between words, A syllable string conversion means for converting the optimum word string obtained by the optimum solution obtaining means into a syllable string, and a difference model describing the likelihood that the syllable string obtained by the syllable string conversion means corresponds to a correct syllable string, A word dictionary describing a standard syllable string of a word, and an optimal syllable string obtained by the syllable string conversion means are input, and the difference model and the word dictionary are referred to, a word string candidate is searched, and a word is searched. 2. The search apparatus for continuous speech recognition according to claim 1, further comprising: a word string search unit that outputs a candidate string.

12. Speech analysis means for analyzing an input speech, an optimum solution obtaining means for inputting the analysis result created by the speech analysis means and obtaining an optimal word string controlled by an automaton representing a connection between words, A difference model describing the likelihood that the optimal word sequence obtained by the optimal solution obtaining means corresponds to the correct word sequence, a word dictionary describing the words, and the optimal word sequence obtained by the optimal solution obtaining device are input. 2. The continuous speech according to claim 1, further comprising: a word string search unit that searches for a candidate of a word string by referring to the difference model and the word dictionary, and outputs a candidate of the word string. Search device for recognition.

13. A word syllable length conversion likelihood table which describes an optimal word string length corresponding to a word in a word dictionary and its likelihood as a difference model. 13. The search device for continuous speech recognition according to claim 11, wherein a search for a candidate word string is performed based on the likelihood of the likelihood table.

14. A search method for continuous speech recognition in which an analysis result of an input speech is obtained in a first step and a search in a second step is performed by modifying the optimum solution in the first step. Providing a difference model expressing the likelihood between the optimal solution obtained in the first stage and the correct solution, and performing the second stage search by applying the difference model from the optimal solution obtained in the first stage A search method for continuous speech recognition characterized by the following.

15. A difference model in which an analysis result of an input speech is input, an optimal syllable sequence controlled by an automaton representing connections between syllables is obtained, and a likelihood corresponding to the optimal syllable sequence corresponding to a correct syllable sequence is described. 15. The search method for continuous speech recognition according to claim 14, wherein a candidate word string is searched for with reference to a word dictionary describing a standard syllable string of words and a candidate word string is output.

16. A difference model in which an analysis result of an input speech is input, an optimal syllable sequence controlled by an automaton representing a connection between syllables is obtained, and a likelihood corresponding to the optimal syllable sequence corresponding to a correct syllable sequence is described. And a word dictionary describing a standard syllable string of the word, and for each word of the word dictionary, a syllable graph obtained by transforming the standard syllable string of the word dictionary based on the description of the word and the difference model. The search method for continuous speech recognition according to claim 14, wherein a candidate word sequence is searched for by referring to a difference model application word dictionary describing the word sequence and the candidate word sequence is output.

17. A difference model in which an analysis result of an input speech is input, an optimal syllable sequence controlled by an automaton representing a connection between syllables is obtained, and a likelihood corresponding to the optimal syllable sequence corresponding to a correct syllable sequence is described. Based on the description, a graph is created by transforming the above optimal syllable sequence, and the created graph is input, and a word dictionary describing standard syllable sequences of words is searched for word sequence candidates. 15. The search method for continuous speech recognition according to claim 14, wherein a candidate word string is output.

18. An analysis result of an input voice is input, and an optimal syllable string composed of upper N syllables controlled by an automaton representing a connection between syllables is obtained, and a syllable composed of these optimal upper N syllables is obtained. The sequence is input, and a difference model that describes the likelihood that the optimal syllable sequence consisting of the top N syllables corresponds to the correct syllable sequence and a word dictionary that describes a standard syllable sequence of words are referred to. 15. The search method for continuous speech recognition according to claim 14, further comprising searching for candidate strings and outputting candidate word strings.

19. An analysis result of an input voice is input, a syllable string composed of N optimum syllables controlled by an automaton representing a connection between syllables is obtained, and this optimum syllable string is input. Reference is made to a difference model describing the likelihood that a syllable string composed of syllables corresponds to a correct syllable string and a word dictionary describing a standard syllable string of words. Referring to a difference model applied word dictionary describing a syllable graph obtained by transforming a standard syllable string of the word dictionary based on the model description, searching for a word string candidate, and outputting a word string candidate. Claim 1.
5. The search method for continuous speech recognition according to 4.

20. An analysis result of an input speech is input, and an optimal syllable string composed of N syllables controlled by an automaton representing connection between syllables is obtained. A graph is created by transforming an optimal syllable string composed of N syllables based on the description of the difference model that describes the likelihood corresponding to the syllable string of the syllable string. 15. The search method for continuous speech recognition according to claim 14, wherein a candidate word string is searched for by referring to a word dictionary describing a syllable string, and the candidate word string is output.

21. An analysis result of an input voice is inputted, an optimum word string controlled by an automaton representing a connection between words is obtained, and the optimum word string is converted into a syllable string, and the syllable string is a correct syllable string. And searching for a candidate word string and outputting a candidate word string by referring to a difference model describing the likelihood corresponding to the word and a word dictionary describing a standard syllable string of the word. 15. The search method for continuous speech recognition according to 14.

22. A difference model in which an analysis result of an input speech is input, an optimum word string controlled by an automaton representing a connection between words is obtained, and a likelihood that the optimum word string corresponds to a correct word string is described. 15. The search method for continuous speech recognition according to claim 14, further comprising: searching for a candidate word string and outputting the candidate word string with reference to the word dictionary describing the word.