JPH0883091A

JPH0883091A - Voice recognition device

Info

Publication number: JPH0883091A
Application number: JP6215958A
Authority: JP
Inventors: Kenji Mizutani; 研治水谷; Makoto Hirai; 誠平井
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1994-09-09
Filing date: 1994-09-09
Publication date: 1996-03-26

Abstract

PURPOSE: To provide a voice recognition device having a high recognition rate. CONSTITUTION: The device utilizes a voice inputting device 101 which converts a voice into electrical signals as inputs and a voice signal recording device 102 which records the electrical signals outputted by the device 101 as input patterns. A collating pattern generating device 103 is used to generate voice standard patterns to recognize the input patterns. In order to surely determine the end of a voice of the input pattern held by the device 102, a recognition segment determining device 104 applies a word spotting system. The input patterns between the previous recognition end point and the end point determined by the device 104 and the standard patterns are collated and the degree of coincidence is computed. A voice signal collating device 105 arranges the patterns in the order of higher degree of coincidence and outputs them. To avoid misrecongnition, the pattern recognition results are verified using the matching to the context and the order of recognition candidates are charged. Then, a context control device 106 is used to predict the next input voice and to control the standard patterns generated by the device 103.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は人間の音声を認識する装
置に関するものであり、特に電子機器の入力装置に関す
るものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a device for recognizing human voice, and more particularly to an input device for electronic equipment.

【０００２】[0002]

【従来の技術】近年、電子機器の操作を迅速かつ容易に
するために音声認識に関する研究開発がなされている。
入力される音声は電気信号に変換され、時間軸に対する
電圧の変化として扱われる。音声の認識は、一般にその
入力パターンとあらかじめ統計的に学習されている音声
の標準パターンとの差異を、時間的伸縮を考慮しながら
比較し、最も差異が小さい標準パターンを選択すること
によって行われる（例えば、特開平４−３６２６９
８）。この方式では認識精度を向上させるために、標準
パターンに無音パターンを接続してから入力パターンと
の比較を行っている。2. Description of the Related Art In recent years, research and development relating to voice recognition have been carried out in order to quickly and easily operate electronic devices.
The input voice is converted into an electric signal and treated as a change in voltage with respect to the time axis. Speech recognition is generally performed by comparing the difference between the input pattern and the standard pattern of speech that has been statistically learned beforehand in consideration of temporal expansion and contraction, and selecting the standard pattern with the smallest difference. (For example, Japanese Patent Laid-Open No. 4-36269
8). In this method, in order to improve recognition accuracy, a silent pattern is connected to a standard pattern and then compared with an input pattern.

【０００３】[0003]

【発明が解決しようとする課題】前述の認識精度向上技
術には２つの問題がある。第１の問題は、入力パターン
と標準パターンの比較をワードスポッティング方式で行
っているために、標準パターンの長さと数に比例して計
算量が爆発的に多くなることである。第２の問題は促音
を含む入力パターンに対しては、促音と無音とを誤認識
して認識率が逆に低下する可能性があることである。The above-mentioned recognition accuracy improving technique has two problems. The first problem is that since the comparison between the input pattern and the standard pattern is performed by the word spotting method, the calculation amount explosively increases in proportion to the length and the number of standard patterns. The second problem is that, for an input pattern including a consonant, the consonant and the silence may be erroneously recognized, and the recognition rate may be decreased.

【０００４】本発明では、音声が無音を伴って終了する
ときの、文法によって支配される音韻的特徴に着目し、
ワードスポッティング方式と、認識の開始点と終了点を
あらかじめ確定してから認識を行う従来の音声認識方式
を組み合わせることにより、上記の問題を解決する。The present invention focuses on the phonological features governed by grammar when speech ends with silence.
The above problem is solved by combining the word spotting method and the conventional voice recognition method in which the start point and the end point of the recognition are previously determined and then the recognition is performed.

【０００５】[0005]

【課題を解決するための手段】音声を入力として電気信
号に変換する音声入力装置と、音声入力装置が出力する
電気信号を入力パターンとして記録する音声信号記録装
置を用意する。入力パターンを認識するための音声の標
準パターンを生成するために、照合パターン生成装置を
用いる。音声信号記録装置が保持する入力パターン上の
音声の終端を確定するために認識区間確定装置を用い
る。そして、前回の認識終了点から認識区間確定装置が
確定した終了点までの入力パターンと標準パターンとを
照合して一致度を計算し、一致度の高い順に標準パター
ンを並べて出力する音声信号照合装置を用いる。誤認識
を防ぐために、パターン認識結果の文脈との整合性を検
証して認識候補の順位を入れ換え、かつ、次に入力され
る音声を予測して照合パターン生成装置が生成する標準
パターンを制御するために文脈管理装置を用いる。A voice input device for converting a voice into an electric signal as an input and a voice signal recording device for recording an electric signal output by the voice input device as an input pattern are prepared. A matching pattern generation device is used to generate a standard voice pattern for recognizing an input pattern. The recognition section determining device is used to determine the end of the voice on the input pattern held by the voice signal recording device. Then, a voice signal collating device that collates the input pattern from the last recognition end point to the end point confirmed by the recognition section confirming device with the standard pattern to calculate the degree of coincidence, and arranges and outputs the standard patterns in descending order of the degree of coincidence. To use. In order to prevent erroneous recognition, the consistency of the pattern recognition result with the context is verified, the order of recognition candidates is changed, and the next input voice is predicted to control the standard pattern generated by the matching pattern generation device. A context management device is used for this.

【０００６】音声信号記録装置は、音声入力装置が出力
する音声の電気信号の情報量を圧縮する情報圧縮装置
と、その出力を記録する入力パターン記録装置で構成す
る。The audio signal recording device is composed of an information compression device for compressing the information amount of the electric signal of the audio output from the audio input device and an input pattern recording device for recording the output.

【０００７】照合パターン生成装置は、認識する音声を
構成する単語とその発音を音素表記で記述した語彙情報
格納装置と、単語と発音についてその接続規則を記述し
た文法情報格納装置と、音素のモデルを保持する音素モ
デル格納装置と、無音のモデルを格納する無音モデル格
納装置と、それらが出力する情報を参照して文脈管理装
置が指示する標準パターンを生成する標準パターン生成
装置で構成する。The collation pattern generation device is a vocabulary information storage device that describes the words that make up the recognized speech and their pronunciations in phoneme notation, a grammatical information storage device that describes the connection rules for words and pronunciations, and a phoneme model. , A phoneme model storage device that holds, a silence model storage device that stores a silence model, and a standard pattern generation device that generates a standard pattern instructed by the context management device by referring to the information output by them.

【０００８】認識区間確定装置は、標準パターンの終端
近傍のパターンを生成する終端パターン生成装置と、入
力パターンの中に終端近傍のパターンの存在を認識する
ワードスポッティング装置で構成する。The recognition section determining device is composed of a terminal pattern generating device for generating a pattern near the terminal end of the standard pattern and a word spotting device for recognizing the existence of the pattern near the terminal end in the input pattern.

【０００９】音声信号照合装置は、ワードスポッティン
グ装置が認識した終端近傍のパターンを含む標準パター
ンだけを選択する照合パターン絞り込み装置と、前回の
認識終了点から今回の認識終了点までの入力パターンを
標準パターンと照合して一致度の高いものから順に並べ
て出力するパターン照合装置で構成する。The voice signal collating device selects a standard pattern including only the pattern near the end recognized by the word spotting device, and a standard input pattern from the last recognition end point to the current recognition end point. It is configured by a pattern matching device that matches patterns and outputs them in order from the one having the highest degree of matching.

【００１０】文脈管理装置は、入力された音声の認識を
開始する時点までの認識結果を記録する発話履歴管理装
置と、音声が発せられる世界に関する知識を格納する対
象世界知識格納装置と、それらに整合する認識候補ほど
高い妥当性を与えて、認識結果の並びを変え、発話履歴
管理装置の内容を更新する認識結果修正装置と、認識履
歴格納装置の内容と対象世界情報格納装置の内容を参照
して次発話を予測するための認識候補生成装置で構成す
る。The context management device includes an utterance history management device that records the recognition result up to the time when the recognition of the input voice is started, a target world knowledge storage device that stores knowledge about the world in which the voice is emitted, and Refer to the recognition result correction device that changes the arrangement of the recognition results and updates the contents of the utterance history management device, and the contents of the recognition history storage device and the contents of the target world information storage device, giving higher validity to matching recognition candidates. Then, the recognition candidate generating apparatus for predicting the next utterance is used.

【００１１】[0011]

【作用】ワードスポッティング方式による音声認識は、
短い音韻列について行っているので、その数が増加して
も計算量の増加は小さい。また、促音を含む音声につい
ては、短い音韻と無音パターンを接続し、認識範囲を確
定してから単語全体の音声認識を行うので、促音を認識
の終了点として誤認識する率が低下する。[Operation] Speech recognition by word spotting
Since it is performed for a short phoneme sequence, the increase in the amount of calculation is small even if the number is increased. In addition, for a voice including a consonant, short phonemes and silent patterns are connected, and the recognition range is determined, and then the speech recognition of the entire word is performed. Therefore, the rate of erroneous recognition of the consonant as the end point of the recognition decreases.

【００１２】[0012]

【実施例】本発明の一実施例の音声認識装置の全体の構
成を表すブロック図を図１に示す。音声入力装置１０１
は、音声を収音し電気信号に変換する。音声信号記録装
置１０２は、電気信号に変換された音声を記録する。照
合パターン生成装置１０３は入力パターンと照合すべき
音声の標準パターンを出力する。認識区間確定装置１０
４は標準パターンと照合すべき入力パターンの範囲を確
定する。音声信号照合装置１０５は前回の認識終了点か
ら認識区間確定装置１０４が検出した終了点までの入力
パターンを標準パターンと照合し、認識結果を出力す
る。文脈管理装置１０６は得られた認識結果の、これま
での認識結果が作ってきた文脈との整合性を検証し、認
識結果の順位を入れ換え、さらに次発話を予測して照合
パターン生成装置１０５が生成すべき標準パターンに関
する情報を出力する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS FIG. 1 is a block diagram showing the overall configuration of a voice recognition device according to an embodiment of the present invention. Voice input device 101
Collects voice and converts it into an electrical signal. The audio signal recording device 102 records the audio converted into an electric signal. The matching pattern generation device 103 outputs a standard pattern of voice to be matched with the input pattern. Recognition section determination device 10
4 determines the range of the input pattern to be matched with the standard pattern. The voice signal matching device 105 matches the input pattern from the last recognition end point to the end point detected by the recognition section determination device 104 with the standard pattern, and outputs the recognition result. The context management device 106 verifies the consistency of the obtained recognition result with the context created by the previous recognition result, exchanges the order of the recognition result, further predicts the next utterance, and the matching pattern generation device 105 Prints information about standard patterns to generate.

【００１３】[0013]

【表１】 [Table 1]

【００１４】次に本発明の動作例を、（表１）に示す会
社の社内案内の音声対話システムに応用した例をあげて
説明する。ユーザが「営業部の、八田さんは、どちらで
しょうか。」と発声すると、音声入力装置１０１はそれ
を電気信号に変換し、音声信号記録装置１０２はその波
形を有限時間分記録する。図２に音声信号記録装置１０
２の内部構成を表すブロック図を示す。音声入力装置１
０１によって電気信号に変換された音声は情報圧縮装置
２０１に入力され、記録容量を削減するために情報量が
圧縮される。情報量の圧縮には入力された音声を人間が
聞いて判別できる程度に復元可能な近似方法を用いる。
圧縮された音声信号は入力パターンとして情報記録装置
２０２に記録される。Next, an operation example of the present invention will be described with reference to an example in which it is applied to a company in-company guidance voice dialogue system shown in (Table 1). When the user utters "Which is Mr. Hatta of the sales department?", The voice input device 101 converts it into an electric signal, and the voice signal recording device 102 records the waveform for a finite time. The audio signal recording device 10 is shown in FIG.
2 is a block diagram showing the internal configuration of FIG. Voice input device 1
The sound converted into an electric signal by 01 is input to the information compression device 201, and the information amount is compressed in order to reduce the recording capacity. To compress the amount of information, an approximation method is used that can restore the input voice to the extent that it can be discriminated by human hearing.
The compressed audio signal is recorded in the information recording device 202 as an input pattern.

【００１５】照合パターン生成装置１０３は、文脈管理
装置１０６の指示により、あらかじめ認識すべき候補の
標準パターンを生成している。対話の開始時点ではユー
ザは部署と名前を発声することが予測されるので、部署
については「社長室の」「営業部の」「経理部の」、人
名については「川田さんを」などを生成している。図３
に照合パターン生成装置１０３の内部構成を表すブロッ
ク図を示す。語彙情報格納装置３０１は、ユーザが発話
すると想定される語彙とその発音の音素表記を格納す
る。社内案内の場合では、名詞として部署名、姓、名、
役職、性別を、助詞として「が」「の」「を」「は」
「には」を、動詞句として「お願いします」「いらっし
ゃいますか」「どちらでしょうか」などを用意し、それ
ぞれの音素表記を記述する。文法情報格納装置３０２
は、各単語の接続規則と音素の接続規則を格納する。音
素モデル格納装置３０３は音素の音韻モデルを格納す
る。音韻のモデルとしては隠れマルコフモデルを用い
る。無音モデル格納装置３０４は無音の音韻モデルを格
納する。標準パターン生成装置３０５は文脈管理装置１
０６によって指定される認識候補を語彙情報格納装置３
０１、文法情報格納装置３０２、無音モデル格納装置３
０３、音素モデル格納装置３０４を参照して、無音モデ
ルを音韻モデル列の前後に接続した標準パターンを生成
する。The collation pattern generation device 103 generates a standard pattern of candidates to be recognized in advance according to an instruction from the context management device 106. At the beginning of the dialogue, it is expected that the user will say the name of the department. Therefore, for the department, "President's office", "Sales department", "Accounting department", etc. are doing. FIG.
A block diagram showing the internal configuration of the matching pattern generation device 103 is shown in FIG. The vocabulary information storage device 301 stores the vocabulary assumed to be spoken by the user and the phoneme notation of its pronunciation. In the case of company information, nouns such as department name, surname, first name,
Title, gender, particle as “ga” “no” “o” “ha”
Prepare "ni" and "please,""do you come,""why?" Grammar information storage device 302
Stores the connection rules for each word and the phoneme connection rules. The phoneme model storage device 303 stores a phoneme model of a phoneme. Hidden Markov models are used as phonological models. The silent model storage device 304 stores silent phoneme models. The standard pattern generation device 305 is the context management device 1
The recognition candidate designated by 06 is used as the vocabulary information storage device 3.
01, grammar information storage device 302, silence model storage device 3
03, referring to the phoneme model storage device 304, a standard pattern in which silent models are connected before and after the phoneme model sequence is generated.

【００１６】認識区間確定装置１０４は、音声信号記録
装置１０２が保持する入力パターン「（無音）えいぎょ
うぶの（無音）はっ（無音）たさんわ（無音）どちらで
しょうか（無音）」の中から、音声信号照合装置１０５
が認識すべき区間を確定する。図４に認識区間確定装置
１０４の内部構成を表すブロック図を示す。照合パター
ン生成装置１０３によって生成された標準パターンはパ
ターン生成装置４０１に入力され、その終端近傍のパタ
ーンが生成される。終端近傍のパターンとしては、助詞
の音韻モデルと無音のモデルを接続した部分を選択す
る。ワードスポッティング装置４０２は、生成された終
端近傍のパターンが入力パターンの各時点に存在する確
率を計算し、音声信号照合装置１０５が認識すべき音声
の終了点を確定する。この例では、ワードスポッティン
グ装置４０２は「の」を終端に持つ標準パターンが「え
いぎょうぶの（無音）」の位置で終了することを示す認
識終了点情報を出力する。The recognition section determining device 104 has the input pattern "(silence) which is held by the audio signal recording device 102. From inside, the voice signal matching device 105
Determines the section that should be recognized. FIG. 4 is a block diagram showing the internal configuration of the recognition section determining device 104. The standard pattern generated by the matching pattern generation device 103 is input to the pattern generation device 401, and a pattern near the end thereof is generated. As the pattern near the terminal end, a part in which the phonological model of the particle and the silent model are connected is selected. The word spotting device 402 calculates the probability that the generated pattern near the end exists at each time point of the input pattern, and determines the end point of the voice to be recognized by the voice signal matching device 105. In this example, the word spotting device 402 outputs the recognition end point information indicating that the standard pattern ending with “no” ends at the position of “no sound”.

【００１７】音声信号照合装置１０５は前回の認識終了
点から認識区間確定装置１０４が確定した認識終了点ま
でを認識対象として標準パターンとの一致度を計算す
る。図５に音声信号照合装置１０５の内部構成を表すブ
ロック図を示す。照合パターン絞り込み装置５０１は、
認識終了点情報を入力として複数の標準パターンの中か
ら実際に一致度を計算すべきものを選択する。この例で
は、「の」を終端に持つ標準パターン、すなわち、「社
長室の」「営業部の」「経理部の」の音韻モデルの、そ
れぞれの前後に無音の音韻モデルを接続した標準パター
ンが選択される。パターン照合装置５０２は、それらの
標準パターンと認識終了点情報で示される入力パターン
の一部「（無音）えいぎょうぶの（無音）」との一致度
を計算し、一致度の高いものから順に並べて出力する。The voice signal collating device 105 calculates the degree of coincidence with the standard pattern from the last recognition end point to the recognition end point confirmed by the recognition section confirming device 104 as a recognition target. FIG. 5 is a block diagram showing the internal configuration of the voice signal matching device 105. The matching pattern narrowing device 501 is
Using the recognition end point information as an input, the one for which the degree of coincidence is to be actually calculated is selected from among a plurality of standard patterns. In this example, a standard pattern ending with "no", that is, a standard pattern in which silent phonological models are connected before and after the phonological models of "President's office", "Sales department", and "Accounting department", respectively. To be selected. The pattern matching device 502 calculates the degree of coincidence between these standard patterns and a part of the input pattern indicated by the recognition end point information “(silence) Eigubu no (silence)”, and the matching degree is calculated in descending order. Output side by side.

【００１８】文脈管理装置１０６は認識結果を文脈を参
照して再順序づけを行い、照合パターン生成装置１０５
に次の認識で使用するために生成すべき語句を指示す
る。図６に文脈管理装置１０６の一実施例の構成を表す
ブロック図を示す。音声信号照合装置１０５によって順
位付けされたパターン認識結果は、認識結果修正装置６
０３が認識履歴格納装置６０１と対象世界情報格納装置
６０２が保持する内容と比較し、一貫性のある認識候補
ほど順位が高く修正される。この例では、対象世界情報
格納装置６０２には、（表１）に示す人事に関する知識
と、受付における標準的な対話手順に関する知識が記述
される。認識結果修正装置６０３は、認識履歴格納装置
６０１を参照し、例えば、すでに部署が認識されている
ときは、部署の認識候補の順位を下げる。修正された認
識結果は認識履歴格納装置６０１に記録され、認識候補
生成装置６０４は、認識履歴格納装置６０１と対象世界
情報格納装置６０２を参照して、次に発話される内容を
予測し、認識候補生成情報を出力する。例えば、部署名
として「営業部の」が認識結果として得られた場合は、
名前の候補として「小川さんを」「淵さんを」「八田さ
んを」「田上さんを」「川田さんを」「三沢さんを」
「鶴田さんを」を標準パターンとして生成するように照
合パターン生成装置１０５に指示を与える。The context management device 106 reorders the recognition results with reference to the context, and the matching pattern generation device 105
Instruct a word to be generated for use in the next recognition. FIG. 6 is a block diagram showing the configuration of an embodiment of the context management device 106. The pattern recognition results ranked by the voice signal matching device 105 are the recognition result correction device 6
03 is compared with the contents held in the recognition history storage device 601 and the target world information storage device 602, and the more consistent the recognition candidate is, the higher the rank is corrected. In this example, in the target world information storage device 602, knowledge about personnel affairs shown in (Table 1) and knowledge about standard dialogue procedures at reception are described. The recognition result correction device 603 refers to the recognition history storage device 601 and, for example, when the department is already recognized, lowers the rank of the recognition candidates of the department. The corrected recognition result is recorded in the recognition history storage device 601, and the recognition candidate generation device 604 refers to the recognition history storage device 601 and the target world information storage device 602 to predict the content to be uttered next and recognize the content. Output the candidate generation information. For example, if "Sales Department" is obtained as the recognition result as the department name,
"Mr. Ogawa""Mr.Fuchi""Mr.Hatta""Mr.Tagami""Mr.Kawata""Mr.Misawa"
The collation pattern generation device 105 is instructed to generate "Mr. Tsuruta" as a standard pattern.

【００１９】なお、本発明は、ワードスポッティングを
行うべき短い音韻列を、音声認識を行う分野に応じて適
当に設定することで、あらゆる分野で高精度の音声認識
を行うことができる。According to the present invention, the short phoneme sequence to be word-spotted is appropriately set according to the field of speech recognition, so that highly accurate speech recognition can be performed in all fields.

【００２０】[0020]

【発明の効果】本装置により、多くの単語を認識しなけ
れならない場合でも、計算量の増加が緩やかで、かつ高
い精度で音声認識が可能になる。また、文法的に意味の
ある認識範囲を確定するので語彙情報と文法情報の正則
性が高く、それらの記述量の増加も緩やかになる。According to the present invention, even when many words have to be recognized, the amount of calculation is moderately increased and the voice recognition can be performed with high accuracy. In addition, since the recognition range that has a grammatical meaning is determined, the regularity of the vocabulary information and the grammatical information is high, and the increase in the description amount of them is moderate.

[Brief description of drawings]

【図１】本発明の一実施例の音声対話型情報検索装置の
全体の構成を表すブロック図FIG. 1 is a block diagram showing the overall configuration of a voice interactive information search device according to an embodiment of the present invention.

【図２】同じくその図１の音声信号記録装置１０２の内
部構成を表すブロック図FIG. 2 is a block diagram showing an internal configuration of the audio signal recording device 102 of FIG.

【図３】同じくその図１の照合パターン生成装置１０３
の内部構成を表すブロック図FIG. 3 is a collation pattern generation device 103 of FIG.
Block diagram showing the internal configuration of

【図４】同じくその図１の認識区間確定装置１０４の内
部構成を表すブロック図FIG. 4 is a block diagram showing an internal configuration of the recognition section determination device 104 of FIG.

【図５】同じくその図１の音声信号照合装置１０５の内
部構成を表すブロック図5 is a block diagram showing the internal configuration of the audio signal matching device 105 of FIG.

【図６】同じくその図１の文脈管理装置１０６の内部構
成を表すブロック図6 is a block diagram showing an internal configuration of the context management device 106 of FIG.

[Explanation of symbols]

１０１音声入力装置１０２音声信号記録装置１０３照合パターン生成装置１０４認識区間確定装置１０５音声信号照合装置１０６文脈管理装置２０１情報圧縮装置２０２入力パターン記録装置３０１語彙情報格納装置３０２文法情報格納装置３０３音素モデル格納装置３０４無音モデル格納装置３０５標準パターン生成装置４０１終端パターン生成装置４０２ワードスポッティング装置５０１照合パターン絞り込み装置５０２パターン照合装置６０１認識履歴格納装置６０２対象世界情報格納装置６０３認識結果修正装置６０４認識候補生成装置 Reference Signs List 101 voice input device 102 voice signal recording device 103 collation pattern generation device 104 recognition section determination device 105 voice signal collation device 106 context management device 201 information compression device 202 input pattern recording device 301 vocabulary information storage device 302 grammar information storage device 303 phoneme model Storage device 304 Silence model storage device 305 Standard pattern generation device 401 End pattern generation device 402 Word spotting device 501 Matching pattern narrowing device 502 Pattern matching device 601 Recognition history storage device 602 Target world information storage device 603 Recognition result correction device 604 Recognition candidate generation apparatus

Claims

[Claims]

1. A voice input device for inputting a voice and outputting an electric signal, a voice signal recording device for recording the electric signal as an input pattern, and a standard pattern of the voice for collating with the input pattern. A collation pattern generation device, a recognition segment determination device that determines the segment of the input pattern to be collated with the standard pattern by inputting the standard pattern and the input pattern, and a range of the range designated by the recognition segment determination device. A dialogue between the voice signal matching device that calculates the degree of coincidence between the input pattern and the standard pattern and outputs the standard patterns arranged in the order of the highest degree of coincidence with the order of the standard patterns arranged in the order of the highest degree of coincidence According to the consistency with the context of the output, and outputs the speech, and predicts the next input voice to generate the matching pattern generation device. That the voice recognition apparatus characterized by comprising a context manager for controlling the standard pattern.

2. An audio signal recording device comprising an information compression device for compressing the information amount of an electric signal of an audio output from an audio input device, and an input pattern recording device for recording the output of the information compression device. The voice recognition device according to claim 1.

3. A matching pattern generation device, a vocabulary information storage device for holding a word constituting a speech to be recognized and a phoneme notation of the word, a grammar for holding a connection rule of the word and a connection rule of the phoneme. An information storage device, a phoneme model storage device that holds a phoneme model of the phoneme, a silence model storage device that holds a silent phoneme model, the vocabulary information storage device, the grammar information storage device, and the phoneme model storage device. 2. The voice recognition device according to claim 1, further comprising a standard pattern generation device for outputting a standard pattern instructed by the context management device with reference to information output by the silent model storage device.

4. The standard pattern generation device according to claim 1, wherein a silent model is connected before and after the phoneme model of the phrase that the context management device instructs to generate, and the standard pattern generation device outputs the standard pattern. Recognition device.

5. A recognition section determining device recognizes the existence of a terminal pattern generating device which inputs a standard pattern and outputs a pattern near the terminal end of the standard pattern, and the presence of a pattern near the terminal end of the standard pattern in the input pattern. 2. The voice recognition device according to claim 1, further comprising a word spotting device that outputs the type of pattern near the end and the position in the input pattern as recognition end point information.

6. The speech recognition apparatus according to claim 1, wherein the terminal pattern generation device generates a pattern in which a particle phoneme model and a silent phoneme model are connected in the vicinity of the end of the standard pattern.

7. A voice signal matching device receives recognition end point information output by a recognition section determining device and a standard pattern output by a matching pattern generating device and inputs a pattern near the end indicated by the recognition end point information. The matching pattern narrowing device that selects and outputs the standard pattern that it has, and the standard pattern that the matching pattern narrowing device outputs and the input pattern that the audio signal recording device outputs, are input, and the recognition is performed from the most recent recognition end position. The degree of coincidence between the input pattern up to the position of the pattern near the end indicated by the end point information and each of the standard patterns is calculated, and the standard patterns are arranged in the descending order of the degree of coincidence and output as a pattern recognition result. The voice recognition device according to claim 1, further comprising a pattern matching device.

8. A context management device, a recognition history management device for recording a recognition result up to a time point when recognition of an input voice is started, and a target world knowledge storage device for storing knowledge about a world in which the voice is emitted. , The pattern recognition that matches the context of the dialogue and the world targeted by the dialogue, with the output of the recognition history management device, the output of the target world knowledge storage device, and the pattern recognition result output by the voice signal matching device as inputs A pattern recognition result correction device that outputs the result recognition candidates in a higher rank, and adds the corrected pattern recognition result to the content held by the recognition history storage device, the output of the recognition history storage device, and the It is characterized by having a recognition candidate generation device for predicting a voice input next by inputting the output of the target world information storage device and outputting recognition candidate generation information. Speech recognition apparatus according to claim 1.