JP3058125B2

JP3058125B2 - Voice recognition device

Info

Publication number: JP3058125B2
Application number: JP9172067A
Authority: JP
Inventors: 正江森
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1997-06-27
Filing date: 1997-06-27
Publication date: 2000-07-04
Anticipated expiration: 2017-06-27
Also published as: JPH1124693A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、カーナビゲーシ
ョンシステムにおける音声を用いた地名の検索等に用い
ることのできる、音声認識装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition device which can be used for searching for a place name using voice in a car navigation system.

【０００２】[0002]

【従来の技術】音声認識の技術は、より自然で容易な入
力手段である音声を用いたマシンインタフェースを実現
させる技術である。音声による入力を行う場合、発声の
途中で意味的、あるいは生理的に短い無音区間（以下、
ポーズと称する。）を入れることが多い。このため、従
来、特開平６−２５９０９０号公報や特開平６−２０２
６８９号公報に示すように、話者にとり、より自然に音
声による入力が行えるよう、発声のポーズの時点で認識
結果の出力を行なうことにより、認識結果の確認が容易
にできる音声認識装置があった。2. Description of the Related Art A speech recognition technique is a technique for realizing a machine interface using speech as a more natural and easy input means. When performing voice input, a silent period that is semantically or physiologically short during the utterance (hereinafter, referred to as
Called a pose. ) Is often included. For this reason, conventionally, JP-A-6-259090 and JP-A-6-202
As disclosed in Japanese Patent Application Laid-Open No. 689, there is a speech recognition device that can easily confirm the recognition result by outputting the recognition result at the time of the pause of the utterance so that the speaker can input the sound more naturally. Was.

【０００３】例えば、特開平６−２５９０９０号公報に
示される音声認識装置は、連続音声を認識するにあた
り、ポーズを文の区切りの単位とし、ポーズが検出され
た時点で、それまでの認識結果の表示および、それに対
するアプリケーションの実行を行っている。発声が続い
た場合、継続して認識を行い、次のポーズが検出された
時点で、新たにそれまでの認識結果を表示している。For example, in a speech recognition apparatus disclosed in Japanese Patent Application Laid-Open No. 6-259090 , in recognizing continuous speech, a pause is used as a unit of a sentence break, and when a pause is detected, the recognition result up to that point is obtained. Display and execution of application for it. When the utterance continues, the recognition is continuously performed, and when the next pose is detected, a new recognition result is displayed.

【０００４】特開平６−２０２６８９号公報に示される
音声認識装置は、音声を認識するための語順を規定する
情報の中のポーズが出現する可能性のある部分に、それ
ぞれポーズの時間を設定し、発声中のポーズの時間が設
定時間を越えた場合、それまでの認識結果を提示する。
このとき、認識結果によって、それまでの認識結果を基
に不足情報の提示を行い、再入力を促すことが可能であ
る。A speech recognition apparatus disclosed in Japanese Patent Application Laid-Open No. 6-202689 sets a pause time in each of portions where a pause may appear in information defining a word order for recognizing a voice. When the pause time during utterance exceeds the set time, the recognition result up to that time is presented.
At this time, it is possible to present the missing information based on the recognition result based on the recognition result, and to prompt the user to input again.

【０００５】[0005]

【発明が解決しようとする課題】特開平６−２５９０９
０号公報に示される音声認識装置では、例えば、電子情
報通信学会によって１９８８年７月に出版された、電子
情報通信学会論文誌D Vo1.J71 No.9 pp1650-1659に掲載
されている、迫江らによって書かれた、「フレーム同
期、ビームサーチ、ベクトル量子化の統合による、ＤＰ
マッチングの高速化」（以下、参考文献１と称する）に
示されるような、有限状態オートマトンで表現された文
法に従って単語パターンを接続して連続音声を認識する
方法を用いている。この場合、全ての単語パターンを認
識対象として、最適単語列を探索することにより音声認
識を行うことになる。ポーズが検出されたことにより、
認識結果が出力されたあと、続いて発声した音声を認識
する場合、認識結果が得られた時点で既に認識されてい
る単語を含む文法全体を探索の対象として処理を行うた
め、単語の探索範囲を小さくすることができない。この
ため、認識結果が得られて確認された後も探索範囲が大
きいままであり、認識結果が得られたことによる演算量
と、認識性能の向上が期待できない。また、ポーズ後の
認識結果に言い直しの発声を許す場合、言い直しを含む
全ての単語の組合せが探索の対象となるので、探索範囲
が大きくなり多くの計算量が必要となり、かつ、認識性
能を高くできないという欠点がある。さらに、参考文献
１に示されるような、全体として最適な単語列を認識結
果とする方式を用いている場合、ポーズが検出された時
点で認識結果が出力され既に結果が確認されている認識
結果の単語でも、全ての発話が終了した時点で全体とし
て最適な単語列を与える単語に変化してしまう可能性が
ある。Problems to be Solved by the Invention Japanese Patent Laid-Open No. 6-25909
The speech recognition device disclosed in Japanese Patent Publication No. 0 , for example, published in IEICE Transactions D Vo1.J71 No.9 pp1650-1659 published in July 1988 by "DP by integrating frame synchronization, beam search, and vector quantization
A method of recognizing continuous speech by connecting word patterns according to a grammar expressed by a finite state automaton as shown in "Speeding Up Matching" (hereinafter referred to as Reference Document 1). In this case, speech recognition is performed by searching for an optimum word string with all word patterns as recognition targets. By detecting the pose,
When recognizing a uttered voice after the recognition result is output, the entire grammar including the word already recognized at the time when the recognition result is obtained is processed as a search target, so the search range of the word Cannot be reduced. For this reason, even after the recognition result is obtained and confirmed, the search range remains large, and it is not expected that the amount of calculation and the recognition performance due to the obtained recognition result are improved. In addition, in the case of allowing the utterance of the restatement in the recognition result after the pause, all combinations of words including the restatement are searched, so that the search range becomes large, a large amount of calculation is required, and the recognition performance is increased. Has the disadvantage that it cannot be increased. Furthermore, in the case of using a method that uses an overall optimal word string as a recognition result as shown in Reference 1, a recognition result is output when a pause is detected, and the recognition result already confirmed. May change to a word that gives an optimal word string as a whole when all the utterances are finished.

【０００６】また、特開平６−２０２６８９号公報に示
される音声認識装置では、認識結果によって再入力を促
すことを行っているが、例えば、再入力に対する文法を
新たに用意する必要がある。In the speech recognition apparatus disclosed in Japanese Patent Application Laid-Open No. 6-202689, re-entry is prompted based on the result of recognition. For example, it is necessary to newly prepare a grammar for re-entry.

【０００７】本発明の目的は、文の区切りで出力された
認識結果に続けて発声したり、言い直しを発声する際
に、演算量を少なくし、さらに認識精度を向上し、且つ
すでに確認された結果が変化しないという使いやすい音
声認識装置を提供することである。SUMMARY OF THE INVENTION An object of the present invention is to reduce the amount of computation, improve recognition accuracy, and improve recognition accuracy when uttering or rephrasing subsequent to a recognition result output at a sentence break. It is an object of the present invention to provide an easy-to-use speech recognition device in which the result does not change.

【０００８】[0008]

【問題を解決する手段】第１の発明の音声認識装置は、
認識対象の単語に関する単語情報と各単語の接続関係の
情報と文が始まる可能性のある接続点である１つ以上の
開始点情報と文が区切れる可能性のある接続点である１
つ以上の休止点とを含む辞書情報を保持する辞書記録部
と、辞書情報をもとにあらかじめ指定された１つ以上の
開始点から、入力された音声に対する認識処理を行い休
止点までの認識結果を出力する音声認識部と、認識結果
に従って休止点に対応して次の認識処理の開始点を指定
する開始点制御部とを有することを特徴とする。[MEANS FOR SOLVING THE PROBLEMS] A speech recognition apparatus according to the first invention comprises:
Word information on the word to be recognized, information on the connection relationship between the words, one or more pieces of start point information at which the sentence may start, and a connection point 1 at which the sentence may be separated
A dictionary recording unit that holds dictionary information including one or more pause points, and performs recognition processing on input speech from one or more start points specified in advance based on the dictionary information, and performs recognition up to the pause points A speech recognition unit for outputting a result, and a start point control unit for designating a start point of a next recognition process corresponding to a pause point according to the recognition result are provided.

【０００９】第２の発明の音声認識装置は、第１の発明
の音声認識装置において、認識結果に従って休止点を次
の認識処理に対する開始点として指定する開始点制御部
とを有することを特徴とする。A speech recognition apparatus according to a second aspect of the present invention is the speech recognition apparatus according to the first aspect, wherein a pause point is set next according to a recognition result.
And a start point control unit for designating a start point for the recognition processing.

【００１０】第３の発明の音声認識装置は、第１の発明
の音声認識装置において、認識結果に従って休止点に対
応して次の認識処理の開始点、および、休止点から文の
先頭に至る単語の接続点に含まれる開始点を、次の認識
処理に対する開始点として指定する開始点制御部とを有
することを特徴とする。A speech recognition apparatus according to a third aspect of the present invention is the speech recognition apparatus according to the first aspect, wherein the speech recognition apparatus detects a pause point according to a recognition result .
Accordingly, a start point control unit for designating a start point of a next recognition process and a start point included in a connection point of a word from a pause point to the beginning of a sentence as a start point for the next recognition process. It is characterized by.

【００１１】[0011]

【発明の実施の形態】本発明の音声認識装置は、例え
ば、参考文献１に示されるような、有限状態オートマト
ンで表現された文法に従い標準パターンを接続して連続
音声を認識する方法において、認識を開始する単語を制
御することにより、続きの発声および、言い直しの発声
の場合に効率よく音声を入力することができるようにす
るものである。ここで、有限状態オートマトンに対し、
接続されている任意の単語から認識を開始することを指
定でき、任意の単語での認識の終了を指定することがで
きるとする。また、本発明の音声認識装置では、例えば
認識の単位に、特開平６−２５９０９０号公報で示され
る３００ミリ秒以上の無音区間であるポーズで区切られ
た区間とすることができる。図３は、有限状態オートマ
トンで表現された文法にしたがった標準パターンの接続
の一例である。実線は単語単位の標準パターンをあらわ
す。標準パターンの先頭を認識を開始することのできる
開始点４０１〜４０４とし、最後尾を認識を終了するこ
とができる休止点４０５〜４１１とする。また、各単語
は、破線で表された接続情報で接続されている。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS A speech recognition apparatus according to the present invention is a method for recognizing continuous speech by connecting standard patterns in accordance with a grammar represented by a finite state automaton as shown in Reference Document 1, for example. Is controlled so that a speech can be efficiently input in the case of a subsequent utterance and a restatement utterance. Here, for a finite state automaton,
It is assumed that recognition can be started from any connected word, and the end of recognition at any word can be specified. In the speech recognition apparatus of the present invention, for example, the unit of recognition may be a section divided by a pause, which is a silent section of 300 milliseconds or longer as disclosed in JP-A-6-259090 . FIG. 3 is an example of connection of standard patterns according to a grammar expressed by a finite state automaton. Solid lines represent standard patterns in word units. The beginning of the standard pattern is defined as start points 401 to 404 at which recognition can be started, and the tail is defined as pause points 405 to 411 at which recognition can be completed. Each word is connected by connection information represented by a broken line.

【００１２】次に、図３を用いて本発明の音声認識装置
の、ポーズ後の発声が、ポーズ前の発声の続きの内容で
ある場合の認識の方法を従来の方法と比較しながら説明
する。例えば、ポーズによって、第１発声「神奈川県横
浜市」と、第２発声「港北区」に分かれている発声が入
力された場合、まず、認識処理の結果、第１発声の認識
結果「神奈川県横浜市」と、それに対応する休止点４０
６が出力される。続いて、第２発声が入力された場合の
処理について説明する。従来の有限状態オートマトンに
従って認識を行う音声認識装置で、第２発声の認識を行
う場合、第１発声と同様、開始点４０１〜４０４の全て
を認識の開始点として指定する方法を用いることができ
る。これにより、再度認識を開始することにより、第２
発声を認識できる。ただし、この場合は常に全ての単語
の探索を行う必要があるため、演算量が減ることがな
い。Next, referring to FIG. 3, a description will be given of a method of recognizing the speech recognition apparatus of the present invention when the utterance after the pause is the continuation of the utterance before the pause, in comparison with the conventional method. . For example, if an utterance divided into a first utterance “Yokohama-shi, Kanagawa” and a second utterance “Kohoku-ku” is input by a pause, first, as a result of the recognition processing, the recognition result of the first utterance “Kanagawa-ken” Yokohama City ”and the corresponding pause point 40
6 is output. Subsequently, a process when the second utterance is input will be described. When performing recognition of a second utterance by a conventional speech recognition apparatus that performs recognition according to a finite state automaton, a method of designating all of the start points 401 to 404 as start points of recognition can be used, similarly to the first utterance. . Thus, by starting the recognition again, the second
Can recognize utterances. However, in this case, since it is necessary to always search all words, the amount of calculation does not decrease.

【００１３】一方、本発明における開始点制御部は、最
初に文の先頭の開始点４０１を指定し、音声認識部によ
って上記のような第１発声の認識結果が出力された後、
出力された休止点４０６の情報を基に、休止点４０６の
接続先である開始点４０３を新たな開始点として指定す
る。これにより、開始点４０３以降の文法のみ探索を行
うことになるので、探索範囲を従来より小さくすること
ができるので、演算量も減らすことができる。さらに、
認識対象の候補数が減るので、認識性能も向上する可能
性がある。また、本発明では、第１発声の認識結果「神
奈川県横浜市」は、第２発声の認識結果が「港北区」や
「金沢区」のいずれでも、「神奈川県横浜市」の認識結
果は変化せずに保存される。一方、従来の有限オートマ
トンに従って認識を行う音声認識装置では、「神奈川県
横浜市港北区」の発声全体が「神奈川県川崎市中原区」
に近い場合、第１発声の認識結果である「神奈川県横浜
市」は「神奈川県川崎市」に変化してしまうので、「神
奈川県横浜市」の認識結果が生かされない。On the other hand, the start point control unit in the present invention first designates the start point 401 of the head of the sentence, and after the speech recognition unit outputs the above-described first utterance recognition result,
Based on the output information of the pause point 406, the start point 403 to which the pause point 406 is connected is designated as a new start point. As a result, only the grammar after the start point 403 is searched, so that the search range can be made smaller than before, and the amount of calculation can be reduced. further,
Since the number of recognition target candidates is reduced, the recognition performance may be improved. Further, in the present invention, the recognition result of the first utterance “Yokohama-shi, Kanagawa” indicates that the recognition result of the second utterance is “Kohoku-ku” or “Kanazawa-ku”. Stored unchanged. On the other hand, in a conventional speech recognition device that performs recognition according to a finite state automaton, the entire utterance of "Kohoku-ku, Yokohama-shi, Kanagawa" is converted to "Nakahara-ku, Kawasaki-shi, Kanagawa".
In the case where is close to, the recognition result of the first utterance, "Yokohama City, Kanagawa Prefecture" is changed to "Kawasaki City, Kanagawa Prefecture", so that the recognition result of "Yokohama City, Kanagawa Prefecture" is not utilized.

【００１４】次に、図３を用いて本発明の第３の発明の
音声認識装置の、ポーズ後の発声が、ポーズ前の発声の
言い直しの内容である場合の認識の方法を、従来の方法
と比較しながら説明する。例えば、第１発声「神奈川県
横浜市」の後ポーズをおいて、第２発声に「横浜市」の
言い直しである「川崎市」が入力された場合、先ず第１
発声の認識結果「神奈川県横浜市」に対応する休止点４
０６が出力される。続いて、第２発声が入力された場
合、開始点４０１〜４０４の全てを認識の開始点として
指定するような従来の方法では、全ての単語の探索を行
うことにより言い直しを認識することも可能である。た
だし、演算量が減ることがない。Next, referring to FIG. 3, a method of recognizing a speech recognition device according to a third invention of the present invention in the case where the utterance after the pause is a restatement of the utterance before the pause will be described. This will be described in comparison with the method. For example, if a pause after the first utterance “Yokohama-shi, Kanagawa” is input and “Kawasaki-shi”, which is a restatement of “Yokohama-shi”, is input in the second utterance, first,
Pause point 4 corresponding to the recognition result of utterance "Yokohama-shi, Kanagawa"
06 is output. Subsequently, when the second utterance is input, in the conventional method in which all of the start points 401 to 404 are designated as the start points of recognition, rephrasing can be recognized by searching for all words. It is possible. However, the amount of calculation does not decrease.

【００１５】一方、本発明における第２開始点制御部で
は、最初に文の先頭の開始点１０１を認識の開始点とし
て指定し、音声認識部によって上記のような第１発声の
認識結果が出力された場合、休止点４０６の接続先であ
る開始点４０３に加えて、文の始まりから休止点４０６
に至る単語列である単語の開始点４０２と、開始点４０
２の接続元の単語の開始点４０１が新たな認識の開始点
として指定される。これにより、開始点４０４から開始
される単語など、指定された開始点以外の探索を行う必
要がないため、探索範囲を従来より小さくすることがで
きる。これにより、演算量も減らすことができ、認識性
能も向上する。On the other hand, in the second start point control unit according to the present invention, the start point 101 at the head of the sentence is first designated as the start point of recognition, and the recognition result of the first utterance is output by the speech recognition unit. In this case, in addition to the start point 403 to which the pause point 406 is connected, the pause point 406
Start point 402 of the word which is a word string leading to
The start point 401 of the connection source word 2 is designated as a new recognition start point. As a result, it is not necessary to search for a word other than the designated start point such as a word starting from the start point 404, so that the search range can be made smaller than before. As a result, the amount of calculation can be reduced, and the recognition performance can be improved.

【００１６】[0016]

【実施例】次に、本発明による実施例を図面を用いて説
明する。図１は、本発明の音声認識装置の第１の実施例
の形態を示すブロック図である。第１の実施例の音声認
識装置は、音声認識部１０１と、辞書記録部１０２と、
第１開始点制御部１０３と、から構成されている。辞書
記録部１０２は、登録された複数の認識対象の単語の音
響的特徴を表す標準パターンと、各単語の接続情報と開
始点情報と休止点情報が記録されている。各単語の標準
パターンや接続情報と休止情報と開始点情報の１例とし
て、先に説明した図３のような有限状態オートマトンで
表現された文法を用いることにする。音声認識部１０１
は、入力信号Ｓから、特開平６−２５９０９０号公報に
示されるようにパワー情報を用いて音声区間とポーズを
検出し、音声区間を辞書記憶部１０２に記録されている
情報を用いて、参考文献１のような、有限状態オートマ
トンで記述された文法に従う連続音声認識を行う。ポー
ズが検出された場合、単語列と、それに対応する休止点
と、その休止点の接続先の開始点を認識結果Ｒとして出
力する。第１開始点制御部１０３は、音声認識部１０１
から出力された認識結果Ｒ中の休止点を基に、認識結果
Ｒ中の休止点の接続先の開始点に設定し、この設定され
た開始点を、次の発声を認識するための開始点情報Ａと
して出力する。Next, an embodiment according to the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing a first embodiment of the speech recognition apparatus of the present invention. The speech recognition apparatus according to the first embodiment includes a speech recognition unit 101, a dictionary recording unit 102,
And a first starting point control unit 103. The dictionary recording unit 102 records a standard pattern representing acoustic features of a plurality of registered words to be recognized, connection information, start point information, and pause point information of each word. As an example of the standard pattern of each word, the connection information, the pause information, and the start point information, the grammar represented by the finite state automaton as described above with reference to FIG. 3 will be used. Voice recognition unit 101
Detects a voice section and a pause from the input signal S using power information as shown in JP-A-6-259090 , and converts the voice section into a reference Continuous speech recognition is performed according to a grammar described by a finite state automaton as in Document 1. When a pause is detected, a word string, a corresponding pause point, and a start point of a connection destination of the pause point are output as a recognition result R. The first start point control unit 103 includes a voice recognition unit 101
Is set as the start point of the connection destination of the pause point in the recognition result R, based on the pause point in the recognition result R output from, and the set start point is used as the start point for recognizing the next utterance. Output as information A.

【００１７】次に動作を説明する。例えば、第１発声
「神奈川県横浜市」と第２発声「港北区」が、発声の間
にポーズを挿入して発声されたとする。最初、開始点情
報Ａには、文の始まりの開始点４０１が指定されてい
る。音声認識部１０１は、第１発声「神奈川県横浜市」
の入力信号Ｓが入力され、開始点情報Ａである開始点４
０１から単語の探索を行う。ポーズが検出された場合、
類似度が最大の単語列「神奈川県横浜市」とそれに対応
する休止点４０６と休止点４０６の接続先の開始点４０
３を認識結果Ｒとして出力する。第１開始点制御部１０
３は、認識結果Ｒが入力され、認識結果Ｒ中の休止点４
０６の接続先である開始点４０３を第２発声の音声認識
の単語の探索の開始点に指定し、開始点情報Ａとして出
力する。音声認識部１０１は、開始点４０３から、第２
発声の単語の探索を行う。これにより、認識結果Ｒの単
語「港北区」が得られる。Next, the operation will be described. For example, it is assumed that the first utterance “Yokohama-shi, Kanagawa” and the second utterance “Kohoku-ku” are uttered by inserting a pause between utterances. First, the start point information A specifies a start point 401 at the beginning of a sentence. The voice recognition unit 101 outputs the first utterance “Yokohama City, Kanagawa Prefecture”
Input signal S is input, and the starting point 4 which is the starting point information A
Search for a word from 01. If a pose is detected,
The word string having the highest similarity “Yokohama-shi, Kanagawa”, the corresponding pause point 406, and the start point 40 to which the pause point 406 is connected
3 is output as the recognition result R. First starting point control unit 10
Reference numeral 3 denotes a pause point in the recognition result R when the recognition result R is input.
The start point 403 which is the connection destination of 06 is designated as the start point of the search for the word of the second utterance voice recognition, and is output as start point information A. From the start point 403, the speech recognition unit 101
Search for uttered words. Thus, the word “Kohoku-ku” of the recognition result R is obtained.

【００１８】以下、本発明に対する第２の実施例を図面
を用いて説明する。図２は、第２の実施例の音声認識装
置を示すブロック図である。第２の実施例の音声認識装
置は、音声認識部１０１と、辞書記録部１０２と、第２
開始点制御部２０１と、から構成されている。音声認識
部１０１と、辞書記録部１０２は、第１実施例と共通で
ある。第２開始点制御部２０１は、辞書記録部１０２を
参照し、次の発声を認識するための開始点を認識結果Ｒ
中の休止点の接続先と、同休止点の単語の開始点と、同
休止点の単語の接続元の単語の開始点と、更に文の始ま
りまでさかのぼった開始点全てを、開始点情報Ａとして
出力する。Hereinafter, a second embodiment of the present invention will be described with reference to the drawings. FIG. 2 is a block diagram illustrating a voice recognition device according to the second embodiment. The speech recognition apparatus according to the second embodiment includes a speech recognition unit 101, a dictionary recording unit 102, a second
And a start point control unit 201. The voice recognition unit 101 and the dictionary recording unit 102 are common to the first embodiment. The second start point control unit 201 refers to the dictionary recording unit 102 and determines a start point for recognizing the next utterance by the recognition result R.
The connection destination of the middle pause point, the start point of the word at the same pause point, the start point of the connection source word of the same pause point, and all the start points that have been traced back to the beginning of the sentence are represented by the start point information A. Output as

【００１９】次に動作を説明する。例えば、第１発声
「神奈川県横浜市」と第２発声「川崎市」が、発声の間
にポーズを挿入して発声されたとする。最初、開始点情
報Ａには、文の始まりの開始点４０１が指定されてい
る。音声認識部１０１は、第１発声「神奈川県横浜市」
の入力信号Ｓが入力され、開始点情報Ａである開始点４
０１から単語の探索を行う。ポーズが検出された場合、
類似度が最大の単語列である「神奈川県横浜市」とそれ
に対応する休止点４０６とを認識結果Ｒとして出力す
る。第２開始点制御部２０１は、辞書記録部１０２を参
照し、認識結果Ｒ中の休止点４０６の接続先である開始
点４０３と、休止点４０６の単語「横浜市」の開始点４
０２と、休止点４０６の単語「横浜市」の接続元の単語
「神奈川県」の開始点４０１を、第２発声の音声認識の
単語の探索の開始点に指定し、開始点情報Ａとして出力
する。音声認識部１０１は、開始点４０１〜４０３か
ら、第２発声の単語の探索を行う。これにより、認識結
果Ｒの単語「川崎市」が得られる。Next, the operation will be described. For example, it is assumed that the first utterance “Yokohama-shi, Kanagawa” and the second utterance “Kawasaki-shi” are uttered with a pause inserted between utterances. First, the start point information A specifies a start point 401 at the beginning of a sentence. The voice recognition unit 101 outputs the first utterance “Yokohama City, Kanagawa Prefecture”
Input signal S is input, and the starting point 4 which is the starting point information A
Search for a word from 01. If a pose is detected,
The word string having the highest similarity, “Yokohama City, Kanagawa Prefecture” and the corresponding pause point 406 are output as the recognition result R. The second start point control unit 201 refers to the dictionary recording unit 102, and refers to the start point 403 to which the pause point 406 in the recognition result R is connected and the start point 4 of the word “Yokohama” of the pause point 406.
02 and the starting point 401 of the word "Kanagawa", which is the connection source of the word "Yokohama" of the pause point 406, is designated as the starting point of the search for the second speech-recognition word, and is output as the starting point information A. I do. The speech recognition unit 101 searches for a word of the second utterance from the start points 401 to 403. As a result, the word “Kawasaki City” of the recognition result R is obtained.

【００２０】以上、本発明による実施例を説明したが、
上記で用いることのできる音声認識方法は、参考文献１
で用いられているＤＰマッチングに限定されるものでは
なく、株式会社オーム社で出版された株式会社国際電気
通信基礎技術研究所編の「ＡＴＲ先端テクノロジーシリ
ーズ自動翻訳電話」（以下参考文献２と称する）の２０
〜３８ページに示されるようなＨＭＭによる方法等も用
いることができる。また、単語の接続情報は、上記に限
定されるものではなく、参考文献２の４９〜６８ページ
に示されるような、文脈自由文法等の文法等によって
も、表現できる。さらに、上記では、認識の単位にポー
ズによって区切られた区間を用いたが、ワードスポティ
ング等の方法によっての抽出された単語列を認識の単位
に用いることもできる。本発明による第２実施例による
音声認識装置の第２開始点制御部２０１では、認識結果
Ｒの休止点から、文の始まりまでさかのぼって全ての開
始点を開始点情報Ａとして選択しているが、例えば１つ
前の単語迄のように定められた１部に限定することもで
きる。The embodiment according to the present invention has been described above.
The speech recognition method that can be used in the above is described in Reference 1
The ATR Advanced Technology Series Automatic Translation Telephone edited by the International Telecommunications Research Institute, Inc. (hereinafter referred to as Reference 2) ) Of 20
３８38, a method using an HMM or the like can also be used. Further, the word connection information is not limited to the above, and can be expressed by a grammar such as a context-free grammar as shown on pages 49 to 68 of Reference 2. Further, in the above description, a section divided by a pause is used as a unit of recognition, but a word string extracted by a method such as word spotting may be used as a unit of recognition. In the second start point control unit 201 of the speech recognition apparatus according to the second embodiment of the present invention, all the start points are selected as the start point information A from the pause point of the recognition result R to the beginning of the sentence. For example, it is also possible to limit the number of copies to one set up to the previous word.

【００２１】[0021]

【発明の効果】以上に説明したように、本発明によれば
ポーズ等の後に、次の発声の認識を行うために、ポーズ
の前の認識結果の続きの発声の認識に必要な開始点や、
ポーズの前の認識結果に対する言い直しの発声の認識に
必要な開始点を指定することによって、ポーズの後の発
声の認識を行う際の単語の探索範囲が減少するため、演
算量が減り、認識性能を向上させることができる。ま
た、続きの認識を行う場合、既に確認された認識結果は
変化しないので、使いやすい音声認識装置を提供でき
る。As described above, according to the present invention, in order to recognize the next utterance after a pause or the like, the starting point or the starting point necessary for the recognition of the utterance following the recognition result before the pause is made. ,
By specifying the starting point required for restatement utterance recognition for the recognition result before the pause, the search range of words when performing utterance recognition after the pause is reduced. Performance can be improved. In addition, when performing subsequent recognition, the already confirmed recognition result does not change, so that an easy-to-use voice recognition device can be provided.

[Brief description of the drawings]

【図１】本発明の第１の実施例を説明するためのブロッ
ク図である。FIG. 1 is a block diagram for explaining a first embodiment of the present invention.

【図２】本発明の第２の実施例を説明するためのブロッ
ク図である。FIG. 2 is a block diagram for explaining a second embodiment of the present invention.

【図３】有限状態オートマトンの一例を説明するための
図FIG. 3 is a diagram illustrating an example of a finite state automaton.

[Explanation of symbols]

１０１音声認識部１０２辞書記録部１０３第１開始点制御部２０１第２開始点制御部Ｓ入力信号Ｒ認識結果Ａ開始点情報 Reference Signs List 101 speech recognition unit 102 dictionary recording unit 103 first start point control unit 201 second start point control unit S input signal R recognition result A start point information

フロントページの続き (56)参考文献特開平６−202689（ＪＰ，Ａ) 特開昭59−62899（ＪＰ，Ａ) 特開昭61−240296（ＪＰ，Ａ) 特開昭61−245198（ＪＰ，Ａ) 特開平４−310000（ＪＰ，Ａ) 特開平７−104782（ＪＰ，Ａ) 特開平７−121192（ＪＰ，Ａ) 特開平10−78961（ＪＰ，Ａ) 特公平７−1437（ＪＰ，Ｂ２) 特公平７−43599（ＪＰ，Ｂ２) 情報処理学会研究報告［自然言語処理］Ｖｏｌ．91，Ｎｏ．80，ＮＬ−85− ７，「逐次的解析における音声情報の利用」ｐ．49−56（1991／９／20) 情報処理学会研究報告［音声言語情報処理］Ｖｏｌ．95，Ｎｏ．51，ＳＬＰ− ６−５，「自然言語発話の言語現象と音声認識用日本語文法」ｐ．27−34, （1995／５／25) ＩＣＯＴ研究速報ＴＭ−0489「韻律情報を用いた音声会話文の文構造推定方式」，ｐ．１−33，Ａｐｒｉｌ 1988 ＩＣＯＴ研究速報ＴＭ−1123「逐次的解析における音声情報の利用」，ｐ. １−10，Ｏｃｔｏｂｅｒ 1991 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 15/18 G10L 15/00 G10L 15/28 ＪＩＣＳＴファイル（ＪＯＩＳ)Continuation of front page (56) References JP-A-6-202689 (JP, A) JP-A-59-62899 (JP, A) JP-A-61-240296 (JP, A) JP-A-61-245198 (JP, A) JP-A-4-310000 (JP, A) JP-A-7-104782 (JP, A) JP-A-7-121192 (JP, A) JP-A-10-78961 (JP, A) 7-1437 (JP, B2) JP 7-43599 (JP, B2) Information Processing Society of Japan Research Report [Natural Language Processing] Vol. 91, No. 80, NL-85-7, “Use of speech information in sequential analysis” p. 49-56 (1991/9/20) Information Processing Society of Japan Research Report [Speech and Language Information Processing] Vol. 95, No. 51, SLP-6-5, “Language phenomena of natural language utterances and Japanese grammar for voice recognition” p. 27-34, (May 25, 1995) ICOT Research Bulletin TM-0489, "Method of estimating sentence structure of spoken dialogue sentences using prosodic information", p. 1-33, April 1988 ICOT research bulletin TM-1123, "Use of speech information in sequential analysis", p. 1-10, October 1991 (58) Fields investigated (Int. Cl. ⁷ , DB name) G10L 15 / 18 G10L 15/00 G10L 15/28 JICST file (JOIS)

Claims

(57) [Claims]

1. A connection in which a sentence may be separated from word information on a word to be recognized, information on a connection relationship between the words, and information on one or more start points which are connection points where a sentence may start. A dictionary recording unit that holds dictionary information including information of one or more pause points that are points, and a recognition process for speech input from one or more start points specified in advance based on the dictionary information. A speech recognition unit that outputs a recognition result up to a pause point, and outputs a recognition result to the pause point according to the recognition result .
And a start point control unit for designating a start point of the next recognition process .

2. The speech recognition device according to claim 1, further comprising: a start point control unit that specifies the pause point as a start point for a next recognition process according to the recognition result.

3. A speech recognition apparatus according to claim 1, wherein a start point of a next recognition process corresponding to said pause point and a word from said pause point to the beginning of a sentence are determined in accordance with said recognition result. A speech recognition apparatus comprising: a start point control unit that specifies a start point included in a connection point as a start point for a next recognition process.

4. A start point for the next recognition process, wherein the start point control unit determines a start point coincident with the pause point and a start point included in a connection point of a word from the pause point to the beginning of a sentence. The speech recognition device according to claim 2, further comprising a start point control unit designated as:

5. The dictionary information includes word information including a voice pattern related to a word to be recognized, position information indicating a connection position of the word, and one or more connection points at which a sentence may start. 2. The speech recognition apparatus according to claim 1, comprising: dictionary information including start point information and one or more pause point information that is a connection point at which a sentence may be separated.

6. The speech recognition according to claim 1, wherein the start point corresponding to the pause point is a start point of a plurality of pieces of the word information which may be connected to one pause point of arbitrary word information. apparatus.

7. The start point according to claim 2, wherein the start point coincident with the pause point is a start point selected from a plurality of pieces of word information connected to one pause point of arbitrary word information. Voice recognition device.