JP4680714B2

JP4680714B2 - Speech recognition apparatus and speech recognition method

Info

Publication number: JP4680714B2
Application number: JP2005225877A
Authority: JP
Inventors: 剛井上; 純幸沖本; 洋九津見; 貴史續木
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 2005-08-03
Filing date: 2005-08-03
Publication date: 2011-05-11
Anticipated expiration: 2025-08-03
Also published as: JP2007041319A

Description

本発明は、入力された音声を音声認識辞書を用いて認識し、認識結果によりシステム状態を遷移させて対話を行う音声認識装置に関する。 The present invention relates to a speech recognition apparatus that recognizes input speech using a speech recognition dictionary and performs a dialog by changing a system state based on a recognition result.

一般的な音声認識の方法の一つとして、予め認識辞書内に登録された語彙を表す種々の音響パターンとユーザから入力された音声信号とを比較することでスコアを計算し、最も類似したパターンを示す認識辞書内語彙を認識結果の候補とする手法がある。 As one of the general speech recognition methods, the most similar pattern is calculated by comparing the various acoustic patterns representing the vocabulary registered in the recognition dictionary in advance with the speech signal input from the user. There is a method in which a vocabulary in a recognition dictionary indicating a recognition result candidate is used.

このような一般的な音声認識方法では、多くのユーザに対して高い認識精度を実現するために作成された音響モデルを用いて音声認識を行っているため、ユーザによっては一般的な音響モデルが適応していないために、認識精度が低くなり、誤認識を多く起こしてしまう場合が生じる。 In such a general speech recognition method, speech recognition is performed using an acoustic model created in order to achieve high recognition accuracy for many users. Since it is not adapted, recognition accuracy becomes low, and a lot of misrecognitions occur.

また、従来においては、例えば認識スコアを利用して認識信頼度を計算し、最も類似した認識候補が得られた場合も、認識信頼度に基づいてリジェクトを行い、再入力を促す手法がある。このようなリジェクト機能を設けることで、例えば音声以外の雑音が入力された場合に誤認識が発生してシステムが誤動作するのを防ぐことができる。 In addition, conventionally, for example, there is a technique for calculating a recognition reliability using a recognition score, and rejecting based on the recognition reliability even when the most similar recognition candidate is obtained and prompting re-input. By providing such a reject function, it is possible to prevent the system from malfunctioning due to erroneous recognition when, for example, noise other than speech is input.

このような音声認識方法では、得られた認識信頼度が所定の値より低い場合、リジェクトなどの機能により誤認識・誤システム動作を防ぐことができるが、反面、通常の音声入力に対しても認識信頼度が低い場合にはリジェクトを行うので、ユーザによっては、特定の認識可能な語彙に対して誤ってリジェクトされてしまう場合が生じる。 In such a speech recognition method, when the obtained recognition reliability is lower than a predetermined value, it is possible to prevent erroneous recognition and erroneous system operation by a function such as reject, but on the other hand, even for normal speech input Since the rejection is performed when the recognition reliability is low, a specific recognizable vocabulary may be erroneously rejected depending on the user.

そこで、このような誤認識・誤リジェクトの対策として、音声認識に利用している一般不特定話者向けの音響モデルを現在のユーザの音響モデルに適応するため、ユーザ自身の発声を用いて再学習させる（話者適応・話者学習）ことで認識精度を向上させる方法や、リジェクトが行われた際にユーザの再発声時の認識精度を向上させる方法が提案されている。 Therefore, as a countermeasure against such misrecognition / rejection, in order to adapt the acoustic model for general unspecified speakers used for speech recognition to the current user's acoustic model, the user's own speech is used again. There are proposed a method of improving recognition accuracy by learning (speaker adaptation / speaker learning) and a method of improving recognition accuracy at the time of recurrence of a user when a rejection is made.

例えば、話者適応・話者学習の方法としては、少数の音声を用いて音響モデルを学習し、さらに誤認識される単語については話者学習を行う方法（例えば、特許文献１参照）が開示されている。一方、再発声時の認識精度を向上させる方法としては、言い直しだと判定した場合には前回と今回の両認識候補を用いて認識結果を定める手法（例えば、特許文献２参照）や、言い直しの発声に対しては前回の認識結果の上位候補を認識対象語彙とする手法（例えば、特許文献３参照）が開示されている。
特開２００３−１６２２９２号公報特開平１１−１４９２９４号公報特許第３１１２０３７号 For example, as a method of speaker adaptation / speaker learning, a method of learning an acoustic model using a small number of voices and further performing speaker learning for a misrecognized word is disclosed (for example, see Patent Document 1). Has been. On the other hand, as a method for improving the recognition accuracy at the time of recurrence voice, when it is determined that it is rephrased, a method of determining a recognition result using both the previous and current recognition candidates (see, for example, Patent Document 2), For correction utterances, a technique (for example, refer to Patent Document 3) is disclosed in which the top candidate of the previous recognition result is a recognition target vocabulary.
JP 2003-162292 A JP-A-11-149294 Japanese Patent No. 3112037

上記のような従来の方法では、少数の学習用発声でユーザの音響モデルを学習可能とする工夫や、再発声時の認識候補や認識対象語彙を変化させることで認識精度を向上させる工夫がされている。 In the conventional methods as described above, a device that can learn the user's acoustic model with a small number of learning utterances, or a device that improves recognition accuracy by changing the recognition candidates and recognition target vocabulary at the time of recurrence is made. ing.

しかしながら、これらの学習による話者適応では、ユーザに適したモデルを学習させる際に少数とはいえ、操作とは直接関係の無い単語を一定量ユーザに発声させるため、ユーザの負担は少なくない。また、再入力時の認識精度向上方法では、再発声時の認識精度は上がるものの、再び前回リジェクトされた発声と同じ発声をユーザが行ったときはやはりリジェクトされてしまい、その度に再発声を行わなくてならない。 However, in speaker adaptation based on these learnings, a small amount of words that are not directly related to the operation are uttered by the user when learning a model suitable for the user, but there is a considerable burden on the user. In addition, the recognition accuracy improvement method at the time of re-input increases the recognition accuracy at the time of recurrence, but when the user utters the same utterance as the previous utterance again, it will be rejected again, Must be done.

例えば、特許文献１では、初めに少数ではあるが話者適応用の学習発声をユーザに促し、さらに度々誤認識する単語については誤認識を起こす部分の発声を話者に促し、その入力を基に話者学習を行うが、ユーザに余分な発声を促すためユーザの負担を増やしてしまう。また、特許文献２では、言い直しと検出された場合に前回の認識結果を含め出力する認識候補を調整するが、前回リジェクトされた発声と同じ発声が入力されたときにリジェクトされてしまい、その度に再発声を行わなくてならない。また、特許文献３では、リジェクトされた次の認識は前回の上位候補のみを認識対象語彙として認識を行うが、特許文献２と同様、前回リジェクトされた発声と同じ発声が入力されたときに正しく認識できない。 For example, in Japanese Patent Application Laid-Open No. 2003-228867, the user is first encouraged to learn speech for adaptation to a speaker, but for words that are frequently misrecognized, the user is prompted to utter the part that causes misrecognition. However, it increases the burden on the user in order to encourage the user to speak extra. Also, in Patent Document 2, when a rewording is detected, the recognition candidates to be output including the previous recognition result are adjusted. However, when the same utterance as the previously rejected utterance is input, the recognition is rejected. I have to repeat my voice every time. Further, in Patent Document 3, the next recognition rejected recognizes only the previous top candidate as a recognition target vocabulary. However, as in Patent Document 2, when the same utterance as the previously rejected utterance is input, I can't recognize it.

そこで、本発明はこのような従来の課題を解決するためになされたものであって、ユーザに学習用の特別な発声を要求することなく、ユーザの負担が少なく自然に音声認識の個人適応を行うことができ、かつ誤認識を減らすことできる音声認識装置および音声認識方法を提供することを目的とする。 Therefore, the present invention has been made to solve the above-described conventional problems, and does not require a special utterance for learning from the user, so that the user's burden is reduced and the individual adaptation of voice recognition is naturally performed. An object of the present invention is to provide a speech recognition apparatus and a speech recognition method that can be performed and that can reduce erroneous recognition.

上記目的を達成するため、本発明に係る音声認識装置は、入力された音声を認識し、認識結果に応じて、ユーザとの対話に関するシステムの状態であるシステム状態を遷移させて、対話を行う音声認識装置であって、音声認識辞書を用いて、入力された音声を認識して、認識結果を出力する音声認識手段と、前記音声認識手段の認識結果により前記システム状態を遷移させて応答を行う対話制御手段と、今回の認識結果で前記システム状態が先に進まず停滞している状態である停滞状態から脱出したか否かを判定するとともに、前記停滞状態から脱出したと判定した場合、今回の認識結果が言い直しおよび言い換えの少なくとも１つであるか否かを判定する停滞脱出判定手段と、前記言い直しまたは言い換えであると判定された場合、対話制御に関する設定としてリジェクトの閾値の変更と、音声認識に関する設定の変更として前記音声認識辞書への新規追加または変更との少なくとも１つを行う変更制御手段とを備えることを特徴とする。 In order to achieve the above object, a speech recognition apparatus according to the present invention recognizes an input speech and performs a dialog by changing a system state that is a system state related to a dialog with a user according to a recognition result. a speech recognition apparatus using the voice recognition dictionary, and recognizing the input speech, a speech recognition means for outputting a recognition result, the recognition result by the by transitioning the system state of the voice recognition means responsive When it is determined that the system state has escaped from the stagnation state where the system state has not progressed further and stays stagnant as a result of the current recognition, and has escaped from the stagnation state Stagnation escape determination means for determining whether or not the current recognition result is at least one of rephrasing and paraphrasing; And changes in the reject threshold as settings for, characterized by comprising a change control means for performing at least one of the newly added or changed to the speech recognition dictionary as a change of configuration for voice recognition.

本発明に係る音声認識装置および音声認識方法によれば、ユーザの発声の特徴とシステムの音声認識用パラメータや音声認識辞書の不適合を解消するために、学習用の特別な発声を要求するのではなく、一度の言い直しまたは言い換えにて正しく認識された結果を利用してユーザに適した学習を行うため、ユーザにとって自然で負担の少ない音声認識の個人適応を行うことができる。さらに、音声認識の個人適応を行うので、次からは前回誤認識した発声と同様の発声を行っても正しく認識が可能となるため、誤認識が減ることにより、円滑な音声操作を実現することができる。 According to the speech recognition apparatus and the speech recognition method according to the present invention, a special utterance for learning is required in order to eliminate the mismatch between the features of the utterance of the user and the parameters for speech recognition of the system and the speech recognition dictionary. In addition, since learning suitable for the user is performed using the result correctly recognized by re-phrase or paraphrase once, it is possible to perform personal adaptation of voice recognition that is natural and less burdensome for the user. In addition, since personal recognition of voice recognition is performed, it is possible to recognize correctly even if the utterance is the same as the previously mistaken utterance from the next time, so that smooth voice operation can be realized by reducing the misrecognition. Can do.

本発明の実施の形態に係る音声認識装置は、入力された音声を認識し、認識結果により対話を行う音声認識装置であって、入力された音声を音声認識辞書を用いて認識して認識結果を出力する音声認識手段と、前記音声認識手段の認識結果によりシステム状態を遷移させて応答を行う対話制御手段と、今回の認識結果で前記システム状態が先に進まず停滞している状態である停滞状態から脱出したか否かを判定するとともに、前記停滞状態から脱出したと判定した場合、今回の認識結果が言い直しおよび言い換えの少なくとも１つであるか否かを判定する停滞脱出判定手段と、前記言い直しまたは言い換えであると判定された場合、対話制御に関する設定および音声認識に関する設定の少なくとも１つを変更する変更制御手段とを備えることを特徴とする。 A speech recognition apparatus according to an embodiment of the present invention is a speech recognition apparatus that recognizes an input speech and performs a dialogue based on a recognition result, and recognizes the input speech using a speech recognition dictionary and recognizes the result. Is a state where the system state does not advance further and is stagnant by the current recognition result. A stagnation escape determining means for determining whether or not the vehicle has escaped from the stagnation state and determining whether or not the current recognition result is at least one of rephrasing and paraphrasing when it is determined that the vehicle has escaped from the stagnation state; And a change control means for changing at least one of a setting relating to dialogue control and a setting relating to voice recognition when it is determined that the rephrase or paraphrase is made. And features.

これによって、通常の音声操作の中で、ユーザ適応を随時行っていくため、ユーザ適応のために特別な発声が必要なく、ユーザにとって自然で負担の少ない音声認識の個人適応を行うことができる。さらに、音声認識の個人適応を行うので、次からは前回誤認識した発声と同様の発声を行っても正しく認識が可能となるため、誤認識が減ることにより、円滑な音声操作を実現することができる。 As a result, user adaptation is performed as needed during normal voice operations, so that special utterance is not necessary for user adaptation, and it is possible to perform personal adaptation of voice recognition that is natural and less burdensome for the user. In addition, since personal recognition of voice recognition is performed, it is possible to recognize correctly even if the utterance is the same as the previously mistaken utterance from the next time, so that smooth voice operation can be realized by reducing the misrecognition. Can do.

また、前記システム状態の停滞状態は、前記音声認識結果のリジェクトによる同一システム状態が続く状態であり、前記停滞脱出判定手段は、今回の認識結果が前回の認識結果と同一単語である場合、言い直しであると判定し、今回の認識結果が前回の認識結果と同一単語では無いが、あらかじめ定められた同じシステム動作を実行する認識単語である場合、言い換えであると判定してもよい。 Further, the stagnation state of the system state is a state in which the same system state continues due to the rejection of the voice recognition result, and the stagnation escape judging means says when the current recognition result is the same word as the previous recognition result. If it is determined to be corrected and the current recognition result is not the same word as the previous recognition result, but is a recognition word that executes the same predetermined system operation, it may be determined to be paraphrased.

また、前記システム状態の停滞状態は、２つのシステム状態の往復が繰り返し続く状態であり、前記停滞脱出判定手段は、今回の認識結果が前々回の認識結果と同一単語である場合、言い直しであると判定し、今回の認識結果が前々回の認識結果と同一単語では無いが、あらかじめ定められた同じシステム動作を実行する認識単語である場合、言い換えであると判定してもよい。 In addition, the stagnation state of the system state is a state in which the reciprocation of two system states continues repeatedly, and the stagnation escape determination means is rephrased when the current recognition result is the same word as the previous recognition result. If the current recognition result is not the same word as the previous recognition result, but is a recognition word that executes the same predetermined system operation, it may be determined as a paraphrase.

これによって、誤ってリジェクトされることによる音声操作の停滞および誤って認識されることによる音声操作の停滞が減ることになり、円滑な音声操作が実現できる。 As a result, the stagnation of the voice operation due to erroneous rejection and the stagnation of the voice operation due to erroneous recognition are reduced, and a smooth voice operation can be realized.

前記変更制御手段は、前記対話制御に関する設定の変更としてリジェクトの閾値の変更を行い、前記音声認識に関する設定の変更として前記音声認識辞書への新規追加または変更を行ってもよい。これによって、リジェクション精度及び認識精度向上が可能となり、ユーザにとって負担が少ない音声認識の個人適応と円滑な各種音声操作を実現することができる。 The change control means may change a rejection threshold as a change in the setting related to the dialog control, and may newly add or change the voice recognition dictionary as a change in the setting related to the voice recognition. Thereby, the rejection accuracy and the recognition accuracy can be improved, and personal adaptation of voice recognition and smooth various voice operations can be realized with less burden on the user.

また、前記変更制御手段は、前記リジェクトの閾値を認識対象単語ごとに設定し変更してもよい。これによって、認識対象単語ごとの個人適応が可能となり、よりユーザにとって負担が少ない音声認識の個人適応と円滑な各種音声操作を実現することができる。 Further, the change control means may set and change the rejection threshold for each recognition target word. Thereby, personal adaptation for each recognition target word is possible, and it is possible to realize personal adaptation of voice recognition and smooth various voice operations with less burden on the user.

また、前記変更制御手段は、前記リジェクトの閾値、および、前記音声認識辞書への新規追加または変更を、ユーザごとに設定してもよい。これによって、複数のユーザが利用しても適切な適応が可能となり、よりユーザにとって負担が少ない音声認識の個人適応と円滑な各種音声操作を実現することができる。 Further, the change control means may set the rejection threshold and new addition or change to the speech recognition dictionary for each user. As a result, appropriate adaptation is possible even when used by a plurality of users, and personal adaptation of voice recognition and smooth various voice operations with less burden on the user can be realized.

また、前記音声認識装置は、さらに、前記停滞状態から脱出した際に、今回の認識結果が前回の認識結果と同一単語では無く、かつあらかじめ定められた同じシステム動作を実行する認識単語でない場合、今回の認識結果の省略語を作成する省略語作成手段を備え、前記音声認識手段は、前記省略語を用いて前回の認識結果を再認識し、前記変更制御手段は、前記音声認識手段の再認識結果に応じて前記省略語を前記音声認識辞書へ新規追加してもよい。これによって、省略語をユーザが利用しても適切な適応が可能となり、よりユーザにとって負担が少ない音声認識の個人適応と円滑な各種音声操作を実現することができる。 In addition, when the voice recognition device further escapes from the stagnation state, the current recognition result is not the same word as the previous recognition result and is not a recognition word that executes the same predetermined system operation, An abbreviation creation means for creating an abbreviation for the current recognition result is provided, wherein the speech recognition means re-recognizes the previous recognition result using the abbreviation, and the change control means re-reads the speech recognition means. The abbreviation may be newly added to the speech recognition dictionary according to the recognition result. Accordingly, even if the user uses the abbreviation, appropriate adaptation is possible, and it is possible to realize personal adaptation of voice recognition and smooth various voice operations with less burden on the user.

なお、本発明は、このような音声認識装置として実現することができるだけでなく、このような音声認識装置が備える特徴的な手段をステップとする音声認識方法として実現したり、それらのステップをコンピュータに実行させるプログラムとして実現したりすることもできる。そして、そのようなプログラムは、ＣＤ−ＲＯＭ等の記録媒体やインターネット等の伝送媒体を介して配信することができるのは言うまでもない。 The present invention can be realized not only as such a speech recognition apparatus, but also as a speech recognition method using steps characteristic of the speech recognition apparatus, or by performing these steps as a computer. It can also be realized as a program to be executed. Needless to say, such a program can be distributed via a recording medium such as a CD-ROM or a transmission medium such as the Internet.

以下、本発明の各実施の形態について、それぞれ図面を参照しながら説明する。 Embodiments of the present invention will be described below with reference to the drawings.

（実施の形態１）
図１は、本発明の実施の形態１に係る音声認識装置を備えた音声対話型情報検索システムの構成を示すブロック図である。 (Embodiment 1)
FIG. 1 is a block diagram showing a configuration of a voice interactive information search system provided with a voice recognition device according to Embodiment 1 of the present invention.

音声対話型情報検索システムは、音声を入力して対話しながら情報を検索するためのシステムであり、図１に示すように音声認識部１０１、音声認識辞書１０２、音声認識パラメータ記憶部１０３、停滞脱出判定部１０４、対話制御部１０５、対話履歴記憶部１０６、システム仕様記憶部１０７、データベース検索部１０８、データベース記憶部１０９、応答音声・画面出力部１１０、およびタイマー１１１を備えている。 The voice interactive information search system is a system for searching for information while inputting a voice, and as shown in FIG. 1, the voice recognition unit 101, the voice recognition dictionary 102, the voice recognition parameter storage unit 103, the stagnation An escape determination unit 104, a dialog control unit 105, a dialog history storage unit 106, a system specification storage unit 107, a database search unit 108, a database storage unit 109, a response voice / screen output unit 110, and a timer 111 are provided.

音声認識部１０１は、音声認識辞書１０２および音声認識パラメータ記憶部１０３を用いて、ユーザより入力された音声の音声認識を行い、認識結果を出力する。音声認識辞書１０２は、認識対象語彙が登録されている辞書である。音声認識パラメータ記憶部１０３は、音声認識用パラメータを記憶している。 The speech recognition unit 101 performs speech recognition of speech input by the user using the speech recognition dictionary 102 and the speech recognition parameter storage unit 103, and outputs a recognition result. The speech recognition dictionary 102 is a dictionary in which recognition target words are registered. The voice recognition parameter storage unit 103 stores voice recognition parameters.

対話制御部１０５は、予めシステムの開発者によって決められた動作仕様に従って対話を制御し、ユーザからの入力に対し次のシステム状態を決定する。具体的には、対話制御部１０５は、音声認識部１０１より入力される音声認識結果、停滞脱出判定部１０４より入力される停滞脱出か否かの判定結果、対話履歴記憶部１０６より入力される現在および過去の対話履歴を利用してシステム仕様記憶部１０７から次のシステム状態を決定する。また、対話制御部１０５は、必要があれば音声認識辞書１０２や音声認識パラメータ１０３の変更、およびデータベース検索をデータベース検索部１０８に要求する。なお、システム状態とはシステムの開発者によって決められたシステムの動作仕様におけるシステムの一状態を示す。 The dialogue control unit 105 controls the dialogue in accordance with operation specifications determined in advance by the system developer, and determines the next system state in response to an input from the user. Specifically, the dialogue control unit 105 receives a voice recognition result input from the voice recognition unit 101, a determination result as to whether or not a stagnation escape is input from the stagnation escape determination unit 104, and a dialogue history storage unit 106. The next system state is determined from the system specification storage unit 107 using the current and past dialog history. Further, the dialogue control unit 105 requests the database search unit 108 to change the speech recognition dictionary 102 and the speech recognition parameter 103 and search the database if necessary. The system state indicates one state of the system in the system operation specification determined by the system developer.

停滞脱出判定部１０４は、対話制御部１０５より入力される現在と過去のユーザの認識結果等の情報を用いてシステムの状態遷移が停滞状態から脱出したか否かを判定する。対話履歴記憶部１０６は、対話制御部１０５から入力される音声認識結果やシステムの出力（出力画面情報・出力応答情報）結果など各システム状態における様々な情報を保存する。システム仕様記憶部１０７は、開発者によってあらかじめ決められたシステムの動作仕様を記憶している。 The stagnation escape determination unit 104 determines whether or not the system state transition has escaped from the stagnation state using information such as the current and past user recognition results input from the dialogue control unit 105. The dialogue history storage unit 106 stores various information in each system state such as a voice recognition result input from the dialogue control unit 105 and a system output (output screen information / output response information) result. The system specification storage unit 107 stores system operation specifications predetermined by the developer.

データベース検索部１０８は、対話制御部１０５からの情報検索要求に対し、データベース記憶部１０９にあるデータベースから検索を行う。データベース記憶部１０９は、データベース検索部１０８の検索対象データベースを格納している。応答音声・画面出力部１１０は、対話制御部１０５より入力されるシステム状態に応じた画面や応答音声を出力する。タイマー１１１は、対話制御部１０５の要求により現時刻を対話制御部１０５に出力する。 In response to the information search request from the dialog control unit 105, the database search unit 108 searches from the database in the database storage unit 109. The database storage unit 109 stores a search target database of the database search unit 108. The response voice / screen output unit 110 outputs a screen and response voice corresponding to the system state input from the dialogue control unit 105. The timer 111 outputs the current time to the dialogue control unit 105 in response to a request from the dialogue control unit 105.

次に、上記のように構成された音声対話型情報検索システムにおいて、番組情報を検索する際の具体的な動作について説明する。図２は音声対話型情報検索システムでの対話全体の動作の流れを示すフローチャートである。 Next, a specific operation when searching for program information in the voice interactive information search system configured as described above will be described. FIG. 2 is a flowchart showing the flow of the entire dialogue in the voice interactive information retrieval system.

対話制御部１０５は、対話開始のシステム状態を決定し、決定したシステム状態での画面と応答音声を決定し、応答音声・画面出力部１１０から出力することで、ユーザに対して入力要求を行う（ステップＳ２０１）。図３は具体的な出力画面例を示すである。ここでは、例えば図３に示すように番組情報を検索する際のメニュー画面が出力され、エージェントの吹き出しの内容３０１が応答音声として音声出力される。なお、吹き出し自体も画面表示してもよい。また、この例では図３における認識可能な語彙は四角で囲まれた語彙のみであるとする。例えば、四角「１．番組名検索」３０２を選択するのに認識可能な語彙としては「１番」「１」「番組名検索」「１．番組名検索」であるとする。 The dialog control unit 105 determines the system state of the dialog start, determines the screen and response voice in the determined system state, and outputs the response voice / screen output unit 110 to make an input request to the user. (Step S201). FIG. 3 shows a specific output screen example. Here, for example, as shown in FIG. 3, a menu screen for searching for program information is output, and the content 301 of the balloon of the agent is output as a response voice. The balloon itself may be displayed on the screen. In this example, it is assumed that the vocabulary that can be recognized in FIG. 3 is only the vocabulary surrounded by a square. For example, suppose that “1”, “1”, “program name search”, and “1. program name search” are vocabulary that can be recognized for selecting the square “1. program name search” 302.

音声認識部１０１は、システムからの応答音声・画面による入力要求に対しするユーザからの入力音声の認識処理を行う（ステップＳ２０２）。より詳細には、まず、対話制御部１０５は、音声認識部１０１に現在のシステム状態で認識可能な語彙の通知と音声認識処理実行の要求を行う。より具体的には、図３に示すシステム状態においては、音声認識部１０１は四角で囲まれた語彙を認識対象語彙として音声認識処理を開始する。次に、音声認識部１０１は、ユーザの入力音声に対して認識処理を行い、対話制御部１０５に対し、認識結果を出力する。ここで、出力される認識結果は、ユーザの発声に最も近い認識対象語彙の単語だけではなく、認識に関する後に記述するような詳細な情報も含め出力する。 The voice recognition unit 101 performs a process of recognizing an input voice from a user in response to an input request by a response voice / screen from the system (step S202). More specifically, first, the dialogue control unit 105 requests the speech recognition unit 101 to notify a vocabulary that can be recognized in the current system state and to execute speech recognition processing. More specifically, in the system state shown in FIG. 3, the speech recognition unit 101 starts speech recognition processing using a vocabulary surrounded by a square as a recognition target vocabulary. Next, the voice recognition unit 101 performs a recognition process on the user input voice and outputs a recognition result to the dialogue control unit 105. Here, the output recognition result includes not only the word of the recognition target vocabulary closest to the user's utterance but also detailed information as described later regarding the recognition.

図４および図５は出力される認識結果の具体的な例を示す図であり、図４は１位の認識結果を中心とした音声認識の全体的な情報を示しており、図５は他の候補も含めた認識結果の情報を示している。ここで、項目４０１は認識結果が出力された日時であり、項目４０２は入力された音声の区間、即ち音声認識部１０１が認識処理を行っていた区間のうち音声であると判断したで区間である。項目４０３は認識対象語彙の中で最も近いと判断された単語、即ち認識結果の候補が１位の単語であり、項目４０４は音声認識辞書とは関係なく音響的に近いカナ文字を認識結果とした文字列であり、一般には音声タイプライタの結果と呼ばれるものである。項目４０５は入力音声区間の中で認識結果の単語がマッチングした区間である。項目４０６は認識度合を示す認識スコアであり、スコアが高い方がより認識度合が高いことを示している。項目４０７は認識信頼度を示し、どの程度認識結果が妥当かを示している。認識信頼度は一般的には、認識候補のスコアの差や音声タイプライタと認識候補の差などを用いて計算する場合が多い。項目４０８はリジェクト用閾値であり、音声認識パラメータ記憶部１０２に記憶されている変数である。 4 and 5 are diagrams showing specific examples of the output recognition results. FIG. 4 shows the overall information of the speech recognition centering on the first recognition result, and FIG. The recognition result information including the candidates is shown. Here, the item 401 is the date and time when the recognition result is output, and the item 402 is the input speech section, that is, the section that is determined to be speech among the sections that the speech recognition unit 101 has performed the recognition process. is there. Item 403 is the word that is determined to be the closest in the recognition target vocabulary, that is, the word with the highest recognition result candidate, and item 404 is the acoustic result that is the kana character that is acoustically independent of the speech recognition dictionary. This character string is generally called the result of an audio typewriter. An item 405 is a section in which words of recognition results are matched in the input speech section. An item 406 is a recognition score indicating a recognition degree, and a higher score indicates a higher recognition degree. An item 407 indicates the recognition reliability and indicates how much the recognition result is appropriate. In general, the recognition reliability is often calculated using a difference between recognition candidate scores or a difference between a speech typewriter and a recognition candidate. An item 408 is a threshold value for rejection, and is a variable stored in the voice recognition parameter storage unit 102.

対話制御部１０５は、このリジェクト閾値と認識信頼度との比較を行いシステムとして認識結果を受け入れるか否かの判定を行う。具体的にはリジェクト閾値より認識信頼度が低い場合、対話制御部１０５は認識結果をリジェクト、即ち入力結果として処理せず、再度同じシステム状態での入力を促す。例えば、図４の例では認識信頼度が「４．５」でリジェクト閾値が「３．５」であるので、対話制御部１０５はこの認識結果「番組名検索」をシステムへの入力として認め対話制御を行う。なお、このリジェクト閾値は、予めシステム開発者が決定しても良いし、評価実験を行うことにより決定してもよい。具体的には何人かの被験者にこの辞書セットの単語を発声させ、その結果を基に決定してもよい。 The dialogue control unit 105 compares the rejection threshold value with the recognition reliability, and determines whether or not to accept the recognition result as a system. Specifically, when the recognition reliability is lower than the reject threshold, the dialogue control unit 105 rejects the recognition result, that is, does not process it as an input result, and prompts input in the same system state again. For example, in the example of FIG. 4, since the recognition reliability is “4.5” and the rejection threshold is “3.5”, the dialogue control unit 105 recognizes this recognition result “program name search” as an input to the system and has a dialogue. Take control. The rejection threshold may be determined in advance by the system developer or by performing an evaluation experiment. Specifically, some subjects may utter words in this dictionary set and may be determined based on the result.

また、図５において、項目５０１は認識候補の認識スコアのよってソートされた結果の認識候補順位であり、項目５０２から項目５０５は各認識候補の情報であり、その内容は図４で説明した認識結果１位の結果の情報と同じである。 Also, in FIG. 5, item 501 is the recognition candidate rank as a result of sorting by the recognition scores of the recognition candidates, items 502 to 505 are information of each recognition candidate, and the contents thereof are the recognition described in FIG. The result is the same as the information of the first result.

対話制御部１０５は、ステップＳ２０２で音声認識部１０１から入力された音声認識結果と対話履歴記憶部１０６に蓄積されている前回の認識結果を停滞脱出判定部１０４に出力する。 The dialogue control unit 105 outputs the speech recognition result input from the speech recognition unit 101 in step S202 and the previous recognition result stored in the dialogue history storage unit 106 to the stagnation escape determination unit 104.

次に、停滞脱出判定部１０４は、今回の入力が停滞の脱出であるか否かの判定を行い、その結果を対話制御部１０５に出力する（ステップＳ２０３）。対話制御部１０５は、この結果を対話履歴記憶部１０６に書き込む。 Next, the stagnation escape determination unit 104 determines whether or not the current input is a stagnation escape and outputs the result to the dialogue control unit 105 (step S203). The dialogue control unit 105 writes this result in the dialogue history storage unit 106.

ここで、停滞脱出判定部１０４における停滞脱出判定動作について、音声認識の誤リジェクトによる停滞を例に取り、詳細に説明する。図６は、停滞脱出判定部１０４における停滞脱出判定動作の流れを示すフローチャートである。 Here, the stagnation / escape determination operation in the stagnation / escape determination unit 104 will be described in detail with an example of stagnation due to erroneous recognition of speech recognition. FIG. 6 is a flowchart showing the flow of the stagnation / escape determination operation in the stagnation / escape determination unit 104.

まず、停滞脱出判定部１０４は、今回の音声認識結果および前回の認識結果を取得する（ステップＳ６０１）。そして、その音声認識結果に基づいてリジェクトか否かの判定を行う（ステップＳ６０２）。この判定の結果、リジェクトと判定した場合（ステップＳ６０２でＹＥＳ）、停滞脱出判定部１０４は停滞脱出でないという判定結果を出力する。これは、リジェクトとは認識結果の信頼度が低いため認識結果として採用されないということであるので、その場合は次のシステム状態へ進まない状態、即ち停滞からの脱出ではないためである。 First, the stagnation escape determination unit 104 acquires the current speech recognition result and the previous recognition result (step S601). Then, based on the voice recognition result, it is determined whether or not it is rejected (step S602). As a result of this determination, when it is determined to be rejected (YES in step S602), the stagnation escape determination unit 104 outputs a determination result that the stagnation escape is not performed. This is because reject means that the reliability of the recognition result is low and is not adopted as the recognition result. In this case, the state does not advance to the next system state, that is, does not escape from the stagnation.

一方、リジェクトでないと判定した場合（ステップＳ６０２でＮＯ）は、対話履歴から前回の発声がリジェクトであったか否かの判定を行う（ステップＳ６０３）。この判定の結果、前回の発声がリジェクトでないと判定した場合（ステップＳ６０３でＮＯ）は、前回の発声においては停滞が発生していないため、停滞脱出判定部１０４は今回の発声は停滞の脱出ではないという判定結果を出力する。 On the other hand, if it is determined not to be rejected (NO in step S602), it is determined from the dialog history whether the previous utterance was rejected (step S603). As a result of this determination, if it is determined that the previous utterance is not rejected (NO in step S603), no stagnation has occurred in the previous utterance, so that the stagnation escape determination unit 104 determines that the current utterance is not in stagnation. The judgment result that there is no is output.

一方、前回の発声をリジェクトと判定した場合（ステップＳ６０３でＹＥＳ）、前回の発声によりシステムは停滞状態であったことを示すため、停滞脱出判定部１０４は、今回の発声により停滞状態から脱出できたという判定し、言い直しであるか否かの判定を行う（ステップＳ６０４）。ここでのいい直しとは、前回の発声と今回の発声が同じであることを意味する。例えば、ユーザが図３のような出力画面において「番組名検索」と発声し、リジェクとされて再度入力を促されたときにもう一度「番組名検索」と発声する場合などである。この言い直し判定は、前回の認識結果と今回の認識結果とを比較することで行い、言い直しであると判定した場合（ステップＳ６０４でＹＥＳ）、停滞脱出判定部１０４は言い直しによる停滞の脱出であるという判定結果を出力する。 On the other hand, when it is determined that the previous utterance is rejected (YES in step S603), the stagnation escape determination unit 104 can escape from the stagnation state by the current utterance to indicate that the system is in a stagnation state due to the previous utterance. It is determined whether or not it is rephrased (step S604). Retouching here means that the previous utterance and the current utterance are the same. For example, the user may say “Search program name” on the output screen as shown in FIG. 3 and say “Search program name” again when the user is rejected and prompted to input again. This re-statement determination is performed by comparing the previous recognition result with the current recognition result. If it is determined that the re-statement is made (YES in step S604), the stagnation escape determination unit 104 escapes the stagnation due to re-statement. The determination result that is is output.

一方、言い直しでないと判定した場合（ステップＳ６０４でＮＯ）は、停滞脱出判定部１０４は、言い換えであるか否かの判定を行う（ステップＳ６０４）。ここでの言い換えとは、前回の発声と今回の発声が発声語彙は異なるが、発声内容が同じ、即ち発声によるシステム動作が同じ発声を意味する。例えば、ユーザが図３のような出力画面において「番組名検索」と発声し、リジェクトされて再入力を促されたときに「１番」と発声する場合などである。この言い換えの判定は、言い直しの判定と同様に前回の認識結果と今回の認識結果の比較を行うことで判定を行う。より具体的には、前回の認識結果と今回の認識結果との語彙が異なり、且つシステム仕様として認識結果が同じ動作を実行する語彙であれば言い換えであると判定する。システム仕様として認識結果が同じ動作か否かの判定は、システム仕様記憶部１０７に定義される各システム仕様により判定する。具体的には、システム仕様記憶部１０７には図７に示されるように、認識結果として受け付ける語彙とその語彙を受け付けたときどの状態に遷移するかが記憶されており、ここで一つの選択可能項目に対応する単語を言い換え対象語として扱う。 On the other hand, when it is determined that it is not rephrasing (NO in step S604), the stagnation escape determination unit 104 determines whether or not it is paraphrasing (step S604). The paraphrase here means a utterance in which the utterance vocabulary is different between the previous utterance and the current utterance, but the utterance contents are the same, that is, the system operation by the utterance is the same. For example, the user may say “Search program name” on the output screen as shown in FIG. 3 and say “No. 1” when rejected and prompted to re-input. This paraphrasing determination is performed by comparing the previous recognition result with the current recognition result in the same manner as the rewording determination. More specifically, if the vocabulary between the previous recognition result and the current recognition result is different and the vocabulary in which the recognition result performs the same operation as the system specification, it is determined that the words are paraphrased. Whether the recognition result is the same operation as the system specification is determined by each system specification defined in the system specification storage unit 107. Specifically, as shown in FIG. 7, the system specification storage unit 107 stores a vocabulary to be accepted as a recognition result and a state to be changed to when the vocabulary is accepted, and can be selected here. The word corresponding to the item is treated as a paraphrase target word.

この判定の結果、言い換えであると判定した場合（ステップＳ６０５でＹＥＳ）、停滞脱出判定部１０４は、言い換えによる停滞脱出であるという判定結果を出力する。一方、言い換えでないと判定した場合（ステップＳ６０５でＮＯ）、停滞脱出判定部１０４は停滞脱出ではないという判定結果を出力する。 As a result of this determination, when it is determined that the paraphrase is made (YES in step S605), the stagnation escape determination unit 104 outputs a determination result that the stagnation escape is due to paraphrase. On the other hand, when it is determined that it is not paraphrasing (NO in step S605), the stagnation escape determination unit 104 outputs a determination result that is not stagnation escape.

以上のように、停滞脱出判定部１０４は停滞脱出判定の動作を行う。
次に、対話制御部１０５は、停滞脱出判定処理（ステップＳ２０３）までに得られている音声認識結果および停滞脱出判定結果に基づいて、音声認識辞書やリジェクト閾値、音響モデルといった音声認識パラメータの変更を行う（ステップＳ２０４）。 As described above, the stagnation escape determination unit 104 performs the stagnation escape determination operation.
Next, the dialogue control unit 105 changes the speech recognition parameters such as the speech recognition dictionary, the rejection threshold, and the acoustic model based on the speech recognition result and the stagnation escape determination result obtained until the stagnation escape determination process (step S203). Is performed (step S204).

次に、対話制御部１０５は、認識結果に基づいて次のシステム状態と、このシステム状態における応答音声および画面の出力について決定し、応答音声・画面出力部１１０に出力する（ステップＳ２０５）。ここで必要であれば、対話制御部１０５は、データベース検索部１０８に対しデータベース記憶部１０９からのデータの検索を要求した結果を応答音声・画面出力部１１０に出力する。 Next, the dialogue control unit 105 determines the next system state and the response voice and screen output in this system state based on the recognition result, and outputs them to the response voice / screen output unit 110 (step S205). If necessary, the dialog control unit 105 outputs a result of requesting the database search unit 108 to search for data from the database storage unit 109 to the response voice / screen output unit 110.

そして、対話制御部１０５は、システム仕様記憶部１０７に定義されているシステム仕様に従い、対話の終了か否かを判定する。この結果、対話の終了でない場合（ステップＳ２０６でＮＯ）には、再び入力音声の認識処理（ステップＳ２０２）より上記ステップを繰り返し、対話の終了である場合（ステップＳ２０６でＹＥＳ）には、対話を終了する。 Then, the dialogue control unit 105 determines whether or not the dialogue is ended according to the system specification defined in the system specification storage unit 107. As a result, if the dialogue is not finished (NO in step S206), the above steps are repeated again from the input speech recognition process (step S202). If the dialogue is finished (YES in step S206), the dialogue is performed. finish.

次に、システムの具体動作例をシステムのシステム出力画面と対話履歴記憶部１０６に保存される対話履歴データの具体例を用いて説明する。 Next, a specific operation example of the system will be described with reference to a specific example of the system output screen of the system and dialog history data stored in the dialog history storage unit 106.

図８は、動作例で対象とする対話履歴データの具体例を示す図である。項目８０１はシステム状態の変化を一元管理するために振られているステップ番号、項目８０２はシステム状態の種類を示すシステム状態、項目８０３はシステムが応答を出力した日時を示す応答出力開始時刻、項目８０４は音声認識部１０１から得られる音声認識結果の１位候補の単語、項目８０５も同様に音声認識結果から得られる認識信頼度、項目８０６は音声認識部１０１からの音声認識結果に基づいて対話制御部１０５が判定したリジェクト判定結果、項目８０７は対話履歴記憶部１０６に保存される前回の認識結果と今回音声認識部１０１が出力した認識結果に基づいて停滞脱出判定部１０４が判定した言い直しによる停滞脱出の判定結果、項目８０８は項目８０７と同様にして停滞脱出判定部１０４が判定した言い換えによる停滞脱出の判定結果、項目８０９は音声認識パラメータ記憶部１０３に保存されており、認識結果からも取得できるリジェクト閾値である。なお、この図には示していないが、各ステップにおける図４で示されるような認識結果の詳細情報や図５に示されるような表示画面についての情報、具体的には表示されている単語やシステムがどのような応答文を出力したかを示す出力応答文など他の情報も対話履歴記憶部１０６には保存してもよい。 FIG. 8 is a diagram illustrating a specific example of dialogue history data targeted in the operation example. Item 801 is a step number assigned to centrally manage changes in system status, item 802 is a system status indicating the type of system status, item 803 is a response output start time indicating the date and time when the system outputs a response, and item 804 is the first candidate word of the speech recognition result obtained from the speech recognition unit 101, item 805 is also the recognition reliability obtained from the speech recognition result, and item 806 is a dialogue based on the speech recognition result from the speech recognition unit 101 Reject determination result determined by the control unit 105, item 807 is a rephrase determined by the stagnation escape determination unit 104 based on the previous recognition result stored in the conversation history storage unit 106 and the recognition result output by the current voice recognition unit 101 The result of determination of stagnation escape by means of item 808 is based on the paraphrase determined by the stagnation escape determination unit 104 in the same manner as item 807. Stagnation Escape determination result, item 809 are stored in the speech recognition parameter storage unit 103, a reject threshold that can be acquired from the recognition result. Although not shown in this figure, the detailed information of the recognition result as shown in FIG. 4 at each step, the information about the display screen as shown in FIG. 5, specifically the displayed words and Other information such as an output response sentence indicating what kind of response sentence the system has output may be stored in the dialogue history storage unit 106.

例えば、ユーザが、図３に示すメニュー画面で「番組名検索」と発声したとする。この認識結果の認識信頼度（０．４７）がリジェクト閾値（０．３５）より高いので、対話制御部１０５は、次のシステム状態を決定し、画面遷移と応答文の出力を行う（図８のステップ＝１）。具体的には、システムからは応答音声・画面出力部１１０によって図９に示されるような画面と「番組名の頭文字を指定してください」という応答音声が出力される。次に、ユーザは「あ行」と発声し、これも先の発声と同様に、認識確信度（０．３６）がリジェクト閾値（０．３５）より高いため、正しく受け付けられる（図８のステップ＝２）。システムからは応答音声・画面出力部１１０によって図１０のような画面と「どの番組ですか？」という応答が出力される。次に、ユーザはそのリストには見たい番組が無く「次の画面」と発声するが、この発声に対する認識結果では、認識信頼度（０．３３）がリジェクト閾値（０．３５）より低いためリジェクトであると判定される（図８のステップ＝３）。リジェクトと判定された場合、対話制御部１０５は再度そのシステム状態で（今の場合、対話＝状態３）再度入力を促す。なお、この動作はユーザが正しく発声しているのに対し、対話制御部１０５が誤ってリジェクトしてしまったシステムの誤動作であり、リジェクト閾値がユーザにとって正しく設定されていないため生じる動作である。 For example, assume that the user utters “program name search” on the menu screen shown in FIG. Since the recognition reliability (0.47) of the recognition result is higher than the rejection threshold (0.35), the dialogue control unit 105 determines the next system state, and outputs a screen transition and a response sentence (FIG. 8). Step = 1). Specifically, the response voice / screen output unit 110 outputs a screen as shown in FIG. 9 and a response voice “Please specify the initial of the program name” from the system. Next, the user utters “A line”, which, like the previous utterance, is recognized correctly because the recognition certainty (0.36) is higher than the reject threshold (0.35) (step in FIG. 8). = 2). From the system, the response voice / screen output unit 110 outputs a screen as shown in FIG. 10 and a response “Which program?”. Next, the user utters “next screen” because there is no program to be viewed in the list, but in the recognition result for this utterance, the recognition reliability (0.33) is lower than the rejection threshold (0.35). It is determined to be a rejection (step = 3 in FIG. 8). If it is determined to be rejected, the dialogue control unit 105 again prompts input in the system state (in this case, dialogue = state 3). This operation is a malfunction of the system in which the dialog control unit 105 rejects the user correctly while the user is uttering correctly, and is an operation that occurs because the reject threshold is not set correctly for the user.

再度同じシステム状態で、システムより入力を促されたユーザは再び「次の画面」と発声し、その音声認識の結果における認識信頼度（０．３８）はリジェクト閾値（０．３５）より高いので、対話制御部１０５はその結果を受け付ける（図８のステップ＝４）。ここで、このステップでは停滞脱出判定部１０４が「前回の発声はリジェクト」かつ「今回の発声は言い直し」であるので「言い直しによる停滞脱出」と判定し、項目８０７にその情報が記憶される。更にこのステップでは、対話制御部１０５は検出した誤動作と正しい動作を用いて、誤動作したはじめの発声を次からは受け付けるよう個人適応を行う。即ち、音声認識パラメータ、今回の例ではリジェクト閾値を変更し、次のステップからこの値を利用して音声認識を行う。具体的には、現在のリジェクト閾値「０．３５」を前回の誤ってリジェクトされた発声における信頼度でも正しく認識できるように「０．３０」に変更する。この閾値の変更は、システム開発者が予め設定した、決まった割合で変更を行っても良い。また、現在のリジェクト閾値と誤ってリジェクトされたときの認識信頼度を利用した計算により閾値の変更を行ってもよい。より具体的には、現在のリジェクト閾値と誤ってリジェクトされたときの認識信頼度の差分が一定値以内であれば、リジェクト閾値を誤ってリジェクトされたときの認識信頼度に設定し、差分が一定値以上であれば、現在のリジェクト閾値と誤ってリジェクトされたときの認識信頼度の間の重み付き平均値を利用してリジェクト閾値を設定しても良い。また、リジェクトされた単語と正しく認識された単語の認識信頼度を用いて閾値の変更を行ってもよい。具体的には、現在のリジェクト閾値と誤ってリジェクトされたときの認識信頼度を用いた計算方法と同様の方法で決定する。 In the same system state again, the user who is prompted to input by the system speaks again “next screen”, and the recognition reliability (0.38) in the result of the speech recognition is higher than the rejection threshold (0.35). The dialogue control unit 105 accepts the result (step = 4 in FIG. 8). Here, in this step, the stagnation / escape determination unit 104 determines that “the previous utterance is rejected” and “the current utterance is re-stated”, so that “stagnation escape due to re-statement” is determined, and the information is stored in the item 807. The Furthermore, in this step, the dialogue control unit 105 performs personal adaptation so as to accept the first utterance that has malfunctioned from the next by using the detected malfunction and the correct action. That is, the speech recognition parameter, in this example, the rejection threshold is changed, and speech recognition is performed using this value from the next step. Specifically, the current rejection threshold value “0.35” is changed to “0.30” so that it can be correctly recognized even with the reliability in the previous erroneously rejected utterance. This threshold value may be changed at a fixed rate preset by the system developer. Further, the threshold value may be changed by calculation using the current rejection threshold value and the recognition reliability when the rejection is erroneously performed. More specifically, if the difference between the current rejection threshold and the recognition reliability when erroneously rejected is within a certain value, the rejection threshold is set to the recognition reliability when erroneously rejected, and the difference is If the value is equal to or greater than a certain value, the reject threshold value may be set using a weighted average value between the current reject threshold value and the recognition reliability when the error is rejected by mistake. Further, the threshold value may be changed using the recognition reliability of the word correctly recognized as the rejected word. Specifically, it is determined by the same method as the calculation method using the current rejection threshold and the recognition reliability when erroneously rejected.

言い直しの結果を受け付けた対話制御部１０５は、次のシステム状態を決定し、画面遷移と応答文の出力を行う。具体的には、システムからは応答音声・画面出力部１１０によって、図１１に示されるような画面と「どの番組ですか？」という応答音声が出力される。ユーザはこの画面にも見たい番組が無いので、さらに「次の画面」と発声する。この発声の認識結果における認識信頼度はステップ３の時と同じ「０．３３」である。この認識信頼度はステップ３ではリジェクトされた値であるが、対話制御部１０５はこの認識信頼度「０．３３」と前ステップで適応させたリジェクト閾値「０．３」とを比較した結果、本ステップではこの発声をリジェクトせず、次のシステム状態を決定し、画面遷移と応答文の出力を行う。具体的には、システムからは応答音声・画面出力部１１０によって、図１２に示されるような画面と「どの番組ですか？」という応答音声が出力される（図８のステップ＝５）。ユーザはこの画面の中では見たい番組を見つけ、「ｉしたい」と番組を選択する発声を行う（図８のステップ＝６）。図１３は、以上の一連の動作をまとめた図であり、上から順に図８のステップ＝１からステップ＝６に対応する。 Upon receiving the restatement result, the dialogue control unit 105 determines the next system state, and outputs a screen transition and a response sentence. Specifically, the response voice / screen output unit 110 outputs a screen as shown in FIG. 11 and a response voice “Which program?” From the system. Since there is no program that the user wants to watch on this screen, the user further says “next screen”. The recognition reliability in the recognition result of this utterance is “0.33”, which is the same as in step 3. This recognition reliability is the value rejected in step 3, but the dialogue control unit 105 compares the recognition reliability “0.33” with the rejection threshold “0.3” adapted in the previous step. In this step, this utterance is not rejected, the next system state is determined, and screen transitions and response sentences are output. Specifically, the response voice / screen output unit 110 outputs a screen as shown in FIG. 12 and a response voice “Which program?” From the system (step = 5 in FIG. 8). The user finds the program he / she wants to see on this screen, and utters “I want to do” to select the program (step = 6 in FIG. 8). FIG. 13 is a diagram summarizing the series of operations described above, and corresponds to step = 1 to step = 6 in FIG. 8 in order from the top.

次に、言い換えを利用したリジェクト閾値の変更動作例について、対話履歴データの具体例を用いて説明する。図１４は、動作例で対象とする対話履歴データの具体例を示す図である。なお、対話履歴データの項目は図８と同じであるので、説明は省略する。更に、上記言い直しによるリジェクト閾値の変更動作例との発声の違いはステップ３〜ステップ５のみであるので、図１４のステップ３からステップ５の動作例についてのみ説明する。 Next, a reject threshold value changing operation example using paraphrasing will be described using a specific example of dialogue history data. FIG. 14 is a diagram illustrating a specific example of dialogue history data targeted in the operation example. The items of the dialogue history data are the same as those in FIG. Furthermore, since the difference in utterance from the example of the operation of changing the rejection threshold due to the above rewording is only step 3 to step 5, only the example of operation from step 3 to step 5 in FIG. 14 will be described.

システムから応答音声・画面出力部１１０によって図１０のような画面と「どの番組ですか？」という応答が出力される。ユーザはそのリストには見たい番組が無いため「次の画面」と発声するが、この発声に対する認識結果では、認識信頼度（０．３３）はリジェクト閾値（０．３５）より低いためリジェクトであると判定される（図１４のステップ＝３）。リジェクトと判定された場合、対話制御部１０５は再度そのシステム状態で（今の場合対話＝状態３）再度入力を促す。 A response voice / screen output unit 110 outputs a screen as shown in FIG. 10 and a response “Which program?” From the system. The user utters “next screen” because there is no program to view in the list. However, in the recognition result for this utterance, the recognition reliability (0.33) is lower than the rejection threshold (0.35), so the rejection is made. It is determined that there is (step = 3 in FIG. 14). If it is determined to be rejected, the dialogue control unit 105 prompts input again in the system state (in this case, dialogue = state 3).

再度同じシステム状態で、システムより入力を促されたユーザは「次の画面」と同じシステム動作を行うコマンドである「５番」と発声する。この音声認識の結果における認識信頼度（０．３８）はリジェクト閾値（０．３５）より高いので、対話制御部１０５はその結果を受け付ける（図１４のステップ＝４）。ここで、このステップでは停滞脱出判定部１０４が「前回の発声はリジェクト」かつ「今回の発声は言い換え」であるので「言い換えによる停滞脱出」と判定し、項目１４０８にその情報が記憶される。さらに、このステップでは、対話制御部１０５は検出した誤動作と正しい動作を用いて、誤動作したはじめの発声を次からは受け付けるよう個人適応を行う。即ち、音声認識パラメータ、今回の例ではリジェクト閾値を変更し、次のステップからこの値を利用して音声認識を行う。具体的には現在のリジェクト閾値「０．３５」を前回の誤ってリジェクトされた発声における信頼度でも正しく認識できるように「０．３」に変更する。以降の動作は上記言い直しによるリジェクト閾値の変更動作例と同じであるので省略する。 Again, in the same system state, the user who is prompted to input by the system utters “No. 5”, which is a command for performing the same system operation as the “next screen”. Since the recognition reliability (0.38) in the voice recognition result is higher than the reject threshold (0.35), the dialogue control unit 105 accepts the result (step = 4 in FIG. 14). Here, in this step, the stagnation / escape determination unit 104 determines that “the previous utterance is rejected” and “the current utterance is paraphrased”, so it is determined as “stagnation escape due to paraphrasing”, and the information is stored in the item 1408. Further, in this step, the dialogue control unit 105 performs personal adaptation using the detected malfunction and the correct action to accept the first utterance that has malfunctioned from the next. That is, the speech recognition parameter, in this example, the rejection threshold is changed, and speech recognition is performed using this value from the next step. Specifically, the current rejection threshold “0.35” is changed to “0.3” so that the reliability can be correctly recognized even in the previous erroneously rejected utterance. Subsequent operations are the same as in the example of changing the reject threshold by the above-mentioned rephrasing, and are therefore omitted.

なお、上記具体例の中では「言い直しまたは言い換えによる停滞脱出」を１回検出した段階でリジェクト閾値を変更したが、音声認識パラメータの変更を行う基準としての停滞脱出検出の回数は可変に設定できるようにしてもよい。例えば３回に設定すると、「言い直しまたは言い換えによる停滞脱出」が３回検出されたらリジェクト閾値の変更を行うことになる。この場合、例えば３回分の認識結果における認識信頼度を用いてリジェクト閾値を変更してもよい。より具体的には、３回分の認識結果における信頼度の最低値や平均値、重み付け平均値により決定する。 In the above specific example, the rejection threshold was changed at the stage where "stagnation escape by rephrasing or paraphrasing" was detected once, but the number of times of stagnation escape detection as a reference for changing the speech recognition parameter is variably set. You may be able to do it. For example, if it is set to 3 times, the rejection threshold value is changed when “stagnation escape by rephrase or paraphrase” is detected 3 times. In this case, for example, the rejection threshold value may be changed using the recognition reliability in the recognition results for three times. More specifically, it is determined by the minimum value, average value, and weighted average value of the reliability in the recognition results for three times.

また、上記具体例ではリジェクト閾値を１つしか持たない例について述べたが、単語ごとにリジェクト閾値を持ち、「言い直しまたは言い換えによる停滞脱出」を単語ごとに検出し、閾値を変更してもよい。具体的には、例えば図１５のようなデータを音声認識パラメータ記憶部１０３に保存する。ここで、項目１５０１は停滞脱出をしたことによりリジェクト閾値が変更された単語であり、項目１５０２はその単語のリジェクト閾値である。なお、このリストに無い単語はデフォルト値、例えば上記具体例では「０．３５」を利用する。 Also, in the above specific example, an example having only one reject threshold has been described, but even if there is a reject threshold for each word, “stagnation escape by rephrase or paraphrase” is detected for each word, and the threshold is changed. Good. Specifically, for example, data as shown in FIG. 15 is stored in the speech recognition parameter storage unit 103. Here, the item 1501 is a word whose reject threshold is changed due to stagnation escape, and the item 1502 is the reject threshold of the word. A word that is not in this list uses a default value, for example, “0.35” in the above specific example.

図１６は本実施の形態を利用した場合と利用しない場合の対話シーケンスの例を示す図である。この図１６に示す例では、本実施の形態を利用した場合の方がユーザの発声が１回少なくて済む。この例では、ユーザは２ページ目で番組の選択を決定しているが、より多くのページを見ていく場合のように多くのステップを有する対話では本実施の形態の有効性は顕著に現れることになることは容易に理解できる。また、一度検索が終わり、再び同じ番組をはじめから選択する場合も本実施の形態を用いれば前回リジェクトされた発声方法でも初めから認識されることになる。 FIG. 16 is a diagram showing an example of a dialogue sequence when the present embodiment is used and when it is not used. In the example shown in FIG. 16, when the present embodiment is used, the user's utterance can be reduced once. In this example, the user decides to select a program on the second page. However, the effectiveness of the present embodiment appears remarkably in a dialog having many steps as in the case of viewing more pages. It ’s easy to understand. Also, when the search is finished once and the same program is selected again from the beginning, using this embodiment, the utterance method rejected last time can be recognized from the beginning.

このように本実施の形態によると、一連の対話シーケンスの中で、誤動作と正しい動作を検出することで音声認識パラメータを適切に変更することが可能となる。この結果、次に前回誤動作をした発声を行ってもシステムは正しい動作が可能となるため、何度も繰り返し言い直しをする必要が無く、スムーズでユーザに負担の掛からない対話が実現できる。また、本実施の形態による音声認識パラメータの変更は、変更のために特別な発声を促すわけでは無いので、ユーザの負担も少ない。 Thus, according to the present embodiment, it is possible to appropriately change the speech recognition parameter by detecting a malfunction and a correct operation in a series of dialogue sequences. As a result, the system can operate correctly even if the next malfunctioning utterance is performed, so that it is not necessary to repeat it again and again, and a smooth conversation that does not burden the user can be realized. Moreover, since the change of the speech recognition parameter according to the present embodiment does not prompt special utterance for the change, the burden on the user is small.

なお、本実施の形態は、図１７に示すように上記構成に加えてＥＰＧ（Electronic Program Guide）を受信するＥＰＧ受信部２０１を備え、ＥＰＧを対象として音声認識を行って情報を検索する音声対話型情報検索システムにおいても適用することが可能である。この場合、ＥＰＧ受信部２０１で受信されたＥＰＧは、データベース記憶部１０９に記憶される。対話制御部１０５は、データベース記憶部１０９に記憶されているＥＰＧを用いて音声認識辞書１０２を作成する。そして、音声認識部１０１は、ＥＰＧを用いて作成された音声認識辞書１０２を用いて、ユーザより入力された音声の音声認識を行う。また、データベース検索部１０８は、データベース記憶部１０９に記憶されているＥＰＧ等を対象として検索を行うことになる。 As shown in FIG. 17, the present embodiment includes an EPG receiving unit 201 that receives an EPG (Electronic Program Guide) in addition to the above-described configuration, and performs voice recognition for EPG as a target to search for information. The present invention can also be applied to a type information retrieval system. In this case, the EPG received by the EPG receiving unit 201 is stored in the database storage unit 109. The dialogue control unit 105 creates the speech recognition dictionary 102 using the EPG stored in the database storage unit 109. The voice recognition unit 101 performs voice recognition of the voice input by the user using the voice recognition dictionary 102 created using EPG. In addition, the database search unit 108 performs a search for an EPG or the like stored in the database storage unit 109.

（実施の形態２）
上記実施の形態１によれば、誤動作と正しい動作を検出することで音声認識パラメータの個人適応が可能となり、ユーザに負担の少ない個人適応が実現できるが、同様の適応を音声認識辞書の追加という形でも行える。本実施の形態では、誤動作と正しい動作の検出しを行い、音声認識辞書の変更または新たに登録を行う方法について述べる。 (Embodiment 2)
According to the first embodiment, it is possible to personally adapt voice recognition parameters by detecting a malfunction and a correct operation, and to realize personal adaptation with less burden on the user. The same adaptation is referred to as addition of a voice recognition dictionary. You can also do it. In this embodiment, a method of detecting a malfunction and a correct operation and changing or newly registering a speech recognition dictionary will be described.

本実施の形態は、上記実施の形態１とは図１における対話制御部１０５における停滞脱出判定結果に基づいて個人適応する対象が異なるものであり、他は実施の形態１と同様である。従って、基本的には図１から図１２を参照することとする。以下、本実施の形態における対話制御部１０５の動作と、前実施の形態では述べていない音声認識辞書の変更処理ついて説明する。 The present embodiment is different from the above-described first embodiment in that individuals to be personally adapted are based on the stagnation escape determination result in the dialogue control unit 105 in FIG. Therefore, basically, reference is made to FIGS. Hereinafter, the operation of the dialogue control unit 105 in the present embodiment and the voice recognition dictionary change processing not described in the previous embodiment will be described.

本実施の形態における辞書変更・登録による個人適応の動作例について、対話履歴記憶部１０６に記憶されている対話履歴データの具体例を用いて説明する。 An operation example of personal adaptation by dictionary change / registration in the present embodiment will be described using a specific example of dialogue history data stored in the dialogue history storage unit 106.

図１８は、対話履歴記憶部１０６に記憶されている対話履歴データの具体例を示す図である。図１８に示される対話履歴データの例は実施の形態１での対話シーケンスにおける対話履歴データの例（図８）と同様の履歴であり、図８には示されていなかった項目「認識結果２」が示されている点、および図８に示されていた項目「応答出力開始時刻」が省略されている点を除いては図８と同じものである。なお、既に述べたが認識結果２は音声認識辞書を使わず、音響的に近いかな文字列を音声認識結果として出力されたものであり、認識結果の一例を示した図４における音声認識結果２と同一のものである。 FIG. 18 is a diagram illustrating a specific example of dialogue history data stored in the dialogue history storage unit 106. The example of the dialog history data shown in FIG. 18 is the same history as the example of the dialog history data (FIG. 8) in the dialog sequence in the first embodiment, and the item “recognition result 2” not shown in FIG. 8 is the same as FIG. 8 except that the item “response output start time” shown in FIG. 8 is omitted. Note that as described above, the recognition result 2 is obtained by outputting an acoustically close kana character string as the speech recognition result without using the speech recognition dictionary, and the speech recognition result 2 in FIG. 4 showing an example of the recognition result. Is the same.

以下、図１８の項目「ステップ」を用い、順に具体的動作を説明する。
ステップ３では、ユーザの発声「次の画面」に対し、音声認識部１０１は認識結果２「スイノダメン」、認識信頼度「０．３３」、リジェクト閾値「０．３５」を出力する。対話制御部１０５は、認識信頼度がリジェクト閾値より低いため、リジェクトと判定し、再度そのシステム状態での再度入力を促す。ステップ４では、ユーザの再発声「次の画面」に対し、音声認識部１０１は認識結果２「ツリノガメン」、認識信頼度「０．３８」、リジェクト閾値「０．３５」を出力し、停滞脱出判定部１０４は「言い直しによる停滞脱出」との判定を出力する。対話制御部１０５は、これらの結果を受けて、誤動作したステップ３における発声が次回からは正しく認識されるように、個人適応を行う。即ち、ステップ３でリジェクトされた発声に対する音声認識結果２の「スリノダメン」をステップ４で正しく認識されたコマンド「次の画面」に対応させて音声認識辞書１０２に新規に登録を行う。 Hereinafter, specific operations will be described in order using the item “step” in FIG.
In step 3, the speech recognition unit 101 outputs a recognition result 2 of “sui no damen”, a recognition reliability of “0.33”, and a rejection threshold of “0.35” for the user's utterance “next screen”. Since the recognition reliability is lower than the rejection threshold, the dialogue control unit 105 determines that the recognition is rejected, and prompts input again in the system state. In step 4, the voice recognition unit 101 outputs the recognition result 2 “Tsurino Gamen”, the recognition reliability “0.38”, and the rejection threshold “0.35” in response to the user's recurrent voice “next screen”. The determination unit 104 outputs a determination “stagnation escape due to rephrasing”. In response to these results, the dialogue control unit 105 performs personal adaptation so that the utterance in step 3 that has malfunctioned is correctly recognized from the next time. That is, “Suri no damen” of the speech recognition result 2 for the utterance rejected in step 3 is newly registered in the speech recognition dictionary 102 in correspondence with the command “next screen” correctly recognized in step 4.

図１９は音声認識辞書の具体例を示す図である。項目１８０１は単語ごとにユニークに付与される単語番号、項目１８０２はシステム仕様で同じ意味として扱われる番号を同一番号として付与された意味番号、項目１８０３は単語の表記、項目１８０４は単語の読みである。ここで、上記例においては、図１９の単語番号１３０が新規登録されたことになる。 FIG. 19 is a diagram showing a specific example of a speech recognition dictionary. The item 1801 is a word number uniquely assigned to each word, the item 1802 is a meaning number assigned with the same number as the same meaning in the system specification, the item 1803 is a word notation, and the item 1804 is a word reading is there. Here, in the above example, the word number 130 in FIG. 19 is newly registered.

ステップ５では、ユーザが「次の画面」と発声する。音声認識部１０１からはステップ３の時と同様に音声認識結果２として「スイノダメン」という結果が出力されるが、このときの音声認識時には音声認識辞書１０２に「スイノダメン」が「次の画面」と対応された状態で登録されているため、高い確信度（今の場合０．４５）が結果として出力される。このように、ステップ５における発声は個人適応されたことに伴い、ステップ３と同様の発声であるにもかかわらず正しく認識が行われる。 In step 5, the user speaks “next screen”. As in the case of step 3, the speech recognition unit 101 outputs the result “sui no damen” as the speech recognition result 2, but at the time of the speech recognition at this time, “sino damen” is displayed as “next screen” in the speech recognition dictionary 102. Since it is registered in a corresponding state, a high certainty factor (0.45 in this case) is output as a result. As described above, the utterance in step 5 is correctly recognized in spite of the utterance similar to that in step 3 due to personal adaptation.

なお、上記具体例の中では言い直しの「言い直しによる停滞脱出」を１回検出した段階で認識辞書の変更を行ったが、認識辞書変更を行う基準としての停滞脱出検出の回数は可変に設定できるようにしてもよい。例えば３回に設定すると、「言い直しによる停滞脱出」が３回検出されたら認識辞書の変更を行うことになる。ここで、３回分の認識結果における認識結果を全て登録しても良いが、組み合わせて作成した文字列を登録してもよい。具体的には「ツギノガメン」に対して「スイノダメン」「ツイノダメン」「スギノダメン」に対して、全てが共通している「ダ」の部分だけを変更した「ツギノダメン」を登録してもよい。さらに、変更されたかな文字を記憶し、このユーザは「ガ」を「ダ」とよく間違えると判定した場合、他の単語についても「ガ」を「ダ」に変更してもよい。具体的には「前の画面」に対し「マエノダメン」という読みを付与し、音声認識辞書に追加登録しても良い。 In the above specific example, the recognition dictionary was changed when the rephrasing “stagnation escape by rephrasing” was detected once, but the number of times of stagnation escape detection as a reference for changing the recognition dictionary is variable. It may be settable. For example, if it is set to 3 times, the recognition dictionary will be changed when “stagnation escape by rephrasing” is detected 3 times. Here, all the recognition results in the recognition results for three times may be registered, or a character string created in combination may be registered. Specifically, “Tsugino Damen” obtained by changing only “Da” part that is common to “Shinoda Damen”, “Tsugino Damen”, and “Sugino Damen” may be registered. Furthermore, the kana characters that have been changed are stored, and when it is determined that “ga” is often mistaken for “da”, “ga” may be changed to “da” for other words. Specifically, the reading “Maenodamen” may be assigned to the “previous screen” and additionally registered in the speech recognition dictionary.

また、本実施の形態では言い直しの停滞の判定により音声認識辞書の追加・変更の例についてのみ述べたが、実施の形態１と同様にすれば言い換えの場合も音声認識辞書の追加・変更を行うことができる。 In the present embodiment, only the example of adding / changing the speech recognition dictionary based on the determination of rephrasing stagnation has been described. However, in the same manner as in the first embodiment, addition / change of the speech recognition dictionary is also performed in the case of paraphrasing. It can be carried out.

このように本実施の形態によると、一連の対話シーケンスの中で、誤動作と正しい動作を検出することで音声認識パラメータだけでなく、音声認識辞書についても適切に変更することが可能となる。この結果、次に前回誤動作をした発声を行ってもシステムは正しい動作が可能となるため、何度も繰り返し言い直しをする必要が無く、スムーズでユーザに負担の掛からない対話が実現できる。また、本実施の形態による音声認識辞書の変更は、認識率を上げるために特別な発声を促すわけでは無く自然な対話から認識率を上げるため、ユーザの負担も少ない。 As described above, according to the present embodiment, it is possible to appropriately change not only the speech recognition parameters but also the speech recognition dictionary by detecting a malfunction and a correct operation in a series of dialogue sequences. As a result, the system can operate correctly even if the next malfunctioning utterance is performed, so that it is not necessary to repeat it again and again, and a smooth conversation that does not burden the user can be realized. In addition, the change of the speech recognition dictionary according to the present embodiment does not prompt special utterances to increase the recognition rate, and raises the recognition rate from natural conversation, so the burden on the user is small.

なお、本実施の形態における音声認識辞書への追加・変更と上記実施の形態１における音声認識パラメータの変更とを組み合わせて実施することも可能である。 It should be noted that the addition / change to the voice recognition dictionary in the present embodiment and the change in the voice recognition parameter in the first embodiment may be combined.

（実施の形態３）
上記実施の形態１および実施の形態２によれば、一連の対話シーケンスの中で、誤動作と正しい動作を検出することで音声認識パラメータおよび認識辞書をユーザに適したものに変更しているが、上記実施の形態１および実施の形態２においては、「言い換え」を前回リジェクトされた単語が、今回正しく認識された単語と同一のシステム動作を行う単語であるかをシステム仕様記憶部にある図７のようなデータを用い判定している。しかし、「言い換え」には様々な形があり、事前にシステム仕様に登録できない場合がある。特にＥＰＧを用いた番組検索システムにおいては、日々更新される番組名を認識対象とする必要があり、予め言い換えについてシステム開発者が登録しておくことができない。本実施の形態は、このような場合に対処するものである。 (Embodiment 3)
According to Embodiment 1 and Embodiment 2 described above, the speech recognition parameters and the recognition dictionary are changed to those suitable for the user by detecting malfunctions and correct operations in a series of dialogue sequences. In the first embodiment and the second embodiment described above, the system specification storage unit determines whether the word that was previously rejected for “paraphrase” is a word that performs the same system operation as the word that was correctly recognized this time. Judgment is made using data such as However, there are various forms of “paraphrasing”, and there are cases where registration in the system specification in advance is not possible. In particular, in a program search system using EPG, it is necessary to recognize a program name that is updated every day, and the system developer cannot register the paraphrase in advance. The present embodiment addresses such a case.

図２０は、本発明の実施の形態３に係る音声認識装置を備えた音声対話型情報検索システムの構成を示すブロック図である。 FIG. 20 is a block diagram showing a configuration of a voice interactive information retrieval system including a voice recognition device according to Embodiment 3 of the present invention.

本実施の形態３と上記実施の形態１および実施の形態２との相違点は、省略語作成部３０１とユーザ発声記憶部３０４が追加されたことによる停滞脱出判定部３０２の動作が異なる点であり、他の動作は上記実施の形態１および実施の形態２と同一である。従って、本実施の形態においては、動作が異なる停滞脱出判定部３０２の言い換え判定の動作についてのみ説明する。 The difference between the third embodiment and the first and second embodiments is that the operation of the stagnation / escape determination unit 302 due to the addition of the abbreviation creation unit 301 and the user utterance storage unit 304 is different. The other operations are the same as those in the first and second embodiments. Therefore, in the present embodiment, only the paraphrase determination operation of the stagnation escape determination unit 302 having different operations will be described.

停滞脱出判定部３０２は、上記実施の形態１および実施の形態２と同様に図６のフローチャートに従って、言い直しおよび言い換えによる停滞脱出の判定を行うが、図６のステップＳ６０５における処理、即ち今回の発声が言い換えか否かの判定を行う処理が異なる。図２１は本実施の形態における言い換え判定動作の流れを示すフローチャートである。 The stagnation escape determination unit 302 performs stagnation escape determination by rephrasing and paraphrasing according to the flowchart of FIG. 6 in the same manner as in the first and second embodiments, but the processing in step S605 of FIG. The process for determining whether the utterance is paraphrasing is different. FIG. 21 is a flowchart showing the flow of the paraphrase determination operation in the present embodiment.

まず、今回の発声が前回の発声と同一のシステム動作を行う認識単語であるか否かを判定する（ステップＳ２００１）。この判定の結果、前回の発声と同一のシステム動作を行う認識単語である場合（ステップＳ２００１でＹＥＳ）、これまでの実施の形態同様の動作であり、言い換えによる停滞脱出と判定する（図６のステップＳ６０５でＹＥＳの判定）。一方、前回の発声と同一のシステム動作を行う認識単語でない場合（ステップＳ２００１でＮＯ）、今回の認識対象語彙から省略語が作成される（ステップＳ２００２）。省略語の作成は、今回の認識対象語彙を用いて省略語作成部３０１において行われる。 First, it is determined whether or not the current utterance is a recognized word that performs the same system operation as the previous utterance (step S2001). As a result of this determination, if the recognition word is the recognition word that performs the same system operation as the previous utterance (YES in step S2001), the operation is the same as in the previous embodiments, and it is determined that the stagnation escape due to paraphrase (FIG. 6). (Step S605: YES) On the other hand, if it is not a recognized word that performs the same system operation as the previous utterance (NO in step S2001), an abbreviation is created from the current recognition target vocabulary (step S2002). The abbreviation creation is performed in the abbreviation creation unit 301 using the current recognition target vocabulary.

省略語作成部３０１は、今回の認識対象語彙を受け取り、予め定義されているルールに基づいて省略語を作成する。省略語作成方法としては、形態素解析ツールなどを用いて今回の認識対象語彙を形態素に分解し、その分解結果を基に作成する。例えば、一つの形態素を省略語としても良いし、複数の形態素をつなげて省略語としても良い。より具体的には例えば、「発掘あるある広辞苑」という単語に対して「発掘」「あるある」「広辞苑」「あるある広辞苑」といった省略語を作成したり、「冬のレクイエム」という単語に対して「冬レク」といった省略語を作成したりする。省略語作成部３０１で作成された省略語は、停滞脱出判定部３０２を介して対話制御部３０３に保持される。 The abbreviation creation unit 301 receives the current recognition target vocabulary and creates abbreviations based on predefined rules. As an abbreviation creation method, the current recognition target vocabulary is decomposed into morphemes using a morpheme analysis tool, and the abbreviation is created based on the decomposition results. For example, one morpheme may be an abbreviation, or a plurality of morphemes may be connected to form an abbreviation. More specifically, for example, an abbreviation such as “Excavation”, “A certain”, “Korenji”, “A certain Hiroji” can be created for the word “Excavation with a certain broad word”, or “Winter Requiem” To create an abbreviation such as “Winter Lek”. The abbreviation created by the abbreviation creation unit 301 is held in the dialogue control unit 303 via the stagnation escape determination unit 302.

次に、音声認識部１０１は、対話制御部３０３に保持されている省略語作成部３０１で作成された省略語を用いて、ユーザ発声記憶部３０４に記憶されている前回リジェクトされた発声について、再度認識を行う（ステップＳ２００３）。 Next, the speech recognition unit 101 uses the abbreviation created by the abbreviation creation unit 301 held in the dialogue control unit 303 to perform the previously rejected utterance stored in the user utterance storage unit 304. Recognition is performed again (step S2003).

そして、停滞脱出判定部３０２は、再認識結果の信頼度とリジェクト閾値とを比較する（ステップＳ２００４）。ここで、再認識結果の信頼度がリジェクト閾値より高い場合（ステップＳ２００４でＹＥＳ）、対話制御部３０３は、認識候補１位の省略語を今回認識された単語と同じ動作を行う単語としてシステム仕様記憶部１０７および音声認識辞書１０２に登録（ステップＳ２００５）し、言い換えによる停滞脱出と判定する（図６のステップＳ６０５でＹＥＳの判定）。一方、再認識結果の信頼度がリジェクト閾値より低い場合（ステップＳ２００４でＮＯ）、停滞脱出判定部３０２は言い換えによる停滞脱出では無いと判定する（図６のステップＳ６０５でＮＯの判定）。 Then, the stagnation escape determination unit 302 compares the reliability of the re-recognition result with the rejection threshold (step S2004). Here, when the reliability of the re-recognition result is higher than the reject threshold (YES in step S2004), the dialogue control unit 303 sets the abbreviation of the recognition candidate first place as a word that performs the same operation as the currently recognized word. It registers in the memory | storage part 107 and the speech recognition dictionary 102 (step S2005), and determines with the stagnation escape by paraphrase (determination of YES in step S605 of FIG. 6). On the other hand, when the reliability of the re-recognition result is lower than the rejection threshold (NO in step S2004), the stagnation escape determination unit 302 determines that the stagnation escape is not due to paraphrasing (NO determination in step S605 of FIG. 6).

以上の動作より、システム仕様で音声認識辞書１０２に登録されていない省略語をユーザが発声してリジェクトされても、次の発声で正しい表現での発声を行い認識されれば、前回発声した省略語は新たに登録されるため、次回から認識が可能となる。これにより、省略語を発声してしまうユーザに対して何度もリジェクトすることが無く、スムーズでユーザに負担の掛からない対話が実現できる。さらに、本実施の形態による省略語の作成には特別な発声を促すわけでは無いので、ユーザの負担も少ない。 As a result of the above operation, even if the user utters an abbreviation that is not registered in the speech recognition dictionary 102 in the system specifications and is rejected, if the utterance is expressed with the correct expression in the next utterance and is recognized, the abbreviation uttered last time Since the word is newly registered, it can be recognized from the next time. As a result, the user who utters the abbreviation is not rejected many times, and a smooth dialogue that does not burden the user can be realized. Furthermore, since the creation of abbreviations according to the present embodiment does not prompt special utterance, the burden on the user is small.

（実施の形態４）
上記実施の形態１から実施の形態３によれば、一連の対話シーケンスの中で、誤動作と正しい動作を検出することで音声認識パラメータの変更および認識辞書の変更を行い、個人適応を可能としたが、複数のユーザが利用することを想定していないため、複数のユーザが利用した場合、正しく個人適応できない。本実施の形態は、このような場合に対処するものである。 (Embodiment 4)
According to the first to third embodiments, the voice recognition parameters are changed and the recognition dictionary is changed by detecting a malfunction and a correct operation in a series of dialogue sequences, thereby enabling personal adaptation. However, since it is not assumed that a plurality of users will use it, when it is used by a plurality of users, it will not be possible to personally adapt correctly. The present embodiment addresses such a case.

図２２は、本発明の実施の形態４に係る音声認識装置を備えた音声対話型情報検索システムの構成を示すブロック図である。 FIG. 22 is a block diagram showing a configuration of a voice interactive information retrieval system including a voice recognition device according to Embodiment 4 of the present invention.

本実施の形態と上記実施の形態３との相違点は、ユーザ入力部４０１およびユーザ情報記憶部４０２が追加されたことによる対話制御部４０３における個人適応処理が異なる点であり、他は実施の形態１から実施の形態３までと同一である。従って、本実施の形態においては、複数ユーザが利用する際の対話制御部４０３の動作について説明する。 The difference between the present embodiment and the third embodiment is that the personal adaptation process in the dialog control unit 403 is different due to the addition of the user input unit 401 and the user information storage unit 402. This is the same as in Embodiment 1 to Embodiment 3. Therefore, in the present embodiment, the operation of the dialogue control unit 403 when used by a plurality of users will be described.

対話制御部４０３は、ユーザ入力部４０１からユーザ名が入力されると、ユーザ情報記憶部４０２より、入力されたユーザ名に適応された音声認識パラメータや認識対象辞書が登録されていているか否かの確認を行う。もし、入力されたユーザ名に適応された音声認識パラメータや認識対象語彙が無い場合、音声認識パラメータや音声認識辞書は初期値を利用してシステムを動作させる。もし、ユーザ適応されていないユーザが、システムを利用中に停滞脱出判定部３０２により誤動作と正しい動作のシーケンスが検出され、実施の形態１から実施の形態３で説明したような音声認識パラメータや音声認識対象語彙の変更が必要となると、対話制御部４０３はユーザ情報記憶部４０２に新規ユーザのユーザ名と音声認識パラメータや辞書を変更した単語についての各種情報を記憶する。 When a user name is input from the user input unit 401, the dialog control unit 403 determines whether or not a speech recognition parameter and a recognition target dictionary adapted to the input user name are registered from the user information storage unit 402. Confirm. If there is no speech recognition parameter or recognition target vocabulary adapted to the input user name, the speech recognition parameter and speech recognition dictionary operate the system using the initial values. If a user who is not adapted to the user detects a malfunction and a correct operation sequence by the stagnation / escape determination unit 302 while using the system, the voice recognition parameter or voice as described in the first to third embodiments is used. When the recognition target vocabulary needs to be changed, the dialogue control unit 403 stores various information about the new user's user name, voice recognition parameters, and words whose dictionary has been changed in the user information storage unit 402.

一方、ユーザ入力部４０１より入力されたユーザ名に適応された音声認識パラメータや認識対象辞書がユーザ情報記憶部４０２に登録されている場合、対話制御部４０３はユーザ情報記憶部４０２から以前登録されたユーザ名の個人適応後の音声認識パラメータや新規認識辞書登録単語を抽出し、音声認識パラメータ記憶部１０３や音声認識辞書１０２にその情報を登録する。 On the other hand, when the speech recognition parameter and the recognition target dictionary adapted to the user name input from the user input unit 401 are registered in the user information storage unit 402, the dialogue control unit 403 is previously registered from the user information storage unit 402. The voice recognition parameter after personal adaptation of the user name and the new recognition dictionary registered word are extracted, and the information is registered in the voice recognition parameter storage unit 103 and the voice recognition dictionary 102.

図２３はユーザ情報記憶部４０２に記憶される各種情報の具体例を示す図である。なお、図２３の例では単語ごとにリジェクト閾値を持つ場合の具体例を示す。項目２２０１はユーザ名であり、項目２２０２は停滞単語、すなわち音声認識パラメータや辞書を変更した単語であり、項目２２０３は停滞脱出回数、すなわち何度停滞脱出が検出されたかを示し、項目２２０４は変更した音声認識パラメータであるリジェクト閾値、項目２２０５は音声認識辞書に追加した新規登録読みである。 FIG. 23 is a diagram showing specific examples of various information stored in the user information storage unit 402. Note that the example of FIG. 23 shows a specific example in the case where each word has a rejection threshold. Item 2201 is a user name, item 2202 is a stagnation word, that is, a word whose speech recognition parameter or dictionary is changed, item 2203 indicates the number of stagnation escapes, that is, how many times stagnation escape is detected, and item 2204 is changed The reject threshold, which is the voice recognition parameter, and the item 2205 are newly registered readings added to the voice recognition dictionary.

図２３に示されるデータがユーザ情報記憶部４０２に記憶されている場合、ユーザ入力部４０１からユーザ名Ａが入力されると、対話制御部４０３はユーザ名Ａの個人適応情報として「次の画面」の単語に対し、リジェクト閾値「３．４」を、新規読み登録として「ツリノダメン」を、「前の画面」に対しリジェクト閾値「３．５」を、新規読み登録として「マエノダメン」をそれぞれ音声認識パラメータ記憶部１０３および音声認識辞書１０２に登録する。 When the data shown in FIG. 23 is stored in the user information storage unit 402, when the user name A is input from the user input unit 401, the dialogue control unit 403 displays “next screen” as personal adaptation information of the user name A. "Reject threshold" 3.4 "for new reading registration," Turi no damen "for new reading registration, Reject threshold" 3.5 "for" previous screen ", and" Maeno damen "for new reading registration Registration is performed in the recognition parameter storage unit 103 and the speech recognition dictionary 102.

以上の動作より、一連の対話シーケンスの中で、誤動作と正しい動作を検出することで音声認識パラメータおよび音声認識辞書の個人適応が可能となるだけではなく、複数のユーザがシステムを利用した際にも正しく個人適応が可能となり、ユーザに負担の少ない個人適応と円滑な対話が実現できる。 From the above operations, it is possible not only to enable individual adaptation of speech recognition parameters and speech recognition dictionaries by detecting malfunctions and correct operations in a series of dialogue sequences, but also when multiple users use the system. Personal adaptation is possible correctly, and personal adaptation and smooth dialogue with less burden on the user can be realized.

なお、本実施の形態ではユーザ入力部の入力を基にユーザの判別を行い、複数のユーザに対応した個人適応を行ったが、話者識別や話者判別の技術は現在一般的に存在するので、それらの技術を用いてユーザの判別を行ってもよい。 In this embodiment, user identification is performed based on input from the user input unit, and personal adaptation corresponding to a plurality of users is performed. However, speaker identification and speaker identification technology currently generally exist. Therefore, the user may be determined using those techniques.

（実施の形態５）
上記実施の形態１から実施の形態４ではシステムの停滞状態として、誤ったリジェクトによる停滞状態を対象としたが、誤認識により誤ったシステム状態へ遷移した場合に発生する停滞状態について述べていない。そこで、本実施の形態は、このような停滞状態に対処するものである。 (Embodiment 5)
In the first to fourth embodiments, the stagnation state caused by an erroneous rejection is targeted as the stagnation state of the system. However, the stagnation state that occurs when the system transitions to the wrong system state due to erroneous recognition is not described. Therefore, this embodiment deals with such a stagnation state.

誤認識により誤ったシステム状態へ遷移した場合に発生するシステムの停滞状態の具体例としては、「時間検索」とユーザが発声したのに対し、システムがこれを「ジャンル検索」と認識し、ユーザが思っていたシステム状態と別のシステム状態へ遷移するような場合がある。このとき、ユーザはこの誤認識によるシステムの誤った状態遷移を基に戻すために「戻る」といった元の状態に戻るためのコマンドを発声する。システムの状態が戻るとユーザは再度「時間検索」を発声する。この一連の動作は２つのシステム状態の往復が繰り返し続く状態であり、一つの停滞状態といえる。 As a specific example of the stagnation state of the system that occurs when a transition to the wrong system state due to misrecognition, the user uttered “time search”, but the system recognizes this as “genre search” and the user There is a case where the system state transitions to a system state different from the system state that was expected. At this time, the user utters a command for returning to the original state such as “return” in order to return the erroneous state transition of the system due to the erroneous recognition. When the system status returns, the user speaks “time search” again. This series of operations is a state in which the reciprocation of two system states continues repeatedly, and can be said to be one stagnation state.

本実施の形態では、上記実施の形態４と比べシステム構成としての変更はなく、異なるのは停滞脱出判定部３０２における停滞判定の動作処理（図６のフローチャート）であり、他は実施の形態４と同様である。 In the present embodiment, there is no change in the system configuration compared to the above-described fourth embodiment, and the difference is the stagnation determination operation processing (flowchart in FIG. 6) in the stagnation escape determination unit 302, and the other is the fourth embodiment. It is the same.

本実施の形態における停滞脱出判定部３０２の動作処理について説明する。図２４は本実施の形態における停滞脱出判定部３０２の動作の流れを示すフローチャートである。なお、下記の説明における過去の認識結果は対話履歴記憶部１０６に記憶されているデータを参照して利用し、言い直しや言い換えの判定は上記実施の形態１から実施の形態４に述べた方法と同じ方法で行う。 The operation process of the stagnation / escape determination unit 302 in the present embodiment will be described. FIG. 24 is a flowchart showing a flow of operation of the stagnation escape determination unit 302 in the present embodiment. The past recognition results in the following description are used by referring to the data stored in the dialogue history storage unit 106, and the rephrasing and paraphrase determination are the methods described in the first to fourth embodiments. Do the same.

まず、停滞脱出判定部３０２は、今回の音声認識結果を取得する（ステップＳ２３０１）。次に、この音声認識結果がリジェクトか否かの判定を行う（ステップＳ２３０２）。この判定の結果、リジェクトと判定した場合（ステップＳ２３０２でＹＥＳ）、停滞脱出ではないと判定し、処理を終了する。一方、リジェクトではないと判定した場合（ステップＳ２３０２でＮＯ）、前回の発声が状態を戻す発声（上記例では「戻る」）であったか否かを判定する（ステップＳ２３０３）。この判定の結果、前回の発声が状態を戻す発声でない場合（ステップＳ２３０３でＮＯ）、停滞脱出では無いと判定し、処理を終了する。一方、前回の発声が状態を戻す発声である場合（ステップＳ２３０３でＹＥＳ）、今回の発声が前々回の発声の言い直しか否かの判定を行う（ステップＳ２３０４）。この判定の結果、言い直しである場合（ステップＳ２３０４でＹＥＳ）、いい直しによる停滞脱出と判定し、処理を終了する。一方、言い直しでない場合（ステップＳ２３０４でＮＯ）、今回の発声が前々回の発声の言い換えか否かの判定を行う（ステップＳ２３０５）。この判定の結果、言い換えである場合（ステップＳ２３０５でＹＥＳ）、言い換えによる停滞脱出と判定し、処理を終了する。一方、言い換えでない場合（ステップＳ２３０５でＮＯ）、停滞脱出ではないと判定し、処理を終了する。なお、このようにして検出された言い直しや言い換えによる停滞脱出は、上記実施の形態１から実施の形態４で述べた、誤ったリジェクトによる停滞からの脱出と区別して対話履歴保存部１０６に保存する。 First, the stagnation escape determination unit 302 acquires the current speech recognition result (step S2301). Next, it is determined whether or not the voice recognition result is rejected (step S2302). As a result of this determination, if it is determined to be rejected (YES in step S2302), it is determined that it is not stagnation escape, and the process is terminated. On the other hand, if it is determined not to be rejected (NO in step S2302), it is determined whether or not the previous utterance was utterance that returns the state ("return" in the above example) (step S2303). As a result of this determination, if the previous utterance is not an utterance that returns the state (NO in step S2303), it is determined that it is not a stagnation escape, and the process ends. On the other hand, if the previous utterance is an utterance that returns the state (YES in step S2303), it is determined whether or not the current utterance is a restatement of the previous utterance (step S2304). If the result of this determination is rephrasing (YES in step S2304), it is determined that stagnation escape has occurred due to reworking, and the process ends. On the other hand, if it is not rephrased (NO in step S2304), it is determined whether the current utterance is a paraphrase of the previous utterance (step S2305). If the result of this determination is paraphrasing (YES in step S2305), it is determined that the stagnation escape is due to paraphrasing, and the process ends. On the other hand, if it is not a paraphrase (NO in step S2305), it is determined that it is not a stagnation escape, and the process ends. Note that the stagnation escape caused by re-phrase or paraphrase detected in this way is stored in the dialogue history storage unit 106 in distinction from the escape from stagnation caused by an erroneous reject described in the first to fourth embodiments. To do.

このようにして誤認識による停滞脱出の判定を行い、音声認識用パラメータや認識時書の変更を行う。具体的には例えば、上記実施の形態２で述べたような認識辞書の変更を行う。より具体的には、前々回の誤認識された発声に対しての音声認識結果のうち、音声認識辞書を使わず、音響的に近いかな文字列を音声認識結果として出力された結果（例えば図４の認識結果２）を、今回得られた正しく認識された結果の単語に対応付けて音声認識辞書に追加する。 In this way, the determination of escape from stagnation due to misrecognition is performed, and the parameters for speech recognition and the time of recognition are changed. Specifically, for example, the recognition dictionary is changed as described in the second embodiment. More specifically, among the speech recognition results for the previous misrecognized utterances, a result of outputting a character string that is acoustically close without using the speech recognition dictionary (for example, FIG. 4). The recognition result 2) is added to the speech recognition dictionary in association with the correctly recognized word obtained this time.

以上の動作より、誤ったリジェクトによるシステム状態の停滞のみでなく、誤認識によるシステム状態の停滞を利用した個人適応を行うので、次に前回誤動作をした発声を行っても誤認識による停滞が発生しなくなるため、スムーズでユーザに負担の掛からない対話が実現できる。また、本実施の形態による音声認識パラメータや音声認識辞書の変更は、専用の特別な発声を促すわけでは無く自然な対話から認識率を上げるため、ユーザの負担も少ない。 Based on the above operations, not only system status stagnation due to erroneous rejection but also personal adaptation using system status stagnation due to misrecognition, so that stagnation due to misrecognition will occur even if the previous malfunctioned utterance is made Therefore, it is possible to realize a smooth dialogue that does not burden the user. In addition, the change of the speech recognition parameters and the speech recognition dictionary according to the present embodiment does not prompt special special utterances, and raises the recognition rate from natural dialogue, so that the burden on the user is small.

なお、上記各実施の形態において、音声認識部は音声認識手段に、停滞脱出判定部は停滞脱出判定手段に、対話制御部は対話制御手段および変更制御手段に、省略語作成部は省略語作成手段に対応する。 In each of the above embodiments, the speech recognition unit is the speech recognition unit, the stagnation escape determination unit is the stagnation escape determination unit, the dialogue control unit is the dialogue control unit and the change control unit, and the abbreviation creation unit is the abbreviation creation. Corresponds to the means.

本発明に係る音声認識装置および音声認識方法は、音声対話型インタフェースを持つ多くのシステムに対して利用可能であり、例えば家庭内の情報検索システムやカーナビゲーションシステム、携帯端末からの情報検索などにおいて有用であり、その利用可能性は非常に大きい。 The voice recognition apparatus and the voice recognition method according to the present invention can be used for many systems having a voice interactive interface. For example, in a home information search system, a car navigation system, and information search from a portable terminal. It is useful and its availability is very large.

本発明の実施の形態１に係る音声認識装置を備えた音声対話型情報検索システムの構成を示すブロック図である。It is a block diagram which shows the structure of the speech interactive information retrieval system provided with the speech recognition apparatus which concerns on Embodiment 1 of this invention. 本発明における対話全体の処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the process of the whole dialog in this invention. 本発明における音声対話型情報検索システムの出力画面例を示す図である。It is a figure which shows the example of an output screen of the voice interactive information retrieval system in this invention. 本発明の実施の形態１における音声認識部から出力されて格納される認識結果例を示す図である。It is a figure which shows the example of a recognition result output and stored from the speech recognition part in Embodiment 1 of this invention. 本発明の実施の形態１における音声認識部から出力されて格納される認識結果例を示す図である。It is a figure which shows the example of a recognition result output and stored from the speech recognition part in Embodiment 1 of this invention. 本発明の実施の形態１における停滞脱出判定部における処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a process in the stagnation escape determination part in Embodiment 1 of this invention. 本発明の実施の形態１におけるシステム仕様記憶部におけるシステム動作仕様例を示す図である。It is a figure which shows the system operation specification example in the system specification memory | storage part in Embodiment 1 of this invention. 本発明の実施の形態１における対話履歴記憶部に記憶される対話履歴データ例を示す図である。It is a figure which shows the example of dialog log | history data memorize | stored in the dialog log | history memory | storage part in Embodiment 1 of this invention. 本発明における音声対話型情報検索システムの出力画面例を示す図である。It is a figure which shows the example of an output screen of the voice interactive information retrieval system in this invention. 本発明における音声対話型情報検索システムの出力画面例を示す図である。It is a figure which shows the example of an output screen of the voice interactive information retrieval system in this invention. 本発明における音声対話型情報検索システムの出力画面例を示す図である。It is a figure which shows the example of an output screen of the voice interactive information retrieval system in this invention. 本発明における音声対話型情報検索システムの出力画面例を示す図である。It is a figure which shows the example of an output screen of the voice interactive information retrieval system in this invention. 本発明の実施の形態１の対話例におけるシステム動作概略を示す図である。It is a figure which shows the system operation | movement outline in the example of interaction | dialogue of Embodiment 1 of this invention. 本発明の実施の形態１における対話履歴記憶部に記憶される対話履歴データ例を示す図である。It is a figure which shows the example of dialog log | history data memorize | stored in the dialog log | history memory | storage part in Embodiment 1 of this invention. 本発明の実施の形態１における単語ごとにリジェクト閾値を設定しているデータ例を示す図である。It is a figure which shows the example of data which has set the rejection threshold value for every word in Embodiment 1 of this invention. 本発明の実施の形態１における対話シーケンスにおいて、本技術を利用した場合としない場合の比較を示す図である。It is a figure which shows the comparison with the case where this technique is not used with the dialog sequence in Embodiment 1 of this invention, when not using. 本発明の実施の形態１に係る音声対話型情報検索システムの他の構成を示すブロック図である。It is a block diagram which shows the other structure of the voice interactive information retrieval system which concerns on Embodiment 1 of this invention. 本発明の実施の形態２における対話履歴記憶部に記憶される対話履歴データ例を示す図である。It is a figure which shows the example of dialog log | history data memorize | stored in the dialog log | history memory | storage part in Embodiment 2 of this invention. 本発明の実施の形態２における音声認識辞書に記憶される認識対象語彙の例を示す図である。It is a figure which shows the example of the recognition object vocabulary memorize | stored in the speech recognition dictionary in Embodiment 2 of this invention. 本発明の実施の形態３に係る音声認識装置を備えた音声対話型情報検索システムの構成を示すブロック図である。It is a block diagram which shows the structure of the speech interactive information retrieval system provided with the speech recognition apparatus which concerns on Embodiment 3 of this invention. 本発明の実施３の形態による言い換え判定動作の処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a process of the paraphrase determination operation | movement by Embodiment 3 of this invention. 本発明の実施４に係る音声認識装置を備えた音声対話型情報検索システムの構成を示すブロック図である。It is a block diagram which shows the structure of the speech interactive information retrieval system provided with the speech recognition apparatus which concerns on Embodiment 4 of this invention. 本発明の実施４の形態おけるユーザ情報記憶部に記憶されるユーザ情報データ例を示す図である。It is a figure which shows the example of user information data memorize | stored in the user information storage part in Embodiment 4 of this invention. 本発明の実施５の形態おける停滞脱出判定部における処理を示すフローチャートである。It is a flowchart which shows the process in the stagnation escape determination part in Embodiment 5 of this invention.

Explanation of symbols

１０１音声認識部
１０２音声認識辞書部
１０３音声認識パラメータ記憶部
１０４、３０２停滞脱出判定部
１０５、３０３、４０３対話制御部
１０６対話履歴記憶部
１０７システム仕様記憶部
１０８データベース検索部
１０９データベース記憶部
１１０応答音声・画面出力部
１１１タイマー
２０１ＥＰＧ受信部
３０１省略語作成部
３０４ユーザ発声記憶部
４０１ユーザ入力部
４０２ユーザ情報記憶部 DESCRIPTION OF SYMBOLS 101 Voice recognition part 102 Voice recognition dictionary part 103 Voice recognition parameter memory | storage part 104,302 Stagnation escape determination part 105,303,403 Dialog control part 106 Dialog history memory | storage part 107 System specification memory | storage part 108 Database search part 109 Database memory part 110 Response Voice / screen output unit 111 Timer 201 EPG reception unit 301 Abbreviation creation unit 304 User utterance storage unit 401 User input unit 402 User information storage unit

Claims

A speech recognition device for recognizing an input voice and changing a system state, which is a system state related to a dialogue with a user, according to a recognition result, and performing a dialogue.
Voice recognition means for recognizing input voice using a voice recognition dictionary and outputting a recognition result;
Dialog control means for making a response by changing the system state according to a recognition result of the voice recognition means;
If it is determined whether or not the system state has escaped from the stagnation state, which is a state where the system state does not proceed further and is stagnant, and the current recognition result indicates that the system state has escaped from the stagnation state. A stagnation escape determination means for determining whether or not it is at least one of correction and paraphrasing;
When it is determined that the rephrasing or paraphrasing, a change that changes at least one of a rejection threshold as a setting related to dialogue control and a new addition or change to the voice recognition dictionary as a change of setting related to voice recognition A speech recognition apparatus comprising: a control unit.

A speech recognition device for recognizing an input voice and changing a system state, which is a system state related to a dialogue with a user, according to a recognition result, and performing a dialogue.
Voice recognition means for recognizing input voice using a voice recognition dictionary and outputting a recognition result;
Dialog control means for making a response by changing the system state according to a recognition result of the voice recognition means;
When it is determined from the current recognition result whether the system state has escaped from the stagnation state that is the same as the system state according to the previous recognition result, and it is determined that the system state has escaped from the stagnation state, Stagnation escape determination means for determining whether the recognition result is at least one of rephrasing and paraphrasing;
When it is determined that the rephrasing or paraphrasing, a change that changes at least one of a rejection threshold as a setting related to dialogue control and a new addition or change to the voice recognition dictionary as a change of setting related to voice recognition A speech recognition apparatus comprising: a control unit.

The stagnation state of the system state is a state in which the same system state continues due to rejection of the voice recognition result,
The stagnation escape determination means determines that the current recognition result is the same word as the previous recognition result, and determines that the current recognition result is not the same word as the previous recognition result, but is determined in advance. The speech recognition apparatus according to claim 1, wherein when the recognition word is a recognition word that executes the same system operation, it is determined as a paraphrase.

The stagnation state of the system state is a state in which the round trip between two system states continues repeatedly,
The stagnation escape determination means determines that the current recognition result is the same word as the previous recognition result, and determines that the current recognition result is not the same word as the previous recognition result, but is determined in advance. The speech recognition apparatus according to claim 1, wherein when the recognition word is a recognition word that executes the same system operation, it is determined as a paraphrase.

The change control means, the speech recognition apparatus according to claim 1 or 2, characterized in that to change and set the threshold value of the reject each recognition target word.

The change control means, the reject threshold, and speech recognition apparatus according to newly added or changed to the speech recognition dictionary, to claim 1 or 2, characterized in that to set for each user.

The voice recognition device further includes:
When the current recognition result is not the same word as the previous recognition result and is not a recognition word that performs the same predetermined system operation when exiting from the stagnation state, an abbreviation of the current recognition target vocabulary is created Abbreviation creation means to
The voice recognition means re-recognizes the previous recognition result using the abbreviation,
The speech recognition apparatus according to claim 1, wherein the change control unit newly adds the abbreviation to the speech recognition dictionary according to a re-recognition result of the speech recognition unit.

An electronic program guide voice recognition apparatus for recognizing an input voice related to an electronic program guide, performing a dialog by changing a system state that is a system state related to a dialog with a user according to a recognition result,
Voice recognition means for recognizing the voice related to the input electronic program guide using a voice recognition dictionary corresponding to the electronic program guide, and outputting a recognition result;
Dialog control means for making a response by changing the system state according to a recognition result of the voice recognition means;
If it is determined whether or not the system state has escaped from the stagnation state, which is a state where the system state does not proceed further and is stagnant, and the current recognition result indicates that the system state has escaped from the stagnation state. A stagnation escape determination means for determining whether or not it is at least one of correction and paraphrasing;
When it is determined that the rephrasing or paraphrasing, a change that changes at least one of a rejection threshold as a setting related to dialogue control and a new addition or change to the voice recognition dictionary as a change of setting related to voice recognition And an electronic program guide voice recognition device.

A speech recognition method for recognizing an input voice, changing a system state that is a system state related to a dialog with a user according to a recognition result, and performing a dialog,
A speech recognition step of recognizing input speech using a speech recognition dictionary and outputting a recognition result;
A dialog control step of making a response by changing the system state according to a recognition result in the voice recognition step;
If it is determined whether or not the system state has escaped from the stagnation state, which is a state where the system state does not proceed further and is stagnant, and the current recognition result indicates that the system state has escaped from the stagnation state. A stagnation escape determination step of determining whether or not it is at least one of correction and paraphrasing;
When it is determined that the rephrasing or paraphrasing, a change that changes at least one of a rejection threshold as a setting related to dialogue control and a new addition or change to the voice recognition dictionary as a change of setting related to voice recognition A speech recognition method comprising: a control step.

A program for recognizing an input voice, changing a system state that is a state of a system related to a dialogue with a user according to a recognition result, and performing a dialogue,
A speech recognition step of recognizing input speech using a speech recognition dictionary and outputting a recognition result;
A dialog control step of making a response by changing the system state according to a recognition result in the voice recognition step;
If it is determined whether or not the system state has escaped from the stagnation state, which is a state where the system state does not proceed further and is stagnant, and the current recognition result indicates that the system state has escaped from the stagnation state. A stagnation escape determination step of determining whether or not it is at least one of correction and paraphrasing;
When it is determined that the rephrasing or paraphrasing, a change that changes at least one of a rejection threshold as a setting related to dialogue control and a new addition or change to the voice recognition dictionary as a change of setting related to voice recognition A program characterized by causing a computer to execute control steps.