JP2003177788A

JP2003177788A - Audio interactive system and its method

Info

Publication number: JP2003177788A
Application number: JP2001377982A
Authority: JP
Inventors: Takahiro Kii; 隆弘紀伊; Tomonori Iketani; 智則池谷; Tatsuro Matsumoto; 達郎松本; Shigeru Yamada; 茂山田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2001-12-12
Filing date: 2001-12-12
Publication date: 2003-06-27

Abstract

<P>PROBLEM TO BE SOLVED: To improve the recognition rate of vocabulary included in a user's speech and to enable recognition of a key word corresponding to a response with an indication word in a barge-in audio interactive system allowing the user to respond in the middle of a system speech. <P>SOLUTION: A scenario of the system speech is monitored, and weights to be added to key words included in individual parts of the scenario are determined on the basis of the time of the user's speech, the speech time of each scenario part, etc., and key words included in the user's speech are recognized. Further, a key word which the indication word means is recognized in accordance with the correspondence relation between the time of the user's speech with the indication word and a corresponding system speech. <P>COPYRIGHT: (C)2003,JPO

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、音声情報をユーザ
に提供し、ユーザからの応答に基づき処理の遂行を行な
う音声対話システムに関し、より詳しくは、順次提示す
る音声情報の発話中にユーザが応答する場合であって
も、ユーザの応答内容を認識出来る音声対話システムに
関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice dialogue system that provides voice information to a user and performs processing based on a response from the user. The present invention relates to a voice dialogue system that can recognize the response content of a user even when responding.

【０００２】[0002]

【従来の技術】音声対話システムでは、システムから選
択肢などを発話し、ユーザは、この選択肢の中から選択
したいものを音声で応答する。そして、システムは、こ
の音声を音声信号に変換した後、この音声信号と認識辞
書に記憶されている音声情報とを比較し、ユーザがどの
選択肢を選択したかを認識する。2. Description of the Related Art In a voice dialogue system, a system utters an option or the like, and a user responds by voice with a choice from the options. Then, the system converts this voice into a voice signal and then compares the voice signal with the voice information stored in the recognition dictionary to recognize which option the user has selected.

【０００３】この音声対話システムを番組案内などに使
用する場合には、番組のカテゴリを順次、発話して、ユ
ーザが案内を受けたいカテゴリを発話して指定してい
る。具体的な対話例を次に示す。When this voice interactive system is used for program guides, the categories of programs are sequentially uttered, and the categories in which the user wants to receive guidance are uttered and designated. A concrete dialogue example is shown below.

【０００４】システム発話：「ご利用になりたいサービ
スを選択してください。１天気予報、２スポーツニュース、３今日の運
勢。」ユーザ発話：「じゃー、天気予報」。System utterance: "Please select the service you want to use. 1 weather forecast, 2 sports news, 3 today's fortune." User utterance: "Ja, weather forecast".

【０００５】この様に、システムが選択肢（上記対話例
では、番組のカテゴリ名）を順次発話した後に、ユーザ
が指定する方法では、ユーザは最初に発話されたカテゴ
リの番組提示を受けたくとも、システムの発話が終わる
まで待たされることになり、また、ユーザは選択したい
選択肢をシステム発話が終わるまで覚えておかなくては
ならないので、間違った選択肢を選ぶ可能性も高くな
る。また、システムにとっては、発話が終了するまで、
ユーザの応答を得る事が出来ないので、シナリオに沿っ
て次のステップに進むまでに時間が掛かることになる。
ここで、「次のシナリオ」とは、上記例では、ユーザが
選択した「スポーツニュース」のより細かな選択肢であ
る野球、サッカー、テニスなどの選択肢の提示や、また
は、本日の野球の試合結果などのコンテンツの提示であ
る。As described above, in the method in which the system sequentially utters the options (the category name of the program in the above-mentioned dialog example) and then the user specifies, even if the user wants to be presented with the program of the category uttered first, Since the system has to wait until the utterance is finished and the user has to remember the option he / she wants to select until the system utterance is finished, the possibility of selecting the wrong option is increased. Also, for the system, until the utterance ends,
Since it is not possible to get the user's response, it will take time to move to the next step according to the scenario.
Here, in the above example, the “next scenario” is the presentation of choices such as baseball, soccer, and tennis that are more detailed choices of the “sports news” selected by the user, or the results of today's baseball game. It is the presentation of content such as.

【０００６】そこで、この様な不具合を解消するため
に、音声対話システムが発話中であっても、ユーザが応
答出来るバージイン（ｂａｒｇｅ−ｉｎ）対話システム
が知られている。このバージインで応答可能な場合の会
話例を示す。In order to solve such a problem, there is known a barge-in dialogue system in which the user can respond even when the voice dialogue system is speaking. An example of a conversation in which a response can be made by this barge-in is shown.

【０００７】（バージイン有りの場合の対話例）システ
ム発話：「ご利用になりたいサービスを選択してくださ
い。(Example of dialogue with barge-in) System utterance: "Please select the service you want to use.

【０００８】１天気予報、２スポーツニュース、・
・・・」ユーザ発話：「スポーツニュース」または「２」。1 weather forecast, 2 sports news,
... "User utterance:" Sports news "or" 2 ".

【０００９】このバージイン有りの音声対話システムで
は、システムの発話中であっても、ユーザが応答するこ
とが許され、選択肢がシステムによって発話中または発
話された直後に応答出来るので、ユーザは選択肢の発話
が終わるまで待たされることは無く、また、間違って選
択肢を選択することも少なくなり、一方、音声対話シス
テムも次のシナリオに進むことが出来る。この様に、バ
ージイン音声対話システムは、ユーザとスムーズな対話
を進行出来るユーザフレンドリーな音声対話システムと
いえる。In this barge-in voice interaction system, the user is allowed to respond even while the system is uttering, and the option can be responded immediately after the system speaks or is spoken by the system. There is no need to wait until the utterance ends, and it is less likely that the user will select an option by mistake, while the voice dialogue system can proceed to the next scenario. In this way, the barge-in voice dialogue system can be said to be a user-friendly voice dialogue system capable of proceeding with a smooth dialogue with the user.

【００１０】しかしながら、このバージイン対話システ
ムは、システムが発話中にユーザが発話出来るので、シ
ステムの発話がユーザの発話に重畳した音声がシステム
に入力されることになり、ユーザの発話内容を誤認した
り、認識率が低下する恐れがある。そのために、バージ
イン音声対話システムでは、システムの発話が再度、シ
ステムに入力されても、システムの発話をキャンセルす
るエコーキャンセンなどの技術を採用している。また更
に、ユーザが発話するであろうと予測出来る語彙に重み
を付けた認識辞書を使用して、発話内容を認識し易くし
ている。However, in this barge-in dialogue system, since the user can speak while the system is speaking, a voice in which the system utterance is superimposed on the user's utterance is input to the system, and the user's utterance content is erroneously recognized. Or the recognition rate may decrease. Therefore, the barge-in spoken dialogue system employs a technique such as echo canceling that cancels the system utterance even if the system utterance is input to the system again. Furthermore, a recognition dictionary that weights the vocabulary that the user can predict to speak is used to facilitate recognition of the content of the speech.

【００１１】[0011]

【発明が解決しようとする課題】上記したエコーキャン
セルや重み付けされた認識辞書を搭載したバージイン音
声対話システムであっても、ユーザが発話した音声の認
識を常に正しく行なう事は困難であった。更に、従来の
音声対話システムでは、複数の選択肢をシステムが発話
した際に、ユーザが「それ」などの指示語で応答して
も、ユーザがどの選択肢を選択したのか認識することは
困難であった。この指示語による対話の具体例を下記に
示す。Even in the barge-in voice dialogue system equipped with the echo cancellation or weighted recognition dictionary described above, it is difficult to always correctly recognize the voice uttered by the user. Furthermore, in a conventional voice dialogue system, when the system utters a plurality of options, it is difficult for the user to recognize which option the user has selected, even if the user responds with a directive such as “that”. It was A specific example of the dialogue using this directive is shown below.

【００１２】（指示語を使用した対話例）システム発話：「ご利用になりたいサービスを選択して
ください。(Example of dialogue using directives) System utterance: "Please select the service you want to use.

【００１３】１天気予報、２スポーツニュース、・
・・・」ユーザ発話：「それ」。1 weather forecast, 2 sports news,
・・・ ”User utterance:“ That ”.

【００１４】そこで、本発明の第１の課題は、ユーザの
発話をより正確に認識するバージインが可能な音声対話
システムおよびその方法を提供することである。Therefore, a first object of the present invention is to provide a voice interaction system and method capable of barge-in that recognizes a user's utterance more accurately.

【００１５】そして、本発明の第２の課題は、ユーザが
指示語を含む応答をしても、ユーザがどの選択肢を選択
したのかを認識出来る音声対話システムおよびその方法
を提供することである。A second object of the present invention is to provide a voice dialogue system and method that can recognize which option the user has selected, even if the user gives a response including a directive.

【００１６】[0016]

【課題を解決するための手段】以下の説明において、シ
ステムの発話をシステム発話と称し、ユーザの発話をユ
ーザ発話と称する。そして、ユーザがシステム発話中に
応答する発話をバージイン発話と称する。バージイン音
声対話システムでは、シナリオに基づいてシステムが発
話している途中であってもユーザが応答出来ることに特
徴がある。In the following description, system utterances are referred to as system utterances, and user utterances are referred to as user utterances. An utterance that the user responds to during system utterance is called a barge-in utterance. The barge-in spoken dialogue system is characterized in that the user can respond even while the system is speaking based on the scenario.

【００１７】この様な対話システムにあっては、ユーザ
は希望の選択肢がシステム発話されると、この選択肢が
発話されている途中や発話直後に応答する場合が多いと
予測出来る。In such a dialogue system, when the user utters a desired option, the user can predict that the option is often answered during or immediately after the utterance.

【００１８】本発明は、この様なユーザの応答の特徴に
着目し、システム発話中のシナリオに含まれる単語や連
続する単語とこれに応答したユーザ発話との時間間隔、
更にはユーザの発話の強さや発話の速さなどのバージイ
ンの状況に対応させて認識辞書中の語彙に重み付けをし
て、ユーザが認識辞書のどの語彙を発話したかの認識
や、更にはユーザがどの選択肢を選択したか、指示した
かの認識を向上させることを特徴とする。The present invention pays attention to such a characteristic of the user's response, and the time interval between the word included in the scenario during the system utterance or the continuous word and the user utterance in response to the word,
Furthermore, the vocabulary in the recognition dictionary is weighted according to the barge-in situation such as the strength of the user's utterance and the speed of utterance, and the recognition of which vocabulary in the recognition dictionary the user uttered It is characterized by improving the recognition of which option has been selected and instructed.

【００１９】更に具体的には、請求項１の発明において
は、発話する複数の単語とこれら複数の単語の発話順と
を規定するシナリオを記憶する対話シナリオ部と語彙情
報を記憶した認識辞書と対話シナリオ部のシナリオに従
って音声を発話する音声出力部とユーザによって発話さ
れた音声を入力する音声入力部とに接続可能な音声対話
システムにおいて、この音声出力部がシナリオのいずれ
の箇所を発話しているかを検知する出力情報管理部と、
この出力情報管理部でユーザが発話した際のシナリオ箇
所を検知し、検知したシナリオ箇所に基づいて、認識辞
書に記憶されている語彙情報に重み付けを行なう重み付
け算出部と、音声入力部に入力された音声を信号処理し
た音声情報と重み付けされた語彙情報とからユーザによ
って発話された音声が認識辞書に記憶されている語彙情
報のいずれであるかを選択する音声認識部とを有するこ
とを特徴とする音声対話システムを要旨とした。More specifically, in the invention of claim 1, a dialogue scenario section for storing a scenario defining a plurality of words to be uttered and an utterance order of the plurality of words, and a recognition dictionary storing vocabulary information. In a voice dialogue system connectable to a voice output unit for uttering voice according to the scenario of the dialogue scenario unit and a voice input unit for inputting voice uttered by a user, this voice output unit utters any part of the scenario. An output information management unit that detects whether
The output information management unit detects a scenario portion when the user speaks, and based on the detected scenario portion, the weighting calculation unit that weights the vocabulary information stored in the recognition dictionary and the voice input unit are input. A voice recognition unit that selects which of the vocabulary information stored in the recognition dictionary the voice uttered by the user is based on, from the voice information obtained by signal processing the processed voice and the weighted vocabulary information, The main point is a spoken dialogue system.

【００２０】この様に、ユーザ発話時点に対応するシス
テム発話を出力情報管理部で検知出来る様に構成したの
で、ユーザ発話に含まれる可能性の高い単語が、このユ
ーザ発話よりも以前にシステム発話された単語との対応
によって分かり、このシステム発話された単語や連続す
る単語に重み付けを高くすることにより、認識率を向上
させる。In this way, the system utterance corresponding to the time of the user utterance can be detected by the output information management unit, so that a word that is highly likely to be included in the user utterance is detected before the user utterance. The recognition rate is improved by increasing the weighting of the words spoken by this system and the continuous words, which are found by the correspondence with the words that have been uttered.

【００２１】尚、上記の単語とは、例えば「天気予報」
という言葉を１つの単語として扱っても良く、また「天
気」と「予報」という２つの単語として扱っても良い。
また更に、「天気の予報」の場合には、この「天気の予
報」を１つの単語として扱っても良く、「天気」、
「の」、「予報」や「天気の」、「の予報」などを各々
単語として扱っても良い。The above-mentioned word is, for example, "weather forecast".
May be treated as one word, or may be treated as two words “weather” and “forecast”.
Furthermore, in the case of "weather forecast", this "weather forecast" may be treated as one word.
"No", "forecast", "weather", and "forecast" may be treated as words.

【００２２】更に、この請求項１の発明においては、シ
ステム発話のシナリオや認識辞書、更には音声入力部や
音声出力部に接続可能な音声対話システムを構成したの
で、この音声対話システムを目的に応じたシナリオやそ
のシナリオに好適な認識辞書、更にはこの音声対話シス
テムが使用される環境に応じた好適な音声入力部および
音声出力部とを容易に使用し得る。Further, according to the invention of claim 1, since a voice dialogue system connectable to a system utterance scenario, a recognition dictionary, and a voice input unit and a voice output unit is constructed, the purpose of this voice dialogue system is as follows. It is possible to easily use a corresponding scenario, a recognition dictionary suitable for the scenario, and a voice input unit and a voice output unit suitable for the environment in which the voice dialogue system is used.

【００２３】また、請求項２の発明においては、発話す
る複数の単語とこれら複数の単語の発話順とを規定する
シナリオを記憶する対話シナリオ部と、語彙情報を記憶
した認識辞書と、ユーザによって発話された音声を入力
する音声入力部と、対話シナリオ部のシナリオに従って
音声を発話する音声出力部と、この音声出力部がシナリ
オのいずれの箇所を発話しているかを検知する出力情報
管理部と、ユーザが発話した際の出力情報管理部で検知
したシナリオ箇所に基づいて、認識辞書に記憶されてい
る語彙情報に重み付けを行なう重み付け算出部と、音声
入力部に入力された音声を信号処理した音声情報と重み
付けされた語彙情報とからユーザによって発話された音
声が認識辞書に記憶されている語彙情報のいずれである
かを選択する音声認識部とを有することを特徴とする音
声対話システムを要旨とした。Further, in the invention of claim 2, a dialogue scenario section for storing a scenario defining a plurality of words to be uttered and an utterance order of the plurality of words, a recognition dictionary storing vocabulary information, and a user A voice input unit for inputting the spoken voice, a voice output unit for speaking the voice according to the scenario of the dialogue scenario unit, and an output information management unit for detecting which part of the scenario the voice output unit speaks. , A weighting calculation unit that weights vocabulary information stored in the recognition dictionary based on the scenario location detected by the output information management unit when the user speaks, and the voice input to the voice input unit is signal-processed. A voice for selecting which of the vocabulary information stored in the recognition dictionary is the voice uttered by the user, from the voice information and the weighted vocabulary information. The speech dialogue system and having a 識部 was gist.

【００２４】この様に、音声対話システムを構成したの
で、請求項１の発明と同様に、ユーザ発話時点の対応す
るシステム発話を出力情報管理部で検知出来る様に構成
したので、ユーザ発話に含まれる可能性の高い単語また
は連続する単語が、このユーザ発話よりも以前にシステ
ム発話された単語や連続する単語との対応によって分か
り、このシステム発話された単語や連続する単語に重み
付けを高くすることにより、認識率を向上させる。Since the voice dialogue system is configured in this manner, the system utterance corresponding to the user's utterance can be detected by the output information management unit as in the case of the invention of claim 1, so that the system is included in the user utterance. Highly probable words or consecutive words are identified by the correspondence with the system uttered words or consecutive words before this user utterance, and the system uttered words or consecutive words are weighted higher. This improves the recognition rate.

【００２５】更に、請求項１または請求項２に記載の重
み付けを、ユーザが発話する音声の強さ及び／または速
さにも基づいて行なわれる構成としても良い。Furthermore, the weighting described in claim 1 or 2 may be performed based on the strength and / or speed of the voice uttered by the user.

【００２６】この様に構成することによって、ユーザが
発話する状況に応じて、認識辞書中の語彙の重み付けを
行なえる。With this configuration, the vocabulary in the recognition dictionary can be weighted according to the situation in which the user speaks.

【００２７】更に、請求項１または請求項２に記載の重
み付けを、ユーザの発話時刻とシナリオに含まれる単語
の発話予定時刻とに基づいて行なう構成としても良い。Furthermore, the weighting according to claim 1 or 2 may be performed based on the utterance time of the user and the scheduled utterance time of the word included in the scenario.

【００２８】この様に構成することによって、ユーザ発
話に含まれると予測される単語をより正確に予測でき、
認識辞書中の語彙の適切な重み付けを行なえる。With this configuration, the word predicted to be included in the user's utterance can be predicted more accurately,
Appropriate weighting of vocabulary in the recognition dictionary can be performed.

【００２９】更に、請求項１または請求項２に記載の重
み付けを、ユーザの発話時刻とシナリオに含まれる単語
の発話予定時刻とに基づいて行なう構成とし、重み付け
算出部は、ユーザの発話時刻とシナリオに含まれる単語
の発話予定時刻との差に基づき認識辞書に含まれる単語
に付加する重み付け係数または重み付けの割合を記憶す
る重み係数管理部に記憶された重み付け係数または重み
付けの割合に基づき、認識辞書に記憶されている語彙情
報に重み付けを行なうことを特徴とする音声対話システ
ム（音声対話システムＡとする）を要旨としても良い。Furthermore, the weighting according to claim 1 or 2 is configured to be performed based on the user's utterance time and the scheduled utterance time of the word included in the scenario, and the weighting calculation unit determines the user's utterance time. Recognition based on the difference between the words included in the scenario and the scheduled time of speech Recognition of the weighting coefficient or weighting ratio stored in the weighting coefficient management unit that stores the weighting coefficient or the weighting ratio added to the words included in the dictionary A voice dialogue system (called a voice dialogue system A) characterized by weighting the vocabulary information stored in the dictionary may be the gist.

【００３０】この様に構成することによって、認識辞書
に記憶されている既にシステム発話された単語、更には
システム発話される予定の単語とユーザ発話時刻との時
間間隔が求められ、この時間間隔に基づいて、認識辞書
に記憶されている語彙情報に重み付けを行なえる。With this configuration, the time interval between the system uttered word already stored in the recognition dictionary and the word to be system uttered and the user uttered time is obtained, and the time interval is set to this time interval. Based on this, the vocabulary information stored in the recognition dictionary can be weighted.

【００３１】更に、請求項１または請求項２に記載の重
み付けを、ユーザの発話時刻とシナリオに含まれる単語
の発話予定時刻とに基づいて行なう構成とし、重み付け
算出部は、ユーザの発話時刻とシナリオに含まれる単語
の発話予定時刻との差に応じた、認識辞書に含まれる単
語に付加する重み付け係数または重み付けの割合を表す
テーブルを記憶する重みテーブル管理部に記憶された重
み付け係数または重み付けの割合に基づき、認識辞書に
記憶されている語彙情報に重み付けを行なうことを特徴
とする音声対話システム（音声対話システムＢとする）
を要旨としても良い。Furthermore, the weighting according to claim 1 or 2 is configured to be performed based on the utterance time of the user and the scheduled utterance time of the words included in the scenario, and the weighting calculation unit determines the utterance time of the user. A weighting coefficient or a weighting coefficient stored in a weight table management unit that stores a table showing a weighting coefficient or a weighting ratio to be added to the words included in the recognition dictionary according to the difference between the utterance scheduled time of the words included in the scenario Spoken dialogue system characterized by weighting the vocabulary information stored in the recognition dictionary based on the ratio (referred to as speech dialogue system B)
May be the gist.

【００３２】この様に認識辞書に含まれる単語に付加す
る重み付け係数または重み付けの割合をテーブルで記憶
する様に構成したので、重み付けの処理が容易に行なえ
る。Since the weighting coefficient or the weighting ratio to be added to the words included in the recognition dictionary is stored in the table as described above, the weighting process can be easily performed.

【００３３】更に、上記の音声対話システムＡまたは音
声対話システムＢにおいて、前記重み付け係数管理部ま
たは前記重みテーブル管理部に記憶された重み付け係数
または重み付けの割合と音声認識部が選択した選択結果
との対応を示す履歴情報を記憶する履歴管理部を有し、
この履歴情報に基づき重み付け係数または前記重み付け
の割合を変更することを特徴とする音声対話システムを
構成しても良い。Further, in the above-mentioned voice interaction system A or voice interaction system B, the weighting coefficient or the weighting ratio stored in the weighting coefficient management unit or the weight table management unit and the selection result selected by the voice recognition unit. It has a history management unit that stores history information indicating correspondence,
A voice dialogue system may be configured in which the weighting coefficient or the weighting ratio is changed based on the history information.

【００３４】この様に構成したので、多数のユーザの応
答に基づく履歴情報に基づいて、重み付け係数または重
み付けの割合を変更出来るのでより認識率の良い重み付
けが行なえる。With this configuration, since the weighting coefficient or the weighting ratio can be changed based on the history information based on the responses of a large number of users, weighting with a better recognition rate can be performed.

【００３５】更に、請求項３の発明においては、発話す
る複数の単語とこれら複数の単語の発話順とを規定する
シナリオを記憶する対話シナリオ部と語彙情報を記憶し
た認識辞書と対話シナリオ部のシナリオに従って音声を
発話する音声出力部とユーザによって発話された音声を
入力する音声入力部とに接続可能な音声対話システムに
おいて、音声出力部がシナリオのいずれの箇所を発話し
ているかを検知する出力情報管理部と、音声入力部に入
力された音声を信号処理した音声情報と認識辞書に記憶
されている語彙情報とからユーザによって発話された音
声が認識辞書に記憶されている語彙情報のいずれである
かを選択し、選択された語彙情報が指示語である場合
に、指示語が発話された時刻に対応して、指示語を音声
出力部から発話された前記シナリオ箇所に含まれる単語
または連続する単語に対応付けする音声認識部とを有す
ることを特徴とする音声対話システムを要旨とした。Further, in the invention of claim 3, a dialogue scenario part storing a plurality of words to be uttered and a scenario defining the utterance order of the plurality of words, a recognition dictionary storing vocabulary information, and a dialogue scenario part. An output that detects which part of the scenario the voice output unit is speaking in a voice interaction system that can be connected to a voice output unit that speaks a voice according to a scenario and a voice input unit that inputs a voice uttered by a user Either the information management unit or the vocabulary information stored in the recognition dictionary for the voice uttered by the user from the voice information obtained by signal processing the voice input to the voice input unit and the vocabulary information stored in the recognition dictionary. Select whether or not, and if the selected vocabulary information is a vocabulary, the vocabulary is uttered from the voice output unit at the time when the vocabulary is uttered. The speech dialogue system and having a voice recognition unit for association with the word words or continuous contained in the scenario portion was gist.

【００３６】この様に構成したので、ユーザが指示語を
発話しても、その指示語が指示する可能性の高いそのユ
ーザ発話時刻、またはユーザ発話時刻前にシステム発話
されたシナリオに含まれる単語が分かるので、指示語に
よるバージインが可能になる。尚、ここで指示語には、
例えば、「これ」、「あれ」、「それ」、「これら」、
「それら」などの言葉が含まれる。With this configuration, even if the user utters an instruction word, there is a high possibility that the instruction word indicates the user's utterance time, or a word included in the system uttered scenario before the user utterance time. Therefore, it is possible to perform barge-in using the directive. In addition, here the directive is
For example, "this", "that", "that", "these",
Includes words such as "these."

【００３７】更に、請求項４の発明においては、発話す
る複数の単語とこれら複数の単語の発話順とを規定する
シナリオを記憶する対話シナリオ部と語彙情報を記憶し
た認識辞書と対話シナリオ部のシナリオに従って音声を
発話する音声出力部とユーザによって発話された音声を
入力する音声入力部とに接続可能な音声対話システムに
おいて、この音声出力部が発話する単語に対応する波形
データを記憶する音声出力データ格納部と、この音声入
力部に入力された音声を信号処理した音声情報と認識辞
書に記憶されている語彙情報とからユーザによって発話
された音声が認識辞書に記憶されている語彙情報のいず
れであるかを選択し、選択された語彙情報が指示語に対
応する場合に、指示語をこの指示語の発話に対応した音
声出力部から発話され音声出力データ格納部に記憶され
ている波形データに対応付けする音声認識部とを有する
ことを特徴とする音声対話システムを要旨とした。Furthermore, in the invention of claim 4, a dialogue scenario part for storing a plurality of words to be uttered and a scenario defining the utterance order of the plurality of words, a recognition dictionary storing vocabulary information, and a dialogue scenario part. In a voice dialogue system connectable to a voice output unit for uttering voice according to a scenario and a voice input unit for inputting voice uttered by a user, a voice output for storing waveform data corresponding to a word uttered by the voice output unit One of the vocabulary information stored in the recognition dictionary, which is the voice uttered by the user, from the data storage unit, the voice information obtained by signal processing the voice input to the voice input unit, and the vocabulary information stored in the recognition dictionary. Is selected, and if the selected vocabulary information corresponds to a vocabulary, the vocabulary is uttered from the voice output unit corresponding to the utterance of this vocabulary. Re was summarized as speech dialogue system, characterized in that it comprises a voice recognition unit for association with the waveform data stored in the audio output data storage unit.

【００３８】この様に構成したので、このユーザ発話の
時点またはユーザ発話直前のシステム発話に含まれる単
語に対応する波形データが音声出力データ格納部に記憶
されているので、ユーザが指示語でシステム発話に応答
しても、この指示語が指示するシステム発話に含まれる
単語または連続する単語を認識出来る。With this configuration, since the waveform data corresponding to the word included in the system utterance at the time of the user's utterance or immediately before the user's utterance is stored in the voice output data storage unit, the user uses the instruction word as the system. Even if the user responds to the utterance, the word included in the system utterance designated by the directional word or the continuous word can be recognized.

【００３９】更に、請求項５の発明においては、シナリ
オに基づき発話を行なうステップと、ユーザの発話を入
力するステップと、ユーザの発話に対応したシナリオの
箇所を検知するステップと、ユーザの発話に対応させシ
ナリオの箇所に含まれる単語または連続する単語に重み
付けを行なうステップと、入力されたユーザの発話と重
み付けされた単語または連続する単語との対応をとり、
単語または連続する単語のいずれかを選択するステップ
と、選択された単語または連続する単語に基づき、シナ
リオに基づいて次の処理を行なうステップとを含む音声
対話システムの音声対話方法を要旨とした。Further, in the invention of claim 5, the step of uttering based on the scenario, the step of inputting the utterance of the user, the step of detecting the part of the scenario corresponding to the utterance of the user, and the step of uttering the user Correlate the step of weighting the words or consecutive words contained in the scenario, and the correspondence between the input user utterance and the weighted or consecutive words,
A speech dialogue method of a speech dialogue system including a step of selecting either a word or a continuous word and a step of performing next processing based on a scenario based on the selected word or the continuous word.

【００４０】この様に音声対話方法を構成したので、ユ
ーザ発話時点に対応してシナリオに含まれる単語または
連続する単語に重み付けを行なえるので、音声対話シス
テムの認識率の向上が図れる。Since the voice interaction method is configured as described above, the words included in the scenario or the continuous words can be weighted according to the time of the user's utterance, so that the recognition rate of the voice interaction system can be improved.

【００４１】上記した様に、システム発話中のユーザ発
話（以下、この様なユーザ発話をバージイン発話と称す
る）が行なわれると、バージイン発話とシステム発話と
の時間情報に基づき、認識語彙の重みの変化を求め、認
識辞書に記憶されている単語情報の重みを変更する。こ
の変更された重み付けされた認識辞書に基づいて、ユー
ザ発話の認識を行うので、バージイン発話の発生時刻と
シナリオに基づくシステム発話とのタイミングが異なっ
ていても、適切な重みを付加された認識辞書を使用して
ユーザ発話を認識することが出来るので、ユーザ発話の
認識率を向上させる。As described above, when the user utterance during the system utterance (hereinafter, such a user utterance is referred to as a barge-in utterance) is performed, the weight of the recognition vocabulary is determined based on the time information between the barge-in utterance and the system utterance. A change is obtained and the weight of the word information stored in the recognition dictionary is changed. Since the user's utterance is recognized based on this changed weighted recognition dictionary, even if the time of occurrence of the barge-in utterance and the timing of the system utterance based on the scenario are different, the recognition dictionary with appropriate weight is added. Since the user utterance can be recognized by using, the recognition rate of the user utterance is improved.

【００４２】また、認識辞書に「それ」などの指示語に
対応する指示語情報を認識語彙として記憶させておき、
バージイン発話に指示語によって選択肢を示す発話が含
まれている場合には、出力情報管理部によって、その指
示語の発話直前の選択肢に対応する語彙情報に基づき、
その語彙を認識結果とすることにより、指示語による選
択肢の選択を可能とする。Further, the vocabulary information corresponding to the demonstrative word such as "that" is stored in the recognition dictionary as the recognition vocabulary,
When the barge-in utterance includes an utterance indicating an option by the directive, the output information management unit uses the vocabulary information corresponding to the option immediately before the utterance of the directive,
By using the vocabulary as the recognition result, it is possible to select the option by the vocabulary.

【００４３】更に、システム発話の波形データを音声出
力データ格納部に保存し、指示語によるバージイン発話
があった際に、このバージイン発話直前にシステム発話
された単語を検知出来、その単語を指示語によって選択
された選択肢と認識する。Furthermore, the waveform data of the system utterance is saved in the voice output data storage unit, and when a barge-in utterance is made by the directive, the system uttered word immediately before the barge-in utterance can be detected, and that word is designated as the directive word. Recognize as the option selected by.

【００４４】[0044]

【発明の実施の形態】（第１実施例）図１から図６迄を
参照して第１実施例を説明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS (First Embodiment) A first embodiment will be described with reference to FIGS.

【００４５】まず、本発明の第１実施例の構成例を示す
図１を参照して、音声対話システム１０の概要を説明す
る。対話制御部１４は、マイクロコンピュータや制御プ
ログラム等を格納したＲＯＭ（ＲｅａｄＯｎｌｙＭ
ｅｍｏｒｙ）、処理データの格納等のためのＲＡＭ（Ｒ
ａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、その他記
憶装置や処理回路等から構成され、音声対話システム１
０の主たる制御等を行なうものである。この対話制御部
１４に接続された対話シナリオ部１２には、この音声対
話システム１０のシステム発話を記述するシナリオを記
憶した対話シナリオ部１２が接続され、このシナリオに
基づいてユーザへのシステム発話が出力情報管理部１６
を介して音声出力部１８から出力される。この出力情報
管理部１６は、音声出力部１８からシステム発話される
音声出力がシナリオのいずれの箇所であるかを監視して
いる。First, an outline of the voice dialogue system 10 will be described with reference to FIG. 1 showing an example of the configuration of the first embodiment of the present invention. The dialogue control unit 14 is a ROM (Read Only M) that stores a microcomputer, a control program, and the like.
memory), a RAM (R) for storing processed data, etc.
and voice access system 1 including other storage devices, processing circuits, and the like.
The main control of 0 is performed. The dialogue scenario unit 12 connected to the dialogue control unit 14 is connected to the dialogue scenario unit 12 that stores a scenario describing the system utterance of the voice dialogue system 10, and the system utterance to the user is generated based on this scenario. Output information management unit 16
Is output from the audio output unit 18 via. The output information management unit 16 monitors which part of the scenario is the voice output of the system output from the voice output unit 18.

【００４６】音声対話システム１０のユーザは、このシ
ステム発話に応答して、ユーザ発話し、発話された音声
は音声入力部２０に入力され、Ａ／Ｄ変換などの信号処
理を受けて、対話制御部１４に送られ、本実施例では対
話制御部１４において更に特徴抽出等の信号処理を受け
て音声情報となり、音声認識部２１に送られる。この音
声認識部２１において、音声情報は、重み算出部２４に
よって重みを付加された認識辞書２２に記憶されている
語彙情報と、パターンマッチング等の周知の手法によっ
て比較処理され、どの語彙に相当するか認識される。In response to the system utterance, the user of the voice dialogue system 10 utters the user's voice, and the uttered voice is input to the voice input unit 20 and undergoes signal processing such as A / D conversion to control the dialogue. It is sent to the unit 14, and in the present embodiment, the dialogue control unit 14 further receives signal processing such as feature extraction to obtain voice information, which is sent to the voice recognition unit 21. In the voice recognition unit 21, the voice information is compared with the vocabulary information stored in the recognition dictionary 22 weighted by the weight calculation unit 24 by a known method such as pattern matching, and corresponds to which vocabulary. Is recognized.

【００４７】尚、図１において、鎖線で囲った音声対話
システム基本部１１は、本発明の主要部であり、音声対
話システム１０の利用目的に応じた対話シナリオ部１
２、認識辞書２２、音声出力部１８、音声入力部２０を
使用することによって、より目的に応じた音声対話シス
テムを構成出来る。In FIG. 1, the voice dialogue system basic portion 11 enclosed by a chain line is the main portion of the present invention, and the dialogue scenario portion 1 according to the purpose of use of the voice dialogue system 10.
2. By using the recognition dictionary 22, the voice output unit 18, and the voice input unit 20, it is possible to configure a voice dialogue system more suited to the purpose.

【００４８】つぎに、本発明の特徴であるユーザがバー
ジイン発話した場合のユーザ発話の認識について、詳し
く説明する。Next, the recognition of the user's utterance when the user makes a barge-in utterance, which is a feature of the present invention, will be described in detail.

【００４９】図２は、対話シナリオ部１２に記憶されて
いる対話シナリオ３０の例と、この対話シナリオ３０を
システム発話として音声出力した場合に、シナリオ箇所
が音声出力される際の発話タイミングを示すシステム発
話４０と、このシステム発話４０に対応してユーザが選
択肢を選ぶために発話したタイミングを示すユーザ発話
４５とを示した図である。FIG. 2 shows an example of the dialogue scenario 30 stored in the dialogue scenario section 12 and the utterance timing when the scenario part is voice-outputted when the dialogue scenario 30 is voice-output as a system utterance. FIG. 4 is a diagram showing a system utterance 40 and a user utterance 45 indicating a timing at which the user uttered to select an option corresponding to the system utterance 40.

【００５０】この対話シナリオ３０の例では、「ご利用
になりたいサービスを選択して下さい。１天気予報、２
スポーツニュース、３今日の運勢。」というシナリオの
一部を示しており、選択肢となるキーワードの前後、例
えば「１天気予報」の前後に区切り記号３６「／」を付
してキーワードであることを示している。In the example of this dialogue scenario 30, "Please select the service you want to use. 1 weather forecast, 2
Sports news, fortunes today. , A part of the scenario, and before and after a keyword that is an option, for example, before and after “1 weather forecast”, a delimiter 36 “/” is added to indicate that the keyword is a keyword.

【００５１】この対話シナリオ３０は、対話制御部１４
によって読みだされ、出力情報管理部１６でシナリオの
いずれの箇所がシステム発話されたかが検出され、音声
出力部１８からシステム発話４０として発話される。This dialogue scenario 30 includes the dialogue control unit 14
Then, the output information management unit 16 detects which part of the scenario has been uttered by the system, and the voice output unit 18 utters the system utterance 40.

【００５２】図２の例では、対話シナリオ３０を構成す
る「・・を選択して下さい。」３１、「１天気予報」
３２、「２スポーツニュース」３３、「３今日の運
勢」３４の各々のシナリオ箇所が、矢印で示した様に各
々時間区間４１、４２、４３、および４４に対応して、
システム発話されることを示している。そして、システ
ム発話４０の時間区間４３で「２スポーツニュース」
と発話した時に、ユーザは「スポーツニュース」とのユ
ーザ発話４７をユーザ発話時点４６を始点としてバージ
インしたことを示している。In the example of FIG. 2, "Please select .." 31 and "1 weather forecast" that constitute the dialogue scenario 30.
32, "2 sports news" 33, "3 today's fortune" 34, respectively, correspond to the time sections 41, 42, 43, and 44, as indicated by the arrows.
It indicates that the system is spoken. Then, in the time section 43 of the system utterance 40, "2 sports news"
Indicates that the user has barged in the user utterance 47 “sports news” starting from the user utterance time point 46.

【００５３】この図２に示したシステム発話４０にユー
ザがユーザ発話４５に示す様なバージインをして応答し
た場合に、音声対話システム１０がユーザが選択した選
択肢を認識するための処理フローを図３および認識辞書
の語彙とその語彙の重みの対応例を示す図４、重みを変
更した認識辞書の例を示す図５、バージイン時刻に伴う
重みの変化例を示す図６を参照して説明する。When the user responds to the system utterance 40 shown in FIG. 2 with a barge-in as shown by the user utterance 45, the voice interactive system 10 recognizes the processing flow for recognizing the option selected by the user. 3 and FIG. 4 showing an example of correspondence between the vocabulary of the recognition dictionary and the weight of the vocabulary, FIG. 5 showing an example of the recognition dictionary in which the weight is changed, and FIG. 6 showing an example of change of the weight with barge-in time. .

【００５４】図３は、本発明の音声対話システム１０の
処理フローの１例を示す図であり、図２に示す様に、ユ
ーザがバージイン発話で応答した際に、そのバージイン
のタイミングに応じたシナリオ箇所を出力情報管理部１
６で検知してユーザの応答に含まれる単語等を推測し、
認識辞書２２に記憶されているその推測された単語等の
重み付けを変更して、認識率を向上させる処理を行なう
ものである。FIG. 3 is a diagram showing an example of the processing flow of the voice dialogue system 10 of the present invention. As shown in FIG. 2, when the user responds with a barge-in utterance, the timing of the barge-in corresponds. Output scenario part Information management unit 1
Detected in 6, guess the word etc. contained in the user's response,
The weighting of the inferred word or the like stored in the recognition dictionary 22 is changed to perform the process of improving the recognition rate.

【００５５】ステップ１００で処理を開始する。対話シ
ナリオ部１２に記憶されているシナリオの各キーワード
は、区切り記号３６「／」で区切られて記述されており
（図２参照）、対話制御部１４を介して出力情報管理部
１６に順次キーワードが送信されると、この区切り記号
３６が何番目の区切り記号３６であるかが計数される。
従って、音声出力部１８からのシステム発話は、何番目
の区切り記号３６に対応するキーワードかが、出力情報
管理部１６で監視されつつ実行される（ステップ１０
２）。この初期の状態では、認識辞書２２の語彙には図
４に示す様に、記憶されている各語彙には同じ重み（本
実施例では１．０）が付加されて記憶されている。The process starts at step 100. Each keyword of the scenario stored in the dialogue scenario unit 12 is described by being separated by the delimiter 36 “/” (see FIG. 2), and the keywords are sequentially provided to the output information management unit 16 via the dialogue control unit 14. Is transmitted, the number of the delimiter 36 is counted.
Therefore, the system utterance from the voice output unit 18 is executed while the output information management unit 16 monitors the number of the delimiter symbol corresponding to the keyword (step 10).
2). In this initial state, the vocabulary of the recognition dictionary 22 is stored with the same weight (1.0 in this embodiment) added to each stored vocabulary, as shown in FIG.

【００５６】上記のシステム発話中にユーザが選択肢の
一つを選ぶバージイン発話を始めると、このバージイン
されたユーザの音声は、音声入力部２０に入力され、対
話制御部１４を介してバージイン開始信号として出力情
報管理部１６に送信される（ステップ１０４）。When the user starts barge-in utterance to select one of the options during the above-mentioned system utterance, the barged-in user's voice is input to the voice input unit 20 and the barge-in start signal is input via the dialogue control unit 14. Is transmitted to the output information management unit 16 (step 104).

【００５７】このバージイン開始信号に基づき、出力情
報管理部１６はシステム発話を停止し、更にシナリオの
どの箇所をシステム発話していたかを検出する。例え
ば、図２に示す様に、シナリオ箇所３３の「スポーツニ
ュース」をシステム発話するで示したシステム発話箇
所４３中に、ユーザが▼印で示したバージイン発話開始
時点４６でユーザ発話４７「スポーツニュース」とバー
ジインを行なった場合に、このユーザのバージインは、
対話シナリオ３０の区切り記号３６−２と３６−３の間
に発生したことが、出力情報管理部１６によって検知さ
れる。ここで、対話シナリオのどの箇所でバージインさ
れたかをバージイン属性と称する。バージインが発生す
ると出力情報管理部１６は、バージイン属性を対話制御
部１４を介して、重み算出部２４にそのバージイン属性
を送信する（図３のステップ１０６）。Based on this barge-in start signal, the output information management unit 16 stops the system utterance and further detects which part of the scenario the system uttered. For example, as shown in FIG. 2, the user utterance 47 “Sports news” is displayed at the barge-in utterance start time point 46 indicated by the user in the system utterance portion 43 indicated by the system utterance of “sports news” in the scenario portion 33. If you do a barge-in with this user, the barge-in for this user is
The occurrence between the delimiters 36-2 and 36-3 of the dialogue scenario 30 is detected by the output information management unit 16. Here, where in the dialogue scenario the barge-in is performed is referred to as a barge-in attribute. When the barge-in occurs, the output information management unit 16 transmits the barge-in attribute to the weight calculation unit 24 via the dialogue control unit 14 (step 106 in FIG. 3).

【００５８】重み算出部２４では、受信したバージイン
属性からバージインされたシナリオ箇所およびこのバー
ジイン前後の各々のシナリオ箇所に含まれている各々の
キーワードが対話シナリオ３０に基づき検出される。こ
の検出された各々のキーワードに対して、例えば、バー
ジインされたシナリオ箇所に含まれるキーワードには最
も大きな重みを、バージイン以前にシステム発話された
シナリオ箇所については、バージインされたシナリオ箇
所に近いシナリオ箇所に含まれるキーワード程、大きな
重みを付ける。そして、バージインされた時刻以降にシ
ステム発話される予定であったシナリオ箇所に含まれる
キーワードには、システム発話済のキーワードに付加さ
れる重みよりも小さな重みを付加する（図３のステップ
１０８）。In the weight calculation section 24, the keyword part included in each of the scenario parts before and after the barge-in based on the received barge-in attribute and each scenario part before and after the barge-in is detected based on the dialogue scenario 30. For each of the detected keywords, for example, the keyword included in the barged-in scenario part is given the highest weight, and the scenario part uttered by the system before the barge-in is a scenario part close to the barge-in scenario part. The keywords included in are weighted more. Then, the keywords included in the scenario portion that was scheduled to be system uttered after the barge-in time are weighted smaller than the weights added to the system uttered keywords (step 108 in FIG. 3).

【００５９】具体的な重み付けを、図２のユーザ発話時
点４６でバージインされた場合のこの各キーワードに対
応付けられた重みを例として図５に示す。この図５で、
語彙６１はシナリオ中のキーワードであり、この各々の
キーワードに対して重み６２が対応付けられている。こ
の図５に示す様に、ユーザがバージインしたユーザ発話
時点４６に対応するシステム発話４３中に含まれるキー
ワード「スポーツニュース」に最も大きな重み１．５
が、システム発話済のキーワード「天気予報」には、重
み１．２が、そして未だシステム発話されていないキー
ワード「今日の運勢」には、最も小さな重み０．８が付
されている。Specific weighting is shown in FIG. 5 as an example of weights associated with the respective keywords when barging in at the user utterance time point 46 in FIG. In this Figure 5,
The vocabulary 61 is a keyword in the scenario, and a weight 62 is associated with each keyword. As shown in FIG. 5, the largest weight 1.5 is given to the keyword “sports news” included in the system utterance 43 corresponding to the user utterance time point 46 when the user barges in.
However, the keyword "weather forecast" that has already been uttered by the system is given a weight of 1.2, and the keyword "today's fortune" that has not been uttered by the system is given a lowest weight of 0.8.

【００６０】この様に、重み算出部２４で算出されたキ
ーワード群に対する各々の重みを、認識辞書２２に記憶
されている各キーワードに対応する語彙に付加し、認識
辞書２２の語彙の重みを更新する（図３のステップ１１
０）。As described above, each weight for the keyword group calculated by the weight calculating section 24 is added to the vocabulary corresponding to each keyword stored in the recognition dictionary 22, and the vocabulary weight in the recognition dictionary 22 is updated. (Step 11 in FIG. 3)
0).

【００６１】このステップ１１０によって、ユーザがバ
ージインした状況（この場合は対話シナリオに対するバ
ージン時刻）に応じて、ユーザ発話に含まれるであろう
語彙、更にこの語彙の中でもよりユーザの発話に含まれ
るで有ろう語彙により大きな重み付けをした。この重み
付けが更新された認識辞書２２を使用して、ユーザ発話
を音声認識する（図３のステップ１１２）。ここで、音
声認識は、周知のダイナミックプログラミング・マッチ
ング法や隠れマルコフモデルによる方法やユーザ発話に
含まれる音声の特徴パラメータの特徴に基づく方法など
を用いることが可能である。By this step 110, the vocabulary that may be included in the user's utterance, and further included in the user's utterance among this vocabulary, may be included depending on the situation in which the user virginized (in this case, the virgin time for the dialogue scenario). Greater weight was given to the likely vocabulary. The recognition dictionary 22 with the updated weighting is used to perform voice recognition of the user's utterance (step 112 in FIG. 3). Here, for the voice recognition, a well-known dynamic programming matching method, a method using a hidden Markov model, a method based on the characteristics of the characteristic parameters of the speech included in the user's utterance, or the like can be used.

【００６２】ここで、この音声認識を簡明に説明するた
めに、ユーザ発話の音声から抽出された特徴パラメータ
Ｕと、予め記憶されている各々のキーワード（語彙）の
特徴パラメータＫ１、Ｋ２、・・・、Ｋｎとの比較を行
い、この｜Ｕ−Ｋｊ｜が最も小さなキーワードをユーザ
が発話したキーワードとする。ここで、上記した重みを
Ｗとすると、本実施例では、ユーザが発話したキーワー
ドは、｜Ｕ−Ｋｊ｜／Ｗの値が最も小さなものであると
認識する。Here, in order to explain this speech recognition in a simple manner, the characteristic parameter U extracted from the speech of the user's speech and the characteristic parameters K1, K2, ... Of each keyword (vocabulary) stored in advance. , Kn is compared, and the keyword having the smallest | U−Kj | is set as the keyword uttered by the user. Here, assuming that the above-mentioned weight is W, in the present embodiment, the keyword uttered by the user is recognized to have the smallest value of | U−Kj | / W.

【００６３】つぎに、対話シナリオ部１２に記録されて
いる対話シナリオに次にユーザに提供すべき対話シナリ
オが有るか否かを対話制御部１４によって判定し、続け
て提供する対話シナリオが有る場合（図３ステップ１１
４でＹＥＳの場合）には、ユーザが選択したキーワー
ド、つまり、認識された結果に基づき、次の対話シナリ
オの発話の処理に移る（図３ステップ１１６）。Next, the dialogue control unit 14 determines whether or not the dialogue scenario recorded in the dialogue scenario unit 12 has a dialogue scenario to be provided to the user next, and if there is a dialogue scenario to be provided subsequently. (Fig. 3, step 11
If YES in 4), based on the keyword selected by the user, that is, the recognized result, the process proceeds to the process of uttering the next dialogue scenario (step 116 in FIG. 3).

【００６４】一方、次の対話シナリオが無い場合には
（図３のステップ１１４でＮＯの場合）、選択された選
択肢に応じた処理、例えば「スポーツニュース」が選択
されていた場合には、スポーツニュースなどのコンテン
ツが、図示しないコンテンツ提示装置を介して、例えば
音声出力部１８などからユーザに提示される処理に引き
渡され、本処理フローは終了する（図３のステップ１１
８）。On the other hand, if there is no next dialogue scenario (NO in step 114 of FIG. 3), processing corresponding to the selected option, for example, if "sports news" has been selected, sports Content such as news is handed over to a process presented to the user from, for example, the audio output unit 18 via a content presentation device (not shown), and this process flow ends (step 11 in FIG. 3).
8).

【００６５】尚、上記図３に示した処理フローでは、ユ
ーザからの応答に対して、確認処理、例えば、ユーザの
応答の後に、認識した単語に対応して「スポーツニュー
スをご希望ですね」とのシステム発話処理を挿入しても
良い。この様な、確認処理を行なう場合には、図３の処
理フロー中のステップ１１２の後に確認処理ステップを
挿入することが好ましい。In the processing flow shown in FIG. 3, the response from the user is confirmed, for example, after the user's response, the word "sports news is desired" corresponding to the recognized word. The system utterance processing with may be inserted. When such confirmation processing is performed, it is preferable to insert a confirmation processing step after step 112 in the processing flow of FIG.

【００６６】もし、この確認ステップでユーザの応答に
基づく判定処理も行い、確認結果がＹＥＳの場合には、
ステップ１１４へ、ＮＯの場合には、「では、再度繰り
返しますので、ご希望の項目を選択願います」などのコ
メントをシステム発話した後に、ステップ１０２に戻
り、処理を続行することが好ましい。In this confirmation step, the determination process based on the user's response is also performed, and if the confirmation result is YES,
In the case of NO in step 114, it is preferable to return to step 102 and continue the processing after uttering a comment such as "I will repeat again, so please select the desired item".

【００６７】また、上記図５や図６の認識辞書中の語彙
として、「天気予報」、「スポーツニュース」、「今日
の運勢」などを各々登録しているが、これらの語彙を例
えば、「天気」、「予報」、「スポーツ」、「ニュー
ス」、「今日」、「運勢」などの様に、さらに細かく分
割して登録しても良い。この様に、細かく分割した場合
であっても、例えば、「天気」、「予報」には同じ重み
を付加することが好ましい。Further, "weather forecast", "sports news", "today's fortune", etc. are respectively registered as the vocabulary in the recognition dictionary of FIG. 5 or FIG. It may be registered in more detail by dividing into “weather”, “forecast”, “sports”, “news”, “today”, “fortune” and the like. Even in the case of fine division, it is preferable to add the same weight to, for example, “weather” and “forecast”.

【００６８】また、システム発話で「１天気予報」な
どの様に、数字を平行してシステム発話する場合にも、
上記と同じように、「１」も登録しておき、天気」、
「予報」には同じ重みを付加することが好ましい。When the system utters the system utterances with the numbers in parallel, such as "1 weather forecast",
In the same way as above, "1" is also registered, weather ",
It is preferable to add the same weight to the “forecast”.

【００６９】上記した処理フローの説明から分かる様
に、ユーザがバージインした時刻によって、図５に示し
た各キーワードに付加される重み付けは、変化する。こ
の各キーワードに付される重みが変化する様子を示した
のが図６である。図６で７２、７３、７４はバージイン
した時刻情報を示し、例えば記号で示した時刻情報７
２は、図２のシステム発話４０中のシステム発話箇所４
２中にバージインされたことを示している。このタイミ
ングでバージインされた事は、「天気予報」とのシステ
ム発話中にバージインされた場合であり、このキーワー
ド「天気予報」の重みを１．５と最も大きくし、このキ
ーワード以降にシステム発話される予定であったキーワ
ード「スポーツニュース」、「今日の運勢」等には、小
さな重み０．８を付加している。また、時刻情報７４の
場合には、システム発話が「今日の運勢」と発話された
際にバージインが有ったので、このキーワード「今日の
運勢」に最も大きな重み１．４を付加し、既に発話済の
キーワード「天気予報」、「スポーツニュース」には、
小さな重み１．０を付加している。ここで、時刻情報７
４の場合の重みを、「天気予報」０．９、「スポーツニ
ュース」１．０、「今日の運勢」１．５の様に、バージ
イン時点から時間経過の長いキーワードには、より小さ
な重みを付加する様にしても良い。As can be seen from the above description of the processing flow, the weighting added to each keyword shown in FIG. 5 changes depending on the time when the user barges in. FIG. 6 shows how the weight assigned to each keyword changes. In FIG. 6, reference numerals 72, 73, and 74 denote time information when barge-in is performed, for example, time information 7 indicated by a symbol.
2 is a system utterance point 4 in the system utterance 40 of FIG.
It is shown that it was barge-in in 2nd. Barging in at this timing is when barging in during system utterance with "weather forecast". The keyword "weather forecast" has the largest weight of 1.5, and system utterance is made after this keyword. A small weight of 0.8 is added to the keywords "sports news", "fortune of today", etc. Further, in the case of the time information 74, since there was a barge-in when the system utterance "Today's fortune" was uttered, the keyword "Today's fortune" was added with the largest weight of 1.4, and In the spoken keywords "weather forecast" and "sports news",
A small weight of 1.0 is added. Here, time information 7
The weight in the case of 4 is set to a smaller weight for a keyword having a long time elapsed from the time of barge-in, such as "weather forecast" 0.9, "sports news" 1.0, "today's fortune" 1.5. It may be added.

【００７０】上記した音声認識を行なう際の処理につい
て、バージインした際の時刻情報を基に、ユーザ発話に
含まれるキーワードを推定し、推定されたキーワードに
時刻情報と対応するシナリオ箇所との関係に基づき重み
を付加した。このバージインした際の時刻情報によって
重み付けを行なう以外に、バージインの状況を加味した
以下の様な重み付けも、ユーザ発話中のキーワードをよ
り正確に認識するために有効である。このバージインの
状況を加味した認識方法を図７、図８を参照して説明す
る。図７はユーザ発話の速さに基づく重み付けを示す図
であり、図８はユーザ発話の強さに基づく重み付けを示
す図である。Regarding the above-described processing when performing voice recognition, the keyword included in the user's utterance is estimated based on the time information at the time of barge-in, and the relationship between the estimated keyword and the scenario information corresponding to the time information. Weight is added based on this. In addition to weighting by the time information at the time of barge-in, the following weighting in consideration of the status of barge-in is also effective for more accurately recognizing the keyword spoken by the user. A recognition method considering the situation of the barge-in will be described with reference to FIGS. 7 and 8. FIG. 7 is a diagram showing weighting based on the speed of the user utterance, and FIG. 8 is a diagram showing weighting based on the strength of the user utterance.

【００７１】図７においてバージイン発話の時間波形８
０からバージイン発話継続時間Ｔを求める。このバージ
イン発話時点に対応する対話シナリオ箇所中に含まれる
キーワードについて、予め記憶されている発話時間Ｔｏ
とＴとの比を求め、この速さ比Ｒ８１に応じて、速さ比
と重み係数との対応８２から係数を求め、重みの初期値
８３にこの係数を掛けて変更後の重み８４を求める。こ
れら一連の処理は対話制御部１４、または重み算出部２
４で処理しても良く、または両方で分担処理しても良
い。In FIG. 7, time waveform 8 of barge-in utterance
The barge-in utterance duration T is calculated from 0. The utterance time To stored in advance for the keyword included in the dialogue scenario portion corresponding to the time of the barge-in utterance
And T, the coefficient is obtained from the correspondence 82 between the speed ratio and the weighting coefficient in accordance with the speed ratio R81, and the initial weight value 83 is multiplied by this coefficient to obtain the changed weight 84. . These series of processes are performed by the dialogue control unit 14 or the weight calculation unit 2
4 may be performed, or both may be shared.

【００７２】更に、速さ比と重み係数との対応８２はテ
ーブルとして対話制御部１４中または重み算出部２４ま
たは認識辞書２２中のいずれかに記憶されている。Further, the correspondence 82 between the speed ratio and the weighting coefficient is stored as a table in either the dialogue control unit 14, the weighting calculation unit 24 or the recognition dictionary 22.

【００７３】上記発話の速さの代わりに、図８ではユー
ザ発話の音響パワーを利用した場合であって、バージイ
ン発話の時間波形８０から周波数解析して得られるバー
ジイン発話のスペクトラム９１を求め、このスペクトラ
ム９１からバージイン発話音声のパワーＰを求める。そ
して、このバージイン発話時点に対応する対話シナリオ
箇所中に含まれるキーワードについて、予め記憶されて
いるパワーＰｏとＰとの比を求め、このパワー比９２に
応じて、パワー比と重み係数との対応９３から係数を求
め、重みの初期値９４にこの係数を掛けて変更後の重み
９５を求める。これら一連の処理は対話制御部１４、ま
たは重み算出部２４で処理しても良く、または両方で分
担処理しても良い。更に、パワー比と重み係数との対応
９３はテーブルとして対話制御部１４中または重み算出
部２４または認識辞書２２中のいずれかに記憶されてい
る。In place of the above-mentioned utterance speed, in FIG. 8, when the acoustic power of the user utterance is used, the spectrum 91 of the barge-in utterance obtained by frequency analysis is obtained from the time waveform 80 of the barge-in utterance. From the spectrum 91, the power P of the barge-in speech is obtained. Then, with respect to the keyword included in the dialogue scenario portion corresponding to this barge-in utterance time, the pre-stored ratio of power Po and P is obtained, and the correspondence between the power ratio and the weighting coefficient is obtained according to this power ratio 92. The coefficient is obtained from 93, and the initial value 94 of the weight is multiplied by this coefficient to obtain the changed weight 95. These series of processes may be processed by the dialogue control unit 14 or the weight calculation unit 24, or may be shared by both. Further, the correspondence 93 between the power ratio and the weight coefficient is stored as a table in either the dialogue control unit 14, the weight calculation unit 24, or the recognition dictionary 22.

【００７４】（第２実施例）本発明の第２実施例を図９
を参照して、音声対話システム２００を説明する。この
第２実施例を示す図９において、第１実施例の構成例を
示す図１中の構成部と同様の機能をもつものには、同じ
符号を付した。第１実施例の音声対話システム１０と異
なり、この音声対話システム２００では、重み算出部２
４−２に重み係数管理部２０１が接続され、重み算出部
２４−２が認識辞書２２に記憶されている語彙に付加す
る重み係数が、この重み係数管理部２０１に記憶されて
いる点である。(Second Embodiment) FIG. 9 shows a second embodiment of the present invention.
The voice dialogue system 200 will be described with reference to FIG. In FIG. 9 showing the second embodiment, components having the same functions as those of the components in FIG. 1 showing the configuration example of the first embodiment are designated by the same reference numerals. Unlike the voice dialogue system 10 of the first embodiment, in the voice dialogue system 200, the weight calculation unit 2
The weighting factor management unit 201 is connected to 4-2, and the weighting factor added to the vocabulary stored in the recognition dictionary 22 by the weighting calculation unit 24-2 is stored in the weighting factor management unit 201. .

【００７５】更に詳しくは、システム発話に対する応答
がユーザ発話されると、そのユーザ発話時点に対応し
て、対話シナリオの何れの箇所がシステム発話されたか
が、第1実施例と同様に出力情報管理部１６によって検
出される。そして、このバージイン時点のシステム発話
箇所に含まれているキーワードには、最も大きな重み
が、既にシステム発話された箇所に含まれているキーワ
ードには、そのバージイン時点との時間経過が大きなシ
ステム発話箇所に含まれるキーワード程、小さな重み
を、そして未発話のシステム発話箇所に含まれるキーワ
ードには更に小さな重みを付加する。ここで、この各重
みの比率またはこの各重みが重み係数管理部２０１にテ
ーブルとして記憶されている。More specifically, when the response to the system utterance is uttered by the user, which part of the dialogue scenario is uttered by the system corresponding to the user utterance time is output information management unit as in the first embodiment. Detected by 16. The keyword included in the system utterance at the time of this barge-in has the highest weight, and the keyword included in the system utterance at the time of the barge-in has a large time lapse with the barge-in time. The smaller the weight of the keyword included in the keyword, and the smaller the weight of the keyword included in the unspoken system utterance. Here, the ratio of each weight or each weight is stored as a table in the weight coefficient management unit 201.

【００７６】この重み係数管理部２０１にテーブルとし
て記憶されている重み係数テーブル例を図１０に示す。
この図１０に示す例では、バージイン時点のシナリオ箇
所に含まれるキーワードには、最も大きな重み１．８を
付加し、その１つ前のシナリオ箇所に含まれるキーワー
ドには重み１．５、更に１つ前のシナリオ箇所に含まれ
るキーワードには１．４とした。即ち、既にシステム発
話されているシナリオ箇所に含まれるキーワードは、バ
ージイン時点から時間経過が長いシナリオ箇所に含まれ
るキーワード程、小さな重みを対応させている。また、
バージイン時点以降に発話される予定であったシナリオ
箇所に含まれるキーワードには、０．８と小さな重みを
対応させる。FIG. 10 shows an example of a weighting coefficient table stored in the weighting coefficient management unit 201 as a table.
In the example shown in FIG. 10, the keyword included in the scenario portion at the time of barge-in is given the highest weight of 1.8, and the keyword included in the scenario portion immediately before the keyword has a weight of 1.5 and further 1. The keyword included in the previous scenario part was set to 1.4. That is, with respect to the keywords included in the scenario portion that has already been uttered by the system, the smaller weight is associated with the keyword included in the scenario portion whose time elapses from the time of barge-in. Also,
A small weight of 0.8 is associated with the keyword included in the scenario portion that was supposed to be uttered after the barge-in time.

【００７７】この様に、バージイン時点を基点として、
時間経過やシステム発話されたか否かに基づいて、ユー
ザが発話すると推測されるキーワードに付加する重みを
重み係数管理部２０１に記憶させ、バージイン時点を検
出して、このバージイン時点とシナリオ発話箇所との対
応付けと、対応する重みをキーワードに付加する処理を
重み算出部２４−２で行う。Thus, with the barge-in time as the base point,
The weighting factor management unit 201 stores the weight to be added to the keyword estimated to be uttered by the user based on the elapsed time and whether or not the system uttered, detects the barge-in time, and detects the barge-in time and the scenario uttered portion. And the process of adding the corresponding weight to the keyword are performed by the weight calculation unit 24-2.

【００７８】この様に、重みテーブルを構成したので、
キーワードに付加する重みの対応付けが簡単になり、処
理を迅速に行なうことが可能になる。Since the weight table is constructed in this way,
The weights added to the keywords are easily associated with each other, and the processing can be performed quickly.

【００７９】（第３実施例）次に、本発明の第３実施例
の構成例を示す図１１を参照して第３実施例を説明す
る。この図１１は、第３実施例の音声対話システム２２
０の概要を示すものである。本図においても、第１、第
２の実施例と同様の機能を有する構成部には同じ符号を
付して、説明を省略する。(Third Embodiment) Next, a third embodiment will be described with reference to FIG. 11 showing a configuration example of the third embodiment of the present invention. This FIG. 11 shows the voice dialogue system 22 of the third embodiment.
0 shows the outline of 0. Also in this figure, components having the same functions as those of the first and second embodiments are designated by the same reference numerals, and the description thereof will be omitted.

【００８０】この第３実施例の特徴は、対話履歴管理部
２２４を設けた点にあり、この対話履歴管理部２２４に
ユーザ発話を音声対話システム２２０が認識した結果の
正当率を経歴として記憶し、この正当率に基づいて、キ
ーワードに付与する重みを更新する様に構成したことで
ある。The feature of the third embodiment is that a dialogue history management unit 224 is provided, and the dialogue history management unit 224 stores the correct ratio of the result of recognition of the user utterance by the voice dialogue system 220 as a history. The weight given to the keyword is updated based on this correctness rate.

【００８１】システム発話に応じて、ユーザ発話がバー
ジインされ、このバージイン時点に対応したシナリオ箇
所に含まれるキーワードに、重みテーブル管理部２２２
に記憶されている重みを重み算出部２４−３で付与す
る。The user utterance is barged in according to the system utterance, and the weight table management unit 222 is added to the keyword included in the scenario portion corresponding to the barge-in time.
The weight calculation unit 24-3 gives the weights stored in the table.

【００８２】この重みテーブル管理部２２２に記憶され
ている重み係数のテーブルは、第２実施例の重み係数管
理部２０１に記憶されているテーブルと同様のものを使
用できる。The weight coefficient table stored in the weight table management section 222 may be the same as the table stored in the weight coefficient management section 201 of the second embodiment.

【００８３】次に、この重みテーブル管理部２２２に記
憶されている重みを対話履歴に基づいて、更新する処理
について以下に説明する。ユーザのバージイン発話があ
った場合に、そのバージイン時点に対応するシナリオ箇
所が出力情報管理部１６によって検知され、そのシナリ
オ箇所、その前後のシナリオ箇所等に含まれるキーワー
ドが検出される。そして、第２実施例に示したと同様
に、この各キーワードには、バージイン時点に発話され
ていたシナリオ箇所に含まれるキーワードには、最も大
きな重みを、既にシステム発話されているシナリオ箇所
に含まれるキーワードについては、このバージイン時点
までの時間経過の長いシナリオ箇所に含まれているキー
ワード程、小さな重みを付加する。この様にキーワード
と重みを対応づけ、対話履歴管理部２２４に記憶されて
いる管理テーブルの例を図１２に示す。Next, a process for updating the weight stored in the weight table management unit 222 based on the dialogue history will be described below. When the user makes a barge-in utterance, the output information management unit 16 detects the scenario part corresponding to the barge-in time, and detects the keywords included in the scenario part, the scenario parts before and after the scenario part, and the like. Then, as in the case of the second embodiment, the keywords included in the scenario portion that was uttered at the time of barge-in includes the highest weight in the scenario portion that has already been uttered by the system. With respect to keywords, the smaller weight is added to the keywords included in the scenario portion whose time elapses up to the time of barge-in. An example of the management table stored in the dialogue history management unit 224 by associating the keywords with the weights in this way is shown in FIG.

【００８４】この図１２に示すテーブルは、対話シナリ
オに５つのキーワード、ａ、ｂ、ｃ、ｄ、ｅが含まれて
いる場合を示し、バージイン時点に対応するシナリオ箇
所にはキーワードｃが含まれている場合を示す。The table shown in FIG. 12 shows a case where the dialogue scenario includes five keywords, a, b, c, d, and e, and the scenario portion corresponding to the time of barge-in includes the keyword c. Indicates the case.

【００８５】バージイン時点のシナリオ箇所に含まれる
キーワードｃには重み１．８、その１つ前のシナリオ箇
所に含まれるキーワードｂには重み１．５、キーワード
ａには１．４と言う様に、バージイン時点までの時間経
過が長いキーワード程、小さな重みが付加され、更に、
本例ではバージイン時点以降に発話される予定であった
キーワードｄ、ｃには重み０．８を付加することを示し
ている。そして、この様な重みを各キーワードに付加し
て、これらのキーワードと音声入力部２０から入力され
たユーザ発話の音声との比較による音声認識が音声認識
部２１−３で行なわれる。この音声認識を多数のユーザ
に対して行なった際に、それぞれの重みを付したキーワ
ードが正しく認識された比率である正当率２２８を認識
処理ごとに対話履歴として更新し、対話履歴管理部２２
４に記憶されている。The keyword c included in the scenario portion at the time of barge-in has a weight of 1.8, the keyword b included in the scenario portion immediately before that has a weight of 1.5, and the keyword a has a weight of 1.4. , The longer the time elapsed until the time of barge-in is, the smaller weight is added to the keyword.
In this example, a weight of 0.8 is added to the keywords d and c that were supposed to be uttered after the barge-in time. Then, such a weight is added to each keyword, and the voice recognition unit 21-3 performs voice recognition by comparing these keywords with the voice of the user utterance input from the voice input unit 20. When this speech recognition is performed on a large number of users, the correctness rate 228, which is the ratio of correctly recognizing the weighted keywords, is updated as a dialogue history for each recognition process, and the dialogue history management unit 22
It is stored in 4.

【００８６】この比較によって、重みの大きなキーワー
ドが必ずしもユーザ発話に含まれるキーワードと認識さ
れない場合がある。例えば、図２で示したシステム発話
４０では、「天気予報」、「スポーツニュース」、「今
日の運勢」と順次発話され、同図に示す様に、「スポー
ツニュース」とシステム発話があった時点でバージイン
発話があったとしても、もしユーザが、選択肢の選択に
迷ったり、また、高齢のユーザであったりした場合に
は、システム発話に対してユーザのバージイン発話はず
れることがあり得る。図１２のキーワードｂの正当率が
７％とになっているのは、この音声対話システム２２０
での多数回のユーザの応答の中には、ユーザのバージイ
ンのタイミングとシステム発話とが多少ずれる場合があ
ることを示している。By this comparison, a keyword having a large weight may not always be recognized as a keyword included in the user's utterance. For example, in the system utterance 40 shown in FIG. 2, “weather forecast”, “sports news”, and “fortune of today” are sequentially uttered, and as shown in FIG. Even if there is a barge-in utterance, the user's barge-in utterance may deviate from the system utterance if the user is confused about the choice or is an elderly user. The valid rate of the keyword b in FIG. 12 is 7% because it is the voice dialogue system 220.
It is shown that the user's barge-in timing and the system utterance may be slightly deviated from the user's response in many times.

【００８７】図１３に示した対話履歴管理部の管理テー
ブルの例を示す図では、図１２と同様に、５つのキーワ
ード（選択肢が５つ）の対話シナリオの場合で、初期の
重み２３４の付加の仕方も同様の例である。しかしなが
ら、多数回のユーザ応答の結果を示す正当率が、図１２
とは異なっており、バージイン時点のシナリオ箇所に含
まれるキーワードに最も大きな重みを付けた場合であっ
ても、その１つ前のキーワードｂの方が正答率２３６が
高くなっている例を示した。In the diagram showing an example of the management table of the dialog history management unit shown in FIG. 13, as in FIG. 12, in the case of a dialog scenario of five keywords (five choices), an initial weight 234 is added. The method is also similar. However, the correct rate showing the result of a large number of user responses is shown in FIG.
In this example, the correct answer rate 236 is higher for the keyword b immediately before the keyword b even when the keyword included in the scenario portion at the time of barge-in is given the highest weight. .

【００８８】例えば、キーワードｂが比較的短い単語で
例えば名所案内を意図して「名所」とのみシステム発話
されたり、紛らわしい単語で例えば風光明媚な場所の案
内を意図して「景勝地」との単語を使用したり、また
は、周辺騒音のためにシステム発話が聞き取り難くバー
ジインのタイミングがずれたり、上記したユーザが選択
肢の選択に迷った場合に生じる可能性がある。For example, if the keyword b is a relatively short word, the system utters only "famous place" for the purpose of, for example, sight-seeing, or if it is a confusing word, for example, "scenic spot" for the purpose of guiding a scenic place. This may occur when a word is used, or the system utterance is difficult to hear due to ambient noise and the barge-in timing is shifted, or when the above-mentioned user is lost in selecting an option.

【００８９】この様に、初期の重みと正当率とが図１２
の様には対応しない場合には、重みを変更して、変更後
の重み２３８に示す様に更新する。As described above, the initial weight and the correct rate are shown in FIG.
If it does not correspond to, the weight is changed and updated as indicated by the changed weight 238.

【００９０】この様にして、本実施例の音声対話システ
ム２２０では、キーワードに付加する重みを、対話経歴
に基づきより良い重みに変更出来るので音声認識率の向
上が可能になる。In this way, in the voice dialogue system 220 of this embodiment, the weight added to the keyword can be changed to a better weight based on the dialogue history, so that the voice recognition rate can be improved.

【００９１】（第４実施例）本発明の第４実施例の構成
例を示す図１４および指示語を含む応答の処理フローの
１例を示す図１５を参照して、ユーザ発話が「それ」な
どの指示語を含む応答をバージインした場合の音声対話
システム２４０を説明する。(Fourth Embodiment) Referring to FIG. 14 showing an example of the configuration of the fourth embodiment of the present invention and FIG. 15 showing an example of the processing flow of a response including a directive, the user utterance is “that”. The voice interaction system 240 in the case of barging in the response including the directives such as

【００９２】本実施例においても第１、２、３実施例と
同じ機能を有する構成部に対しては、同じ符号を付し、
説明を省略する。Also in this embodiment, the same reference numerals are given to the components having the same functions as in the first, second and third embodiments,
The description is omitted.

【００９３】本第４実施例と上記各実施例との主たる違
いは、音声入力部２０からの「それ」などの指示語を含
むユーザ発話音声が入力されても、その指示語によって
どの選択肢が選択されたかが、認識出来る事に特徴があ
る。ここで、指示語がユーザ発話され、音声入力部２０
を介して対話制御部１４−４に入力されると、この指示
語の特徴抽出が行なわれ、この特徴と標準パターンとの
比較が音声認識部２１−４において、行なわれ、指示語
が入力されたことが検知される。この各種の指示語の標
準パターン情報は、本実施例では認識辞書２２に記憶さ
れている。The main difference between the fourth embodiment and each of the above-described embodiments is that even if a user's uttered voice including an instruction word such as "that" is input from the voice input unit 20, which option is selected by the instruction word. The feature is that it can be recognized whether it is selected. Here, the instruction word is uttered by the user, and the voice input unit 20
When input to the dialogue control unit 14-4 via the, the feature extraction of this directive is performed, the feature is compared with the standard pattern in the voice recognition unit 21-4, and the directive is input. Is detected. The standard pattern information of these various directives is stored in the recognition dictionary 22 in this embodiment.

【００９４】次に、この指示語を含むユーザ発話で選択
肢が選択される際の音声対話システム２４０の詳細の処
理フローを以下に説明する。Next, a detailed processing flow of the voice interaction system 240 when an option is selected by the user's utterance including this directive is described below.

【００９５】音声対話システム２４０が稼働可能となる
と図１５のステップ３００で処理を開始する。本実施例
においてもシステム発話は図２で示した対話シナリオを
使用するものとして説明する。When the voice interactive system 240 becomes operable, the process starts in step 300 of FIG. Also in this embodiment, the system utterance will be described assuming that the dialogue scenario shown in FIG. 2 is used.

【００９６】第１実施例と同様に、対話シナリオ部１２
に記憶されている対話シナリオは、対話制御部１４−４
を介して出力情報管理部１６−４に順次キーワードが送
信されると、図２に示した区切り記号３６が何番目の区
切り記号３６であるかが計数される。従って、音声出力
部１８からのシステム発話は、何番目の区切り記号３６
に対応するキーワードかが、出力情報管理部１６−４で
監視されつつ実行される（図１５のステップ３０２）。Similar to the first embodiment, the dialogue scenario section 12
The dialogue scenario stored in the dialogue control unit 14-4.
When the keywords are sequentially transmitted to the output information management unit 16-4 via the, the number of the delimiter 36 shown in FIG. 2 is counted. Therefore, the system utterance from the voice output unit 18 is the order of the delimiter 36.
The keyword corresponding to is executed while being monitored by the output information management unit 16-4 (step 302 in FIG. 15).

【００９７】上記のシステム発話中に、図２のユーザ発
話時点４６において、ユーザが選択肢の一つを選ぶ「そ
れ」と言う指示語をバージイン発話すると、このバージ
インされたユーザの音声は、音声入力部２０に入力さ
れ、対話制御部１４−４で特徴抽出が行なわれ、この抽
出された特徴情報は音声認識部２１−４に入力され、予
め認識辞書２２に記憶されている各指示語の標準パター
ンとの比較が行なわれる。ユーザ発話が指示語であるこ
とが音声認識部２１−４で認識されると、この認識に基
づくバージイン信号が、選択指示語管理部２４２に送信
され、この選択指示語管理部２４２から指示語によるバ
ージインがなされたことを示す指示語バージイン信号が
対話制御部１４−４を介して、出力情報管理部１６−４
に送信される（図１５のステップ３０４）。During the above system utterance, at the user utterance time point 46 in FIG. 2, when the user utters a barge-in utterance indicating "it" to select one of the options, the barge-in user's voice is input by voice input. It is input to the unit 20, feature extraction is performed by the dialogue control unit 14-4, and the extracted feature information is input to the voice recognition unit 21-4 and is a standard of each directive stored in the recognition dictionary 22 in advance. The pattern is compared. When the voice recognition unit 21-4 recognizes that the user utterance is an instruction word, a barge-in signal based on this recognition is transmitted to the selection instruction word management unit 242, and the selection instruction word management unit 242 uses the instruction word. An instruction word barge-in signal indicating that the barge-in has been performed is output via the dialogue control unit 14-4 to the output information management unit 16-4.
(Step 304 in FIG. 15).

【００９８】この指示語バージイン信号に基づき、出力
情報管理部１６−４はシステム発話を停止し、更にシナ
リオのどの箇所をシステム発話していたかを検出する。
例えば、図２に示す様に、シナリオ箇所３３の「スポー
ツニュース」をシステム発話するで示したシステム発
話箇所４３中に、ユーザが▼印で示したバージイン発話
開始時点４６でユーザ発話「それ」と指示語でバージイ
ンを行なったので、このユーザのバージインは、対話シ
ナリオ３０の区切り記号３６−２と３６−３の間に発生
したこと（バージイン属性）が、出力情報管理部１６−
４によって検知される（図１５のステップ３０６）。Based on this directive word barge-in signal, the output information management unit 16-4 stops the system utterance and further detects which part of the scenario the system uttered.
For example, as shown in FIG. 2, the user utters "that" at the barge-in utterance start time point 46 indicated by the user in the system utterance portion 43 indicated by the system utterance of "sports news" in the scenario point 33. Since the barge-in was performed with the directive, the fact that the barge-in of this user occurred between the delimiters 36-2 and 36-3 of the dialogue scenario 30 (barge-in attribute) is the output information management unit 16-.
4 (step 306 in FIG. 15).

【００９９】そして、出力情報管理部１６−４は、この
検出したバージイン属性には「スポーツニュース」とい
うキーワードが含まれているので、ユーザが指示語で選
択したキーワードは「スポーツニュース」であると判断
し（図１５の３０８）、このキーワード「スポーツニュ
ース」に対応するキーワード情報を対話制御部１４−４
を介して、選択指示語管理部２４２に送信する。Since the detected barge-in attribute includes the keyword "sports news", the output information management unit 16-4 determines that the keyword selected by the user with the instruction word is "sports news". The determination is made (308 in FIG. 15), and the keyword information corresponding to this keyword “sports news” is set as the dialogue control unit 14-4.
Via the selected instruction word management unit 242.

【０１００】選択指示語管理部２４２では、送信されて
来たキーワード「スポーツニュース」に対応するキーワ
ード情報に基づき、指示語「それ」は「スポーツニュー
ス」を指示するものと判定し（図１５のステップ３１
０）、ユーザの選択したキーワードは「スポーツニュー
ス」であるとの認識結果を対話制御部１４−４に送信す
る。The selection instruction word management unit 242 determines that the instruction word "that" indicates "sports news" based on the keyword information corresponding to the transmitted keyword "sports news" (see FIG. 15). Step 31
0), the result of recognition that the keyword selected by the user is “sports news” is transmitted to the dialogue control unit 14-4.

【０１０１】対話制御部１４−４は、つぎに、対話シナ
リオ部１２に記録されている対話シナリオにユーザに次
に提供すべき対話シナリオが有るか否かを判定し、続け
て提供する対話シナリオが有る場合（図１５のステップ
３１２でＹＥＳの場合）には、ユーザが選択したキーワ
ード、つまり、認識された結果に基づき、次の対話シナ
リオの発話へ処理に移る（図１５のステップ３１４）。Next, the dialogue control unit 14-4 determines whether or not the dialogue scenario recorded in the dialogue scenario unit 12 includes a dialogue scenario to be provided to the user next, and the dialogue scenario to be provided subsequently. If there is any (YES in step 312 in FIG. 15), the process moves to the utterance of the next dialogue scenario based on the keyword selected by the user, that is, the recognized result (step 314 in FIG. 15).

【０１０２】一方、次の対話シナリオが無い場合には
（図１５のステップ３１５でＮＯの場合）、選択された
選択肢に応じた処理、例えば「スポーツニュース」が選
択されていた場合には、スポーツニュースなどのコンテ
ンツが、図示しないコンテンツ提示装置を介して、例え
ば音声出力部１８などからユーザに提示される処理に引
き渡され、本処理フローは終了する（ステップ３１
６）。On the other hand, if there is no next dialogue scenario (NO in step 315 of FIG. 15), processing corresponding to the selected option, for example, if “sports news” has been selected, sports The content such as news is delivered to the process presented to the user from, for example, the audio output unit 18 via the content presentation device (not shown), and the process flow ends (step 31).
6).

【０１０３】この様に、指示語でバージインして選択肢
を選択した場合であっても、ユーザ発話が指示語である
か否かを認識し、指示語である場合にバージインした時
点を検知し、このバージイン時点に対応するシナリオ箇
所に含まれるキーワードを検知する様に構成したので、
ユーザ発話が指示語であっても、ユーザの選択肢を認識
することが出来る。As described above, even when the option is selected by barge-in with the directive word, it is recognized whether or not the user's utterance is the directive word, and when it is the directive word, the time when the barge-in is detected is detected. Since it is configured to detect the keywords included in the scenario part corresponding to this barge-in time,
Even if the user's utterance is an instruction word, the user's choice can be recognized.

【０１０４】（第５実施例）本発明の第５実施例の構成
例を示す図１６および指示語を含む応答の処理フローの
他の例を示す図１７を参照して、ユーザ発話が「それ」
などの指示語を含む応答をバージインした場合の音声対
話システム２５０を説明する。(Fifth Embodiment) With reference to FIG. 16 showing a configuration example of a fifth embodiment of the present invention and FIG. 17 showing another example of the processing flow of a response including a directive, the user utterance is “that. "
The voice interaction system 250 in the case of barging in the response including the directives such as

【０１０５】本実施例においても第１、２、３、４実施
例と同じ機能を有する構成部に対しては、同じ符号を付
し、説明を省略する。Also in the present embodiment, the constituents having the same functions as those in the first, second, third, and fourth embodiments are designated by the same reference numerals, and the description thereof will be omitted.

【０１０６】本実施例の主たる特徴は、対話シナリオの
構成に関し、この対話システムを選択肢を区別するため
の区切り記号等を使用することなく構成されたシナリオ
を使用可能とした点であり、録音音声などを対話シナリ
オとして使用出来る。即ち、上記第１から第４の実施例
では、対話シナリオはシナリオ箇所を特定する区切り記
号「／」でシナリオ箇所を特定したが、本実施例では、
シナリオ箇所を特定する情報をシナリオに含めることな
く、バージインしたユーザ発話からユーザが選択肢を指
示するキーワードを認識することを可能とする。The main feature of the present embodiment is that, with regard to the construction of the dialogue scenario, the dialogue system can be used without using a delimiter or the like for distinguishing the choices. Can be used as a dialogue scenario. That is, in the first to fourth embodiments, the dialogue scenario specifies the scenario part with the delimiter "/" for specifying the scenario part. However, in the present embodiment,
It is possible to allow a user to recognize a keyword that indicates a choice from a uttered user uttered in a barge without including information for specifying a scenario part in the scenario.

【０１０７】本実施例と上記各実施例と主に異なる特徴
点の概要を述べる。図１６の音声対話システム２５０に
おいて、本実施例の対話シナリオ部１２−５には、例え
ば図２に示すような対話シナリオが録音記録されて収納
されており、この対話シナリオ部１２−５に録音記録さ
れている対話シナリオが対話制御部１４−５を介して音
声出力部１８−５からシステム発話されるとともに、こ
の発話される音声波形データが、音声出力データ格納部
２５２に順次、記憶されて行く。The outline of the characteristic points which are different from the present embodiment and the above-mentioned embodiments will be mainly described. In the voice dialogue system 250 of FIG. 16, the dialogue scenario section 12-5 of the present embodiment records and stores the dialogue scenario as shown in FIG. 2, for example. The dialogue scenario section 12-5 records the dialogue scenario. The recorded dialogue scenario is system-uttered from the voice output unit 18-5 via the dialogue control unit 14-5, and the uttered voice waveform data is sequentially stored in the voice output data storage unit 252. go.

【０１０８】次に、この様な特徴を有する音声対話シス
テム２５０が、システム発話によって複数の選択肢が発
話された際に、ユーザが選択肢中のいずれかを指示語に
よって選択の応答をする場合の処理フローを説明する。Next, when the voice dialogue system 250 having such characteristics has a plurality of choices uttered by the system utterance, the user makes a response to the selection of one of the choices by the instruction word. The flow will be described.

【０１０９】音声対話システム２５０が稼働可能となる
と図１７のステップ４００で処理を開始する。本実施例
においてもシステム発話の内容は図２で示した「天気予
報」、「スポーツニュース」、「今日の運勢」などの３
つの選択肢からいずれかを選択する対話シナリオを使用
するものとして説明する。When the voice interactive system 250 becomes operable, the process starts at step 400 in FIG. Also in this embodiment, the contents of the system utterance are 3 such as "weather forecast", "sports news", and "fortune of today" shown in FIG.
Described as using a dialogue scenario in which one of the two options is selected.

【０１１０】本実施例では、対話シナリオ部１２−５に
記憶されている対話シナリオは、対話制御部１４−５を
介して音声出力部１８−５からシステム発話される一
方、この順次発話されるシステム発話の音声波形データ
が音声出力データ格納部２５２に記憶される（図１７の
ステップ４０２）。In this embodiment, the dialogue scenarios stored in the dialogue scenario section 12-5 are uttered by the system from the voice output section 18-5 via the dialogue control section 14-5, and are sequentially uttered. The voice waveform data of the system utterance is stored in the voice output data storage unit 252 (step 402 in FIG. 17).

【０１１１】上記のシステム発話中に、図２のユーザ発
話時点４６において、ユーザが選択肢の一つを選ぶ「そ
れ」と言う指示語をバージイン発話すると、このバージ
インされたユーザの音声は、音声入力部２０に入力さ
れ、対話制御部１４−５で特徴抽出が行なわれ、この抽
出された特徴情報は音声認識部２１−５に入力され、予
め認識辞書２２に記憶されている各指示語の標準パター
ンとの比較が行なわれる。ユーザ発話が指示語であるこ
とが音声認識部２１−５で認識されると、この認識に基
づくバージイン信号が、選択指示語管理部２４２−５に
送信され、この選択指示語管理部２４２−５から指示語
によるバージインがなされたことを示す指示語バージイ
ン信号が対話制御部１４−５に送信され、音声出力部１
８−５からのシステム発話を停止する（図１７のステッ
プ４０６）。During the above-mentioned system utterance, at the user utterance time point 46 in FIG. 2, when the user barge-in utters an instruction word "that" to select one of the options, the barge-in user's voice is input by voice input. The dialogue control unit 14-5 inputs the feature information to the unit 20, and the extracted feature information is input to the voice recognition unit 21-5 and is a standard for each directive stored in the recognition dictionary 22 in advance. The pattern is compared. When the voice recognition unit 21-5 recognizes that the user utterance is an instruction word, a barge-in signal based on this recognition is transmitted to the selection instruction word management unit 242-5, and the selection instruction word management unit 242-5. From the instruction word barge-in signal indicating that the barge-in is performed by the instruction word from the dialogue control unit 14-5, and the voice output unit 1
The system utterance from 8-5 is stopped (step 406 in FIG. 17).

【０１１２】この指示語バージイン信号に基づき、対話
制御部１４−５は、音声出力データ格納部２５２に記録
されているバージイン時点以前の所定時間間隔内の音声
波形データを読み取り、音声認識部２１−５に送信す
る。この音声波形データと認識辞書２２に記憶されてい
る音声波形データとがパターン比較などの手法により音
声認識部２１−５で音声認識処理が行なわれる。（図１
７のステップ４０８）。Based on this directive word barge-in signal, the dialogue control unit 14-5 reads the voice waveform data within the predetermined time interval before the barge-in time recorded in the voice output data storage unit 252, and the voice recognition unit 21- Send to 5. The voice recognition unit 21-5 performs voice recognition processing on the voice waveform data and the voice waveform data stored in the recognition dictionary 22 by a method such as pattern comparison. (Fig. 1
7 step 408).

【０１１３】ここで、図２を参照して具体例を示すと、
シナリオ箇所３３の「スポーツニュース」をシステム発
話するで示したシステム発話箇所４３中に、ユーザが
▼印で示したバージイン発話開始時点４６でユーザ発話
「それ」と指示語でバージインを行なったとすれば、音
声認識部２１−５において音声波形データの比較によっ
て「スポーツニュース」をユーザが選択したことが認識
される（図１７のステップ４０８）。そしてこの認識結
果「スポーツニュース」を選択指示語管理部２４２−５
に送信される（図１７のステップ４１０）。Here, a specific example will be described with reference to FIG.
If the user performs a barge-in with the user utterance "that" at the barge-in utterance start point 46 indicated by a ▼ mark in the system utterance part 43 shown in System utterance of "sports news" of the scenario point 33 The voice recognition unit 21-5 recognizes that the user has selected "sports news" by comparing the voice waveform data (step 408 in FIG. 17). Then, the recognition result “sports news” is selected by the selection word management unit 242-5.
(Step 410 in FIG. 17).

【０１１４】そして、選択指示語管理部２４２−５は認
識結果「スポーツニュース」を対話制御部１４−５に送
信し、次の対話シナリオが有るか否かを対話制御部１４
−５で判定し、次のシナリオがある場合には（図１７の
ステップ４１２でＹＥＳの場合）、ステップ４１４に進
み、認識結果に基づき次の対話シナリオへ進む。Then, the selection instruction word management unit 242-5 transmits the recognition result "sports news" to the dialogue control unit 14-5, and the dialogue control unit 14-5 determines whether or not there is a next dialogue scenario.
If it is determined in -5 and there is a next scenario (YES in step 412 of FIG. 17), the process proceeds to step 414, and the next dialogue scenario is performed based on the recognition result.

【０１１５】一方、次の対話シナリオが無い場合には
（図１７のステップ４１２でＮＯの場合）、選択された
選択肢に応じた処理、例えば「スポーツニュース」が選
択されていた場合には、スポーツニュースなどのコンテ
ンツが、図示しないコンテンツ提示装置を介して、例え
ば音声出力部１８−５などからユーザに提示される処理
に引き渡され、本処理フローは終了する（図１７のステ
ップ４１６）。On the other hand, if there is no next dialogue scenario (NO in step 412 of FIG. 17), a process corresponding to the selected option, for example, if "sports news" has been selected, sports Content such as news is delivered to the processing presented to the user from, for example, the audio output unit 18-5 via the content presentation device (not shown), and the processing flow ends (step 416 in FIG. 17).

【０１１６】（第６実施例）本発明を適用した携帯電話
の概略を示す図１８を参照して、第６実施例を説明す
る。この第６実施例の携帯電話５００は、サービスプロ
バイダやサービス提供者（例えば、役所などの公共施
設）や商品販売会社などと通信を行い、ユーザが所望の
サービスを音声応答により受ける機能を有したものであ
る。この携帯電話５００の本体５１０中に図１に示す音
声対話システム基本部１１に対応する機能部が収納さ
れ、音声出力部であるスピーカ５２０、音声入力部であ
るマイク５３０は、キーボード５３０や表示パネル５４
０を配置した携帯電話５００の前面部に配置されてい
る。図１の対話シナリオ部１２及び認識辞書２２に相当
する機能は、アンテナ５５０を介して、上記サービスプ
ロバイダなどの情報提供側に設置してある。(Sixth Embodiment) A sixth embodiment will be described with reference to FIG. 18 showing an outline of a portable telephone to which the present invention is applied. The mobile phone 500 of the sixth embodiment has a function of communicating with a service provider, a service provider (for example, a public facility such as a public office), a product sales company, etc., and receiving a desired service by a voice response from a user. It is a thing. A functional unit corresponding to the voice interaction system basic unit 11 shown in FIG. 1 is housed in the main body 510 of the mobile phone 500, and the speaker 520 as a voice output unit, the microphone 530 as a voice input unit, the keyboard 530 and the display panel. 54
0 is arranged on the front surface of the mobile phone 500. Functions corresponding to the dialogue scenario unit 12 and the recognition dictionary 22 in FIG. 1 are installed on the information providing side such as the service provider via the antenna 550.

【０１１７】この様に、通信端末である携帯電話５００
に本発明を適用したので、ユーザは随時、所望のサービ
スを音声対話することによって享受できる効果がある。In this way, the mobile phone 500 which is a communication terminal
Since the present invention is applied to the above, there is an effect that the user can enjoy a desired service by voice conversation at any time.

【０１１８】尚、上記携帯電話５００と同様に、ＰＤＡ
（ＰｅｒｓｏｎａｌＤｉｇｉｔａｌＡｓｓｉｓｔａ
ｎｔ）に当音声対話システムを搭載しても良い。Note that, like the mobile phone 500, the PDA
(Personal Digital Assistant
nt) may be equipped with the voice dialogue system.

【０１１９】（付記１）発話する複数の単語と該複数の
単語の発話順とを規定するシナリオを記憶する対話シナ
リオ部と語彙情報を記憶した認識辞書と前記対話シナリ
オ部のシナリオに従って音声を発話する音声出力部とユ
ーザによって発話された音声を入力する音声入力部とに
接続可能な音声対話システムにおいて、前記音声出力部
が前記シナリオのいずれの箇所を発話しているかを検知
する出力情報管理部と、前記ユーザが発話した際の前記
出力情報管理部で検知したシナリオ箇所に基づいて、前
記認識辞書に記憶されている語彙情報に重み付けを行な
う重み付け算出部と、前記音声入力部に入力された音声
を信号処理した音声情報と前記重み付けされた語彙情報
とから前記ユーザによって発話された音声が前記認識辞
書に記憶されている語彙情報のいずれであるかを選択す
る音声認識部と、を有することを特徴とする音声対話シ
ステム。(Supplementary Note 1) A dialogue scenario part for storing a plurality of words to be uttered and a scenario defining the utterance order of the plurality of words, a recognition dictionary for storing vocabulary information, and a voice utterance according to the scenario of the dialogue scenario part. In a voice dialogue system connectable to a voice output unit for inputting and a voice input unit for inputting a voice uttered by a user, an output information management unit for detecting which part of the scenario the voice output unit is speaking. And a weighting calculation unit for weighting the vocabulary information stored in the recognition dictionary based on the scenario location detected by the output information management unit when the user speaks, and the voice input unit. A voice uttered by the user is stored in the recognition dictionary based on the voice information obtained by signal processing the voice and the weighted vocabulary information. Speech dialogue system, characterized in that it comprises a voice recognition unit for selecting which of the lexical information.

【０１２０】（付記２）発話する複数の単語と該複数の
単語の発話順とを規定するシナリオを記憶する対話シナ
リオ部と、語彙情報を記憶した認識辞書と、ユーザによ
って発話された音声を入力する音声入力部と、前記対話
シナリオ部のシナリオに従って音声を発話する音声出力
部と、前記音声出力部が前記シナリオのいずれの箇所を
発話しているかを検知する出力情報管理部と、前記ユー
ザが発話した際の前記出力情報管理部で検知したシナリ
オ箇所に基づいて、前記認識辞書に記憶されている語彙
情報に重み付けを行なう重み付け算出部と、前記音声入
力部に入力された音声を信号処理した音声情報と前記重
み付けされた語彙情報とから前記ユーザによって発話さ
れた音声が前記認識辞書に記憶されている語彙情報のい
ずれであるかを選択する音声認識部と、を有することを
特徴とする音声対話システム。(Supplementary Note 2) A dialogue scenario section that stores a plurality of words to be uttered and a scenario that defines the utterance order of the plurality of words, a recognition dictionary that stores vocabulary information, and a voice uttered by a user are input. A voice input unit, a voice output unit that speaks a voice according to the scenario of the dialogue scenario unit, an output information management unit that detects which part of the scenario the voice output unit is speaking, and the user Based on the scenario location detected by the output information management unit at the time of utterance, a weighting calculation unit for weighting the vocabulary information stored in the recognition dictionary, and signal processing of the voice input to the voice input unit. From the voice information and the weighted vocabulary information, it is selected whether the voice uttered by the user is the vocabulary information stored in the recognition dictionary. Speech dialogue system, characterized in that it comprises a voice recognition unit for, a.

【０１２１】（付記３）前記重み付けは、前記ユーザが
発話する音声の強さ及び／または速さにも基づいて行な
われることを特徴とする付記１または付記２に記載の音
声対話システム。(Supplementary Note 3) The voice interaction system according to Supplementary Note 1 or Supplementary Note 2, wherein the weighting is performed based on the strength and / or speed of the voice uttered by the user.

【０１２２】（付記４）前記重み付けは、前記ユーザの
発話時刻と前記シナリオに含まれる単語の発話予定時刻
とに基づいて行なわれることを特徴とする付記１または
付記２に記載の音声対話システム。(Supplementary note 4) The voice dialogue system according to Supplementary note 1 or Supplementary note 2, wherein the weighting is performed based on the speech time of the user and the scheduled speech time of the word included in the scenario.

【０１２３】（付記５）前記重み付け算出部は、前記ユ
ーザの発話時刻と前記シナリオに含まれる単語の発話予
定時刻との差に基づき前記認識辞書に含まれる単語に付
加する前記重み付け係数または前記重み付けの割合を記
憶する重み係数管理部に記憶された前記重み付け係数ま
たは前記重み付けの割合に基づき、前記認識辞書に記憶
されている語彙情報に重み付けを行なうことを特徴とす
る付記４に記載の音声対話システム。(Supplementary Note 5) The weighting calculation unit adds the weighting coefficient or the weighting factor to the word included in the recognition dictionary based on the difference between the utterance time of the user and the utterance scheduled time of the word included in the scenario. The vocabulary information stored in the recognition dictionary is weighted on the basis of the weighting coefficient or the weighting ratio stored in the weighting coefficient management unit that stores the ratio. system.

【０１２４】（付記６）前記重み付け算出部は、前記ユ
ーザの発話時刻と前記シナリオに含まれる単語の発話予
定時刻との差に応じた、前記認識辞書に含まれる単語に
付加する前記重み付け係数または前記重み付けの割合を
表すテーブルを記憶する重みテーブル管理部に記憶され
た前記重み付け係数または前記重み付けの割合に基づ
き、前記認識辞書に記憶されている語彙情報に重み付け
を行なうことを特徴とする付記４に記載の音声対話シス
テム。(Supplementary Note 6) The weighting calculation unit adds the weighting coefficient to the word included in the recognition dictionary according to the difference between the utterance time of the user and the scheduled utterance time of the word included in the scenario, or Note 4 is characterized in that the vocabulary information stored in the recognition dictionary is weighted based on the weighting coefficient or the weighting ratio stored in a weight table management unit that stores a table representing the weighting ratio. Spoken dialogue system described in.

【０１２５】（付記７）付記５または付記６に記載の音
声対話システムは、前記重み付け係数管理部または前記
重みテーブル管理部に記憶された前記重み付け係数また
は前記重み付けの割合と前記音声認識部が選択した選択
結果との対応を示す履歴情報を記憶する履歴管理部を有
し、前記履歴情報に基づき前記重み付け係数または前記
重み付けの割合を変更することを特徴とする音声対話シ
ステム。(Supplementary Note 7) In the voice dialogue system according to Supplementary Note 5 or Supplementary Note 6, the weighting coefficient or the weighting ratio stored in the weighting coefficient management unit or the weight table management unit and the voice recognition unit are selected. A voice interaction system, comprising: a history management unit that stores history information indicating a correspondence with the selected result, and changing the weighting coefficient or the weighting ratio based on the history information.

【０１２６】（付記８）発話する複数の単語と該複数の
単語の発話順とを規定するシナリオを記憶する対話シナ
リオ部と語彙情報を記憶した認識辞書と前記対話シナリ
オ部のシナリオに従って音声を発話する音声出力部とユ
ーザによって発話された音声を入力する音声入力部とに
接続可能な音声対話システムにおいて、前記音声出力部
が前記シナリオのいずれの箇所を発話しているかを検知
する出力情報管理部と、前記音声入力部に入力された音
声を信号処理した音声情報と前記認識辞書に記憶されて
いる語彙情報とから前記ユーザによって発話された音声
が前記認識辞書に記憶されている語彙情報のいずれであ
るかを選択し、前記選択された語彙情報が指示語である
場合に、前記指示語は前記指示語が発話された時刻に対
応して、前記音声出力部から発話された前記シナリオ箇
所に含まれる前記単語または前記連続する単語に対応付
けする音声認識部と、を有することを特徴とする音声対
話システム。(Supplementary Note 8) A dialogue scenario part for storing a plurality of words to be uttered and a scenario defining the utterance order of the plurality of words, a recognition dictionary storing vocabulary information, and a voice utterance according to the scenario of the dialogue scenario part. In a voice dialogue system connectable to a voice output unit for inputting and a voice input unit for inputting a voice uttered by a user, an output information management unit for detecting which part of the scenario the voice output unit is speaking. Of the vocabulary information stored in the recognition dictionary, the voice uttered by the user from the voice information obtained by signal processing the voice input to the voice input unit and the vocabulary information stored in the recognition dictionary. Is selected, and when the selected vocabulary information is a reference word, the reference word corresponds to the time when the reference word is uttered, and Speech dialogue system, characterized in that it comprises a voice recognition unit for association with words that the word or the continuous contained in the scenario locations uttered from the force unit.

【０１２７】（付記９）発話する複数の単語と該複数の
単語の発話順とを規定するシナリオを記憶する対話シナ
リオ部と語彙情報を記憶した認識辞書と前記対話シナリ
オ部のシナリオに従って音声を発話する音声出力部とユ
ーザによって発話された音声を入力する音声入力部とに
接続可能な音声対話システムにおいて、前記音声出力部
が発話する単語に対応する波形データを記憶する音声出
力データ格納部と、前記音声入力部に入力された音声を
信号処理した音声情報と前記認識辞書に記憶されている
語彙情報とから前記ユーザによって発話された音声が前
記認識辞書に記憶されている語彙情報のいずれであるか
を選択し、前記選択された語彙情報が指示語に対応する
場合に、前記指示語を前記指示語の発話に対応した前記
音声出力部から発話され前記音声出力データ格納部に記
憶されている波形データに対応付けする音声認識部と、
を有することを特徴とする音声対話システム。(Supplementary Note 9) A dialogue scenario part for storing a plurality of words to be uttered and a scenario defining the utterance order of the plurality of words, a recognition dictionary storing vocabulary information, and a voice utterance according to the scenario of the dialogue scenario part. A voice output system and a voice input system for inputting a voice uttered by a user, in the voice dialogue system, a voice output data storage unit for storing waveform data corresponding to a word uttered by the voice output unit, The voice uttered by the user from the voice information obtained by signal-processing the voice input to the voice input unit and the vocabulary information stored in the recognition dictionary is any of the vocabulary information stored in the recognition dictionary. Is selected, and if the selected vocabulary information corresponds to a reference word, the reference word is output from the voice output unit corresponding to the utterance of the reference word. A voice recognition unit for association with the waveform data stored in the audio output data storage unit is,
A spoken dialogue system comprising:

【０１２８】（付記１０）シナリオに基づき発話を行な
うステップと、ユーザの発話を入力するステップと、前
記ユーザの発話に対応した前記シナリオの箇所を検知す
るステップと、前記ユーザの発話に対応させ前記シナリ
オの箇所に含まれる単語または連続する単語に重み付け
を行なうステップと、前記入力されたユーザの発話と前
記重み付けされた単語または連続する単語との対応をと
り、前記単語または連続する単語のいずれかを選択する
ステップと、選択された単語または連続する単語に基づ
き、前記シナリオに基づいて次の処理を行なうステップ
とを含むことを特徴とする音声対話システムの音声対話
方法。(Supplementary Note 10) A step of uttering based on a scenario, a step of inputting a utterance of a user, a step of detecting a part of the scenario corresponding to the utterance of the user, and a step of correlating with the utterance of the user A step of weighting a word included in a part of a scenario or a continuous word, and a correspondence between the input user's utterance and the weighted word or a continuous word, and either the word or the continuous word; And a step of performing the following processing based on the scenario based on the selected word or continuous words, a voice interaction method for a voice interaction system.

【０１２９】（付記１１）発話する複数の単語と該複数
の単語の発話順とを規定するシナリオを記憶する対話シ
ナリオ部と語彙情報を記憶した認識辞書とを基地局を経
由して接続可能な通信端末装置であって、ユーザによっ
て発話された音声を入力する音声入力部と、前記対話シ
ナリオ部のシナリオに従って音声を発話する音声出力部
と、前記音声出力部が前記シナリオのいずれの箇所を発
話しているかを検知する出力情報管理部と、前記ユーザ
が発話した際の前記出力情報管理部で検知したシナリオ
箇所に基づいて、前記認識辞書に記憶されている語彙情
報に重み付けを行なう重み付け算出部と、前記音声入力
部に入力された音声を信号処理した音声情報と前記重み
付けされた語彙情報とから前記ユーザによって発話され
た音声が前記認識辞書に記憶されている語彙情報のいず
れであるかを選択する音声認識部と、を有することを特
徴とする音声対話システム。(Supplementary Note 11) A dialogue scenario section that stores a plurality of words to be uttered and a scenario that defines the utterance order of the plurality of words and a recognition dictionary that stores vocabulary information can be connected via a base station. A communication terminal device, wherein a voice input unit for inputting a voice uttered by a user, a voice output unit for uttering voice according to the scenario of the dialogue scenario unit, and a voice output unit for uttering any part of the scenario An output information management unit that detects whether or not the vocabulary information is stored, and a weighting calculation unit that weights the vocabulary information stored in the recognition dictionary based on the scenario location detected by the output information management unit when the user speaks. And the voice uttered by the user is recognized from the voice information obtained by signal-processing the voice input to the voice input unit and the weighted vocabulary information. Speech dialogue system, characterized in that it comprises a voice recognition unit for selecting which of the lexical information stored in the book, the.

【０１３０】[0130]

【発明の効果】システム発話のシナリオを監視し、ユー
ザの発話があった時点でのシステム発話の箇所を検出す
ることによって、システム発話中に応答するユーザ発話
に含まれるキーワードを予測し、ユーザ発話時点と対応
するシステム発話との時間関係から予測されたキーワー
ドに重みを付けて、ユーザ発話中のキーワードを認識す
るので、ユーザ発話中のキーワードの認識をより正確に
行なえる。更に、指示語によるユーザの応答において
も、シナリオとの対応付けから指示語に対応するキーワ
ードを正確に認識出来る。EFFECT OF THE INVENTION By monitoring the system utterance scenario and detecting the location of the system utterance at the time when the user utters, the keyword included in the user utterance responding during the system utterance is predicted, and the user utterance Since the keyword predicted from the time relationship between the time point and the corresponding system utterance is weighted to recognize the keyword being uttered by the user, the keyword being uttered by the user can be recognized more accurately. Furthermore, even in the response of the user by the directive, the keyword corresponding to the directive can be accurately recognized from the association with the scenario.

[Brief description of drawings]

【図１】本発明の第１実施例の構成例を示す図。FIG. 1 is a diagram showing a configuration example of a first embodiment of the present invention.

【図２】対話シナリオ、システム発話とユーザ発話との
関係を示す図。FIG. 2 is a diagram showing a relationship between a dialogue scenario, a system utterance, and a user utterance.

【図３】音声対話システムの処理フローの１例を示す
図。FIG. 3 is a diagram showing an example of a processing flow of a voice dialogue system.

【図４】認識辞書の語彙とその語彙の重みの対応例を示
す図。FIG. 4 is a diagram showing an example of correspondence between a vocabulary of a recognition dictionary and a weight of the vocabulary.

【図５】重みを変更した認識辞書の例を示す図。FIG. 5 is a diagram showing an example of a recognition dictionary in which weights are changed.

【図６】時間経過に伴って重みを変える重みテーブルの
例を示す図。FIG. 6 is a diagram showing an example of a weight table that changes weights over time.

【図７】ユーザ発話の速さに基づく重み付けを示す図。FIG. 7 is a diagram showing weighting based on the speed of user speech.

【図８】ユーザ発話の強さに基づく重み付けを示す図。FIG. 8 is a diagram showing weighting based on the strength of a user's utterance.

【図９】本発明の第２実施例の構成例を示す図。FIG. 9 is a diagram showing a configuration example of a second embodiment of the present invention.

【図１０】重み係数テーブルの例を示す図。FIG. 10 is a diagram showing an example of a weighting coefficient table.

【図１１】本発明の第３実施例の構成例を示す図。FIG. 11 is a diagram showing a configuration example of a third embodiment of the present invention.

【図１２】対話履歴管理部の管理テーブルの例を示す
図。FIG. 12 is a diagram showing an example of a management table of a dialogue history management unit.

【図１３】対話履歴管理部の管理テーブルの例を示す
図。FIG. 13 is a diagram showing an example of a management table of a dialogue history management unit.

【図１４】本発明の第４実施例の構成例を示す図。FIG. 14 is a diagram showing a configuration example of a fourth embodiment of the present invention.

【図１５】指示語を含む応答の処理フローの１例を示す
図。FIG. 15 is a diagram showing an example of a processing flow of a response including a directive.

【図１６】本発明の第５実施例の構成例を示す図。FIG. 16 is a diagram showing a configuration example of a fifth embodiment of the present invention.

【図１７】指示語を含む応答の処理フローの他の例を示
す図。FIG. 17 is a diagram showing another example of a processing flow of a response including a directive.

【図１８】本発明を適用した携帯電話の概略を示す図。FIG. 18 is a diagram showing an outline of a mobile phone to which the present invention has been applied.

[Explanation of symbols]

１０音声対話システム１１音声対話システム基本部１２対話シナリオ部１４対話制御部１６出力情報管理部１８音声出力部２０音声入力部２１音声認識部２２認識辞書２４重み算出部３０対話シナリオ４０システム発話４５ユーザ発話２００音声対話システム２２０音声対話システム２２２重みテーブル管理部２２４対話履歴管理部２４０音声対話システム２４２選択指示語管理部２５０音声対話システム２５２音声出力データ格納部５００携帯電話 10 Spoken dialogue system 11 Spoken dialogue system basics 12 Dialog scenario section 14 Dialog control unit 16 Output information management section 18 Audio output section 20 Voice input section 21 Speech recognition unit 22 Recognition dictionary 24 Weight calculator 30 Dialogue scenario 40 system utterance 45 User utterance 200 Spoken dialogue system 220 Spoken dialogue system 222 Weight table management unit 224 Dialog History Management Department 240 Spoken dialogue system 242 Selection Directive Management Section 250 Spoken dialogue system 252 voice output data storage 500 mobile phones

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｇ１０Ｌ 3/00 Ｒ (72)発明者松本達郎神奈川県川崎市中原区上小田中４丁目１番１号富士通株式会社内 (72)発明者山田茂神奈川県川崎市中原区上小田中４丁目１番１号富士通株式会社内Ｆターム(参考） 5D015 HH05 HH13 LL06 LL10 5D045 AB04 AB30 ─────────────────────────────────────────────────── ─── Continuation of the front page (51) Int.Cl. ⁷ Identification code FI Theme Coat (reference) G10L 3/00 R (72) Inventor Tatsuro Matsumoto 4-1, 1-1 Ueodachu, Nakahara-ku, Kawasaki-shi, Kanagawa Fujitsu Incorporated (72) Inventor Shigeru Yamada 4-1-1 Kamiodanaka, Nakahara-ku, Kawasaki-shi, Kanagawa F-term (reference) within Fujitsu Limited 5D015 HH05 HH13 LL06 LL10 5D045 AB04 AB30

Claims

[Claims]

1. A dialogue scenario section for storing a plurality of words to be uttered and a scenario defining a utterance order of the plurality of words, a recognition dictionary storing vocabulary information, and a voice for uttering a voice according to the scenario of the dialogue scenario section. In a voice dialogue system connectable to an output unit and a voice input unit for inputting a voice uttered by a user, an output information management unit that detects which part of the scenario the voice output unit is speaking, A weighting calculation unit for weighting the vocabulary information stored in the recognition dictionary based on a scenario location detected by the output information management unit when the user speaks; and a voice input to the voice input unit. The vocabulary information in which the speech uttered by the user is stored in the recognition dictionary from the signal-processed voice information and the weighted vocabulary information. Speech dialogue system and having a speech recognition unit for selecting which of.

2. A dialogue scenario section for storing a plurality of words to be uttered and a scenario defining a utterance order of the plurality of words, a recognition dictionary for storing vocabulary information, and a voice for inputting a voice uttered by a user. An input unit, a voice output unit that speaks a voice according to the scenario of the dialogue scenario unit, an output information management unit that detects which part of the scenario the voice output unit is speaking, and the user speaks At this time, based on the scenario location detected by the output information management unit, a weighting calculation unit for weighting the vocabulary information stored in the recognition dictionary, and voice information obtained by signal processing the voice input to the voice input unit. And which of the vocabulary information stored in the recognition dictionary the voice uttered by the user is selected from and the weighted vocabulary information. Speech dialogue system and having a, a voice recognition unit.

3. A dialogue scenario section for storing a scenario defining a plurality of words to be spoken and a speech order of the plurality of words, a recognition dictionary storing vocabulary information, and a voice for uttering a voice according to the scenario of the dialogue scenario section. In a voice dialogue system connectable to an output unit and a voice input unit for inputting a voice uttered by a user, an output information management unit that detects which part of the scenario the voice output unit is speaking, The voice uttered by the user from the voice information obtained by signal-processing the voice input to the voice input unit and the vocabulary information stored in the recognition dictionary is any of the vocabulary information stored in the recognition dictionary. And the selected vocabulary information is a directive word, the directive word corresponds to the time when the directive word is uttered by the voice output unit. Speech dialogue system, characterized in that it comprises a voice recognition unit for association with words that the word or the continuous contained in the scenario points that have been spoken, the.

4. A dialogue scenario section for storing a plurality of words to be uttered and a scenario defining a utterance order of the plurality of words, a recognition dictionary storing vocabulary information, and a voice for uttering a voice in accordance with the scenario of the dialogue scenario section. In a voice interaction system connectable to an output unit and a voice input unit for inputting a voice uttered by a user, a voice output data storage unit for storing waveform data corresponding to a word uttered by the voice output unit; Whether the voice uttered by the user is the vocabulary information stored in the recognition dictionary from the voice information obtained by signal-processing the voice input to the input unit and the vocabulary information stored in the recognition dictionary. If the selected vocabulary information corresponds to a vocabulary, the vocabulary is not uttered from the voice output unit corresponding to the utterance of the vocabulary. Speech dialogue system, characterized in that it comprises a voice recognition unit for association with the waveform data stored in the audio output data storage unit.

5. A utterance based on a scenario, a step of receiving a utterance of a user, a step of detecting a portion of the scenario corresponding to the utterance of the user, and a step of correlating the utterance of the user with the utterance of the user. A step of weighting a word included in the place of or a continuous word, and the correspondence between the input user's utterance and the weighted word or a continuous word, and either the word or the continuous word A voice interaction method for a voice interaction system, comprising: a step of selecting; and a step of performing the following processing based on the scenario based on the selected word or continuous words.