JPH10124087A

JPH10124087A - Voice interactive device and interactive method

Info

Publication number: JPH10124087A
Application number: JP8295896A
Authority: JP
Inventors: Kazuya Nomura; 和也野村
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1996-10-18
Filing date: 1996-10-18
Publication date: 1998-05-15
Anticipated expiration: 2016-10-18
Also published as: JP3755941B2

Abstract

PROBLEM TO BE SOLVED: To retrieve an object item by changing the order of generations by providing a voice signal accumulating means, voice recognizing through changing the order of inputted voice signals and narrowing the number of voice recognition objects against the word previously generated. SOLUTION: An interactive control section 105 instructs an input voice control section 101 and an input voice accumulating section 102 to accumulate inputted voice signals. Then, the section 101 outputs the inputted voice signal to the section 102 by the instruction and the section 102 starts to accumulate the inputted voice signals. Then, the vocabulary of the recognition objects corresponding to the words generated previously is reduced employing the voice recognition result of the words generated later and voice recognition is conducted. Thus, the flow of interactive operations, in which main information is inputted first and then, auxiliary information is inputted employing a voice recognition section having same performance, is established and an object item is retrieved by changing the order of generation.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、音声認識技術と音
声合成技術を用いた音声対話装置及び対話方法に関する
ものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a spoken dialogue apparatus and a spoken dialogue method using a speech recognition technique and a speech synthesis technique.

【０００２】[0002]

【従来の技術】人との音声対話が可能な装置において、
選択しようとするある目的の項目を含む集団に含まれて
いる項目の数が音声認識部の処理能力を超えるような場
合、目的の項目名を音声入力する前に、予め目的の項目
を含む部分集団を表す言葉を入力して、検索の対象をそ
の集団に特定し、音声認識の対象となる単語数を絞るこ
とが必要である。2. Description of the Related Art In a device capable of voice communication with a person,
If the number of items included in the group containing a certain target item to be selected exceeds the processing capability of the speech recognition unit, a portion including the target item is input before the target item name is input by voice. It is necessary to input words representing a group, specify a search target in the group, and narrow down the number of words to be subjected to speech recognition.

【０００３】例えば、音声対話機能を備えたカーナビゲ
ーション装置において実現されている音声対話を用いた
目的地設定のための項目検索機能を用いてゴルフ場を検
索する場合、検索の対象となるゴルフ場の項目数の総和
が日本全国で２０００施設あり、また音声認識部の最大
処理能力が１００単語であるとすると、日本全国のゴル
フ場名を音声認識対象として一度に検索することは不可
能である。For example, when a golf course is searched using an item search function for setting a destination using a voice dialog realized in a car navigation apparatus having a voice dialog function, a golf course to be searched is used. If the total number of items is 2,000 in Japan and the maximum processing capacity of the voice recognition unit is 100 words, it is impossible to search all golf course names in Japan as voice recognition targets at once. .

【０００４】そこで、県毎にカテゴリ分けした場合、各
県毎の施設数が１００以内になるとすると、使用者に対
し目的のゴルフ場名を入力させる前に県名を入力させ、
音声認識対象を県毎に絞り込んでから目的の施設名を発
声させることにより、全項目数が音声認識部の最大処理
能力を超える場合でも、その全項目の中から目的の施設
名を検索することが可能となる。[0004] If the number of facilities in each prefecture is less than 100 when the categories are classified by prefecture, the user is required to input the name of the prefecture before inputting the target golf course name.
By narrowing down the voice recognition targets for each prefecture and then uttering the target facility name, even if the total number of items exceeds the maximum processing capacity of the voice recognition unit, search for the target facility name from all the items Becomes possible.

【０００５】従来、このような音声対話装置としては、
例えば、図７及び図８に示すようなものがあった。図７
は従来の音声対話装置の構成を示すブロック図、図８は
図７に示す音声対話装置による音声対話の流れを示すフ
ローチャートである。[0005] Conventionally, as such a speech dialogue device,
For example, there was one as shown in FIGS. FIG.
FIG. 8 is a block diagram showing a configuration of a conventional voice interaction device, and FIG. 8 is a flowchart showing a flow of voice interaction by the voice interaction device shown in FIG.

【０００６】まず、図７を参照して、従来の音声対話装
置の構成について説明する。図７において、３０３は音
声信号を入力し、入力音声信号を分析して特徴パラメー
タを求める音響分析部、３０４は対話制御部３０５の指
令により入力音声信号を分析して得られた特徴パラメー
タと音声認識辞書とを照合して音声認識を行う音声認識
部、３０５は音声対話を制御する対話制御部、３０６は
使用者の操作及び音声認識の結果に基づいた音声対話の
流れの情報を格納する対話制御用情報格納部である。First, the configuration of a conventional voice interaction apparatus will be described with reference to FIG. In FIG. 7, reference numeral 303 denotes a sound analysis unit for inputting a voice signal and analyzing the input voice signal to obtain a characteristic parameter; A voice recognition unit that performs voice recognition by collating with a recognition dictionary, a dialog control unit 305 that controls voice dialogue, and a dialogue 306 that stores information on the flow of the voice dialogue based on the operation of the user and the result of voice recognition. This is a control information storage unit.

【０００７】また、３０７は音声認識に用いられる辞書
を格納する音声認識辞書格納部、３０８は対話制御部３
０５の指令により音声認識辞書格納部３０７に格納され
ている辞書から音声認識に用いる辞書を選択する辞書選
択部、３０９は対話制御部３０５の指令により、メッセ
ージ辞書格納部３１０に格納されているメッセージの中
から使用者に対して音声により提示すべきメッセージを
選択するメッセージ選択部、３１０は使用者に対して提
示するメッセージを格納するメッセージ辞書格納部であ
る。Reference numeral 307 denotes a speech recognition dictionary storage unit for storing a dictionary used for speech recognition, and reference numeral 308 denotes a dialogue control unit 3.
The dictionary selection unit 309 selects a dictionary to be used for speech recognition from the dictionaries stored in the speech recognition dictionary storage unit 307 in accordance with the instruction of 05, and the message stored in the message dictionary storage unit 310 in accordance with the instruction of the dialog control unit 305. Is a message selection unit for selecting a message to be presented to the user by voice from the list, and 310 is a message dictionary storage unit for storing a message to be presented to the user.

【０００８】次に、図７及び図８を参照して、上記従来
の音声対話装置の動作について説明する。なお、以下に
示す対話の流れは図８を参照し、音声認識の対話に使用
する辞書の内容は図４乃至図６を参照する。図４は音声
対話装置において検索項目のジャンルを音声認識するた
めの音声認識辞書の内容を示す図、図５は音声対話装置
においてゴルフ場のある県名を音声認識するための音声
認識辞書の内容を示す図、図６は音声対話装置において
静岡県のゴルフ場を音声認識するための音声認識辞書の
内容を示す図である。[0008] Next, the operation of the above-described conventional voice interactive device will be described with reference to FIGS. 7 and 8. The flow of the dialog shown below refers to FIG. 8, and the contents of the dictionary used for the dialog of the voice recognition refer to FIGS. 4 to 6. FIG. 4 is a diagram showing the contents of a speech recognition dictionary for speech recognition of a genre of a search item in the speech dialogue device, and FIG. 5 is the contents of a speech recognition dictionary for speech recognition of a prefecture with a golf course in the speech dialogue device. FIG. 6 is a diagram showing the contents of a voice recognition dictionary for voice recognition of a golf course in Shizuoka Prefecture by the voice dialogue device.

【０００９】まず、ユーザーの指示により音声対話が開
始されると、対話制御部３０５は辞書選択部３０８に対
し検索のジャンルを表す言葉で構成された辞書の作成を
指令する。この指令により、辞書選択部３０８は音声認
識辞書格納部３０７から図４に示すような、検索のジャ
ンルを表す言葉で構成された音声認識辞書の作成を行
う。First, when a voice dialogue is started according to a user's instruction, a dialogue control unit 305 instructs a dictionary selection unit 308 to create a dictionary composed of words representing a search genre. In response to this command, the dictionary selection unit 308 creates a speech recognition dictionary composed of words representing the genre of the search from the speech recognition dictionary storage unit 307, as shown in FIG.

【００１０】次に、対話制御部３０５はメッセージ選択
部３０９に対し、使用者に対して施設の種類を表す言葉
の発声を促すメッセージを出力することを指令する。こ
の指令に対し、メッセージ選択部３０９はメッセージ辞
書格納部３１０から「施設の種類をどうぞ」というメッ
セージを選択して使用者に音声で提示する。Next, the dialogue control unit 305 instructs the message selection unit 309 to output a message prompting the user to speak a word indicating the type of facility. In response to this command, the message selection unit 309 selects a message “Please select the type of facility” from the message dictionary storage unit 310 and presents it to the user by voice.

【００１１】次に、対話制御部３０５は音声認識部３０
４に対し、辞書選択部３０８が作成した辞書を用いて音
声認識を実行することを指令する。先の「施設の種類を
どうぞ」というメッセージを聞いた使用者は検索したい
ジャンルを表す言葉、この場合「ゴルフ場」を発声して
音声対話装置に音声信号を入力する。入力された音声信
号は音響分析部３０３において特徴パラメータが求めら
れ、音声認識部３０４で認識される。Next, the dialogue control unit 305 is
4 instructs to execute speech recognition using the dictionary created by the dictionary selection unit 308. The user who hears the message "Please select the type of facility" utters a word indicating a genre to be searched, in this case, "golf course" and inputs a voice signal to the voice interaction device. A characteristic parameter of the input speech signal is obtained by the acoustic analysis unit 303 and is recognized by the speech recognition unit 304.

【００１２】認識結果として、「ゴルフ場」が検索のジ
ャンルとして選ばれる。この結果を対話制御部３０５が
記憶する。次に、対話制御部３０５は辞書選択部３０８
に検索の対象の県名を表す言葉で構成された辞書の作成
を指令する。この指令により、辞書選択部３０８は音声
認識辞書格納部３０７から図５に示すような、検索の対
象の県名を表す言葉で構成された音声認識辞書の作成を
行う。As a result of the recognition, "golf course" is selected as a genre of search. The dialog control unit 305 stores this result. Next, the dialogue control unit 305 sets the dictionary selection unit 308
To create a dictionary composed of words representing the names of prefectures to be searched. In response to this command, the dictionary selection unit 308 creates a speech recognition dictionary composed of words representing the prefecture name to be searched, as shown in FIG. 5, from the speech recognition dictionary storage unit 307.

【００１３】次に、対話制御部３０５はメッセージ選択
部３０９に対し、使用者に対して検索の対象の県名を表
す言葉の発声をメッセージとして出力することを指令す
る。この指令に対し、メッセージ選択部３０９はメッセ
ージ辞書格納部３１０から「ゴルフ場のある県名をどう
ぞ」というメッセージを選択し、使用者に音声で提示す
る。Next, the dialogue control unit 305 instructs the message selection unit 309 to output to the user the utterance of a word representing the name of the prefecture to be searched as a message. In response to this command, the message selection unit 309 selects a message "Please select the prefecture where the golf course is located" from the message dictionary storage unit 310 and presents it to the user by voice.

【００１４】次に、対話制御部３０５は、音声認識部３
０４に対し、辞書選択部３０８が作成した辞書を用いて
音声認識を実行することを指令すると、「ゴルフ場のあ
る県名をどうぞ」というメッセージを聞いた使用者は検
索の対象となる県を表す言葉、この場合「静岡県」を発
声して音声対話装置に入力する。入力された音声信号は
音響分析部３０３で特徴パラメータが求められ、音声認
識部３０４で認識され、認識の結果として静岡県」が検
索対象の県名して選ばれる。Next, the dialogue control unit 305 controls the speech recognition unit 3
When the user issues a command to execute voice recognition using the dictionary created by the dictionary selection unit 308, the user who hears the message “Please go to the name of the prefecture where the golf course is located” The word to be expressed, in this case, "Shizuoka Prefecture" is uttered and input to the voice interaction device. The input speech signal is determined by the acoustic analysis unit 303 for feature parameters, and is recognized by the speech recognition unit 304. As a result of the recognition, "Shizuoka prefecture" is selected as the name of the prefecture to be searched.

【００１５】この結果を対話制御部３０５が記憶する。
対話制御部３０５は先の音声認識の結果の「静岡県」
と、その前に行われた音声認識の結果である「ゴルフ
場」とを組み合わせ、辞書選択部３０８に対し、静岡県
のゴルフ場の名称で構成された辞書の作成を指令する。
この指令により、辞書選択部３０８は音声認識辞書格納
部３０７から図６に示すような、静岡県のゴルフ場の名
称で構成された音声認識辞書の作成を行う。The result is stored in the dialogue control unit 305.
The dialogue control unit 305 determines “Shizuoka” as the result of the previous speech recognition.
And "golf course" which is the result of the speech recognition performed before that, and instructs the dictionary selection unit 308 to create a dictionary composed of the names of golf courses in Shizuoka Prefecture.
In response to this command, the dictionary selection unit 308 creates a speech recognition dictionary composed of the names of golf courses in Shizuoka Prefecture from the speech recognition dictionary storage unit 307 as shown in FIG.

【００１６】次に、対話制御部３０５はメッセージ選択
部３０９に対し、使用者に対して検索の対象である静岡
県のゴルフ場の名称を表す言葉の発声をメッセージとし
て出力することを指令する。この指令に対し、メッセー
ジ選択部３０９はメッセージ辞書格納部３１０から
「ゴルフ場の名称をどうぞ」というメッセージを選択
し、使用者に音声で提示する。Next, the dialogue control unit 305 instructs the message selection unit 309 to output to the user, as a message, the utterance of a word indicating the name of the golf course in Shizuoka Prefecture to be searched. In response to this command, the message selection unit 309 sends the message
The message "Please name the golf course" is selected and presented to the user by voice.

【００１７】次に、対話制御部３０５は、音声認識部３
０４に対し、辞書選択部３０８が作成した辞書を用いて
音声認識を実行することを指令すると、「ゴルフ場の名
称をどうぞ」というメッセージを聞いた使用者は検索の
対象となるゴルフ場の名称を表す言葉、この場合「○○
カントリークラブ」を発声して音声対話装置に入力す
る。入力された音声信号は音響分析部３０３で特徴パラ
メータが求められ、音声認識部３０４で認識され、認識
の結果として○○カントリークラブ」が選ばれ、検索対
象が確定する。Next, the dialog control unit 305 controls the speech recognition unit 3
04, the dictionary selection unit 308 instructs to execute voice recognition using the dictionary created. When the user hears the message “Please name the golf course”, the user receives the name of the golf course to be searched. In this case, "○○
"Country club" and input it to the voice interaction device. The input speech signal is determined by the acoustic analysis unit 303 for feature parameters, and is recognized by the speech recognition unit 304. As a result of the recognition, "XX country club" is selected, and the search target is determined.

【００１８】次に、対話制御部３０５はメッセージ選択
部３０９に対し、確定した検索対象「○○カントリーク
ラブ」をユーザーに提示することをを指令する。この指
令に対し、メッセージ選択部３０９はメッセージ辞書格
納部３１０に格納されている内容と「○○カントリーク
ラブ」とを組み合わせ、「○○カントリークラブ付近の
地図を表示します。」というメッセージを作成して使用
者に対し音声で提示する。そして、その地図が表示され
る。以上の動作により、図８に示した対話の流れは完了
する。Next, the dialogue control unit 305 instructs the message selection unit 309 to present the determined search target “XX country club” to the user. In response to this command, the message selection unit 309 combines the content stored in the message dictionary storage unit 310 with "XX country club" to create a message "displays a map near the XX country club." And present it to the user by voice. Then, the map is displayed. With the above operation, the flow of the dialog shown in FIG. 8 is completed.

【００１９】[0019]

【発明が解決しようとする課題】しかしながら、上記の
従来広く用いられている音声認識装置では、複数の入力
を蓄積する手段を持たないため、先に入力した言葉によ
って、次に実施すべき音声認識の対象を絞り込むことに
より目的の項目を検索するという方法が採られるため、
上記のようなゴルフ場の検索の例では、その対話の流れ
が図８に示すようなものに固定されてしまうことにな
る。However, since the above-mentioned widely used speech recognition apparatus does not have a means for storing a plurality of inputs, the speech recognition to be performed next is performed based on the previously input words. By narrowing down the target of, the method of searching for the desired item is adopted,
In the example of the golf course search described above, the flow of the dialogue is fixed to the one shown in FIG.

【００２０】一般に、音声対話装置の分野では、音声対
話装置の使用者に対し違和感とかストレスを与えない、
自然な音声対話を提供することが要求されている。上記
の例では、ゴルフ場の名称が使用者の入力する情報の主
体であり、県名は補足情報である。そのため、図８に示
されるように、補足情報を先に入力させ、主体となる情
報をあとから入力させると、逆の場合に比べ、主体とな
る情報を先に入力することができないので、使用者に対
し違和感を与えがちになるという問題があった。In general, in the field of voice interactive devices, the user of the voice interactive device does not feel uncomfortable or stress,
There is a need to provide natural spoken dialogue. In the above example, the name of the golf course is the subject of the information input by the user, and the prefecture name is supplementary information. For this reason, as shown in FIG. 8, when the supplementary information is input first and the main information is input later, the main information cannot be input first as compared to the opposite case. There is a problem that tends to give a sense of discomfort to the elderly.

【００２１】本発明は、上記従来の問題を解決するため
になされたもので、音声認識部の性能限界により認識語
彙数が限定されることから、まず補足情報を先に入力さ
せ認識語彙数を絞り込んだ後に、主体となる情報を入力
させるという対話の流れにせざるを得ないような場合で
も、同一性能の音声認識部を用いて、主体となる情報を
先に入力した後に補足情報を入力するという対話の流れ
を実現することができ、発声順序を変更して目的の項目
を検索しうる音声対話装置及び対話方法を提供すること
を目的とする。The present invention has been made to solve the above-mentioned conventional problem. Since the number of recognized vocabulary words is limited by the performance limit of the speech recognition unit, first, supplementary information is input first to reduce the number of recognized vocabulary words. Even if it is necessary to enter the main information after narrowing down, use the voice recognition unit of the same performance to input the main information first and then input the supplementary information even if it is necessary to input the main information It is an object of the present invention to provide a spoken dialogue apparatus and a dialogue method that can realize the flow of the dialogue and can search the target item by changing the utterance order.

【００２２】[0022]

【課題を解決するための手段】本発明による音声対話装
置及び対話方法は、入力された音声信号を入力音声信号
の形でまたは入力音声信号を分析した結果の特徴パラメ
ータの形で蓄積する蓄積手段を設け、音声信号を入力し
た順序を入れ替えて音声認識することにより、後で発声
した言葉の音声認識結果から、前に発声した言葉に対す
る音声認識の対象を絞るようにしたものである。SUMMARY OF THE INVENTION According to the present invention, there is provided a speech dialogue apparatus and dialogue method for accumulating an input speech signal in the form of an input speech signal or in the form of a characteristic parameter obtained by analyzing the input speech signal. And by recognizing the voice by changing the input order of the voice signals, the voice recognition result for the previously uttered word is narrowed down from the voice recognition result of the word uttered later.

【００２３】本発明によれば、同一性能の音声認識部を
用いて、主体となる情報を先に入力（発声）した後に補
足情報を入力した場合でも、後で発声した言葉の音声認
識結果から前に発声した言葉に対する音声認識の対象を
絞ることができるようにしたことにより、使用者に対し
違和感を与えない音声対話装置及び対話方法が得られ
る。According to the present invention, even when supplementary information is input after the main information is first input (uttered) using the voice recognition unit having the same performance, the speech recognition result of the word uttered later is obtained. By being able to narrow down the target of speech recognition for previously uttered words, a speech dialogue apparatus and dialogue method that does not give a user a sense of discomfort can be obtained.

【００２４】[0024]

【発明の実施の形態】本発明の請求項１に記載の発明
は、対話制御部の指令により入力音声信号を蓄積するか
または分析するか、蓄積した入力音声信号を分析するか
の切り換えを行う入力音声制御手段と、対話制御部の指
令により入力音声信号を蓄積する入力音声蓄積手段と、
入力された音声信号を分析して特徴パラメータを求める
音響分析手段と、対話制御部の指令により入力音声信号
を分析して得られた特徴パラメータと音声認識辞書とを
照合して音声認識を行う音声認識手段と、音声対話を制
御する対話制御部と、対話制御部の指令により格納され
ているメッセージの中から使用者に対して提示すべきメ
ッセージを選択して出力するメッセージ選択手段とから
なり、入力した音声信号を入力音声蓄積手段に蓄積し、
入力音声信号の順序を入れ替えて音声認識するようにし
たものであり、入力した音声信号の順序を入れ替えて音
声認識することにより、発声順序を変更して目的の項目
を検索しうる音声対話装置が得られるという作用を有す
る。According to the first aspect of the present invention, switching between storing or analyzing an input voice signal and analyzing the stored input voice signal is performed according to a command from a dialogue control unit. Input voice control means, input voice storage means for storing an input voice signal according to a command of the dialogue control unit,
Acoustic analysis means for analyzing input speech signals to obtain feature parameters, and speech for performing speech recognition by comparing feature parameters obtained by analyzing an input speech signal in accordance with a command of a dialog control unit with a speech recognition dictionary. Recognition means, a dialogue control unit for controlling a voice dialogue, and a message selection means for selecting and outputting a message to be presented to the user from messages stored by a command of the dialogue control unit, The input voice signal is stored in the input voice storage means,
The voice recognition is performed by changing the order of the input voice signals, and by recognizing the voice by changing the order of the input voice signals, a voice interactive device that can change the voice order and search for a target item is realized. It has the effect of being obtained.

【００２５】本発明の請求項２に記載の発明は、入力さ
れた音声信号を分析してその特徴パラメータを求める音
響分析手段と、対話制御部の指令により入力音声信号を
分析して得られた特徴パラメータを蓄積するかまたは音
声認識するか、蓄積していた特徴パラメータを音声認識
するかの切り換えを行うパラメータ制御手段と、対話制
御部の指令により入力音声信号を分析して得られた特徴
パラメータを蓄積するパラメータ蓄積手段と、対話制御
部の指令により入力音声信号を分析して得られた特徴パ
ラメータと音声認識辞書とを照合して音声認識を行う音
声認識手段と、音声対話を制御する対話制御部と、対話
制御部の指令によりメッセージ辞書格納手段に格納され
ているメッセージの中から使用者に対して提示すべきメ
ッセージを選択して出力するメッセージ選択手段とから
なり、入力した音声信号を分析して得られた特徴パラメ
ータをパラメータ蓄積手段に蓄積し、特徴パラメータの
順序を入れ替えて音声認識するようにしたものであり、
入力した音声信号の特徴パラメータの順序を入れ替えて
音声認識することにより、発声順序を変更して目的の項
目を検索しうる音声対話装置が得られるという作用を有
する。The invention according to claim 2 of the present invention is obtained by analyzing an input voice signal and analyzing the input voice signal in accordance with a command from a dialogue control unit, and acoustic analysis means for obtaining a characteristic parameter of the input voice signal. Parameter control means for switching between storing the feature parameter or performing voice recognition or recognizing the stored feature parameter as voice, and a feature parameter obtained by analyzing the input voice signal in accordance with a command from the dialog control unit Parameter storage means for storing voice data, voice recognition means for performing voice recognition by comparing a feature parameter obtained by analyzing an input voice signal in accordance with a command of the dialog control unit with a voice recognition dictionary, and dialogue for controlling voice dialogue A message to be presented to the user is selected from the messages stored in the message dictionary storage means according to a command from the control unit and the dialog control unit. Consists of a message selection means for outputting, to accumulate the characteristic parameters obtained by analyzing the audio signal input to the parameter storage unit, which has to be speech recognition out of sequence of feature parameters,
By reversing the order of the feature parameters of the input speech signal and performing speech recognition, an effect is obtained that a speech dialogue apparatus capable of changing the utterance order and searching for a target item is obtained.

【００２６】本発明の請求項３に記載の発明は、対話に
よる音声信号を入力し、入力した音声信号を分析して特
徴パラメータを求め、前記入力した音声信号かまたは該
音声信号から求められた特徴パラメータを蓄積し、制御
手段の制御による対話の流れに従い格納されているメッ
セージから提示すべきメッセージを選択して提示し、前
記蓄積した音声信号かまたは特徴パラメータの順序を前
記対話の流れとは異なるように入れ換え音声認識辞書と
照合して音声認識を行うようにしたものであり、入力し
た音声信号かまたは音声信号の特徴パラメータの順序を
入れ替えて音声認識することにより、発声順序を変更し
て目的の項目を検索しうる音声対話方法が得られるとい
う作用を有する。According to the invention of claim 3 of the present invention, a speech signal by dialogue is inputted, the inputted speech signal is analyzed to obtain a characteristic parameter, and the characteristic parameter is obtained, and the speech signal is obtained from the inputted speech signal or from the speech signal. A feature parameter is accumulated, a message to be presented is selected and presented from messages stored according to the flow of the dialogue under the control of the control means, and the sequence of the stored voice signal or the feature parameter is defined as the flow of the dialogue. The voice recognition is performed by comparing the input voice signal with the voice recognition dictionary differently, and the voice recognition sequence is changed by performing voice recognition by changing the sequence of the input voice signal or the feature parameter of the voice signal. This has the effect that a voice dialogue method capable of searching for a target item is obtained.

【００２７】以下、添付図面、図１乃至図６に基づき、
本発明の実施の形態を詳細に説明する。図１は本発明の
第１の実施の形態における音声対話装置の構成を示すブ
ロック図、図２は本発明の第２の実施の形態における音
声対話装置の構成を示すブロック図、図３は図１及び図
２に示す音声対話装置による音声対話の流れを示すフロ
ーチャートを示す図、図４は音声対話装置において検索
項目のジャンルを音声認識するための音声認識辞書の内
容を示す図、図５は音声対話装置においてゴルフ場のあ
る県名を音声認識するための音声認識辞書の内容を示す
図、図６は音声対話装置において静岡県のゴルフ場を音
声認識するための音声認識辞書の内容を示す図である。Hereinafter, based on the attached drawings and FIGS.
An embodiment of the present invention will be described in detail. FIG. 1 is a block diagram illustrating a configuration of a voice interaction device according to a first embodiment of the present invention, FIG. 2 is a block diagram illustrating a configuration of a voice interaction device according to a second embodiment of the present invention, and FIG. FIG. 4 is a flowchart showing a flow of a voice dialogue by the voice dialogue device shown in FIGS. 1 and 2; FIG. 4 is a diagram showing the contents of a voice recognition dictionary for voice recognition of a genre of a search item in the voice dialogue device; FIG. 6 is a diagram showing the contents of a voice recognition dictionary for voice recognition of a prefecture with a golf course in a voice interactive device, and FIG. 6 is a diagram showing the contents of a voice recognition dictionary for voice recognition of a golf course in Shizuoka with a voice interactive device. FIG.

【００２８】（実施の形態１）まず、図１を参照して、
本発明の第１の実施の形態における音声対話装置の構成
について詳細に説明する。図１において、１０１は対話
制御部１０５の指令により入力音声信号を蓄積するか、
入力音声信号を分析するか、または蓄積した入力音声信
号を分析するかの切り換えを行う入力音声制御部、１０
２は対話制御部１０５の指令により入力音声信号を蓄積
する入力音声蓄積部である。(Embodiment 1) First, referring to FIG.
The configuration of the voice interaction device according to the first embodiment of the present invention will be described in detail. In FIG. 1, reference numeral 101 denotes whether to store an input voice signal according to a command from the dialog control unit 105,
An input audio control unit that switches between analyzing an input audio signal or analyzing a stored input audio signal;
Reference numeral 2 denotes an input voice storage unit that stores an input voice signal according to a command from the dialog control unit 105.

【００２９】また、１０３は入力された音声信号を分析
して特徴パラメータを求める音響分析部、１０４は対話
制御部１０５の指令により、入力音声信号を分析して得
られた特徴パラメータと音声認識辞書とを照合して音声
認識を行う音声認識部、１０５は音声対話を制御する対
話制御部、１０６は使用者の操作とか音声認識の結果に
従って決まる音声対話の流れに対する情報を格納する対
話制御用情報格納部、１０７は音声認識に用いられる辞
書を格納する音声認識辞書格納部である。Reference numeral 103 denotes an acoustic analysis unit for analyzing input speech signals to obtain feature parameters, and 104 denotes a feature parameter obtained by analyzing an input speech signal and a speech recognition dictionary in accordance with a command from the dialog control unit 105. A voice recognition unit for performing voice recognition by collating with a voice; 105, a dialogue control unit for controlling a voice dialogue; 106, dialogue control information for storing information on a flow of the voice dialogue determined according to a user operation or a result of voice recognition A storage unit 107 is a speech recognition dictionary storage unit that stores a dictionary used for speech recognition.

【００３０】また、１０８は対話制御部１０５の指令に
より、音声認識辞書格納部１０７に格納されている辞書
から音声認識に用いる辞書を選択する辞書選択部、１０
９は対話制御部１０５の指令により、メッセージ辞書格
納部１１０に格納されているメッセージの中から使用者
に対して提示すべきメッセージを選択するメッセージ選
択部、１１０は使用者に対して音声で提示するメッセー
ジを格納するメッセージ辞書格納部である。Reference numeral 108 denotes a dictionary selection unit for selecting a dictionary used for speech recognition from the dictionaries stored in the speech recognition dictionary storage unit 107 in accordance with a command from the dialogue control unit 105.
Reference numeral 9 denotes a message selection unit for selecting a message to be presented to the user from among messages stored in the message dictionary storage unit 110 in accordance with a command from the dialog control unit 105. Reference numeral 110 denotes a voice presentation to the user. This is a message dictionary storage unit for storing messages to be processed.

【００３１】尚、入力音声制御部１０１、入力音声蓄積
部１０２、音響分析部１０３、音声認識部１０４、対話
制御部１０５及びメッセージ選択部１０９はそれぞれ入
力音声制御手段、入力音声蓄積手段、音響分析手段、音
声認識手段、対話制御手段及びメッセージ選択手段に対
応する。The input voice control unit 101, the input voice storage unit 102, the sound analysis unit 103, the voice recognition unit 104, the dialog control unit 105, and the message selection unit 109 are respectively an input voice control unit, an input voice storage unit, and a sound analysis unit. Means, voice recognition means, dialog control means, and message selection means.

【００３２】次に、図１及び図３乃至図６を参照して、
本発明の第１の実施の形態における音声対話装置の動作
について、図３に示す対話の流れを例に詳細に説明す
る。まず、ユーザーの指示により音声対話が開始される
と、対話制御部１０５は辞書選択部１０８に対し検索の
ジャンルを表す言葉で構成された辞書の作成を指令す
る。この指令により、辞書選択部１０８は音声認識辞書
格納部１０７から図４に示すような、検索のジャンルを
表す言葉で構成された音声認識辞書の作成を行う。Next, referring to FIG. 1 and FIGS. 3 to 6,
The operation of the spoken dialogue apparatus according to the first embodiment of the present invention will be described in detail with reference to the dialogue flow shown in FIG. 3 as an example. First, when a voice dialogue is started according to a user's instruction, the dialogue control unit 105 instructs the dictionary selection unit 108 to create a dictionary composed of words representing a search genre. In response to this command, the dictionary selection unit 108 creates a speech recognition dictionary composed of words representing the genre of the search from the speech recognition dictionary storage unit 107, as shown in FIG.

【００３３】次に、対話制御部１０５はメッセージ選択
部１０９に対し、使用者に対して施設の種類を表す言葉
の発声を促すメッセージを出力することを指令する。こ
の指令に対し、メッセージ選択部１０９はメッセージ辞
書格納部１１０から「どのジャンルを検索しますか？」
というメッセージを選択し、使用者に音声で提示する。
（尚、使用者に対するこの提示は音声によるほか、表示
装置に対する表示をも併用することもできる、以下同
じ）。Next, the dialog control unit 105 instructs the message selection unit 109 to output a message prompting the user to utter a word indicating the type of facility. In response to this command, the message selection unit 109 reads “what genre to search?” From the message dictionary storage unit 110.
Is selected and presented to the user by voice.
(Note that this presentation to the user is not only made by voice, but can also be made with a display on a display device, the same applies hereinafter).

【００３４】次に、対話制御部１０５は、音声認識部１
０４に対し辞書選択部１０８が作成した辞書を用いて音
声認識を実行することを指令するとともに、入力音声制
御部１０１に対し、入力音声信号を音響分析部へ出力す
ることを指令する。先の「どのジャンルを検索しますか
？」というメッセージを聞いた使用者は検索を希望する
ジャンルを表す言葉、この場合「ゴルフ場」を発声して
音声対話装置に入力する。入力された音声信号は、入力
音声制御部１０１を経由し、音響分析部１０３において
その特徴パラメータが求められ、音声認識部１０４で認
識される。Next, the dialog control unit 105 controls the speech recognition unit 1
04 instructs to execute speech recognition using the dictionary created by the dictionary selection unit 108, and instructs the input speech control unit 101 to output an input speech signal to the acoustic analysis unit. The user who hears the message "Which genre do you want to search for?" Speaks a word indicating the genre desired to be searched, in this case, "golf course" and inputs it to the voice interaction device. The input speech signal passes through an input speech control unit 101, a characteristic parameter of which is obtained by an acoustic analysis unit 103, and is recognized by a speech recognition unit 104.

【００３５】認識結果として、「ゴルフ場」が検索のジ
ャンルとして選ばれる。対話制御部１０５はこの結果を
記憶する。次に、対話制御部１０５は、メッセージ選択
部１０９に対し先の音声認識の結果である「ゴルフ場」
の名称の発声を使用者に対して促す言葉をメッセージと
して出力することを指令する。この指令に対し、メッセ
ージ選択部１０９はメッセージ辞書格納部１１０から
「何というゴルフ場ですか？」というメッセージを選択
して使用者に音声で提示する。As a result of the recognition, "golf course" is selected as a genre of search. The dialog control unit 105 stores this result. Next, the dialogue control unit 105 instructs the message selection unit 109 to output the result of the previous voice recognition, “golf course”.
Is output as a message that prompts the user to utter the name. In response to this command, the message selection unit 109 selects a message “What golf course?” From the message dictionary storage unit 110 and presents it to the user by voice.

【００３６】次に、対話制御部１０５は、入力音声制御
部１０１及び入力音声蓄積部１０２に対し、入力した入
力音声信号を蓄積することを指令する。入力音声制御部
１０１は、この指令により入力した音声信号を音声蓄積
部１０２に出力し、入力音声蓄積部１０２は入力音声信
号の蓄積を開始する。Next, the dialogue control unit 105 instructs the input voice control unit 101 and the input voice storage unit 102 to store the input voice signal. The input voice control unit 101 outputs the voice signal input according to this command to the voice storage unit 102, and the input voice storage unit 102 starts storing the input voice signal.

【００３７】また、先に提示された「何というゴルフ場
ですか？」というメッセージを聞いた使用者は検索を希
望するゴルフ場を表す言葉、この場合「○○カントリー
クラブ」を発声し、音声対話装置に入力する。入力され
た音声である「○○カントリークラブ」は入力音声制御
部１０１を経由して、入力音声蓄積部１０２に蓄積され
る。When the user hears the message "What is the golf course?" Presented earlier, the user utters a word indicating the golf course desired to be searched, in this case, "XX country club", and sounds. Type in the interactive device. The input voice “XX country club” is stored in the input voice storage unit 102 via the input voice control unit 101.

【００３８】この蓄積が終了すると、対話制御部１０５
は辞書選択部１０８に対し検索の対象とする県名を表す
言葉で構成された辞書の作成を指令する。この指令によ
り、辞書選択部１０８は音声認識辞書格納部１０７から
図５に示すような、検索対象の県名を表す言葉で構成さ
れた音声認識辞書の作成を行う。When the accumulation is completed, the dialog control unit 105
Commands the dictionary selection unit 108 to create a dictionary composed of words representing prefecture names to be searched. In response to this command, the dictionary selection unit 108 creates a speech recognition dictionary composed of words representing the name of the prefecture to be searched, as shown in FIG. 5, from the speech recognition dictionary storage unit 107.

【００３９】次に、対話制御部１０５はメッセージ選択
部１０９に対し、使用者に対して検索の対象の県名を表
す言葉の発声を促す言葉をメッセージとして出力するこ
とを指令する。この指令に対し、メッセージ選択部１０
９はメッセージ辞書格納部１１０から「どの県にありま
すか？」というメッセージを選択し、使用者に音声で提
示する。Next, the dialogue control unit 105 instructs the message selection unit 109 to output to the user, as a message, a word that prompts the user to utter a word indicating the name of the prefecture to be searched. In response to this command, the message selection unit 10
9 selects a message "Which prefecture is it?" From the message dictionary storage unit 110 and presents it to the user by voice.

【００４０】次に、対話制御部１０５は、音声認識部１
０４に対し、辞書選択部１０８が作成した辞書を用いて
音声認識を実行することを指令するとともに、入力音声
制御部１０１に対し、入力音声信号を音響分析部１０３
へ出力することを指令する。先の「どの県にありますか
？」というメッセージを聞いた使用者は検索の対象とな
る県を表す言葉、この場合「静岡県」を発声し、音声対
話装置に入力する。入力された音声信号「静岡県」は入
力音声制御部１０１を経由して、音響分析部１０３で特
徴パラメータが求められ、音声認識部１０４で認識さ
れ、その認識結果として、「静岡県」が検索対象の県名
して選ばれる。Next, the dialog control unit 105 controls the speech recognition unit 1
04 to execute voice recognition using the dictionary created by the dictionary selection unit 108, and to the input voice control unit 101,
Command to output to. Upon hearing the message "Which prefecture is it?", The user utters a word representing the prefecture to be searched, in this case, "Shizuoka prefecture" and inputs it to the voice interaction device. The input speech signal “Shizuoka” passes through the input speech control unit 101, a feature parameter is obtained by the acoustic analysis unit 103, and the feature parameter is recognized by the speech recognition unit 104. As a result of the recognition, “Shizuoka” is searched. Selected as the target prefecture name.

【００４１】対話制御部１０５は、その結果を記憶する
とともに、音声認識の結果の「静岡県」と、その前に行
われた音声認識の結果である「ゴルフ場」とを組み合わ
せて、辞書選択部１０８に対し、静岡県のゴルフ場の名
称で構成された辞書の作成を指令する。この指令によ
り、辞書選択部１０８は音声認識辞書格納部１０７から
図６に示すような、静岡県のゴルフ場の名称で構成され
た音声認識辞書の作成を行う。The dialogue control unit 105 stores the result and combines the result of speech recognition "Shizuoka" with the result of speech recognition "Golf course" performed before that to select a dictionary. It instructs the unit 108 to create a dictionary composed of the names of golf courses in Shizuoka Prefecture. In response to this command, the dictionary selection unit 108 creates a speech recognition dictionary composed of the names of golf courses in Shizuoka Prefecture from the speech recognition dictionary storage unit 107, as shown in FIG.

【００４２】次に、対話制御部１０５は、入力音声制御
部１０１及び入力音声蓄積部１０２に対し、先に蓄積し
た使用者の発声である「○○カントリークラブ」の音声
信号を音響分析部１０３に出力することを指令する。こ
の指令により音声蓄積部１０２は蓄積された音声信号を
入力音声制御部１０１に出力し、入力音声制御部１０１
は音響分析部１０３に対してその入力音声信号の出力を
開始する。この音声信号が、音響分析部１０３で分析さ
れてその特徴パラメータが求められ、音声認識部１０４
で認識される。その認識結果から、図６に示すような
「○○カントリークラブ」が選ばれて検索対象が確定す
る。Next, the dialogue control unit 105 sends the voice signal of “XX Country Club”, which is the voice of the user previously stored, to the sound analysis unit 103 to the input voice control unit 101 and the input voice storage unit 102. Command to output to. In response to this command, the voice storage unit 102 outputs the stored voice signal to the input voice control unit 101, and the input voice control unit 101
Starts outputting the input audio signal to the acoustic analysis unit 103. The speech signal is analyzed by the acoustic analysis unit 103 to determine its characteristic parameters.
Recognized by From the recognition result, “XX Country Club” as shown in FIG. 6 is selected and the search target is determined.

【００４３】次に、対話制御部１０５はメッセージ選択
部１０９に対し、確定した検索対象「○○カントリーク
ラブ」をユーザーに対し音声で提示することをを指令す
る。この指令に対し、メッセージ選択部１０９はメッセ
ージ辞書格納部１１０に格納されている内容と「○○カ
ントリークラブ」を組み合わせ、「○○カントリークラ
ブ付近の地図を表示します。」というメッセージを作成
して使用者に対し音声で提示する。そして、その地図が
表示される。以上の動作により、図３に示した対話の流
れが完了する。Next, the dialogue control unit 105 instructs the message selection unit 109 to present the determined search target "XX country club" to the user by voice. In response to this command, the message selection unit 109 combines the content stored in the message dictionary storage unit 110 with "XX country club" to create a message "displays a map near the XX country club." To the user by voice. Then, the map is displayed. With the above operations, the flow of the dialog shown in FIG. 3 is completed.

【００４４】（実施の形態２）次に、図２を参照して、
本発明の第２の実施の形態における音声対話装置の構成
について詳細に説明する。図２において、２０１は入力
された音声信号を分析してその特徴パラメータを求める
音響分析部、２０２は、対話制御部２０５の指令によ
り、入力音声信号を分析した結果得られた特徴パラメー
タを蓄積するか、入力音声信号を分析した結果得られた
特徴パラメータを音声認識するか、または蓄積していた
特徴パラメータを音声認識するかの切り換えを行うパラ
メータ制御部である。(Embodiment 2) Next, referring to FIG.
The configuration of the voice interaction device according to the second embodiment of the present invention will be described in detail. In FIG. 2, reference numeral 201 denotes an acoustic analysis unit that analyzes an input audio signal and obtains a characteristic parameter thereof, and 202 stores a characteristic parameter obtained as a result of analyzing the input audio signal in accordance with a command from the dialog control unit 205. Or a parameter control unit that switches between speech recognition of a feature parameter obtained as a result of analyzing an input speech signal and speech recognition of a stored feature parameter.

【００４５】また、２０３は対話制御部２０５の指令に
より、入力音声信号を分析して得られた特徴パラメータ
を蓄積するパラメータ蓄積部、２０４は対話制御部２０
５の指令により、入力音声信号を分析して得られた特徴
パラメータと音声認識辞書とを照合して音声認識を行う
音声認識部、２０５は音声対話を制御する対話制御部、
２０６は使用者の操作とか音声認識の結果に従って行わ
れる音声対話の流れの情報を格納する対話制御用情報格
納部、２０７は音声認識に用いられる辞書を格納する音
声認識辞書格納部である。Numeral 203 denotes a parameter storage unit for accumulating characteristic parameters obtained by analyzing the input voice signal in accordance with a command from the dialog control unit 205, and 204 a dialog control unit 20.
5, a voice recognition unit for performing voice recognition by comparing a feature parameter obtained by analyzing an input voice signal with a voice recognition dictionary, 205 is a dialogue control unit for controlling a voice dialogue,
Reference numeral 206 denotes a dialog control information storage unit that stores information on a flow of a voice dialogue performed according to a user operation or a result of voice recognition. Reference numeral 207 denotes a voice recognition dictionary storage unit that stores a dictionary used for voice recognition.

【００４６】また、２０８は対話制御部の指令により、
音声認識辞書格納部に格納されている辞書から、音声認
識に用いる辞書を選択する辞書選択部、２０９は対話制
御部の指令により、メッセージ辞書格納部に格納されて
いるメッセージの中から使用者に対して提示すべきメッ
セージを選択するメッセージ選択部、２１０は使用者に
対して提示するメッセージを格納するメッセージ辞書格
納部である。Reference numeral 208 denotes a command from the dialog control unit.
A dictionary selection unit 209 for selecting a dictionary to be used for speech recognition from the dictionaries stored in the speech recognition dictionary storage unit. A message selection unit 210 selects a message to be presented to the user, and a message dictionary storage unit 210 stores a message to be presented to the user.

【００４７】尚、音響分析部２０１、パラメータ制御部
２０２、パラメータ蓄積部２０３、音声認識部２０４、
対話制御部２０５及びメッセージ選択部２０９はそれぞ
れ音響分析手段、パラメータ制御手段、パラメータ蓄積
手段、音声認識手段、対話制御手段及びメッセージ選択
手段に対応する。Incidentally, the acoustic analysis unit 201, the parameter control unit 202, the parameter storage unit 203, the speech recognition unit 204,
The dialog control unit 205 and the message selection unit 209 correspond to an acoustic analysis unit, a parameter control unit, a parameter storage unit, a voice recognition unit, a dialog control unit, and a message selection unit, respectively.

【００４８】次に、図２及び図３乃至図６を参照して、
本発明の第２の実施の形態における音声対話装置の動作
について、図３に示す対話の流れを例に詳細に説明す
る。まず、ユーザーの指示により音声対話が開始される
と、対話制御部２０５は、辞書選択部２０８に対し検索
のジャンルを表す言葉で構成された辞書の作成を指令す
る。この指令により、辞書選択部２０８は音声認識辞書
格納部２０７から図４に示すような、検索のジャンルを
表す言葉で構成された音声認識辞書の作成を行う。Next, referring to FIG. 2 and FIGS. 3 to 6,
The operation of the spoken dialogue device according to the second embodiment of the present invention will be described in detail with reference to the flow of the dialogue shown in FIG. 3 as an example. First, when a voice dialogue is started according to a user's instruction, the dialogue control unit 205 instructs the dictionary selection unit 208 to create a dictionary composed of words indicating a genre of search. In response to this command, the dictionary selection unit 208 creates a speech recognition dictionary composed of words indicating the genre of search from the speech recognition dictionary storage unit 207 as shown in FIG.

【００４９】次に、対話制御部２０５はメッセージ選択
部２０９に対し、使用者に対して施設の種類を表す言葉
の発声を促すメッセージを出力することを指令する。こ
の指令に対し、メッセージ選択部２０９はメッセージ辞
書格納部２１０から「どのジャンルを検索しますか？」
というメッセージを選択し、使用者に対し音声で提示す
る。Next, the dialogue control unit 205 instructs the message selection unit 209 to output a message prompting the user to speak a word indicating the type of facility. In response to this command, the message selection unit 209 reads from the message dictionary storage unit 210 “What genre to search?”
Is selected and presented to the user by voice.

【００５０】そこで、対話制御部２０５は、音声認識部
２０４に対し、辞書選択部２０８が作成した辞書を用い
て音声認識を実行することを指令するとともに、パラメ
ータ制御部２０２に対し、音響分析部２０１において入
力音声信号を分析した結果得られた特徴パラメータを音
声認識部２０４へ出力することを指令する。先の「どの
ジャンルを検索しますか？」というメッセージを聞いた
使用者は検索を希望するジャンルを表す言葉、この場合
「ゴルフ場」を発声して音声対話装置に入力する。Therefore, the dialog control unit 205 instructs the speech recognition unit 204 to execute speech recognition using the dictionary created by the dictionary selection unit 208, and also instructs the parameter control unit 202 to use the acoustic analysis unit. In step 201, an instruction is issued to output the characteristic parameters obtained as a result of analyzing the input voice signal to the voice recognition unit 204. The user who hears the message "Which genre do you want to search for?" Speaks a word indicating the genre desired to be searched, in this case, "golf course" and inputs it to the voice interaction device.

【００５１】入力された音声信号「ゴルフ場」は、音響
分析部２０１で分析されて特徴パラメータに変換され、
パラメータ制御部２０２を経由し、音響分析部２０１で
求めた特徴パラメータが音声認識部２０４で認識され
る。認識結果として、「ゴルフ場」が検索のジャンルと
して選ばれる。この結果は対話制御部２０５に記憶され
る。次に、対話制御部２０５はメッセージ選択部２０９
に対し、先の音声認識の結果であるゴルフ場の名称の発
声を使用者に対して促す言葉をメッセージとして出力す
ることを指令する。この指令に対し、メッセージ選択部
２０９はメッセージ辞書格納部２１０から「何というゴ
ルフ場ですか？」というメッセージを選択して、使用者
に音声で提示する。The input voice signal “golf course” is analyzed by the acoustic analysis unit 201 and converted into characteristic parameters.
Through the parameter control unit 202, the feature parameters obtained by the sound analysis unit 201 are recognized by the speech recognition unit 204. “Golf course” is selected as a search genre as a recognition result. This result is stored in the dialog control unit 205. Next, the dialog control unit 205 sets the message selection unit 209
Is instructed to output a message prompting the user to utter the name of the golf course, which is the result of the previous speech recognition. In response to this command, the message selection unit 209 selects a message “What golf course?” From the message dictionary storage unit 210 and presents it to the user by voice.

【００５２】次に、対話制御部２０５はパラメータ制御
部２０２とパラメータ蓄積部２０３に対し、音響分析部
２０１において、入力音声信号を分析した結果得られた
特徴パラメータの蓄積を指令する。この指令により、パ
ラメータ制御部２０２は入力された音声信号をパラメー
タ蓄積部２０３に出力し、入力パラメータ蓄積部２０３
は入力音声信号を分析して得られた特徴パラメータの蓄
積を開始する。Next, the dialog control unit 205 instructs the parameter control unit 202 and the parameter storage unit 203 to store the characteristic parameters obtained as a result of analyzing the input voice signal in the sound analysis unit 201. In response to this command, the parameter control unit 202 outputs the input audio signal to the parameter storage unit 203, and the input parameter storage unit 203
Starts accumulation of feature parameters obtained by analyzing the input speech signal.

【００５３】また、先に提示された「何というゴルフ場
ですか？」というメッセージを聞いた使用者は検索を希
望するゴルフ場を表す言葉、この場合は「○○カントリ
ークラブ」を発声して音声対話装置に入力する。入力さ
れた音声信号「○○カントリークラブ」は音響分析部２
０１において分析されて特徴パラメータに変換され、パ
ラメータ制御部２０２を経由して、パラメータ蓄積部２
０３に蓄積される。When the user hears the message "What is the golf course?" Presented earlier, the user speaks a word indicating the golf course desired to be searched, in this case, "XX country club". Input to the spoken dialogue device. The input audio signal "XX Country Club" is the sound analysis unit 2.
01, is converted into characteristic parameters, and is passed through the parameter control unit 202 to the parameter storage unit 2.
03.

【００５４】この蓄積が終了すると、対話制御部２０５
は辞書選択部２０８に対し検索の対象の県名を表す言葉
で構成された辞書の作成を指令する。この指令により、
辞書選択部２０８は音声認識辞書格納部２０７から図５
に示すような、検索の対象の県名を表す言葉で構成され
た音声認識辞書の作成を行う。When the accumulation is completed, the dialog control unit 205
Commands the dictionary selection unit 208 to create a dictionary composed of words representing the names of prefectures to be searched. With this directive,
The dictionary selection unit 208 receives the data from the speech recognition dictionary storage unit 207 as shown in FIG.
A speech recognition dictionary composed of words indicating the name of the prefecture to be searched is created as shown in FIG.

【００５５】次に、対話制御部２０５はメッセージ選択
部２０９に対し、使用者に対して検索の対象の県名を表
す言葉の発声を促す言葉をメッセージとして出力するこ
とを指令する。この指令に対し、メッセージ選択部２０
９はメッセージ辞書格納部２１０から「どの県にありま
すか？」というメッセージを選択して、使用者に音声で
提示する。Next, the dialog control unit 205 instructs the message selection unit 209 to output to the user, as a message, a word that prompts the user to utter a word indicating the name of the prefecture to be searched. In response to this command, the message selector 20
9 selects a message "Which prefecture is it?" From the message dictionary storage unit 210 and presents it to the user by voice.

【００５６】次に、対話制御部２０５は、音声認識部２
０４に対し、辞書選択部２０８が作成した辞書を用いて
音声認識を実行することを指令するとともに、パラメー
タ制御部２０２に対し、入力音声信号を音響分析部２０
１で分析して得られた特徴パラメータを音声認識部２０
４へ出力することを指令する。Next, the dialog control unit 205 controls the speech recognition unit 2
04 to execute voice recognition using the dictionary created by the dictionary selection unit 208, and instruct the parameter control unit 202 to input the input voice signal to the acoustic analysis unit 20.
The characteristic parameters obtained by the analysis in step 1
4 is output.

【００５７】先の「どの県にありますか？」というメッ
セージを聞いた使用者は検索の対象となる県を表す言
葉、この場合は「静岡県」を発声し、音声対話装置に入
力する。入力された音声信号「静岡県」は音響分析部２
０１で分析されて特徴パラメータに変換され、パラメー
タ制御部２０２を経由して、音響分析部２０１で求めら
れた特徴パラメータが音声認識部２０４で認識される。The user who hears the message "Which prefecture are you located in?" Speaks the word representing the prefecture to be searched, in this case, "Shizuoka prefecture" and inputs it to the voice interactive device. The input audio signal "Shizuoka" is the sound analysis unit 2.
01 is converted into characteristic parameters, and the characteristic parameters obtained by the acoustic analysis unit 201 are recognized by the speech recognition unit 204 via the parameter control unit 202.

【００５８】認識結果として、「静岡県」が検索対象の
県名して選ばれる。この結果は対話制御部２０５に記憶
される。対話制御部２０５は先の音声認識の結果の「静
岡県」と、その前に行われた音声認識の結果である「ゴ
ルフ場」とを組み合わせ、辞書選択部２０８に対し、静
岡県のゴルフ場の名称で構成された辞書の作成を指令す
る。この指令により、辞書選択部２０８は音声認識辞書
格納部２０７から図６に示すような、静岡県のゴルフ場
の名称で構成された音声認識辞書の作成を行う。As the recognition result, "Shizuoka prefecture" is selected as the name of the prefecture to be searched. This result is stored in the dialog control unit 205. The dialogue control unit 205 combines “Shizuoka Prefecture” as the result of the previous voice recognition with “Golf course” as the result of the previous voice recognition, and instructs the dictionary selection unit 208 to use the golf course in Shizuoka Prefecture. Command to create a dictionary composed of the names of In response to this command, the dictionary selection unit 208 creates a speech recognition dictionary composed of the names of golf courses in Shizuoka Prefecture from the speech recognition dictionary storage unit 207 as shown in FIG.

【００５９】次に、対話制御部２０５は、パラメータ制
御部２０２及びパラメータ蓄積部２０３に対し、先に蓄
積した使用者の発声による「○○カントリークラブ」の
特徴パラメータを音声認識部２０４へ出力することを指
令する。この指令により、パラメータ蓄積部２０３は、
蓄積された特徴パラメータをパラメータ制御部２０２に
出力し、パラメータ制御部２０２はその特徴パラメータ
の音声認識部２０４に対する出力を開始する。この特徴
パラメータは、音声認識部２０４で認識され、その認識
の結果として、「○○カントリークラブ」が選ばれて検
索対象が確定する。Next, the dialogue control unit 205 outputs the previously stored characteristic parameters of “XX country club” by the user's voice to the speech recognition unit 204 to the parameter control unit 202 and the parameter storage unit 203. Command. With this command, the parameter storage unit 203
The stored characteristic parameters are output to the parameter control unit 202, and the parameter control unit 202 starts outputting the characteristic parameters to the speech recognition unit 204. This feature parameter is recognized by the voice recognition unit 204, and as a result of the recognition, “OO country club” is selected and the search target is determined.

【００６０】次に、対話制御部２０５はメッセージ選択
部２０９に対し、確定した検索対象「○○カントリーク
ラブ」をユーザーに提示することをを指令する。この指
令に対し、メッセージ選択部２０９はメッセージ辞書格
納部２１０に格納されている内容と「○○カントリーク
ラブ」を組み合わせ、「○○カントリークラブ付近の地
図を表示します。」というメッセージを作成して使用者
に対し音声で提示する。そして、その地図が表示され
る。以上の動作により、図３に示した対話の流れが完了
する。Next, the dialogue control unit 205 instructs the message selection unit 209 to present the determined search target “XX country club” to the user. In response to this command, the message selection unit 209 combines the content stored in the message dictionary storage unit 210 with "XX country club" to create a message "Display a map near the XX country club." To the user by voice. Then, the map is displayed. With the above operations, the flow of the dialog shown in FIG. 3 is completed.

【００６１】[0061]

【発明の効果】本発明は、以上のように構成し、特に、
入力音声信号を一時蓄積する入力音声蓄積部かまたは入
力音声信号を分析した結果の特徴パラメータを一時蓄積
するパラメータ蓄積部を備え、後から発声した言葉の音
声認識結果から、前に発声した言葉に対する認識対象の
語彙を絞って音声認識できるようにしたことにより、同
じ性能の音声認識部を用いて、主体となる情報を先に入
力した後補足情報を入力するという対話の流れを実現す
ることができ、発声順序を変更して目的の項目を検索し
うる、使用者に対してより使い易い音声対話装置及び対
話方法を提供することができる。The present invention is configured as described above, and in particular,
An input voice storage unit for temporarily storing an input voice signal or a parameter storage unit for temporarily storing feature parameters as a result of analyzing the input voice signal is provided. By enabling speech recognition by narrowing down the vocabulary to be recognized, it is possible to realize a dialog flow in which the main information is input first, and then the supplementary information is input, using the same performance of the speech recognition unit. It is possible to provide a user-friendly voice dialogue apparatus and a dialogue method that can change the utterance order and search for a target item.

[Brief description of the drawings]

【図１】本発明の第１の実施の形態における音声対話装
置の構成を示すブロック図FIG. 1 is a block diagram showing a configuration of a voice interaction device according to a first embodiment of the present invention.

【図２】本発明の第２の実施の形態における音声対話装
置の構成を示すブロック図FIG. 2 is a block diagram showing a configuration of a voice interaction device according to a second embodiment of the present invention;

【図３】図１及び図２に示す音声対話装置による音声対
話の流れを示すフローチャートを示す図FIG. 3 is a flowchart showing a flow of a voice dialogue by the voice dialogue device shown in FIGS. 1 and 2;

【図４】音声対話装置において検索項目のジャンルを音
声認識するための音声認識辞書の内容を示す図FIG. 4 is a diagram showing contents of a voice recognition dictionary for voice recognition of a genre of a search item in a voice interactive device.

【図５】音声対話装置においてゴルフ場のある県名を音
声認識するための音声認識辞書の内容を示す図FIG. 5 is a diagram showing contents of a voice recognition dictionary for voice recognition of a prefecture name of a golf course in the voice dialogue device.

【図６】音声対話装置において静岡県のゴルフ場を音声
認識するための音声認識辞書の内容を示す図FIG. 6 is a diagram showing the contents of a voice recognition dictionary for voice recognition of a golf course in Shizuoka prefecture by a voice dialogue device.

【図７】従来の音声対話装置の構成を示すブロック図FIG. 7 is a block diagram showing a configuration of a conventional voice interaction device.

【図８】図７に示す音声対話装置による音声対話の流れ
を示すフローチャート8 is a flowchart showing a flow of a voice dialogue by the voice dialogue device shown in FIG. 7;

[Explanation of symbols]

１０１入力音声制御部１０２入力音声蓄積部１０３音響分析部１０４音声認識部１０５対話制御部１０６対話制御用情報格納部１０７音声認識辞書格納部１０８辞書選択部１０９メッセージ選択部１１０メッセージ辞書格納部２０１音響分析部２０２パラメータ制御部２０３パラメータ蓄積部２０４音声認識部２０５対話制御部２０６対話制御用情報格納部２０７音声認識辞書格納部２０８辞書選択部２０９メッセージ選択部２１０メッセージ辞書格納部３０３音響分析部３０４音声認識部３０５対話制御部３０６対話制御用情報格納部３０７音声認識辞書格納部３０８辞書選択部３０９メッセージ選択部３１０メッセージ辞書格納部 101 input voice control unit 102 input voice storage unit 103 sound analysis unit 104 voice recognition unit 105 dialog control unit 106 dialog control information storage unit 107 voice recognition dictionary storage unit 108 dictionary selection unit 109 message selection unit 110 message dictionary storage unit 201 sound Analysis unit 202 Parameter control unit 203 Parameter storage unit 204 Voice recognition unit 205 Dialogue control unit 206 Dialogue control information storage unit 207 Voice recognition dictionary storage unit 208 Dictionary selection unit 209 Message selection unit 210 Message dictionary storage unit 303 Sound analysis unit 304 Voice Recognition unit 305 Dialog control unit 306 Dialog control information storage unit 307 Voice recognition dictionary storage unit 308 Dictionary selection unit 309 Message selection unit 310 Message dictionary storage unit

Claims

[Claims]

1. An input voice control means for switching between accumulating or analyzing an input voice signal or analyzing a stored input voice signal in accordance with a command from the dialog control unit, and an input voice signal in response to a command from the dialog control unit. Input voice storage means for storing signals, acoustic analysis means for analyzing input voice signals to obtain feature parameters, feature parameters and voice recognition dictionary obtained by analyzing input voice signals in accordance with commands from a dialogue control unit A voice recognition means for performing voice recognition by collating with a user, a dialogue control unit for controlling a voice dialogue, and selecting a message to be presented to the user from messages stored by a command of the dialogue control unit. Means for selecting and outputting the input voice signal, storing the input voice signal in the input voice storage means, and changing the order of the input voice signal to perform voice recognition. Spoken dialogue apparatus according to symptoms.

2. An acoustic analysis means for analyzing an input voice signal to obtain a characteristic parameter thereof, and storing or recognizing a voice parameter obtained by analyzing the input voice signal according to a command of a dialogue control unit. A parameter control means for switching between voice recognition of the stored feature parameters and a parameter storage means for storing feature parameters obtained by analyzing an input voice signal in accordance with a command from the dialog control unit; A voice recognition unit for performing voice recognition by comparing a feature parameter obtained by analyzing an input voice signal with a voice recognition dictionary according to a command of a unit, a dialog control unit for controlling a voice dialog, and a command of the dialog control unit. Message selection means for selecting and outputting a message to be presented to the user from messages stored in the message dictionary storage means It accumulates the characteristic parameters obtained by analyzing the audio signal input to the parameter storage unit, the speech dialogue system, characterized in that so as to speech recognition out of sequence of feature parameters.

3. An audio signal by dialogue is input, the input audio signal is analyzed to determine a characteristic parameter, and the input audio signal or the characteristic parameter obtained from the audio signal is stored, and control of the control means is performed. A message to be presented is selected from the stored messages according to the flow of the dialogue and presented, and the stored voice signal or the order of the feature parameter is changed so as to be different from the flow of the dialogue, and is compared with the voice recognition dictionary. A voice interaction method comprising the steps of: