JPH1031497A

JPH1031497A - Voice conversation control method and voice conversation system

Info

Publication number: JPH1031497A
Application number: JP8189060A
Authority: JP
Inventors: Toshiyuki Odaka; 俊之小高; Zuhaeru Toraberushi; ズハエルトラベルシ; Akio Amano; 明雄天野; Nobuo Hataoka; 信夫畑岡
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1996-07-18
Filing date: 1996-07-18
Publication date: 1998-02-03
Anticipated expiration: 2016-07-18
Also published as: JP3700266B2

Abstract

PROBLEM TO BE SOLVED: To efficiently conduct voice conversation between a user and a system by eliminating the system output which is for confirmation only and providing the recognition results including a neat guidance sentence. SOLUTION: A conversation control means 5 requests the guidance contents which urge the next action to a task control means 10. The response to the request is a question for a name. Then, the means 5 asks for a latest keyword to a keyword holding means and obtains the 'Material Section' which has been just POPed. Then, the means 5 transmits the next guidance contents and the latest keyword to an response generating means 7 and instructs it to generate a response sentence. Then, the means 7 generates a question sentence, such as 'What is your name in the Material Section?' and transmits the sentence to a voice output means 5. Then, the means 5 sends the recognition vocabulary to a recognition vocabulary supplementing means 3 in order to confirm the name.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、情報検索などを行
なうために利用する計算機システムに係り、特に、マイ
クとスピーカあるいは電話などの音声入出力インタフェ
ースを備え、誰でも容易に利用することができる音声対
話システムに関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a computer system used for performing information retrieval and the like, and more particularly, to a computer system provided with a microphone and a voice input / output interface such as a speaker or a telephone so that anyone can use it easily. The present invention relates to a voice interaction system.

【０００２】[0002]

【従来の技術】音声対話システムでは一般に、利用者の
システムに対するデータ入力において音声認識の技術を
用いている。音声によるデータ入力は、電話を通して使
う場合、あるいは車の運転中などのいわゆるハンズアイ
ビジーの状況下使う場合、非常に有効である。2. Description of the Related Art Generally, a speech dialogue system uses a speech recognition technique for data input to a user's system. Voice data input is very effective when used over a telephone or when used in a so-called hands-eye busy situation such as driving a car.

【０００３】普通は複数の項目（例えば、電話番号検索
のアプリケーションでは相手の所属と名前、チケットの
予約のアプリケーションでは月日、時間、大人または子
供、枚数等）に対してデータを入力する。しかし一方、
音声認識において１００％の認識率を実現することは不
可能であり、利用者の入力に対して、入力項目毎にシス
テムの認識結果を確認することが必須である。例えば、
文または文節単位の音声認識が可能な音声対話システム
での対話例は、システム：「相手の所属と名前をどうぞ」利用者：「資材課の佐藤さん」システム：「資材課の佐藤ですね」利用者：「はい」となる。この例のように、文または文節単位の音声認識
が可能なシステムで認識が正しく行われた場合は、入力
時間も短く利用効率が良くなる。しかし、文または文節
単位の認識の場合、複数のキーワード（ここでの例では
相手の所属と名前）の組合せ分だけ認識対象の種類が多
くなるばかりでなく、助詞も含めた表現のバラエティが
多様になるという点で、単語認識に比較して認識性能が
低くなる。さらに、認識を間違えた場合は、次のような
問題も発生する。Normally, data is input for a plurality of items (for example, the affiliation and name of a partner in a telephone number search application, the date and time, adults or children, the number of sheets, etc. in a ticket reservation application). But on the other hand,
It is impossible to achieve a recognition rate of 100% in voice recognition, and it is essential to check the recognition result of the system for each input item for a user input. For example,
An example of dialogue in a voice dialogue system that can recognize sentences or phrases is as follows: System: "Please select the affiliation and name of the other party" User: "Mr. Sato of the Materials Division" System: "Sato of the Materials Division" User: Yes. As in this example, when recognition is correctly performed in a system capable of performing speech recognition in units of sentences or phrases, the input time is short and the use efficiency is improved. However, in the case of recognition by sentence or phrase, not only the number of types of recognition targets increases by the combination of a plurality of keywords (in this example, the affiliation and name of the partner), but also the variety of expressions, including particles, varies. , The recognition performance is lower than that of word recognition. Further, if the recognition is wrong, the following problem also occurs.

【０００４】システム：「相手の所属と名前をどうぞ」利用者：「資材課の佐藤さん」システム：「資材課の加藤ですね」利用者：「いいえ」システム：「相手の所属と名前をどうぞ」最初のシステムの質問に、所属と名前という２つのキー
ワードが含まれている。この場合、システム側でこのや
りとりだけからでは、利用者が否定したことが所属と名
前のどちらか片方だけを誤認識したことを意味するの
か、あるいは両方誤認識したことを意味するのか、が特
定できない。したがって、所属と名前の両方のキーワー
ドが正しく認識されるまで、同じ質問を繰り返すことに
なる。この場合は、時間がかかるという問題がある。ま
た別な対処方法として、利用者に誤認識された方だけ再
入力してもらう方法も考えられる。この場合は、所属か
名前かわからない発声を認識しなければならないので、
音声認識の性能に対する要求が現状で対応可能なレベル
より高くなるという問題がある上、そのように利用者の
発声を誘導するにはどうすれば良いかという、解決が非
常に困難な課題が生じる。[0004] System: "Please choose the affiliation and name of the other party" User: "Mr. Sato of the Materials Division" System: "It's Kato of the Materials Division" User: "No" System: "Please choose the affiliation and name of the other party The first system question included two keywords: affiliation and name. In this case, the system alone can determine whether the user's rejection means that either the affiliation or the name has been misrecognized, or that both have been misrecognized. Can not. Thus, the same question is repeated until both the affiliation and name keywords are correctly recognized. In this case, there is a problem that it takes time. As another countermeasure, a method in which only the user who has been erroneously recognized may input again is also conceivable. In this case, you need to recognize utterances that do not know your affiliation or name,
In addition to the problem that the demand for the performance of speech recognition is higher than the level that can be currently dealt with, there is a very difficult problem of how to guide the utterance of the user.

【０００５】これに対して、単語認識の場合は認識対象
の表現のバラエティが抑えられ、現状レベルでもほぼ満
足のいく認識性能が得られる。また、項目を１つずつ質
問し、確認するために、確実に１項目ずつのデータ入力
が行える。例えば、次のようになる。On the other hand, in the case of word recognition, the variety of expressions to be recognized is suppressed, and almost satisfactory recognition performance can be obtained even at the current level. In addition, in order to ask and confirm items one by one, data can be input one item at a time. For example,

【０００６】システム：「相手の所属をどうぞ」利用者：「資材課」システム：「資材課ですか」利用者：「はい」システム：「相手の名前をどうぞ」利用者：「佐藤」システム：「佐藤ですか」利用者：「はい」ただし、ここに示したようにやりとりが長くなり全体に
時間がかかる傾向になるという大きな問題が残る。[0006] System: "Please add the other party's affiliation" User: "Materials Section" System: "Is the Materials Section?" User: "Yes" System: "Please enter the name of the other party" User: "Sato" System: "Is Sato?" User: "Yes" However, as shown here, there remains a major problem that the exchange becomes longer and the whole tends to take longer.

【０００７】[0007]

【発明が解決しようとする課題】上記のような従来の音
声対話システムにおいて、現状の音声認識性能と時間的
な利用効率は相反するパラメータであった。In the above-mentioned conventional speech dialogue system, the current speech recognition performance and the temporal utilization efficiency are incompatible parameters.

【０００８】本発明の目的は、現状で可能な音声認識性
能の範囲で、最も効率良くデータ入力が行え、利用者と
システムとの間で円滑な対話を実現できる音声対話シス
テムを提供することにある。[0008] It is an object of the present invention to provide a speech dialogue system capable of inputting data most efficiently within the range of currently available speech recognition performance and realizing a smooth dialogue between a user and the system. is there.

【０００９】[0009]

【課題を解決するための手段】本発明によれば、以下の
ような手段による対話制御方法および以下のような手段
を設けた音声対話システムが提供される。According to the present invention, there is provided a dialogue control method using the following means and a voice dialogue system provided with the following means.

【００１０】タスク管理手段に要求を出しその返答結果
に応じて、応答生成手段、認識語彙補足手段、キーワー
ド判定手段、およびキーワード保持手段を制御し、また
認識語彙補足手段を介して音声認識手段を、さらに応答
生成手段を介して音声合成手段を制御し、システムと利
用者の間の対話を進行させる対話制御手段の元で、前記
キーワード保持手段は、前記対話制御手段の要求に基づ
き、キーワードの保持、削除、最新のキーワードの通知
を行い、前記対話制御手段は、タスク管理手段に対話の
進行上における次のアクションを促すガイダンス内容を
要求して該ガイダンス内容を受け取り、また前記キーワ
ード保持手段に最新のキーワードを要求して該最新のキ
ーワードを受け取り、前記ガイダンス内容と前記最新の
キーワード、およびそれらを用いて応答文を生成させる
指示を応答生成手段へ通知し、前記応答生成手段は、前
記対話制御手段から受け取った指示に従い、対話の進
行上における次のアクションを促すガイダンス文の中
に、同時に受け取った前段階の認識結果でもある最新の
キーワードを含めた応答文を生成し、音声合成手段へ出
力し、次に前記対話制御手段は、タスク管理手段に対話
の進行上における次の認識語彙を要求して該認識語彙を
受け取り、タスクに依存したキーワードからなる該認識
語彙を認識語彙補足手段へ送り、前記認識語彙補足手段
は、前記対話制御手段より受け取った認識語彙に、「取
消」「ヘルプ」「ストップ」「もう一度」「わからな
い」「任意」などのタスクから独立なコマンドを表す単
語を補足して、音声認識手段およびキーワード判定手段
に渡し、前記キーワード判定手段は、前記認識語彙補足
手段より得た補足後の認識語彙と前記音声認識手段より
得た認識結果とを比較し、該認識結果が前記タスクから
独立なコマンドか、タスクに依存したキーワードかを判
定し、その判定結果を対話制御手段へ送り、さらに次に
前記対話制御手段は、前記キーワード判定手段の判定結
果を元に、該判定結果がキーワードである場合は、それ
をキーワード保持手段に送ると共に該キーワードをタス
ク管理手段に送り、前記判定結果がコマンドである場
合は、各コマンドに対する処理を行う。[0010] A request is issued to the task management means, and the response generation means, the recognized vocabulary supplementing means, the keyword determination means, and the keyword holding means are controlled in accordance with the response result, and the speech recognition means is controlled via the recognized vocabulary supplementing means. Further, under the control of the speech synthesizing means via the response generating means, and the dialog control means for proceeding the dialog between the system and the user, the keyword holding means, based on the request of the dialog control means, Holding, deleting, notifying of the latest keyword, the dialog control means requests the task management means for guidance content for prompting the next action in the progress of the dialogue, receives the guidance content, and sends the guidance content to the keyword holding means. Requesting the latest keyword and receiving the latest keyword, the guidance content and the latest keyword, and An instruction to generate a response sentence using them is sent to the response generation means, and the response generation means, in accordance with the instruction received from the interaction control means, includes a guidance sentence for prompting a next action in the progress of the interaction, At the same time, a response sentence including the latest keyword, which is also the recognition result of the previous stage received, is generated and output to the speech synthesis means. Then, the dialog control means sends the task management means the next recognized vocabulary in the progress of the dialog. , Receiving the recognition vocabulary, and sending the recognition vocabulary comprising task-dependent keywords to the recognition vocabulary supplementing means. The recognition vocabulary supplementing means adds “cancel”, “cancel” to the recognition vocabulary received from the dialog control means. Supplement words that represent commands that are independent of tasks, such as help, stop, repeat, don't know, or any Handover, and the keyword determination unit compares the supplemented recognized vocabulary obtained by the recognized vocabulary supplementing unit with the recognition result obtained by the voice recognition unit, and the recognition result is a command independent of the task. Or a keyword depending on the task, and sends the result of the determination to the dialogue control means. Next, the dialogue control means determines whether the keyword is a keyword based on the determination result of the keyword determination means. Sends the keyword to the keyword holding unit and sends the keyword to the task management unit, and if the determination result is a command, performs a process for each command.

【００１１】[0011]

【発明の実施の形態】以下図を用いて本発明の実施例を
説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings.

【００１２】図１は本発明による音声対話システムの一
実施例を示すブロック図である。音声対話システムは、
あるタスクを遂行することを目的として、利用者とシス
テムが音声を使ってやりとりをするようなシステムであ
る。本発明による音声対話システムは、対話制御手段
（５）の制御の元に動作する。対話制御手段（５）の動
作については、後で詳述する。FIG. 1 is a block diagram showing one embodiment of a voice dialogue system according to the present invention. The spoken dialogue system
It is a system in which the user and the system communicate using voice for the purpose of performing a certain task. The voice dialogue system according to the present invention operates under the control of the dialogue control means (5). The operation of the dialog control means (5) will be described later in detail.

【００１３】タスク管理手段（１０）は、タスクに依存
した処理を全て担当する。タスクに依存した情報（タス
ク遂行の手順、場面毎の入力待ち語彙、等）を管理した
り、対話制御手段からのタスクの進行等についての問い
合わせなどに返答したりする。The task management means (10) is in charge of all tasks dependent processing. It manages task-dependent information (task execution procedure, input waiting vocabulary for each scene, etc.) and responds to inquiries about the progress of tasks from the dialog control means.

【００１４】音声認識手段（２）は、与えられた認識語
彙の範囲内で、入力される音声（１）を認識し、認識結
果を１つの単語あるいは複数の単語の列として出力す
る。音声認識手段（２）の実現方法としては様々な手法
が考えられ、本発明はその方法を限定しない。例えば、
ＨＭＭ（ＨｉｄｄｅｎＭａｒｋｏｖＭｏｄｅｌ）という
確率モデルを用いる手法が扱いやすい。この方法を用い
れば、任意の認識語彙をテキストとして与えるだけで、
音声認識できる構成にすることができる。詳細は、“中
川聖一：”確率モデルによる音声認識、電子情報通信学
会、１９８８”他の文献に詳しいので、本明細書では省
略する。The speech recognition means (2) recognizes the input speech (1) within the range of the given recognition vocabulary, and outputs the recognition result as one word or a sequence of a plurality of words. Various methods are conceivable as a method of realizing the voice recognition means (2), and the present invention does not limit the method. For example,
A method using a probabilistic model called HMM (Hidden Markov Model) is easy to handle. With this method, just giving any recognized vocabulary as text,
A configuration capable of voice recognition can be provided. The details are described in "Seiichi Nakagawa: Speech Recognition by Probabilistic Model, IEICE, 1988" and other documents, and are omitted in this specification.

【００１５】認識語彙補足手段（３）は、対話制御手段
（５）より受け取った認識語彙に、特にタスクから独立
なコマンドを表す単語等を補足して、音声認識手段に渡
す。ここで、コマンドとしては、「違います」「違
う」「いいえ」「取消」「キャンセル」などの否定ある
いは取消を意味する単語、「ヘルプ」などの助言要求
を意味する単語、「ストップ」「停止」「中止」など
のシステムの処理の停止要求を意味する単語、「もう一
度」「リピート」などの直前のシステム応答の再出力要
求を意味する単語、などがある。さらに、「わかりませ
ん」「わからない」などの不明を意味する単語、「問わ
ない」「何でも良い」「任意」などの任意を意味する単
語なども認識語彙補足手段（３）で補足され得る単語と
する。図３に認識語彙の例を示しており、人名の認識語
彙を例に取り、補足前（ａ）と補足後（ｂ）の認識語彙
を示している。The recognition vocabulary supplementing means (3) supplements the recognition vocabulary received from the dialog control means (5), especially words representing commands independent of tasks, and passes them to the speech recognition means. Here, the command is a word that means negation or cancellation, such as "no,""no,""no,""cancel," or "cancel," a word that means an advisory request, such as "help,""stop,""stop." There are words that mean a request to stop the processing of the system, such as "", "stop", and words, such as "again" and "repeat", that mean a request to re-output the immediately previous system response. Furthermore, words that can be supplemented by the recognized vocabulary supplementing means (3) include words that mean unknown, such as “I don't understand” or “I don't know”, and words that mean arbitrary, such as “I don't care”, “Anything is fine”, or “Any”. And FIG. 3 shows an example of the recognition vocabulary. The recognition vocabulary of the personal name is taken as an example, and the recognition vocabulary before (a) and after (b) the supplement is shown.

【００１６】キーワード判定手段（６）は、音声認識手
段（２）より得られた認識結果がタスクから独立なコマ
ンドか、タスクに依存したキーワードかを判定し、その
結果を対話制御手段へ送る。ここで、判定結果は、例え
ば図４に示すように表現される。図４に示した例では、
１つの判定結果が、２つの値の組み合わせで表現され
る。左側が判定結果の種類を表しており、「ＣＭＤ」は
コマンド、「ＫＷ」はキーワードを表す。また、「Ｃ
ＭＤ」と組み合わされている「ＣＡＮＣＥＬ」、「ＨＥ
ＬＰ」はコマンドの種類を表している。さらに「ＫＷ」
と組み合わされている“資材課”や“佐藤”は実データ
値を示している。The keyword determining means (6) determines whether the recognition result obtained from the voice recognition means (2) is a command independent of the task or a keyword dependent on the task, and sends the result to the dialog control means. Here, the determination result is expressed, for example, as shown in FIG. In the example shown in FIG.
One determination result is represented by a combination of two values. The left side shows the type of the determination result, “CMD” represents a command, and “KW” represents a keyword. Also, "C
"CANCEL", "HE" combined with "MD"
"LP" indicates the type of command. Furthermore, "KW"
“Materials section” and “Sato” combined with are showing actual data values.

【００１７】キーワード保持手段（６）は、対話制御手
段の指示に従って、渡されたキーワードをスタック形式
で保持したり、スタックに積まれているキーワードを対
話制御手段に通知したりする。The keyword holding means (6) holds the passed keywords in a stack format or notifies the dialog control means of the keywords stored in the stack, according to the instruction of the dialog control means.

【００１８】応答生成手段（７）は、対話制御手段
（５）からの指示に従い、タスクを遂行するために必要
な項目の内容（名前などの実データ値）を質問するよう
な応答文を生成する。The response generation means (7) generates a response sentence which inquires of the contents (actual data values such as names) of the items necessary for performing the task in accordance with the instruction from the interaction control means (5). I do.

【００１９】音声合成手段（８）は、応答生成手段
（７）から得られる応答文を音声波形に変換してスピー
カなどの電気信号から音波へ変換するデバイスにより空
間中を伝播する音声（９）として出力する。The voice synthesizing means (8) converts the response sentence obtained from the response generating means (7) into a voice waveform and converts the electric signal into a sound wave by a device such as a speaker. Output as

【００２０】図２は対話制御手段の処理フローを示す図
である。簡単のために、本発明による音声対話システム
でデータ入力のみが行われる場合のフローを示してい
る。実際のタスクにおいては、利用者に対する結果のみ
の提示なども含まれることになる。FIG. 2 is a diagram showing a processing flow of the dialog control means. For simplicity, the flow when only data input is performed in the voice interaction system according to the present invention is shown. In the actual task, presentation of only the result to the user is also included.

【００２１】次に、図２のフローに従って、電話接続サ
ービスなどをタスクとして、所属名に‘資材課’、人名
に‘佐藤’を入力する場合を仮定して処理手順を説明す
る。Next, the processing procedure will be described in accordance with the flow of FIG. 2 on the assumption that a telephone connection service or the like is a task and "materials section" is input as the affiliation name and "Sato" is input as the personal name.

【００２２】同タスク管理手段に対話の進行上における
次のアクションを促すガイダンス内容を要求し、まずサ
ービスが開始された時点では、対話制御手段（５）は、
タスク管理手段（１０）へ対話の進行上における次の
アクションを促すガイダンス内容を要求する。この要求
に対する返答は、次のガイダンス内容が「所属の質
問」である、として得られる。次に、対話制御手段
（５）は、キーワード保持手段から最新のキーワードを
受理しようとする。システム利用開始直後は、キーワー
ド保持手段はからであり、その旨が対話制御手段へ通知
される。次に、対話制御手段（５）は、次のガイダンス
内容（「所属の質問」）と、あれば最新のキーワード
（この段階では「なし」）を応答生成手段（７）に送
り、応答文を生成するように指示する。応答生成手段
（７）では、ガイダンス内容が「所属の質問」である
ので、例えば“所属をどうぞ”というような質問文を生
成して、音声出力手段（８）に送る。続いて、対話制御
手段（５）は、「所属」を音声認識するための認識語彙
を、認識語彙補足手段（３）へ送る。ここでの認識語彙
は、先の、次のアクションを促すガイダンス内容を要求
した際に、タスク管理手段（１０）より一緒に受け取る
こととする。あるいは、この段階であらためて、タスク
管理手段に問い合わせて、受理しても良い。この後、認
識語彙補足手段（３）で補足された認識語彙は、音声認
識手段（２）に送られる。そして、利用者の発声に対す
る音声認識手段（２）よる認識結果は、キーワード判定
手段（４）を介してコマンドかキーワードかの判定が付
加され、図４に示したような形式で対話制御手段（５）
に戻される。今の仮定では、ここでの受理結果は「Ｋ
Ｗ］＋「資材課」である。すなわち、対話制御手段
（５）はキーワードをキーワード保持手段（６）のスタ
ックにキーワードをＰＯＰし、さらに、タスク管理手段
（１０）に対してキーワードを通知する。この時点で、
タスク管理手段（１０）は、内部でタスクの進行状態が
更新される。The task management means is requested for guidance contents for prompting the next action in the progress of the dialog. First, when the service is started, the dialog control means (5)
It requests the task management means (10) for guidance contents for prompting the next action in the progress of the dialogue. The response to this request is obtained as the following guidance content is "affiliation question". Next, the dialog control means (5) tries to receive the latest keyword from the keyword holding means. Immediately after the start of use of the system, the keyword holding unit is empty, and the fact is notified to the dialog control unit. Next, the dialogue control means (5) sends the next guidance content ("Affiliation question") and the latest keyword ("None" at this stage) to the response generation means (7), and sends the response sentence. Instruct to generate. In the response generation means (7), since the guidance content is "question of affiliation", a question sentence such as "please belong" is generated and sent to the voice output means (8). Subsequently, the dialog control means (5) sends a recognized vocabulary for voice recognition of "belonging" to the recognized vocabulary supplementing means (3). The recognition vocabulary here is received together from the task management means (10) when the guidance content for prompting the next action is requested. Alternatively, an inquiry may be made to the task management means again at this stage to accept it. Thereafter, the recognized vocabulary supplemented by the recognized vocabulary supplementing means (3) is sent to the speech recognition means (2). Then, the recognition result of the user's utterance by the voice recognition means (2) is added with a determination as to whether it is a command or a keyword via the keyword determination means (4), and is performed in the form shown in FIG. 5)
Is returned to. According to the current assumption, the result of acceptance here is "K
W] + “Materials Section”. That is, the dialog control means (5) POPs the keyword on the stack of the keyword holding means (6) and further notifies the task management means (10) of the keyword. at this point,
The task management means (10) internally updates the task progress status.

【００２３】ここで、図２のフローの最初に戻り、対話
制御手段（５）は、再びタスク管理手段（１０）へ、対
話の進行上における次のアクションを促すガイダンス内
容を要求する。この要求に対する返答は、次のガイダン
ス内容が「名前の質問」であるである、として得られ
る。次に、対話制御手段（５）は、キーワード保持手段
から最新のキーワードを問い合わせ、先ほどＰＯＰされ
たばかりの「資材課」が得られる。そして、対話管理手
段（５）は、次のガイダンス内容（「名前の質問」）と
最新のキーワード（「資材課」）を応答生成手段（７）
に送り、応答文を生成するように指示する。応答生成手
段（７）では、「資材課」を含めて、かつ、「名前」
を問い合わせるような質問文、例えば“資材課で名前は
何ですか”とか“資材課の誰ですか”というような質問
文を生成して、音声出力手段（８）に送る。続いて、対
話制御手段（５）は、「名前」を音声認識するための認
識語彙を、認識語彙補足手段（３）へ送る。そしてま
た、音声認識手段（２）による認識結果は、キーワード
判定手段（４）を介してコマンドかキーワードかの判定
が付加され、図４に示したような形式で対話制御手段
（５）に戻される。今の仮定では、ここでの受理結果は
「ＫＷ］＋「佐藤」となる。以上の繰り返しで、基本的
な対話が進行していく。Here, returning to the beginning of the flow of FIG. 2, the dialogue control means (5) requests the task management means (10) again for guidance contents for prompting the next action in the progress of the dialogue. The answer to this request is obtained as the next guidance content is "name question". Next, the dialogue control means (5) inquires the latest keyword from the keyword holding means, and obtains the "materials section" that has just been POP. Then, the dialogue management means (5) sends the next guidance content ("question of name") and the latest keyword ("materials section") to the response generation means (7).
And instruct it to generate a response sentence. In response generation means (7), include "material section" and "name"
, Such as "What is the name in the material section?" Or "Who is the material section?", And sends it to the voice output means (8). Subsequently, the interaction control means (5) sends a recognition vocabulary for voice recognition of the "name" to the recognition vocabulary supplementing means (3). Further, the result of recognition by the voice recognition means (2) is added with a judgment as to whether it is a command or a keyword through the keyword judgment means (4), and is returned to the dialogue control means (5) in a format as shown in FIG. It is. Under the current assumption, the result of acceptance here is “KW” + “Sato”. By repeating the above, the basic dialogue proceeds.

【００２４】次に、利用者の「資材課」の発声が誤認識
されて、認識結果が「施設課」になったと仮定した場合
の例を説明する。先の“資材課の誰ですか”の代わりに
“施設課の誰ですか”というシステム出力になっている
はずであり、それに対して、利用者は“違います”と答
えたとする。キーワード判定手段（４）より受け取った
結果が「ＫＷ」＋「佐藤」でなく、「ＣＭＤ」＋「ＣＡ
ＮＣＥＬ」となる。この場合、対話制御手段（５）はキ
ーワード保持手段（６）に対して、キーワードを１つ
（今の場合「資材課」）ＰＯＰするように指示する。さ
らに、タスク管理手段（１０）に対して、ＰＯＰされた
「資材課」を取り消すように通知する。ここで、対話制
御手段（５）の処理は図２のフローの最初に戻り、タス
ク管理手段（１０）、キーワード保持手段（６）への問
い合わせをし、次のアクションを促すガイダンス内容が
「所属の質問」であり、最新のキーワードは「なし」で
あることを応答生成手段（７）に送り、応答生成手段
（７）で再び、“所属をどうぞ”というような質問文が
生成される。Next, an example will be described in which it is assumed that the user's utterance of the "material section" is erroneously recognized and the recognition result is "the facility section". It is assumed that the system output is "Who is in the facility section" instead of "Who is in the material section", and the user answers "No" to the system output. The result received from the keyword determination means (4) is not "KW" + "Sato" but "CMD" + "CA
NCEL ". In this case, the dialogue control means (5) instructs the keyword holding means (6) to POP one keyword (in this case, “materials section”). Furthermore, the task management means (10) is notified to cancel the POP "material section". Here, the processing of the dialogue control means (5) returns to the beginning of the flow in FIG. 2, and inquires of the task management means (10) and the keyword holding means (6), and the guidance content for prompting the next action is "belonging Is sent to the response generating means (7), and the response generating means (7) again generates a question sentence such as "Please belong."

【００２５】最後に、本発明による音声対話システムを
利用した場合の、対話例を示す。タスクを電話の接続サ
ービスとした場合の例である。Finally, an example of a dialogue when the voice dialogue system according to the present invention is used will be described. This is an example where the task is a telephone connection service.

【００２６】システム：「相手の所属をどうぞ」利用者：「資材課」システム：「資材課の誰ですか」利用者：「佐藤」システム：「佐藤さんに電話をつなぎます」次に、１番目の項目で誤認識した場合の対話例を示す。System: “Please add the other party's affiliation” User: “Materials Section” System: “Who is in the Materials Section?” User: “Sato” System: “Call Sato-san” Here is an example of a dialogue in the case of misrecognition in the second item.

【００２７】システム：「相手の所属をどうぞ」利用者：「資材課」システム：「施設課の誰ですか」利用者：「違います」システム：「もう一度、相手の所属をどうぞ」利用者：「資材課」システム：「資材課の誰ですか」：（以下省略）次に、２番目以降の項目で誤認識した場合の対話例を示
す。[0027] System: "Please go to the other party's affiliation" User: "Materials Section" System: "Who is the facility section?" User: "No," System: "Please try again to belong to the other party" User: "Materials Section" System: "Who is the Materials Section?": (Omitted hereafter) Next, an example of dialogue when misrecognition is made in the second and subsequent items is shown.

【００２８】：（途中まで省略）システム：「資材課の誰ですか」利用者：「佐藤」システム：「加藤さんに電話をつなぎます」利用者：「違います」システム：「もう一度、資材課の誰ですか」利用者：「佐藤」システム：「佐藤さんに電話をつなぎます」図１において、利用者とシステムとの間のメディアとし
て、音声のみしか描いていないが、文字、画像、など他
のメディアも含めた対話システムでも良い。また、ボタ
ン入力を用意し、取消などをボタン入力するようにして
も良い。遠隔地の電話機から利用している際には、取消
などにタッチトーン信号を用いても良い。: (Some part omitted) System: "Who is in the materials section?" User: "Sato" System: "Connects the phone to Mr. Kato" User: "Is different" System: "Again, Materials section User: "Sato" System: "Connect Sato-san" In Figure 1, only audio is drawn as media between the user and the system, but characters, images, etc. A dialog system including other media may be used. Alternatively, a button input may be prepared, and a button such as a cancel button may be input. When using from a remote telephone, a touch tone signal may be used for cancellation or the like.

【００２９】図１におけるタスク管理手段はタスクに特
有の処理も含んでいる。例えば、遠隔地の電話機からシ
ステムを利用する形態の場合、電話回線の制御などが考
えられる。The task management means in FIG. 1 also includes processing specific to a task. For example, in the case of using the system from a remote telephone, control of a telephone line may be considered.

【００３０】[0030]

【発明の効果】本発明によれば、複数項目のデータ入力
を音声で行う音声対話システムにおいて、音声認識を利
用する際に必須の確認に関して、確認のみのシステム出
力を省略し、認識結果を次のガイダンス文に含めて提示
することでやりとりの数を減少させることにより、単
語音声認識程度の音声認識能力しか持たない音声対話シ
ステムでも、利用者とシステムとの間で効率良く音声対
話が進行できる効果が得られる。According to the present invention, in a voice dialogue system for inputting data of a plurality of items by voice, a system output of only verification is omitted for a required confirmation when using voice recognition, and the recognition result is set as follows. By reducing the number of conversations by presenting it in a guidance sentence, the user and the system can proceed efficiently with a dialogue even with a spoken dialogue system that has only a speech recognition capability equivalent to word speech recognition. The effect is obtained.

[Brief description of the drawings]

【図１】本発明による音声対話システムの構成の一実施
例を示すブロック図である。FIG. 1 is a block diagram showing one embodiment of the configuration of a voice dialogue system according to the present invention.

【図２】対話制御手段の処理手順を示すフローチャート
である。FIG. 2 is a flowchart illustrating a processing procedure of a dialog control unit.

【図３】認識語彙の一例を示す図である。FIG. 3 is a diagram showing an example of a recognized vocabulary.

【図４】キーワード判定手段から対話制御手段に渡され
るデータ構造の一例を示す図である。FIG. 4 is a diagram showing an example of a data structure passed from a keyword determination unit to a dialogue control unit.

【図５】キーワード保持手段に保持されるデータの一例
を示す図である。FIG. 5 is a diagram illustrating an example of data stored in a keyword storage unit.

[Explanation of symbols]

１…利用者の音声、２…音声認識手段、３…認識語彙補
足手段、４…キーワード判定手段、５…対話制御手段、
６…キーワード保持手段、７…応答生成手段、８…音声
出力手段、９…システムの出力音声、１０…タスク管理
手段。DESCRIPTION OF SYMBOLS 1 ... User's voice, 2 ... Voice recognition means, 3 ... Recognition vocabulary supplement means, 4 ... Keyword judgment means, 5 ... Dialogue control means,
6 ... Keyword holding means, 7 ... Response generation means, 8 ... Speech output means, 9 ... System output speech, 10 ... Task management means.

フロントページの続き (72)発明者畑岡信夫東京都国分寺市東恋ケ窪一丁目280番地株式会社日立製作所中央研究所内Continued on the front page (72) Inventor Nobuo Hataoka 1-280 Higashi Koigakubo, Kokubunji-shi, Tokyo Inside Central Research Laboratory, Hitachi, Ltd.

Claims

[Claims]

A dialogue is provided by using guidance contents for prompting a next action in the progress of a dialogue managed by a task management means and the latest keyword which is a recognition result of a previous stage held by a keyword holding means. A response sentence including the latest keyword is generated and output by the response generating unit in the guidance sentence for prompting the next action in the progress of the process, and the task is managed by the task managing unit in the recognition vocabulary supplementing unit. A word representing a command independent from the task is supplemented to the recognition vocabulary in the next scene in the progress of the dialogue, and within the range of the recognition vocabulary after the supplementation, the speech recognition means recognizes the speech uttered by the user. The keyword determination means determines whether the recognition result is a command independent of the task or a keyword dependent on other tasks. If the judgment result is a command,
The process for each command is performed, and when the result of the determination is the keyword, the keyword that is the result of the determination is newly held in the keyword holding unit as the latest keyword. A voice dialogue control method, characterized in that the voice is advanced by notifying the management means and repeating the above.

2. The method according to claim 1, wherein the command includes at least a word meaning negation or cancellation, such as "no,""no,""no,""cancel," or "cancel." If the result of the determination is a word meaning the negation or cancellation, the keyword holding means is instructed to delete one of the latest keywords, and at the same time, the content of the immediately preceding request item is instructed to the task management means. A voice dialogue control method, wherein a notification of cancellation is provided.

3. The voice dialogue control method according to claim 1, wherein the command includes at least a word meaning an advice request such as "help", and the determination result is a word meaning the advice request. And a response generation means for outputting a message about the usage.

4. The voice interaction control method according to claim 1, wherein the command includes at least a word indicating a request to stop processing of the system such as “stop”, “stop”, “stop”, “stop”, and the like. A speech dialogue control method, wherein the dialogue progress is stopped when the determination result is a word meaning the stop request.

5. The voice dialogue control method according to claim 1, wherein said response generation means has means for storing one or more response sentences, which can be referred to later, and wherein said command is at least “again” or “repeat”. If the determination result is a word meaning the re-output request including a word meaning a re-output request of the immediately preceding system response, the response generation means may re-output the response sent immediately before. A voice interaction control method characterized by the following.

6. The method according to claim 1, wherein the command is at least "I do not know."
Including a word meaning unknown such as "I don't know", and when the determination result is a word meaning unknown, notifies the task management means that the content of the immediately preceding request item is unknown, At the same time, request the task management means for guidance that prompts the next action in the progress of the dialogue,
A spoken dialogue control method characterized in that the dialogue proceeds according to the contents of the response.

7. The voice interaction control method according to claim 1, wherein said command includes at least a word meaning an arbitrary word such as "don't care", "anything", "any", and said judgment result indicates that said arbitrary word is arbitrary. If it is a meaningful word, notify the task management means that the content of the request item is optional, and at the same time, request the task management means for guidance content that prompts the next action in the progress of the dialogue, and the response content A dialogue control method characterized in that the dialogue proceeds according to the following.

8. A request for providing information related to the progress of a task-dependent dialogue, such as guidance content for managing the progress of a task-dependent dialogue and prompting the next action in the progress of the dialogue. A task management means for responding information to the request, a recognition vocabulary supplementing means, a keyword determination means, a keyword holding means, and a response generation means in response to a request issued to the task management means and the response result. Dialogue control means for promoting a dialogue between a user and a user; a recognition vocabulary supplementary means for supplementing a new word to the recognition vocabulary received from the dialogue control means and passing the supplemented result to a speech recognition means; A speech recognition method for recognizing a speech uttered by a user and outputting one or more word sequences within the range of the recognition vocabulary specified by the output of the vocabulary supplementing means. And a keyword determining unit for determining the one or more word sequences; holding a keyword passed from the dialog control unit; deleting a latest keyword from the held keywords; or a dialog control unit A response generating means for generating a response sentence using data from the dialog control means in accordance with an instruction from the dialog control means, and in some cases, obtained from the response generating means. And speech synthesis means for converting the response sentence into a speech waveform and outputting the speech. In particular, the dialogue control means requests the task management means for guidance content for prompting the next action in the progress of the dialogue. Requesting the latest keyword from the keyword holding means and receiving the latest keyword, Wherein the volume latest keywords, and notifies the response generation means an instruction to generate a response sentence by using them, the response generation means,
According to the instruction received from the interaction control means, a response sentence including the latest keyword, which is also a recognition result of the previous stage received at the same time, is generated in the guidance sentence for prompting the next action in the progress of the dialogue, and speech synthesis is performed. The dialogue control means requests the task management means for the next recognition vocabulary in the course of the dialogue, receives the recognition vocabulary, and supplements the recognition vocabulary consisting of task-dependent keywords with the recognition vocabulary supplement. Means, the recognition vocabulary supplementing means supplements the recognition vocabulary received from the dialog control means with a word representing a command independent of a task, passes the word to the speech recognition means and the keyword determination means, and passes the word to the keyword recognition means. Compares the supplemented recognition vocabulary obtained by the recognition vocabulary supplementing means with the recognition result obtained by the speech recognition means, and the recognition result is A command independent of the task,
It is determined whether the keyword depends on the task, the determination result is sent to the dialog control means, and further, the dialog control means, based on the determination result of the keyword determination means, if the determination result is a command, A voice dialogue system which performs a process for each command and, when the determination result is a keyword, sends the keyword as the determination result to the keyword holding means and simultaneously sends the keyword to the task management means.

9. The speech dialogue system according to claim 2, wherein the command includes at least a word meaning negation or cancellation such as "no,""no,""no,""cancel," or "cancel." If the determination result is a word meaning the negation or cancellation, the dialogue control means instructs the keyword holding means to delete one newest keyword, and at the same time, issues a request to the task management means. A voice dialogue system for notifying that the content of an item has been canceled.

10. The voice interaction system according to claim 2, wherein said command includes at least a word meaning an advice request such as "help", and the result of judgment by said keyword judgment means is a word meaning the advice request. Wherein the dialogue control means instructs the response generation means to provide a help response, and the response generation means outputs a message such as usage in accordance with the instruction.

11. The voice interaction system according to claim 2, wherein the command includes at least a word indicating a request to stop processing of the system, such as "stop", "stop", "stop", "stop", and the keyword. If the result of the determination by the determination means is a word meaning the stop request, the dialogue control means stops the dialogue progress.

12. The speech dialogue system according to claim 2, wherein said response generating means has means for storing one or more response sentences, which can be referred to later, and wherein said command is at least "again" or "repeat". When the result of the keyword determination means is a word meaning the re-output request, the dialog control means outputs the word immediately before to the response generation means. A spoken dialogue system that issues an instruction to re-output a response sentence.

13. The voice interaction system according to claim 2, wherein said command is at least "I do not understand".
If the keyword includes a word meaning unknown, such as "I don't know," and the result of the determination by the keyword determination means is a word meaning the unknown, the dialog control means sends the content of the immediately preceding request item to the task management means. Notify that is unknown,
At the same time, a voice dialogue system which requests the task management means for guidance content for prompting the next action in the progress of the dialogue, and proceeds with the dialogue according to the response content.

14. The voice dialogue system according to claim 2, wherein the command includes at least a word meaning an arbitrary word such as “don't care”, “anything”, “any”,
When the result of the determination by the keyword determination means is a word meaning the arbitraryness, the dialogue control means notifies the task management means that the content of the request item is optional, and at the same time, notifies the task management means of the dialogue. A spoken dialogue system that requests guidance content for prompting the next action in the process, and performs dialogue according to the response content.

15. The method of claim 8, 9, 10, 11, 12, 1.
15. A voice interaction system in which the contents of the tasks described in 3 and 14 are a telephone number search service or a telephone line connection service.

16. The method of claim 8, 9, 10, 11, 12, or 1.
15. A spoken dialogue system in which the contents of the tasks described in 3 and 14 are a transportation service, an entertainment or facility search service, or a reservation service.

17. The method of claim 8, 9, 10, 11, 12, 1.
15. The voice interaction system according to items 3 and 14, wherein the content of the task is a product search service, a sales service, or a support service.

18. The method of claim 8, 9, 10, 11, 12, 1.
15. A voice interaction system in which the contents of the tasks described in 3 and 14 are a route information providing service or a route guidance service.