JPH10326176A

JPH10326176A - Voice conversation control method

Info

Publication number: JPH10326176A
Application number: JP15043797A
Authority: JP
Inventors: Minoru Nagasaki; 実長崎; Kenichiro Fukushima; 健一郎福島; Nobuhiro Asatani; 伸宏浅谷; Nobuhiro Kimura; 信宏木村
Original assignee: OKI HOKURIKU SYST KAIHATSU KK
Current assignee: OKI HOKURIKU SYST KAIHATSU KK
Priority date: 1997-05-23
Filing date: 1997-05-23
Publication date: 1998-12-08

Abstract

PROBLEM TO BE SOLVED: To input an ordinary asking-back sentence without being aware of special operation, by judging the asking-back sentence of an interocutor and selecting a processing method for voice data, which were outputted last, according to the kind of the asking-back sentence. SOLUTION: If a learner 1 fails to or can not hear a conversational sentence when the conversational sentence is outputted by a speaker 4, this sentence is inputted to a device main body 2 through a microphone 5 and a voice recognition part 13 of the device main body 2 recognizes the voice of the asking-back sentence. Then, a processor 16 judges that the input is the asking-back sentence and then judged the kind of the asking-back sentence in such a case, and once an output method corresponding to the judged kind is determined, a voice output part 12 drives the speaker 4 to output the same conversational sentence. At this same time, a proper image is displayed.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、音声認識機能を持
つコンピュータを使用した語学訓練装置等において、聞
き取れなかった会話文を聞き返すための音声対話制御方
法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech dialogue control method for returning a conversation sentence that cannot be heard in a language training apparatus or the like using a computer having a speech recognition function.

【０００２】[0002]

【従来の技術】コンピュータに音声認識機能を付与し、
対話者と所定の対話を行い、語学訓練をする装置があ
る。この装置は、対話者の声をマイクから受け入れて、
その内容を認識する。次に、必要な会話文を選択してス
ピーカ等を用いて発話し、会話を進める。語学訓練装置
に限らず、人の要求を受け入れて各種の操作を実行する
ための装置には、この種の会話型装置が組み込まれる可
能性が高い。こうした装置において、対話者が発話した
言葉を認識できない場合には、例えばブザーを鳴らした
り、ディスプレイに何らかの表示をして、再度の発話を
促す。また、これとは逆に、装置の側からの発話を対話
者が聞き取れない場合もある。こうした場合には対話者
が装置の所定のキーを押したりして再度その言葉を発話
するよう要求する。2. Description of the Related Art A computer is provided with a voice recognition function,
There is a device that performs a predetermined dialogue with a speaker and performs language training. This device accepts the voice of the interlocutor from the microphone,
Recognize the contents. Next, the user selects a necessary conversation sentence, speaks using a speaker or the like, and proceeds with the conversation. It is highly likely that not only language training devices but also devices for receiving various requests and executing various operations will incorporate such interactive devices. In such a device, if the interlocutor cannot recognize the uttered word, for example, a buzzer sounds or some display is displayed on the display to prompt the user to utter again. Conversely, there is a case where the interlocutor cannot hear the utterance from the device side. In such a case, the interlocutor requests that the word be spoken again by pressing a predetermined key of the device.

【０００３】[0003]

【発明が解決しようとする課題】ところで、上記のよう
な従来の技術には次のような解決すべき課題があった。
上記のように、例えば装置が発話した言葉を対話者が理
解できないようなとき、これを聞き返すための操作は、
装置によって様々な形態となる。語学訓練装置の場合に
は、パーソナルコンピュータが主体となる。従って、対
話者はキーボードやマウスを用いて所定の手順を踏み、
聞き返し処理を行うことになる。しかしながら、こうし
た操作方法を習得しなければ円滑な会話は進まない。ま
た、会話訓練中にこの種の操作をすると、自然な会話が
中断されるという問題もある。さらに、語学訓練装置に
限らず、コンピュータが人と対話するようなシステムで
は、より自然な形で会話が進められることが好ましい。However, the above-mentioned prior art has the following problems to be solved.
As described above, for example, when the interlocutor does not understand the words spoken by the device, the operation for listening back is as follows.
There are various forms depending on the device. In the case of a language training device, a personal computer is mainly used. Therefore, the interlocutor follows a predetermined procedure using the keyboard and mouse,
A reflection process will be performed. However, smooth conversation does not proceed without learning such an operation method. In addition, there is also a problem that a natural conversation is interrupted if such an operation is performed during the conversation training. Further, in a system in which a computer interacts with a person, not limited to the language training apparatus, it is preferable that the conversation can proceed in a more natural form.

【０００４】[0004]

[Means for Solving the Problems]

〈構成１〉対話者が発話したとき、その発話の内容を音
声認識して、通常の会話文か聞き返し文かを判定すると
ともに、聞き返し文の場合には、音声認識して得られた
認識文を解析して、対話者の聞き返し文の種類を判定
し、この聞き返し文の種類に応じて、直前に出力した音
声データの処理方法を選択することを特徴とする音声対
話制御方法。<Structure 1> When the interlocutor utters a speech, the content of the utterance is recognized by speech to determine whether it is a normal conversation sentence or a return sentence. In the case of a return sentence, a recognition sentence obtained by speech recognition is obtained. A speech dialogue control method characterized in that the type of the sentence of the interlocutor is analyzed to determine the type of the sentence of the interlocutor, and the method of processing the immediately preceding output speech data is selected in accordance with the type of the sentence of the interlocutor.

【０００５】〈構成２〉構成１において、聞き返し文の
種類が、出力音量の変更を要求するものである場合に
は、音声データの出力音量を要求に応じて変更すること
を特徴とする音声対話制御方法。<Structure 2> In structure 1, when the type of the reflected sentence is a request for changing the output sound volume, the output sound volume of the voice data is changed according to the request. Control method.

【０００６】〈構成３〉構成１において、聞き返し文の
種類が、出力速度の変更を要求するものである場合に
は、音声データの出力速度を要求に応じて変更すること
を特徴とする音声対話制御方法。<Structure 3> In structure 1, when the type of the reflected sentence requests a change in the output speed, the output speed of the voice data is changed according to the request. Control method.

【０００７】〈構成４〉構成１において、聞き返し文の
種類に応じて、直前に出力した音声データとともに表示
する表示画像の処理方法を選択することを特徴とする音
声対話制御方法。<Structure 4> A speech dialogue control method according to Structure 1, wherein a method of processing a display image to be displayed together with the voice data output immediately before is selected according to the type of the sentence to be heard.

【０００８】〈構成５〉対話者が発話したとき、その発
話の内容を音声認識して、通常の会話文か聞き返し文か
を判定するとともに、聞き返し文の場合には、音声認識
して得られた認識文を解析して、対話者の聞き返し文の
種類を判定し、この聞き返し文の種類に応じて、直前に
出力した音声データの処理方法を選択するよう制御する
プログラムを記録した記録媒体。<Structure 5> When the interlocutor utters, the content of the utterance is recognized by speech to determine whether the sentence is a normal conversation sentence or a return sentence. A recording medium for storing a program for analyzing the recognized sentence, determining the type of the sentence of the interlocutor, and selecting a processing method for the immediately preceding voice data according to the type of the sentence.

【０００９】〈構成６〉対話者が発話して、音声認識が
できないとき、予め用意した該当する聞き返し文を選択
して、音声による応答出力を行うことを特徴とする音声
対話制御方法。<Structure 6> A spoken dialogue control method characterized in that when a talker speaks and speech cannot be recognized, a corresponding reply sentence prepared in advance is selected and a response is output by voice.

【００１０】〈構成７〉対話者が発話して、音声認識が
できないとき、予め用意した該当する聞き返し文を選択
して、音声による応答出力を行うよう制御するプログラ
ムを記録した記録媒体。<Structure 7> A recording medium in which a program for controlling a response sentence prepared in advance by selecting a corresponding reply sentence when a speech is spoken by a talker and speech recognition cannot be performed is recorded.

【００１１】[0011]

【発明の実施の形態】以下、本発明の実施の形態を具体
例を用いて説明する。〈具体例〉図１は、本発明による音声対話制御方法の説
明図である。この説明をする前に、まずこの発明を利用
して動作する語学訓練装置の構成を説明する。図２に、
語学訓練装置のブロック図を図示した。この装置は、学
習者１の語学訓練のために、装置本体２に、ディスプレ
イ３、スピーカ４、マイク５等を備えている。装置本体
２はパーソナルコンピュータ等から構成される。その内
部の機能ブロックをこの図の右側に示した。即ち、装置
本体２には、音声入力部１１、音声出力部１２、音声認
識部１３、画像処理部１４、プログラムメモリ１５、プ
ロセッサ１６及び記憶部１７等が設けられている。DESCRIPTION OF THE PREFERRED EMBODIMENTS Embodiments of the present invention will be described below using specific examples. <Specific Example> FIG. 1 is an explanatory diagram of a voice interaction control method according to the present invention. Prior to the description, first, the configuration of a language training apparatus that operates using the present invention will be described. In FIG.
A block diagram of the language training device is illustrated. This device includes a display 3, a speaker 4, a microphone 5, and the like in a device main body 2 for language training of a learner 1. The apparatus main body 2 is composed of a personal computer or the like. The internal functional blocks are shown on the right side of this figure. That is, the apparatus main body 2 includes a voice input unit 11, a voice output unit 12, a voice recognition unit 13, an image processing unit 14, a program memory 15, a processor 16, a storage unit 17, and the like.

【００１２】音声入力部１１はマイク５を制御し、学習
者１の音声を取り込む機能を持つ。音声出力部１２は装
置の生成した音声をスピーカ４を駆動して出力する機能
を持つ。音声認識部１３は音声入力部１１により取り入
れた学習者１の音声を認識して、認識文を生成する機能
を持つ。画像処理部１４はディスプレイ３を制御する機
能を持つ。The voice input unit 11 has a function of controlling the microphone 5 and capturing the voice of the learner 1. The audio output section 12 has a function of driving the speaker 4 to output the audio generated by the device. The voice recognition unit 13 has a function of recognizing the voice of the learner 1 taken in by the voice input unit 11 and generating a recognition sentence. The image processing unit 14 has a function of controlling the display 3.

【００１３】プロセッサ１６は装置全体の動作を制御す
る。この語学訓練装置の制御プログラムはプログラムメ
モリ１５に格納される。記憶部１７はハードディスク等
の記憶装置で、会話文データファイル１８や表示画面デ
ータファイル１９を格納している。A processor 16 controls the operation of the entire apparatus. The control program of the language training apparatus is stored in the program memory 15. The storage unit 17 is a storage device such as a hard disk, and stores a conversation sentence data file 18 and a display screen data file 19.

【００１４】上記の会話文データファイル１８には学習
者１が発話するべき文や装置本体２から発話させるべき
文が格納されており、その会話文データを用いて対話型
の語学訓練が実行される。表示画面データファイル１９
には、学習者１に対し発話のための案内等を表示する画
面データが格納されている。ディスプレイ３には、学習
者１の会話訓練の相手となる画面上の人物（パートナ
ー）が表示される。こうして学習者１とパートナーとの
対話が実施される。The conversation sentence data file 18 stores a sentence to be uttered by the learner 1 and a sentence to be uttered from the apparatus main body 2, and interactive language training is executed using the conversation sentence data. You. Display screen data file 19
Stores screen data for displaying guidance for speech to the learner 1. The display 3 displays a person (partner) on the screen who is a partner of the conversation training of the learner 1. Thus, the conversation between the learner 1 and the partner is performed.

【００１５】図１に戻って、本発明の音声対話制御方法
を説明する。図２を用いて説明したようなスピーカ４に
よって、何らかの会話文が出力されたとする（ステップ
Ｓ１）。これを学習者１が聞き逃したり、聞き取れなか
った場合に、ここで、例えば「もう一度言って下さ
い。」といった聞き返し文を発話する（ステップＳ
２）。これはマイク５を通じて装置に入力する。図２を
用いて説明した音声認識部１３では、この聞き返し文を
音声認識する（ステップＳ３）。Returning to FIG. 1, the speech dialogue control method of the present invention will be described. It is assumed that some conversational sentence is output by the speaker 4 as described with reference to FIG. 2 (step S1). If the learner 1 misses this or cannot hear it, here he utters a reply sentence such as "Please say again." (Step S)
2). This is input to the device through the microphone 5. The speech recognizing unit 13 described with reference to FIG. 2 performs speech recognition of the repetition sentence (step S3).

【００１６】そして、ステップＳ４で、聞き返し文かど
うかの判定を行う。もし、聞き返し文であれば、今度は
聞き返し文の種類を判定する（ステップＳ５）。聞き返
し文の種類としては、この図のステップＳ６に示すよう
に、例えば単に聞き取れなかった場合と、声が小さすぎ
て聞き取れなかった場合と、速すぎて聞き取れなかった
場合とがある。Then, in step S4, it is determined whether the sentence is a reflection sentence. If the sentence is a reflection sentence, the type of the reflection sentence is determined this time (step S5). As shown in step S6 in this figure, there are two types of the reflected sentence, for example, a case where the voice cannot be heard simply, a case where the voice is too small to be heard, and a case where the voice is too fast to be heard.

【００１７】単に聞き取れなかっただけであれば、同一
の音声出力を繰り返す。声が小さすぎれば大きい声で出
力する。声が速すぎれば遅い速度で出力する。このよう
な出力方法の選択のために、聞き返し文の種類の判定が
される。そして、その種類に応じた出力方法が決定する
と、音声出力部１２においてスピーカ４を駆動し、同一
の会話文を出力する（ステップＳ７）。同時に適切な画
像も表示する。本発明の概略はこの通りであるが、次
に、そのデータ構成等を具体的に説明する。If the sound is simply not heard, the same sound output is repeated. If the voice is too low, output it with a loud voice. If the voice is too fast, output at a slow speed. For the selection of such an output method, the type of the sentence to be heard is determined. Then, when the output method according to the type is determined, the speaker 4 is driven in the audio output unit 12 to output the same conversation sentence (step S7). At the same time, display an appropriate image. The outline of the present invention is as described above. Next, the data configuration and the like will be specifically described.

【００１８】図３には、会話文レコードの説明図を示
す。上記のような聞き返し処理を行う場合、対話者の会
話文は音声認識され、その内容が予めメモリ等に記憶し
た会話文と比較される。これによって、どういった種類
の聞き返し文かが判断される。その判断結果に応じて、
対応する音声出力が選択される。この図は対話者の会話
文やその対応する音声出力を、会話文レコードとしてど
のように保持するかを説明している。FIG. 3 shows an explanatory diagram of a conversation sentence record. When performing the above-described reflection processing, the conversation sentence of the interlocutor is recognized by speech, and the content is compared with the conversation sentence stored in a memory or the like in advance. By this, what kind of reflection sentence is determined. According to the judgment result,
The corresponding audio output is selected. This figure illustrates how a conversation sentence of the interlocutor and its corresponding audio output are stored as a conversation sentence record.

【００１９】会話データは、例えば会話訓練のために設
定される場面毎に用意される。会話データＤ１，Ｄ２，
Ｄ３，…Ｄｎは、設定場面ごとに分類されたデータであ
る。そして、例えば会話データＤ１には、その場面で出
力される会話文レコードＲ１〜Ｒｍが書き込まれる。こ
の１つの会話文レコードの構造を図の下側に示した。即
ち、この会話文レコードは、会話文テキストデータ２
１、フラグ２２、画像ファイル名２３及び音声ファイル
名２４から構成される。The conversation data is prepared, for example, for each scene set for conversation training. Conversation data D1, D2
D3,..., Dn are data classified for each setting scene. Then, for example, conversation sentence records R1 to Rm output in the scene are written in the conversation data D1. The structure of this one conversation sentence record is shown at the bottom of the figure. That is, this conversation sentence record is composed of conversation sentence text data 2
1, a flag 22, an image file name 23, and an audio file name 24.

【００２０】会話文テキストデータ２１は、会話文の内
容をテキストデータとして表現したものである。これ
は、音声認識して得られた認識文との比較に用いられ
る。この比較の結果により、装置は、対話者の発話内容
を認識する。フラグ２２は、音声出力部が音声出力処理
をする際に参照される制御用のパラメータである。フラ
グ２２の内容は、この会話文が通常の会話文の場合
“０”、その他の聞き返し文の場合は、図に示すように
“１”，“２”，“３”，“４”，“５”となる。即
ち、「もう一度」という聞き返し文の場合にはフラグが
“１”、「もっと大きく」という聞き返し文の場合には
フラグが“２”、「もっと小さく」という聞き返し文の
場合にはフラグが“３”、「もっとゆっくり」という聞
き返し文の場合にはフラグが“４”、「もっと速く」と
いう聞き返し文の場合にはフラグが“５”となる。The conversation text data 21 expresses the contents of the conversation as text data. This is used for comparison with a recognition sentence obtained by voice recognition. Based on the result of this comparison, the device recognizes the utterance content of the interlocutor. The flag 22 is a control parameter that is referred to when the audio output unit performs an audio output process. The contents of the flag 22 are “0” when the conversation sentence is a normal conversation sentence, and “1”, “2”, “3”, “4”, “4” as shown in FIG. 5 ". That is, the flag is "1" in the case of the repetition sentence "again", the flag is "2" in the case of the repetition sentence of "larger", and the flag is "3" in the case of the repetition sentence "smaller". The flag is "4" in the case of the reply sentence "", "more slowly", and the flag is "5" in the case of the return sentence "faster".

【００２１】画像ファイル名２３は、その会話文と同時
に表示すべき画像ファイルを指定するためのデータであ
る。音声ファイル名２４は、その会話文を発話する場合
の合成音声データを格納したファイルを指定している。The image file name 23 is data for specifying an image file to be displayed simultaneously with the conversation sentence. The voice file name 24 designates a file storing synthesized voice data when the conversation sentence is uttered.

【００２２】音声認識が行われると認識文が得られる。
この認識文と会話文テキストデータ２１とが比較され
る。そして、一致するテキストデータが存在する場合、
その会話文の内容が認識される。一方、音声出力の場合
には、通常の会話文の場合には、画像ファイル名２３と
音声ファイル名２４が参照され、対応する画像が表示さ
れる。さらに対応する合成音声が出力に使用される。一
方、聞き返し文の場合には、そのフラグ２２の内容に応
じて、音声出力部で合成音声データが変換処理される。
なお、対話者から入力する会話文が英語の場合には、図
の＊１〜＊５に示したような内容にすればよい。When speech recognition is performed, a recognition sentence is obtained.
The recognition sentence and the conversation sentence text data 21 are compared. And if there is matching text data,
The content of the conversation is recognized. On the other hand, in the case of voice output, in the case of a normal conversation sentence, the image file name 23 and the voice file name 24 are referred to and the corresponding image is displayed. Further, the corresponding synthesized speech is used for output. On the other hand, in the case of a repetition sentence, the synthesized voice data is converted by the voice output unit according to the content of the flag 22.
If the conversation sentence input by the interlocutor is in English, the contents may be as indicated by * 1 to * 5 in the figure.

【００２３】図４に、本発明による具体的な動作フロー
チャートを示す。この図を用いて、対話者の聞き返しに
対応する装置の具体的な動作を説明する。まず、ステッ
プＳ１において、音声入力があったかどうかが判断され
る。音声入力があると、ステップＳ２において、その音
声が認識されテキストデータに変換される。次に、ステ
ップＳ３において、そのデータが通常の会話文か聞き返
し文かの判断がされる。通常の会話文の場合には応答用
データが読み込まれ、その応答用データの出力が行われ
る（ステップＳ４，ステップＳ５）。FIG. 4 shows a specific operation flowchart according to the present invention. With reference to this figure, a specific operation of the device corresponding to the interrogation of the interlocutor will be described. First, in step S1, it is determined whether a voice input has been made. When there is a voice input, the voice is recognized and converted into text data in step S2. Next, in step S3, it is determined whether the data is a normal conversation sentence or a return sentence. In the case of a normal conversation sentence, the response data is read, and the response data is output (steps S4 and S5).

【００２４】一方、聞き返し文と判断されると、既に説
明したフラグがセットされた後、ステップＳ３からステ
ップＳ７方向に向かう。もう一度繰り返し出力するよう
な内容のの場合には、ステップＳ３からステップＳ７に
進み、直前のデータを読み込んで、そのデータを出力す
る（ステップＳ８）。一方、もっと大きくあるいはもっ
と小さく出力する場合には、ステップＳ６において、音
量の設定変更を行う。そして、ステップＳ７に進んで直
前のデータを読み込み出力する。一方、もっとゆっくり
あるいはもっと速くという聞き返し文の場合にはステッ
プＳ９に進み、予め遅く速度の設定してあるデータや速
く設定してあるデータを読み込む。そして、ステップＳ
１０において、そのデータを出力する。このようにし
て、聞き返しの内容に応じた出力が可能になる。その
後、ステップＳ１１に進み、次の会話データがある場合
にはステップＳ１に戻って同様の動作が繰り返される。On the other hand, if it is determined that the sentence is a reflection sentence, the flag described above is set, and then the flow proceeds from step S3 to step S7. If the content is to be repeatedly output once again, the process proceeds from step S3 to step S7, where the immediately preceding data is read and the data is output (step S8). On the other hand, if the output is to be larger or smaller, the volume setting is changed in step S6. Then, the process proceeds to step S7 to read and output the immediately preceding data. On the other hand, in the case of a repetition sentence that is slower or faster, the process proceeds to step S9, in which data whose speed is set slower or data whose speed is set faster are read. And step S
At 10, the data is output. In this way, it is possible to output according to the content of the reflection. Thereafter, the process proceeds to step S11, and if there is the next conversation data, the process returns to step S1 and the same operation is repeated.

【００２５】次に、装置から対話者に対し聞き返しを行
う場合の動作を説明する。これまでとは逆に、対話者の
音声を装置が認識できない場合がある。この場合にも円
滑な対話を妨げないために、音声による応答がされるこ
とが好ましい。これは、次のような手順で実現する。Next, the operation in the case where the apparatus returns to the interlocutor will be described. Conversely, the device may not be able to recognize the voice of the interlocutor. In this case, it is preferable that a voice response be given in order not to hinder a smooth dialogue. This is achieved by the following procedure.

【００２６】図５は、装置からの聞き返し動作フローチ
ャートである。まず、ステップＳ１において、音声入力
があるかどうかの判断がされる。音声入力がなければ別
処理に進む。音声入力があればその音声を認識し、テキ
ストデータに変換する（ステップＳ２）。ここで、この
音声からテキストへの変換処理ができたかどうかを判断
する（ステップＳ３）。FIG. 5 is a flow chart of the operation of listening back from the apparatus. First, in step S1, it is determined whether there is a voice input. If there is no voice input, the process proceeds to another process. If there is a voice input, the voice is recognized and converted into text data (step S2). Here, it is determined whether or not this voice-to-text conversion processing has been performed (step S3).

【００２７】認識処理そのものができなかった場合、あ
るいはテキストに変換しても該当する会話文がなく、そ
のテキストを認識できない場合の両方がある。いずれの
場合においても、認識ができれば応答処理に進む。認識
ができなければステップＳ４に進んで、聞き返し動作デ
ータの読込みが行われる。There are both cases where the recognition processing itself could not be performed, or cases where the text could not be recognized because there was no corresponding conversational sentence even when converted to text. In any case, if the recognition is successful, the process proceeds to the response process. If the recognition is not possible, the process proceeds to step S4, where the reading operation data is read.

【００２８】記憶部１７には、例えば「もう一度お願い
します。」といったメッセージ１７Ａが格納されてい
る。このデータ形式は既に図３を用いて説明したものと
同様でよい。こうしたデータが読み込まれ、次のステッ
プＳ５において、音声出力がされる。こうして装置は、
対話者の音声が聞き取れない場合に、音声によってその
旨を対話者に伝える。従って、対話者はこれに対応して
再度直前に発話した会話文を入力する。なお、ここでは
同一の会話文を再度入力するような要求のみを例にし
た。しかしながら、これまでの対話者側からの聞き返し
文と同様に、声が小さくて認識できない場合等につい
て、別の聞き返し文を用意し発話するようにしてもよ
い。なお、上記のような語学訓練装置は、パーソナルコ
ンピュータのプログラムの制御により実現する。従っ
て、そのプログラムをフロッピーディスクやＣＤ−ＲＯ
Ｍその他の記録媒体に記録してから、コンピュータのハ
ードディスクにインストールしあるいはネットワークを
経由してダウンロードすれば、本発明を実施することが
できる。The storage unit 17 stores, for example, a message 17A such as "Please give me again." This data format may be the same as that already described with reference to FIG. Such data is read, and in the next step S5, audio output is performed. Thus the device
If the voice of the interlocutor cannot be heard, the effect is communicated to the interlocutor by voice. Accordingly, the interlocutor responds by inputting the conversation sentence immediately before again. Here, only a request for re-inputting the same conversational sentence is described as an example. However, as in the case of the previous sentence from the interlocutor, another sentence may be prepared and spoken when the voice is too low to be recognized. The language training apparatus as described above is realized by controlling a program of a personal computer. Therefore, the program can be stored on a floppy disk or CD-RO
The present invention can be carried out by recording the data on a recording medium M or another recording medium and then installing it on a hard disk of a computer or downloading it via a network.

【００２９】[0029]

【発明の効果】以上説明した本発明の音声対話制御方法
によれば、語学訓練装置等において、対話者が装置の発
する会話文を認識できない場合に、特別の操作を意識す
ることなく、通常の聞き返し文を入力することによっ
て、聞き返し処理が可能となる。従って、自然な会話を
妨げず、円滑な語学訓練ができる。また、語学訓練装置
に限らず、対話型の各種の装置において、自然な聞き返
しによる処理ができるため、操作性の向上が図られる。
また、装置の側が聞き返しを行う場合においても、音声
出力により聞き返しを行うことから、対話者の再入力を
自然な状態で行うことが可能になる。According to the speech dialogue control method of the present invention described above, in a language training apparatus or the like, when a dialogue person cannot recognize a conversation sentence from the apparatus, he or she does not need to be conscious of a special operation and can perform ordinary operations. By inputting a reflection sentence, a reflection process can be performed. Therefore, smooth language training can be performed without interrupting natural conversation. Further, not only the language training apparatus but also various interactive apparatuses can perform processing by natural listening, so that operability is improved.
Also, when the device performs the recall, the replay is performed by voice output, so that the re-input of the interlocutor can be performed in a natural state.

[Brief description of the drawings]

【図１】本発明による音声対話制御方法の説明図であ
る。FIG. 1 is an explanatory diagram of a voice interaction control method according to the present invention.

【図２】語学訓練装置のブロック図である。FIG. 2 is a block diagram of a language training device.

【図３】会話文レコードの説明図である。FIG. 3 is an explanatory diagram of a conversation sentence record.

【図４】本発明による具体的な動作フローチャートであ
る。FIG. 4 is a specific operation flowchart according to the present invention.

【図５】装置からの聞き返し動作フローチャートであ
る。FIG. 5 is a flowchart of a listening operation from the device.

【符号の説明】１対話者４スピーカ５マイク１２音声出力部[Description of Signs] 1 Interactor 4 Speaker 5 Microphone 12 Voice output unit

───────────────────────────────────────────────────── フロントページの続き (72)発明者木村信宏石川県金沢市幸町３番35号株式会社沖北陸システム開発内 ────────────────────────────────────────────────── ─── Continuing on the front page (72) Inventor Nobuhiro Kimura 3-35 Sachimachi, Kanazawa-shi, Ishikawa Pref.

Claims

[Claims]

When an interlocutor utters a speech, the content of the utterance is recognized by speech to determine whether it is a normal conversation sentence or a return sentence, and in the case of a return sentence, a recognition obtained by speech recognition. A spoken dialogue control method characterized by analyzing a sentence, determining the type of the sentence of the interlocutor, and selecting a processing method for the immediately preceding output speech data according to the type of the sentence.

2. The voice interaction control according to claim 1, wherein, if the type of the reflected sentence is a request for changing the output volume, the output volume of the voice data is changed according to the request. Method.

3. The voice interaction control according to claim 1, wherein, if the type of the reflected sentence requests a change in output speed, the output speed of voice data is changed in accordance with the request. Method.

4. The voice interaction control method according to claim 1, wherein a method of processing a display image to be displayed together with the voice data output immediately before is selected according to the type of the reflected sentence.

5. When the interlocutor utters a speech, the content of the utterance is recognized by speech to determine whether it is a normal conversation sentence or a return sentence, and in the case of a return sentence, a recognition obtained by speech recognition. A recording medium that stores a program that analyzes a sentence, determines the type of the sentence of the interlocutor, and controls the processing method of the audio data output immediately before according to the type of the sentence.

6. A voice dialogue control method characterized in that, when a voice is spoken and a voice cannot be recognized, a corresponding reply sentence prepared in advance is selected and a response is output by voice.

7. A recording medium in which a program for controlling a response sentence by voice is selected by selecting a corresponding reply sentence prepared when an interlocutor speaks and voice recognition cannot be performed.