JP2005258235A

JP2005258235A - Interaction controller with interaction correcting function by feeling utterance detection

Info

Publication number: JP2005258235A
Application number: JP2004071932A
Authority: JP
Inventors: Nobuo Sato; 信夫佐藤; Yasunari Obuchi; 康成大淵; Masahiro Kato; 雅弘加藤
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2004-03-15
Filing date: 2004-03-15
Publication date: 2005-09-22

Abstract

<P>PROBLEM TO BE SOLVED: To provide an interaction control method in which an interaction currently in process between a human and a machine is corrected into an interaction that a user desires more. <P>SOLUTION: An interaction controller always measures feelings from utterance of a person during the interaction between the human and machine. Once some feeling is detected, the interaction currently in process is corrected to provide a smoother interaction with the machine. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、機械との対話において、現在のユーザの心理状態を把握することにより、現在の対話をよりユーザが求めている対話へと制御することを可能とする、対話制御に関する。 The present invention relates to dialog control that enables a user to control a current dialog to a dialog requested by the user by grasping a psychological state of the current user in the dialog with a machine.

現在、人と機械との対話は、音声から音声認識を用いて文字化を行ない、言葉を用いた処理となる言語的な処理を用いて対話を行なっているので、対話は無機質的になりがちになる。対話をより豊かで、スムーズにすることを目指した研究を報告されており、うなずき等の頭部ジェスチャから肯定的か否定的かを判別する技術（例えば非特許文献１、参照）、発話の継続長の傾きから肯定的か否定的かを判別する技術（例えば非特許文献２、参照）がある。しかし、現在報告されている技術では、ユーザのある心理状態を検出した、または、検出していない、という判断を対話に反映させているだけであり、検出した心理状態の度合いを対話に反映させていない。 At present, dialogue between people and machines is performed using speech recognition from speech and linguistic processing, which is processing using words, so dialogue tends to be inorganic. become. Research has been reported with the aim of making conversations richer and smoother. Techniques for discriminating positive or negative from head gestures such as nodding (see Non-Patent Document 1, for example), continuation of speech There is a technique (for example, see Non-Patent Document 2) for determining whether the length is positive or negative. However, the currently reported technology only reflects in the dialogue the judgment that a certain psychological state of the user has been detected or not detected, and reflects the degree of the detected psychological state in the dialogue. Not.

江尻康松坂要佐小林哲則、“対話中における頭部ジェスチャの認識”、電子情報通信学会技術研究報告、Ｖｏｌ．１０２、Ｎｏ．２１８、ｐｐ３１−３６、ＰＲＭＵ２００２−６１、２００２Yasushi Ejiri Yoza Matsuzaka Tetsunori Kobayashi, “Recognition of Head Gestures During Dialogue”, IEICE Technical Report, Vol. 102, no. 218, pp31-36, PRMU2002-61, 2002

藤江真也八木大三菊池英明小林哲則、“パラ言語情報を用いた音声対話システム”、日本音響学会秋季研究発表会講論集、Ｖｏｌ．Ｉ、ｐｐ３９−４０、Ｓｅｐ．、２００３Shinya Fujie, Daizo Yagi, Hideaki Kikuchi, Tetsunori Kobayashi, “Spoken Dialogue System Using Paralinguistic Information”, Proceedings of the Acoustical Society of Japan Autumn Meeting, Vol. I, pp39-40, Sep. , 2003

人間と機械との音声対話において、機械は、人間の発話に対して応答を作成し、生成された言葉を音声に変換し、ユーザに提示す。これを複数回繰り返すことにより、対話が行なわれる。現在、対話に用いる技術としては、音声から音声認識を用いて文字化を行ない、言葉を意味する言語的な処理を用いる技術が用いられている。しかし、この技術では、対話が無機質的になってしまい、云わば、人間が機械に対して命令を言っているような感じになってしまう。よって、我々人間は機械に対して、身近に感じることができない。そこで、本発明では、人間が機械をより身近に感じられるようにするために、ユーザの心理状態判定し、かつ、その心理状態の度合いまでを計測することにより、人間と機械との対話をより豊かにすることができる対話制御方法及び装置を提供することを目的としている。 In a voice dialogue between a human and a machine, the machine creates a response to the human speech, converts the generated words into voice, and presents it to the user. By repeating this several times, a dialogue is performed. Currently, as a technique used for dialogue, a technique is used in which speech is converted from speech into speech using speech recognition and linguistic processing that means words is used. However, with this technology, the dialogue becomes inorganic, which means that humans are giving instructions to the machine. Therefore, we humans cannot feel close to the machine. Therefore, in the present invention, in order to allow humans to feel the machine more closely, the user's psychological state is determined, and the degree of the psychological state is measured, thereby further increasing the interaction between the human and the machine. It is an object of the present invention to provide a dialog control method and apparatus that can be enriched.

上記課題を解決するための、本願の開示する代表的な発明の概要は以下の通りである。本願発明の対話制御装置は、音声認識プログラム、音声合成プログラム、感情認識プログラムと対話内容を格納している記録部と、音声、音や画像などの信号を入力とする入力部と、対話に必要な音や画像などの信号を出力とする出力部と、これらを制御する制御部を有する。制御部は、入力部に送られてくる信号を基に、複数の応答から１つの応答を選択し、出力部からその選択結果の信号を出力することにより対話を行なっている。そして、入力部から入力する音声、音や画像などの信号から感情を検知した場合には、その感情度合いを検出し、現在行われている応答とは異なる応答を選択して、出力部から送出する。 In order to solve the above problems, an outline of a typical invention disclosed in the present application is as follows. The dialogue control device of the present invention is necessary for dialogue with a voice recognition program, a voice synthesis program, an emotion recognition program and a recording unit storing dialogue contents, an input unit for inputting signals such as voice, sound, and images. An output unit that outputs a signal such as a simple sound or an image, and a control unit that controls the output unit. Based on the signal sent to the input unit, the control unit selects one response from a plurality of responses, and outputs a signal of the selection result from the output unit to perform a dialogue. When emotions are detected from signals such as voice, sound and images input from the input unit, the emotion level is detected, and a response different from the currently performed response is selected and transmitted from the output unit. To do.

人間と機械との対話において、音声認識を用いた言語的な情報を用いた応答選択に加え、人間の心理状態を把握するために、人間の音声から感情とその感情の度合いを分析する。そして、応答選択する際に感情とその感情の度合いを考慮することが可能となる。その結果、人間と機械との対話がよりスムーズで、かつ、豊かになる。 In the dialogue between human and machine, in addition to the response selection using linguistic information using voice recognition, in order to grasp the human psychological state, the emotion and the degree of the emotion are analyzed from the human voice. And it becomes possible to consider an emotion and the degree of the emotion when selecting a response. As a result, the dialogue between humans and machines becomes smoother and richer.

以下、本発明の実施形態を、図面を参照しながら説明する。図１は、本発明の対話制御方法を実現するためのシステムを説明するための概略図である。１０は対話制御装置である。
対話制御装置１０は、感情分析やデータ処理機能等を有する制御部１１、ハードディクス又はメモリ等各種データの記憶が可能な記録部１２、ユーザの音声を入力するためのマイク、ユーザの画像を入力とするカメラ、キーボード、マウス、タッチパネル、リモコン又は外部機器との接続端子等各種データの外部からの入力が可能な入力部１３、液晶ディスプレイ及びスピーカ等から構成され各種データの表示、外部機器を接続可能及び、音声出力が可能な出力部１４を有している。また、記録部１２では、対話制御装置１０における各種処理を実行するためのメインプログラムなどが記録され、メインプログラムは制御部１１で読み込まれ、各種の処理が実行される。対話制御装置は、ユーザによる音声の発話を含むユーザ入力を受け付け、入力情報に適した応答を生成・選択し、これを音声などを用いてユーザに対して出力する装置である。入力情報に応じた返答を出力する制御を対話制御という。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. FIG. 1 is a schematic diagram for explaining a system for realizing the dialogue control method of the present invention. Reference numeral 10 denotes a dialogue control device.
The dialogue control apparatus 10 includes a control unit 11 having emotion analysis and data processing functions, a recording unit 12 capable of storing various data such as a hard disk or a memory, a microphone for inputting user's voice, and a user's image. It consists of a camera, keyboard, mouse, touch panel, remote control or connection terminal for external devices and various other data such as an input unit 13, a liquid crystal display, a speaker, etc. The output unit 14 is capable of outputting sound. The recording unit 12 records a main program and the like for executing various processes in the dialog control apparatus 10, and the main program is read by the control unit 11 to execute various processes. The dialogue control device is a device that accepts a user input including a voice utterance by a user, generates and selects a response suitable for the input information, and outputs the response to the user using voice or the like. Control that outputs a response according to input information is called dialog control.

次に、本発明を説明するのに用いた図について説明する。図２は、本発明に用いる対話の処理構成とその流れ示した図である。図３は、対話中のサブ対話の処理構成と流れを示した図である。図４は、サブ対話中の機械からの問いかけの１つである普通入力の処理とその流れを示した図である。図５は、サブ対話中の機械からの問いかけの１つである簡単入力の処理とその流れを示した図である。図６は、普通・簡単入力において、感情を考慮した場合の対話の流れを示した図である。図７は、感情を考慮した場合の対話制御処理とその流れを示した図である。図８は、サブ対話内で用いられる話の内容を記述したデータベースである。図９は、対話を行なう際に必要な処理を記述したデータベースである。図１０は、普通・簡単入力の入力を切り替えするために必要な処理を記述したデータベースである。図１１は、現在の対話状態を記述したデータベースである。図１２は、感情判定処理と対話制御システムとのやり取りに必要な処理とその流れを示した図である。図１３、図１４と図１５は、機械が話す内容を記述したデータベースである。図１６は、感情を判定するための処理とその処理を示した図である。図１７、図１８は、感情を特定するために必要な変数を示したデータベースである。これらの図に示されるデータベースおよび処理構成を実現するためのプログラムは、記録部１２に記録される。 Next, the drawings used to explain the present invention will be described. FIG. 2 is a diagram showing a dialogue processing configuration and its flow used in the present invention. FIG. 3 is a diagram illustrating a processing configuration and a flow of a sub dialog during a dialog. FIG. 4 is a diagram showing a normal input process and its flow, which is one of the questions from the machine during the sub-dialogue. FIG. 5 is a diagram showing a simple input process, which is one of the questions from the machine during the sub-dialogue, and its flow. FIG. 6 is a diagram showing a flow of dialogue when emotion is taken into consideration in normal / simple input. FIG. 7 is a diagram showing a dialogue control process and its flow when emotion is taken into consideration. FIG. 8 is a database describing the content of stories used in the sub dialog. FIG. 9 is a database in which processing necessary for conducting a dialogue is described. FIG. 10 is a database describing processing necessary for switching between normal and simple input. FIG. 11 is a database describing the current dialog state. FIG. 12 is a diagram showing a process necessary for the exchange between the emotion determination process and the dialogue control system and the flow thereof. FIG. 13, FIG. 14 and FIG. 15 are databases describing the contents spoken by the machine. FIG. 16 is a diagram illustrating a process for determining an emotion and the process. 17 and 18 are databases showing variables necessary for specifying emotions. The database and the program for realizing the processing configuration shown in these figures are recorded in the recording unit 12.

本発明では、対話制御を行なう際に、入力部１３から入力される信号を制御部１１において感情を分析する。ある特定の感情を検出した場合に、その感情とその感情の度合いにあった応答が選択され、その応答が出力部１４を通して、ユーザに送出される。この結果、スムーズな対話を実現することができる。 In the present invention, when performing dialogue control, the control unit 11 analyzes the emotion of the signal input from the input unit 13. When a specific emotion is detected, a response corresponding to the emotion and the degree of the emotion is selected, and the response is sent to the user through the output unit 14. As a result, a smooth dialogue can be realized.

本発明では、人間と機械の対話中に、人間の怒り発話を検知すると、現在の対話が成立していないと判断し、対話を修正ための対話制御を行なう。これは、環境ノイズ等で対話に用いている音声認識が低下し、対話が成立しなくなってしまった場合に、人間の怒り発話を検知することにより、認識率が向上するような対話へと修正することを目的としている。認識率の向上する対話とは、例えば、駅名の場合には、通常は駅名を直接求めるのに対し、変化後は最初に県名、次に駅名を求める対話を行なうことである。前者はキーワードを直接求める方法であるので対し、後者は絞り込みにより数回に分けて、キーワードを求める方法にかえる。後者の手法は、音声認識において、認識すべき語を予めに絞ることが可能なため、認識率を高めることを狙っている。これにより、対話は少し煩雑になるが、駅名を求めるというタスクは達成できるため、対話が成立しないということを避けることができる。 In the present invention, if a human anger utterance is detected during a dialogue between a human and a machine, it is determined that the current dialogue is not established, and dialogue control for correcting the dialogue is performed. This is a conversation that improves the recognition rate by detecting human angry utterances when the speech recognition used for the conversation is reduced due to environmental noise, etc., and the conversation is no longer established. The purpose is to do. For example, in the case of a station name, the dialogue for improving the recognition rate is to directly obtain the station name, but after the change, the dialogue for obtaining the prefecture name first and then the station name is performed. The former is a method for directly obtaining a keyword, whereas the latter is divided into several times by narrowing down to a method for obtaining a keyword. The latter method aims to increase the recognition rate because words to be recognized can be narrowed down in advance in speech recognition. This makes the dialogue a little complicated, but since the task of obtaining the station name can be achieved, it is possible to avoid that the dialogue is not established.

図１に示すシステム構成を利用した第１の実施形態について図２を用いて以下に説明する。図２は、人間と機械との対話を示した対話ストーリーフローチャート２００である。対話制御を行なうために、対話を基本的には１問１答形式の小さなサブ対話に分割する。それらのサブ対話を組み合わせることにより、対話を構成する。対話ストーリーフローチャート２００は、種々のストーリーで使用することが可能であり、例えば、切符の購入や天気予報などストーリーに適応できる。次に、サブ対話とは、例えば、切符の購入のサブ対話の場合には、行き先、日時など、切符の購入するために必要な１項目のことを指す。また、対話の進め方は機械からの問いに人間が答える形であってもその逆であってもかまわない。 A first embodiment using the system configuration shown in FIG. 1 will be described below with reference to FIG. FIG. 2 is a dialogue story flowchart 200 showing a dialogue between a human and a machine. In order to perform dialogue control, the dialogue is basically divided into small sub-dialogues of 1-question 1-answer format. A dialog is composed by combining these sub dialogs. The dialogue story flowchart 200 can be used in various stories, and can be applied to stories such as ticket purchase and weather forecast, for example. Next, the sub dialog refers to one item necessary for purchasing a ticket, such as a destination and a date, in the case of a sub dialog for purchasing a ticket. In addition, the method of proceeding the dialogue may be a form in which a human answers a question from a machine or vice versa.

次に、図２は５つのサブ対話から構成されており、対話ストーリー内のサブ対話数は任意に増減できるものとする。２０１は対話の開始、２０７は対話の終了、２０２〜２０６は対話ストーリー内のサブ対話である。各対話は、ユーザの発話とコンピュータの発話（音声であっても、表示等、非音声であってもよい）の組によって構成される。サブ対話（２０２〜２０６）の出力は、例えば、継続と強制終了であり、人間と機械との対話の制御に必要な出力であれば、これ以外であっても良く、別途追加することができる。次に、出力である継続と強制終了について説明する。継続は、コンピュータの発話に対してユーザが適切に応答したことが確認されて（またはユーザの発話に対するコンピュータの応答が適切だったことが確認されて）、サブ対話の内容が成立したことを示す出力であり、次のサブ対話に進む。よって、継続の出力は次のサブ対話に接続されており、最終のサブ対話の接続は終了２０７に接続している。強制終了は対話が成立しなかったことを示す出力であり、対話が途中の場合でも強制終了する。よって、全ての強制終了は強制終了２０８に接続している。図２の手順は、対話制御装置１０の記録部１２に記憶されたメインプログラムにしたがって制御部１１が対話の制御を行なっており、また、制御部１１がマイクなどの入力部１３、スピーカなどの出力部１４を制御している。 Next, FIG. 2 is composed of five sub dialogs, and the number of sub dialogs in the dialog story can be arbitrarily increased or decreased. 201 is a dialog start, 207 is a dialog end, and 202 to 206 are sub-dialogs in a dialog story. Each dialogue is composed of a set of a user's utterance and a computer utterance (which may be speech or non-speech such as a display). The output of the sub dialog (202 to 206) is, for example, continuation and forced termination, and any other output may be used as long as it is necessary for controlling the human-machine dialog, and can be added separately. . Next, continuation and forced termination that are outputs will be described. Continuation indicates that the user has properly responded to the computer utterance (or that the computer has responded properly to the user utterance) and that the sub-interaction has been established Output and go to next sub-dialog. Thus, the continuation output is connected to the next sub-dialog, and the connection of the final sub-dialog is connected to the end 207. The forced termination is an output indicating that the dialogue has not been established, and is terminated even if the dialogue is in progress. Thus, all forced terminations are connected to forced termination 208. In the procedure of FIG. 2, the control unit 11 controls the dialogue according to the main program stored in the recording unit 12 of the dialogue control device 10, and the control unit 11 uses an input unit 13 such as a microphone, a speaker and the like. The output unit 14 is controlled.

図２で示したサブ対話（２０２〜２０６）の処理とその流れについて示した１例が、図３のフローチャートである。これをサブ対話の入力方法のフローチャート３００という。図３のフローチャートを説明する前に、このフローチャートに必要な値について説明する。なぜならば、このフローチャートの実行には、現在の対話の状態を計測した結果を利用することが好ましいからである。そのため、現在の対話の状態を管理するために、例えば、現在の対話の状態の計測結果をデータベースとしてまとめたものである対話状態分析データベース１１００を用いてもよい。 FIG. 3 is a flowchart showing an example of the processing and flow of the sub dialog (202 to 206) shown in FIG. This is referred to as a flowchart 300 of a sub dialog input method. Before explaining the flowchart of FIG. 3, the values necessary for this flowchart will be explained. This is because it is preferable to use the result of measuring the current state of dialogue for the execution of this flowchart. Therefore, in order to manage the current dialog state, for example, a dialog state analysis database 1100 that is a database of measurement results of the current dialog state may be used.

図１１は対話状態分析データベース１１００の１例である。図１１は、現在の対話状況を格納しておくデータベースであり、各項目で、認識された結果、利用した時間やユーザの心理状態などを記述し、格納する。そして、対話状態分析データベース１１００は、現在の状態を知るために用いることと他の状態で参照するために用いることを目的としている。図１１では、状態１１０１は、現在の状態を示している。前状態１１０２は、１つ前の状態を示している。時間１１０３は、現在の状態になった時刻を示している。質問認識結果１１０４は、質問時に認識を用いた場合の認識結果を格納する。確認認識結果１１０５は、確認時に認識を用いた場合の認識結果を格納する。最新感情判定１１０６は、最新の感情判定を示している。最新感情分析結果１１０７は、感情の判定結果を格納する。最新感情検出時１１０８は、判定した時刻を格納する。 FIG. 11 shows an example of the dialog state analysis database 1100. FIG. 11 is a database for storing the current dialog status, and describes and stores the result of recognition, the time used, the psychological state of the user, and the like for each item. The dialog state analysis database 1100 is intended to be used to know the current state and to be referred to in another state. In FIG. 11, a state 1101 indicates the current state. The previous state 1102 indicates the previous state. Time 1103 indicates the time when the current state is reached. The question recognition result 1104 stores the recognition result when the recognition is used at the time of the question. The confirmation recognition result 1105 stores a recognition result when recognition is used at the time of confirmation. The latest emotion determination 1106 indicates the latest emotion determination. The latest emotion analysis result 1107 stores emotion determination results. The latest emotion detection time 1108 stores the determined time.

最新モード１１０９は、現在のモードを示している。最新モード変更時１１１０は、現在のモードに変更になった時刻を示している。最新入力方法１１１１は、現在の入力方法を示している。最新入力方法変更時１１１２は、現在の入力方法になった時刻を示している。次の状態１１１３は、次に進む状態を格納する。項目１１１４は、各項目を示している。値１１１５は、各項目に対する値が格納する。なお、これら対話状態分析データベース１１００に記憶される項目は、これらに限定されることなく、必要に応じて、削除及び追加することができるものとする。また、この対話状態分析データベース１１００は、過去を参照するための資料として保存することが好ましく、例えば、人間と機械とのやり取り毎に、記録部１２に保存することが好ましい。 The latest mode 1109 indicates the current mode. The latest mode change time 1110 indicates the time when the current mode is changed. The latest input method 1111 shows the current input method. The latest input method change time 1112 indicates the time when the current input method is selected. The next state 1113 stores the state to proceed to the next. An item 1114 indicates each item. A value 1115 stores a value for each item. The items stored in the dialog state analysis database 1100 are not limited to these items, and can be deleted and added as necessary. The dialog state analysis database 1100 is preferably stored as data for referring to the past. For example, the dialog state analysis database 1100 is preferably stored in the recording unit 12 for each exchange between a human and a machine.

次に、図３で示すサブ対話の入力方法のフローチャート３００の構成について以下に説明する。入力分岐３０２は、入力方法における分岐である。サブ対話とは、例えば、対話が切符の購入の場合のサブ対話は、行き先、日時など、対話である切符の購入するために必要な１項目のことで表している。サブ対話の入力は、例えば、行き先など、対話を成立するために必要な１項目の入力である。また、入力方法とは、１項目の求め方のことをいい、例えば、入力を直接求める方法や、入力を段階的に絞り込むように求める方法などがある。入力分岐３０２で分岐する数は、使用する入力方法の数と同数が好ましく、例えば、図３では普通入力３０３と簡単入力３０４の２つの入力方法で構成されているので、分岐数は２とした。分岐方法は、対話状態分析データベース１１００の最新入力方法１１１１である。最新入力方法１１１１で、「普通」と記述されているならば、普通入力３０３へ、また、「確認」と記述されているならば、確認入力３０４に進む。また、これら入力方法に関して、優先順位をつけるのも好ましく、例えば、図３のサブ対話の入力方法のフローチャート３００では２種類の入力方法から構成されており、かつ、入力方法の優先順に、普通入力３０３、簡単入力３０４としてもよい。 Next, the configuration of the flowchart 300 of the sub dialog input method shown in FIG. 3 will be described below. The input branch 302 is a branch in the input method. The sub-dialogue is expressed by, for example, one item necessary for purchasing a ticket, which is a dialogue, such as a destination and date / time when the dialogue is a ticket purchase. The input of the sub dialog is, for example, one item necessary for establishing the dialog, such as a destination. The input method refers to a method for obtaining one item. For example, there are a method for directly obtaining an input and a method for obtaining the input in a stepwise manner. The number of branches in the input branch 302 is preferably the same as the number of input methods to be used. For example, in FIG. 3, the number of branches is set to 2 because it is configured with two input methods of a normal input 303 and a simple input 304. . The branching method is the latest input method 1111 of the dialog state analysis database 1100. If “normal” is described in the latest input method 1111, the process proceeds to the normal input 303, and if “confirmation” is described, the process proceeds to the confirmation input 304. It is also preferable to prioritize these input methods. For example, the flowchart 300 of the sub-dialog input method in FIG. 3 is composed of two types of input methods, and the normal input is performed in the order of priority of the input methods. 303 or simple input 304 may be used.

普通入力３０３と確認入力３０４の構成は、それぞれ、図４と図５で説明する。普通入力３０３と確認入力３０４の出力は、例えば、リスタート、入力切替、強制終了と継続であり、次の対話の進め方に必要か出力であれば、これ以外であってもかまわない。次に、これらの出力であるリスタート、入力切替、強制終了と継続について説明する。リスタートは対話の最初から再スタートさせるために、対話の先頭に戻る。よって、リスタートで接続されている先は、入力分岐３０２である。入力切替は現在用いている入力方法では入力が完了できなかった場合に、次の入力方法として用いるものである。よって、普通入力３０３の場合には、次の入力方法として、簡単入力３０４を用いる。簡単入力３０４の場合には、次の入力方法が存在しないので、強制終了とする。強制終了は、対話が成立しなかったことを示す出力であり、対話が途中場合でも対話を終了する。よって、これは強制終了３０７に接続している。継続は、対話が成立したことを示す出力であり、次の対話に進む。よって、各対話の継続は次の対話に接続しており、継続３０６に接続している。 The configurations of the normal input 303 and the confirmation input 304 will be described with reference to FIGS. 4 and 5, respectively. The outputs of the normal input 303 and the confirmation input 304 are, for example, restart, input switching, forced termination, and continuation, and any other output may be used as long as it is necessary or necessary for the next dialog. Next, restart, input switching, forced termination and continuation that are these outputs will be described. Restart returns to the beginning of the dialog to restart from the beginning of the dialog. Therefore, the destination connected by the restart is the input branch 302. Input switching is used as the next input method when input cannot be completed with the currently used input method. Therefore, in the case of the normal input 303, the simple input 304 is used as the next input method. In the case of the simple input 304, since there is no next input method, it is forced to end. The forced termination is an output indicating that the dialogue has not been established, and the dialogue is terminated even when the dialogue is in progress. Therefore, this is connected to the forced termination 307. Continuation is an output indicating that a dialogue has been established, and proceeds to the next dialogue. Thus, the continuation of each dialog is connected to the next dialog and is connected to the continuation 306.

次に、それぞれの入力方法である普通入力３０３と簡単入力３０４の構成についての１例を図４と図５を用いて説明する。図４は普通入力のフローチャート４００、図５は簡単入力のフローチャート５００である。図４と図５の違いは、１項目の求め方の違いであり、図４では入力を直接求める方法や、図５では入力を段階的に絞り込むように求める方法を用いる。１項目の求め方は、絞り込みを数回行なうことができる。例えば、図４では1回、図５では２回から構成されている。はじめに、図４である普通入力について説明する。図４では、普通入力のフローチャート４００で、これは１回で直接求める方法であるため、言葉イ４０２から構成されている。言葉イ４０２の出力は、例えば、リスタート、入力切替、強制終了と継続であり、次の対話の進め方に必要な出力であれば、これ以外であってもかまわない。また、図４の出力は、図３のサブ対話の入力方法のフローチャート３００の普通入力３０３の出力と同じであることが好ましい。 Next, an example of the configuration of the normal input 303 and the simple input 304, which are the respective input methods, will be described with reference to FIGS. 4 is a flowchart 400 for normal input, and FIG. 5 is a flowchart 500 for simple input. The difference between FIG. 4 and FIG. 5 is the difference in the method of obtaining one item. In FIG. 4, a method for directly obtaining an input is used, and in FIG. One item can be obtained by narrowing down several times. For example, it is configured once in FIG. 4 and twice in FIG. First, the normal input shown in FIG. 4 will be described. In FIG. 4, the normal input flowchart 400 is composed of the words A 402 since this is a method of obtaining directly at one time. The output of the word A 402 is, for example, restart, input switching, forced termination and continuation, and any other output may be used as long as it is an output necessary for the next dialog. Also, the output of FIG. 4 is preferably the same as the output of the ordinary input 303 of the flowchart 300 of the sub-dialog input method of FIG.

次に、図５である簡単入力について説明する。図５では、簡単入力のフローチャート５００で、これは２回で段階的に絞り込んで求める方法であるため、言葉ロ５０２、言葉ハ５０３から構成されている。言葉ロ５０２、言葉ハ５０３の出力は共に、例えば、リスタート、入力切替、強制終了と継続であり、次の対話の進め方に必要な出力であれば、これ以外であってもかまわない。また、図５の出力は、図３のサブ対話の入力方法のフローチャート３００の簡単入力３０４の出力と同じであることが好ましい。
次に、図４での言葉イ４０２、図５での言葉ロ５０２と言葉ハ５０３の構成の１例を図６を用いて説明する。図６は、言葉イロハ内のフローチャート６００であり、言葉イ４０２、言葉ロ５０２と言葉ハ５０３は、このフローチャートによって処理させる。 Next, the simple input shown in FIG. 5 will be described. In FIG. 5, the simple input flowchart 500 is a method of narrowing down and obtaining step-by-step two times, and thus includes a word B 502 and a word C 503. The outputs of the words B 502 and C 503 are, for example, restart, input switching, forced termination and continuation, and any other output may be used as long as it is an output necessary for the next dialogue. Further, the output of FIG. 5 is preferably the same as the output of the simple input 304 of the flowchart 300 of the sub-dialog input method of FIG.
Next, an example of the configuration of the word A 402 in FIG. 4 and the word B 502 and the word C 503 in FIG. 5 will be described with reference to FIG. FIG. 6 is a flow chart 600 in the word Iloha. The word A 402, the word B 502, and the word C 503 are processed by this flowchart.

言葉イロハ内のフローチャート６００を説明する前に、はじめに、言葉イ、言葉ロと言葉ハに対して、どのような言葉を用いるかを指定しておくことが好ましく、例えば、対話別に言葉イ、言葉ロと言葉ハを指定して、データベースとしてまとめたものである対話関連付けデータベース８００を用いてもよい。言葉とは、機械が１度に話す内容のことであり、言葉単位で管理することが好ましい。図８は、対話関連付けデータベース８００の１例である。対話毎に発話内容が異なっているため、使用する対話に相応しい言葉を選択できることが好ましく、例えば、図８は、各対話における言葉イ、言葉ロと言葉ハの対応表である。対話１（８０４）における言葉イ（８０１）は言葉１-１-１、言葉ロ（８０１）は言葉１-２-１、言葉ハ（８０１）は言葉１-２-２であり、対話２（８０５）における言葉イ（８０１）は言葉２-１-１、言葉ロ（８０１）は言葉２-２-１、言葉ハ（８０１）は言葉２-２-２である。なお、これら対話関連付けデータベース８００に記憶されるデータ項目は、これらに限定されることなく、必要に応じて、削除及び追加することができるものとする。 Before explaining the flow chart 600 in the word Iroha, it is preferable to first specify what kind of words are used for the words I, B, and C. It is also possible to use the dialogue association database 800 which is designated as B and the word C and is compiled as a database. A word is a content that a machine speaks at a time, and is preferably managed in units of words. FIG. 8 is an example of the dialogue association database 800. Since the utterance contents differ for each dialogue, it is preferable that words suitable for the dialogue to be used can be selected. For example, FIG. 8 is a correspondence table of words a, b, and c in each dialogue. Word 1 (801) in Dialogue 1 (804) is Word 1-1-1, Word B (801) is Word 1-2-1, Word Ha (801) is Word 1-2-2, and Dialog 2 ( 805) is the word 2-1-1, the word b (801) is the word 2-2-1, and the word ha (801) is the word 2-2-2. Note that the data items stored in the dialogue association database 800 are not limited to these, and can be deleted and added as necessary.

また、図８の対話関連付けデータベース８００の記述されている値（言葉１-１-１など）について述べる。これは、対話の際に必要な情報を予め指定しておくための物で、データベースとしてまとめたものである言葉構成データベース９００を用いてもよい。図９は、言葉構成データベース９００の１例である。例えば、対話するための質問内容、音声合成や音声認識に用いるための詳細なパラメータや機械がアクションするために必要な詳細なパラメータを格納することができる。 In addition, values described in the dialogue association database 800 of FIG. 8 (such as words 1-1-1) will be described. This is for preliminarily designating information necessary for the dialogue, and the word configuration database 900 that is compiled as a database may be used. FIG. 9 is an example of the word configuration database 900. For example, it is possible to store question contents for dialogue, detailed parameters for use in speech synthesis and speech recognition, and detailed parameters necessary for the machine to act.

図９では、質問使用番号９０１は、質問内容の番号を示している。対話継続確認番号９０２は、対話継続の際に用いる際に、現在の対話状態を示すための内容を示している。入力切替確認番号９０３は、入力切り替えをする際に用いられる質問内容を示している。質問代入（＄ＩＮＰＵＴ）９０４は、質問内容に地名など変更可能な言葉を入れる際に用いるもので、挿入する言葉を示している。この値に、先頭に＄がついている場合には、対話状態分析データベース１１００の項目の中含まれている文章が挿入する。質問認識利用９０５は、質問の際に認識をするかどうかの有無を示している。質問認識利用時９０６は、機械からの発話の前後のどちらで音声認識を起動させるかを示している。質問認識ＤＢ名９０７は、認識する際に用いるデータベース名を示している。質問認識結果代入先９０８は、認識した結果を代入する先を示している。 In FIG. 9, the question use number 901 indicates the number of the question content. The dialog continuation confirmation number 902 indicates the contents for indicating the current dialog state when used when the dialog is continued. The input switching confirmation number 903 indicates the contents of the question used when switching the input. A question substitution ($ INPUT) 904 is used when a changeable word such as a place name is put in the question content, and indicates a word to be inserted. If this value is preceded by $, the text included in the item of the dialog state analysis database 1100 is inserted. Question recognition use 905 indicates whether or not to recognize the question. When using question recognition 906 indicates whether speech recognition is activated before or after the utterance from the machine. The question recognition DB name 907 indicates a database name used for recognition. The question recognition result substitution destination 908 indicates a destination to which the recognized result is substituted.

確認使用番号９０９は、確認する際に用いた内容を示している。確認ステート内容確認番号９１０は、現在の状態を示している。確認代入（＄ＩＮＰＵＴ）９１１は、質問内容の一部に変更可能な言葉を入れる際に用いるもので、挿入する言葉を示している。この値に、先頭に＄がついている場合には、対話状態分析データベース１１００の項目の中含まれている文章が挿入する。確認認識利用９１２は、質問の際に音声認識を行なうかの有無を示している。確認認識利用時９１３は、機械からの発話の前後のどちらで音声認識を起動させるかを示している。確認認識ＤＢ名９１４は、認識する際に用いるデータベース名を示している。確認認識結果代入先９１５は、認識した結果を代入する先を示している。なお、これら言葉構成データベース９００に記憶されるデータ項目は、これらに限定されることなく、必要に応じて、削除及び追加することができるものとする。 The confirmation usage number 909 indicates the contents used for confirmation. The confirmation state content confirmation number 910 indicates the current state. Confirmation substitution ($ INPUT) 911 is used when a changeable word is put in a part of the question content, and indicates a word to be inserted. If this value is preceded by $, the text included in the item of the dialog state analysis database 1100 is inserted. Use of confirmation recognition 912 indicates whether or not voice recognition is performed at the time of a question. The use of confirmation recognition 913 indicates whether voice recognition is activated before or after the utterance from the machine. The confirmation recognition DB name 914 indicates a database name used for recognition. The confirmation recognition result substitution destination 915 indicates a destination to which the recognized result is substituted. The data items stored in these word configuration databases 900 are not limited to these, and can be deleted and added as necessary.

次に、言葉イロハ内のフローチャート６００の各ステップについて説明する。内容発話６０２は対話内容の発話と質問事項があるならば、それに対しての音声合成と音声認識を用いる。発話する内容を記述したファイルは、対話関連付けデータベース８００で各言葉に対して、番号が割り振られており、対話によって、番号が異なる。そして、選択された番号を基に、言葉構成データベース９００の質問使用番号９０１に記述している番号を用いて、対話文章データベース１３００から選び出す。認識した結果を対話状態分析データベース１１００の質問認識結果１１０４に記述する。例えば、対話関連付けデータベース８００では、対話１の言葉イは、言葉１−１−１が選択される。そして、言葉構成データベース９００の質問使用番号９０１の１−１−１の項目である１−１−１が選択される。最後に、対話文章データベース１３００から１−１−１番目に格納されている言葉である「駅名を言ってください？」が選択される。音声合成・音声認識に用いられる技術、種々の公知の技術を使用することが可能であり、該技術を実行するためのプログラムが記録部１２に予め記憶されているものとする。また、音声認識の代わりに、入力部１３のキーボードにおけるキーの押下、マウスにおけるボタンの押下、タッチパネルにおける接触の有無、リモコンにおけるボタンの押下等を用いて操作することも可能である。さらに、音声合成の代わりに、出力部１４の液晶ディスプレイを用いて表示することも可能である。 Next, each step of the flowchart 600 in the word Iloha will be described. The content utterance 602 uses speech synthesis and speech recognition for dialogue utterances and questions, if any. In the file describing the content to be uttered, a number is assigned to each word in the dialogue association database 800, and the number varies depending on the dialogue. And based on the selected number, it selects from the dialogue sentence database 1300 using the number described in the question use number 901 of the word composition database 900. The recognized result is described in the question recognition result 1104 of the dialog state analysis database 1100. For example, in the dialogue association database 800, the word 1-1-1 is selected as the word 1 of the dialogue 1. Then, 1-1-1 that is the item 1-1-1 of the question usage number 901 in the word configuration database 900 is selected. Finally, “Please say the station name?”, Which is the 1-1-11th stored word, is selected from the dialogue text database 1300. It is possible to use a technique used for speech synthesis / recognition and various known techniques, and a program for executing the technique is stored in the recording unit 12 in advance. Further, instead of voice recognition, it is possible to perform operations by pressing a key on the keyboard of the input unit 13, pressing a button on the mouse, touching on the touch panel, pressing a button on the remote control, or the like. Furthermore, it is also possible to display using the liquid crystal display of the output unit 14 instead of voice synthesis.

対話修正６０３は、発話している際に人の感情を検知し、この感情の状態から人が何を求めていることを推測する。感情の検知から対話修正６０３の構成は、図７で説明する。感情の検知から対話修正６０３の出力は、例えば、リスタート、入力切替、強制終了と継続であり、次の対話の進め方に必要か出力であれば、これ以外であってもかまわない。次に、リスタート、入力切替、強制終了と継続について説明する。リスタートは対話の最初から再スタートさせるために、対話の先頭に戻る。よって、リスタートで接続されている先は、内容発話６０２である。入力切替は現在用いている入力方法では入力が完了できなかった場合に、次の入力方法として用いるものである。よって、入力切替は入力切替６０９に接続されている。強制終了は、対話が成立しなかったことを示す出力であり、対話が途中でも終了させる。よって、強制終了６１０、６１１に接続させている。継続、対話が成立したことを示す出力であり、次の対話に進む。よって、対話の継続は次の対話に接続されており、次の対話に続く場合には、継続６０７に接続されている。確認分岐６０４は、人が発話した内容を確認することの有無により分岐を行なう。選択方法は、対話状態分析データベース１１００の最新モード１１０９である。 The dialogue correction 603 detects a person's emotion when speaking, and estimates what the person wants from the state of the emotion. The configuration of emotion correction to dialogue correction 603 will be described with reference to FIG. The output from the emotion detection to the dialogue correction 603 is, for example, restart, input switching, forced termination and continuation, and any other output may be used as long as it is necessary or necessary for the next dialogue. Next, restart, input switching, forced termination and continuation will be described. Restart returns to the beginning of the dialog to restart from the beginning of the dialog. Therefore, the destination connected by the restart is the content utterance 602. Input switching is used as the next input method when input cannot be completed with the currently used input method. Therefore, the input switching is connected to the input switching 609. The forced termination is an output indicating that the dialogue has not been established, and the dialogue is terminated even in the middle. Therefore, the forced terminations 610 and 611 are connected. Continuation is an output indicating that a dialogue has been established, and proceeds to the next dialogue. Thus, the continuation of the dialog is connected to the next dialog, and when continuing to the next dialog, it is connected to the continuation 607. The confirmation branch 604 branches depending on whether or not the content of a person uttered is confirmed. The selection method is the latest mode 1109 of the dialog state analysis database 1100.

最新モード１１０９に「確認」と記述されているならば、確認発話６０５へ、また、「普通」と記述されていれば、継続６０７に進む。確認発話６０５は、確認と問うための発話と、もし、確認事項があるならば、音声合成と音声認識を用いて確認のための対話を行なう。発話する内容を記述したファイルは、言葉構成データベース９００の確認使用番号９０９に記述している番号を用いて、対話文章データベース１３００から選び出す。認識した結果を対話状態分析データベース１１００の確認識結果１１０５に記述する。音声認識・音声合成は、言葉イロハ内のフローチャート６００の内容発話６０２と同様のものを用いてもかまわない。また、感情の検知から対話修正６０６は、感情の検知から対話修正６０３と同様のものを用いてもかまわない。また、内容発話６０２や感情の検知から対話修正６０６で、出力を判定する際に、今までの状態を計測した結果を利用することが好ましく、例えば、これをデータベースとしてまとめたものである対話状態分析データベース１１００を用いてもよい。また、新しく判定した結果を対話状態分析データベース１１００に記述することが好ましい。 If “confirmation” is described in the latest mode 1109, the process proceeds to the confirmation utterance 605, and if “normal” is described, the process proceeds to continuation 607. The confirmation utterance 605 carries out a dialogue for confirmation using speech synthesis and speech recognition if there is a confirmation item, and an utterance for asking confirmation. The file describing the content to be uttered is selected from the dialogue sentence database 1300 using the number described in the confirmation usage number 909 of the word configuration database 900. The recognized result is described in the confirmation identification result 1105 of the dialog state analysis database 1100. For speech recognition and speech synthesis, the same speech utterance 602 of the flowchart 600 in the word Iloha may be used. Also, the emotion detection to dialogue correction 606 may be the same as the emotion detection to dialogue correction 603. In addition, it is preferable to use the result of measuring the state so far when determining the output in the dialogue correction 606 based on the detection of the content utterance 602 or emotion, for example, the dialogue state in which this is collected as a database An analysis database 1100 may be used. Further, it is preferable to describe the newly determined result in the dialog state analysis database 1100.

感情の検知から対話修正６０３の構成について述べる。これは、感情を抽出した後に、現在の対話の状態から修正することにより、よりスムーズな対話になる。対話の修正に感情を用いることが好ましく、感情の検知から対話修正６０３の１例を図７のフローチャートを用いて説明する。感情分析の結果７０２は、現在の話者の発話の感情を判定する。また、感情の度合いも判定することが好ましい。感情分析の結果７０２については、感情の判定とその度合い判定することが好ましく、これを図１２で説明する。モードチェック７０３は、現在のモードを確認し、もし、変更が必要な場合には、モード変換７０４を行なう。モード変換７０４は、モードチェック７０３の結果を基に、モードの変換を行なう。変換方法は、対話状態分析データベース１１００の最新モード１１０９と最新モード変更時１１１０である。最新モード１１０９と最新モード変更時１１１０から現在のモードを判定する。モードを変換した場合には、最新モード１１０９と最新モード変更時１１１０をそれぞれ更新する。感情カウント７０５は、ある一定時間内に、ある感情数をカウントしたものである。計測方法は、ある一定時間内に、どの感情を何回発話したかを計測することである。時間チェック７０６は、前に検出した感情がいつ検出されたかを分析することで、何秒前に検出されたかを分析する。検出方法は、対話状態分析データベース１１００の最新感情検出時１１０８と現在の時間の差から求める。 The configuration of the dialogue correction 603 from the detection of emotion will be described. This is a smoother dialogue by correcting the current dialogue state after extracting the emotion. It is preferable to use emotion for correcting the dialogue, and an example of emotion correction from the detection of emotion will be described with reference to the flowchart of FIG. The emotion analysis result 702 determines the emotion of the current speaker's utterance. It is also preferable to determine the degree of emotion. As for the result 702 of emotion analysis, it is preferable to determine the emotion and its degree, which will be described with reference to FIG. A mode check 703 confirms the current mode, and performs a mode conversion 704 if a change is necessary. The mode conversion 704 performs mode conversion based on the result of the mode check 703. The conversion method is the latest mode 1109 and the latest mode change time 1110 of the dialog state analysis database 1100. The current mode is determined from the latest mode 1109 and the latest mode change time 1110. When the mode is converted, the latest mode 1109 and the latest mode change time 1110 are updated. The emotion count 705 is obtained by counting a certain number of emotions within a certain period of time. The measuring method is to measure how many times each emotion is spoken within a certain time. The time check 706 analyzes how many seconds ago it was detected by analyzing when the previously detected emotion was detected. The detection method is obtained from the difference between the latest emotion detection time 1108 in the dialog state analysis database 1100 and the current time.

そして、予めある閾値を設け、その閾値以上と未満で分岐を変える。感情カウント７０５と時間チェック７０６により、入力切替確認７０７か対話状態説明・対話接続確認７０８の分岐を行なう。入力切替確認７０７は、入力方法を切り替える問いを行ない、話者に判定してもらう。その際に、対話内容の発話と質問事項があるならば、音声合成と音声認識を用いて対話を行なう。発話する内容を記述したファイルは、入力切替確認・対話継続確認データベース１０００の質問使用番号１００１に記述している番号を用いて、切替・継続対話文章データベース１５００から選び出す。音声認識・音声合成は、言葉イロハ内のフローチャート６００の内容発話６０２と同様のものを用いてもかまわない。話者は、入力の切り替えを望む場合には「はい」を、分岐を望まない場合には「いいえ」を選択する。そして、「はい」を選択した場合には、入力切替７１０として接続され、「いいえ」を選択した場合には、強制終了７１１に接続される。また、対話状態分析データベース１１００の最新入力方法１１１１と最新入力方法変更時１１１２を変更する必要がある。「はい」を選択した場合には、最新入力方法１１１１「簡単」とし、最新入力方法変更時１１１２を更新する。 A predetermined threshold value is set in advance, and the branch is changed between the threshold value and the threshold value. Based on the emotion count 705 and the time check 706, the input switching confirmation 707 or the dialog state explanation / dialog connection confirmation 708 is branched. The input switching confirmation 707 asks the speaker to determine the input method switching question. At this time, if there is an utterance of dialogue content and a question item, dialogue is performed using speech synthesis and speech recognition. The file describing the content to be uttered is selected from the switching / continuation dialogue text database 1500 using the number described in the question use number 1001 of the input switching confirmation / dialog continuation confirmation database 1000. For speech recognition and speech synthesis, the same speech utterance 602 of the flowchart 600 in the word Iloha may be used. The speaker selects “Yes” if he / she wants to switch inputs, and “No” if he / she does not want to branch. When “Yes” is selected, the connection is made as the input switching 710, and when “No” is selected, the connection is forcibly terminated 711. Further, it is necessary to change the latest input method 1111 and the latest input method change time 1112 in the dialog state analysis database 1100. When “Yes” is selected, the latest input method 1111 is “simple”, and the latest input method change 1112 is updated.

対話状態説明・対話接続確認７０８は、現在どのような対話を行なっているかを述べ、この対話を続けるかの判定を行なう。その際に、対話内容の発話と質問事項があるならば、音声合成と音声認識を用いて対話を行なう。発話する内容を記述したファイルは、入力切替確認・対話継続確認データベース１０００の質問使用番号１００１に記述している番号を用いて、現在状態文章データベース１４００と切替・継続対話文章データベース１５００から選び出す。音声認識・音声合成は、言葉イロハ内のフローチャート６００の内容発話６０２と同様のものを用いてもかまわない。話者は、対話を続ける場合には「はい」を、この対話をスタートからやり直す場合には、「いいえ」を発話する。そして、「はい」を選択した場合には、繰り返し７１２に接続され、「いいえ」を選択した場合には、リスタート７１３に接続される。最後に、感情の検知から対話修正７００を分析する際に、今までの状態を計測した結果を利用することが好ましく、例えば、これをデータベースとしてまとめたものである対話状態分析データベース１１００を用いてもよい。また、新しく判定した結果をそれぞれの項目に対して、随時、対話状態分析データベース１１００に記述することが好ましい。 The dialog state explanation / dialog connection confirmation 708 describes what kind of dialog is currently being performed and determines whether or not to continue this dialog. At this time, if there is an utterance of dialogue content and a question item, dialogue is performed using speech synthesis and speech recognition. A file describing the content to be uttered is selected from the current state sentence database 1400 and the switching / continuation dialogue sentence database 1500 using the number described in the question use number 1001 of the input switching confirmation / dialog continuation confirmation database 1000. For speech recognition and speech synthesis, the same speech utterance 602 of the flowchart 600 in the word Iloha may be used. The speaker utters “Yes” to continue the dialogue, or “No” to redo this dialogue from the start. If “Yes” is selected, the connection is repeatedly made to 712, and if “No” is selected, the connection is made to the restart 713. Finally, when analyzing the dialogue correction 700 from the detection of emotion, it is preferable to use the result of measuring the state so far, for example, using the dialogue state analysis database 1100 which is a database of the results. Also good. Moreover, it is preferable to describe the newly determined result in the dialog state analysis database 1100 as needed for each item.

図１０は、図７の入力切替確認７０７と対話継続確認７０８の内容を管理するために用いるもので、データベースとしてまとめたものである入力切替確認・対話継続確認データベース１０００を用いてもよい。図１０は、入力切替確認・対話継続確認データベース１０００の１例である。これは、言葉単位で管理することが好ましく、例えば、対話するための質問内容や音声認識に用いるための詳細なパラメータや機械がアクションするために必要な詳細なパラメータを１言葉単位で格納することができる。言葉とは、機械が１度に話す内容のことである。質問使用番号１００１は、実際の質問内容を示している。質問認識利用１００２は、質問の際に認識を用いるかの有無を示している。質問認識利用時１００３は、機械からの発話の前後のどちらで音声認識を起動させるかを示している。質問認識ＤＢ名１００４は、認識する際に用いるデータベース名を示している。分析結果代入先１００５は、認識した結果を代入する先を示している。対話内容は、入力切替確認１００６と対話継続確認１００７のそれぞれの項目に記述がある。なお、これら入力切替確認・対話継続確認データベース１０００に記憶されるデータ項目は、これらに限定されることなく、必要に応じて、削除及び追加することが好ましい。 FIG. 10 is used to manage the contents of the input switching confirmation 707 and the dialog continuation confirmation 708 in FIG. 7, and the input switching confirmation / dialog continuation confirmation database 1000 that is compiled as a database may be used. FIG. 10 is an example of the input switching confirmation / dialog continuation confirmation database 1000. This is preferably managed in units of words. For example, the contents of questions for dialogue, detailed parameters used for speech recognition, and detailed parameters necessary for the machine to act are stored in units of words. Can do. A word is what the machine speaks at once. The question usage number 1001 indicates the actual question content. Use question recognition 1002 indicates whether or not to use recognition in the case of a question. When using question recognition, 1003 indicates whether speech recognition is activated before or after the utterance from the machine. The question recognition DB name 1004 indicates a database name used for recognition. An analysis result substitution destination 1005 indicates a destination to which the recognized result is substituted. The dialogue contents are described in the respective items of the input switching confirmation 1006 and the dialogue continuation confirmation 1007. The data items stored in the input switching confirmation / dialog continuation confirmation database 1000 are not limited to these, but are preferably deleted and added as necessary.

図７の感情分析の結果７０２を利用する際に、感情分析を計測することが好ましく、例えば、人の発話から分析を行なう方法である感情分析フローチャートを用いてよく、図１２は１例であり、音声を用いた感情分析フローチャート１２００である。以下、感情分析フローチャート１２００の説明をする。感情判定１２０２は、音声から感情を特定する。音声感情認識に用いられる技術、種々の公知の技術を使用することが可能であり、該技術を実行するためのプログラムが記録部１２に予め記憶されているものとする。感情の度合いの求め方は、例えば、３０秒以内に何回怒り発話を検知したなど、単位時間当たりの感情検知回数を調べることにより、感情の度合いを求めることができる。感情回数が多いほど、感情度合いが高いということである。これは、感情度合いの求め方の１例であり、また、感情判定フローチャート１６００でも、感情の度合いを求めることでき、これを用いてもかまわない。また、音声感情認識の代わりに、音声認識を用いて、「ちがう」など、ある特定の感情が含まれている可能性のある単語を認識した場合には、ある特定の感情を検出したと判定することもできる。 When using the result 702 of emotion analysis in FIG. 7, it is preferable to measure emotion analysis. For example, an emotion analysis flowchart which is a method of performing analysis from a person's utterance may be used, and FIG. 12 is an example. This is an emotion analysis flowchart 1200 using voice. Hereinafter, the emotion analysis flowchart 1200 will be described. Emotion determination 1202 identifies an emotion from the voice. It is possible to use a technique used for speech emotion recognition and various known techniques, and a program for executing the technique is stored in the recording unit 12 in advance. The degree of emotion can be obtained by examining the number of times of emotion detection per unit time, for example, how many times an angry utterance is detected within 30 seconds. The greater the number of emotions, the higher the emotion level. This is an example of how to determine the emotion level, and the emotion determination flowchart 1600 can also determine the emotion level, and this may be used. Also, instead of voice emotion recognition, using voice recognition, if a word that may contain a certain emotion such as “No” is recognized, it is determined that a certain emotion has been detected. You can also

さらに、入力部１３のキーボードにおけるキーの押下、カメラによる画像認識における感情判定、マウスにおけるボタンの押下、タッチパネルにおける接触の有無、リモコンにおけるボタンの押下等を用いて操作することも可能である。判定結果の更新１２０３は、感情判定１２０２の結果を対話状態分析データベース１１００の最新感情判定１１０６に代入する。また、感情判定結果を基に、対話を変更することが好ましい。変更方法としては、今までの感情状態を判定した結果を利用することが好ましく、例えば、対話状況をデータベースとしてまとめたものである対話状態分析データベース１１００を用いてもよい。そして、変更した結果を対話状態分析データベース１１００の項目へ記述してもかまわない。例えば、ある感情を検知した場合には、対話状態分析データベース１１００の最新モード１１０９を「通常」から「確認」に一定時間変更し、また、ある感情の度合いにより対話状態分析データベース１１００の最新入力方法１１１１を「普通」から「簡単」に一定時間変更し、変更した時間を記述することが好ましい。 Furthermore, it is possible to perform operations by pressing a key on the keyboard of the input unit 13, emotion determination in image recognition by a camera, pressing a button on a mouse, presence or absence of contact on a touch panel, pressing a button on a remote control, or the like. In the determination result update 1203, the result of the emotion determination 1202 is assigned to the latest emotion determination 1106 of the dialog state analysis database 1100. Further, it is preferable to change the dialogue based on the emotion determination result. As a changing method, it is preferable to use the result of determining the emotional state so far, and for example, a dialog state analysis database 1100 that summarizes the dialog state as a database may be used. Then, the changed result may be described in the item of the dialog state analysis database 1100. For example, when a certain emotion is detected, the latest mode 1109 of the dialog state analysis database 1100 is changed from “normal” to “confirmation” for a certain period of time, and the latest input method of the dialog state analysis database 1100 is determined depending on the degree of a certain emotion. It is preferable to change 1111 from “normal” to “easy” for a certain period of time and to describe the changed time.

言葉構成データベース９００の質問使用番号９０１で、１−１−１という番号が振られているが、その番号は、現在の対話で使用する言葉の番号であり、実際に言葉が格納されているデータベースを利用することが好ましく、例えば、これをデータベースとしてまとめたものである対話文章データベース１３００を用いてもよい。図１３は対話文章データベース１３００の１例である。 The question use number 901 of the word composition database 900 is numbered 1-1-1, which is the number of the word used in the current dialogue, and the database in which the word is actually stored. Is preferably used, and for example, an interactive text database 1300 that is a database of these may be used. FIG. 13 shows an example of the dialog text database 1300.

対話文章データベース１３００は、実際に対話時に機械が発話する内容が記述されている。ＮＯ．（１００７）は番号であり、その番号の内容を言葉（１３０７）に記述してある。１３０１〜１３０６は、番号と言葉が対になっており、１−１−１の場合には、言葉は「駅名を言ってください？」である。また、言葉に＄ＩＮＰＵＴと示されている箇所は、＄ＩＮＰＵＴの部分に任意の言葉を含めることができることが好ましく、例えば、言葉構成データベース９００の質問代入（＄ＩＮＰＵＴ）９０４を利用することにより、＄ＩＮＰＵＴの部分を任意の言葉に変更できる。また、対話時に機械が発話する内容だけでなく、現在の状態を確認するために必要な言葉が格納されているデータベースを利用することが好ましく、例えば、これをデータベースとしてまとめたものである現在状態文章データベース１４００を用いてもよい。図１４は現在状態文章データベース１４００の１例である。また、図７の入力切替確認７０７と対話継続確認７０８時に、対話に使用する言葉の番号であり、実際に言葉が格納されているデータベースを利用することが好ましく、例えば、これをデータベースとしてまとめたものである切替・継続対話文章データベース１５００を用いてもよい。図１５は切替・継続対話文章データベース１５００の１例である。現在状態文章データベース１４００と切替・継続対話文章データベース１５００のデータベースの構造は、対話文章データベース１３００と同じものを使用することが好ましい。 The dialogue text database 1300 describes the content that the machine utters during the dialogue. NO. (1007) is a number, and the contents of the number are described in the word (1307). Numbers 1301 to 1306 are paired with numbers, and in the case of 1-1-1, the words are “Please say the station name?”. In addition, it is preferable that the part indicated by $ INPUT in the word can include any word in the part of $ INPUT. For example, by using the question substitution ($ INPUT) 904 in the word composition database 900, $ INPUT can be changed to any word. In addition, it is preferable to use a database in which words necessary for confirming the current state as well as the content of the machine spoken at the time of dialogue are used. For example, the current state is a summary of this as a database A text database 1400 may be used. FIG. 14 shows an example of the current state sentence database 1400. In addition, it is preferable to use a database in which words are used for dialogues at the time of the input switching confirmation 707 and the dialogue continuation confirmation 708 in FIG. 7, and the words are actually stored. A switching / continuation dialogue text database 1500 may be used. FIG. 15 shows an example of the switching / continuation dialogue text database 1500. The database structure of the current state text database 1400 and the switching / continuation dialog text database 1500 is preferably the same as that of the dialog text database 1300.

感情分析フローチャート１２００の感情判定１２０２で、音声から感情を特定していることが好ましく、例えば、図１６は１例であり、人の心理状態を認識する処理である感情判定フローチャート１６００である。これは、利用者の危機を観測し、利用者、及び、加害者等の人間の怒鳴り、叫びや悲鳴など、日常の生活では発することのない音声を発生したと判断されたならば、利用者の危機であると判定する処理である。 In the emotion determination flowchart 1200, it is preferable to specify an emotion from the voice, for example, FIG. 16 is an example, and is an emotion determination flowchart 1600 that is a process of recognizing a person's psychological state. This is because the user's crisis is observed, and if it is judged that a voice that does not occur in daily life, such as the yelling, screaming, or screaming of the user and the perpetrator, etc., is generated. It is a process to determine that it is a crisis.

図１６は感情判定フローチャート１６００の典型的な構成例を示すフローチャートであり、感情判定フローチャート１６００は音声特徴抽出処理１６０１、感情識別処理１６０２と感情データベース１６０４と判定処理１６０５から構成される。音声特徴抽出処理１６０１は入力音声からその特徴量を抽出する。特徴量としては、声の高さを表すピッチの場合には、感情の含まれている１発話における、ピッチの平均値、ピッチの最大値、ピッチの最大値を検出した位置、ピッチの最小値、ピッチの最小値を検出した位置などの検出を特徴量とする。また、声の大きさを表すエネルギの場合には、感情の含まれている１発話における、エネルギの平均値、エネルギの最大値、エネルギの最大値を検出した位置、エネルギの最小値、エネルギの最小値を検出した位置などの検出を特徴量とする。これは、１例であり、１発話中の中から、ピッチやパワーやテンポなどの求め、それらから特徴を求めることにより、それを特徴量とすることができる。また、１発話中の中から、ピッチやパワーやテンポなどの時間における変化量から特徴量を抽出してもかまわない。さらに、特徴量は、公知で知られている感情の特徴を特徴量として用いてもかまわない。また、音声感情認識の代わりに、音声認識を用いて、「ちがう」など、ある特定の感情が含まれている可能性のある単語を認識した場合には、感情を検出したと判定することもできる。 FIG. 16 is a flowchart showing a typical configuration example of the emotion determination flowchart 1600. The emotion determination flowchart 1600 includes a voice feature extraction process 1601, an emotion identification process 1602, an emotion database 1604, and a determination process 1605. The voice feature extraction process 1601 extracts the feature amount from the input voice. As the feature amount, in the case of a pitch representing the pitch of the voice, the average value of the pitch, the maximum value of the pitch, the position where the maximum value of the pitch is detected, and the minimum value of the pitch in one utterance containing emotion The detection of the position where the minimum value of the pitch is detected is taken as the feature amount. In the case of energy representing the loudness of the voice, the average value of energy, the maximum value of energy, the position where the maximum value of energy is detected, the minimum value of energy, the energy Detection of the position where the minimum value is detected is used as a feature amount. This is an example, and the pitch, power, tempo, and the like can be obtained from one utterance, and the feature can be obtained from the pitch, power, and tempo. In addition, a feature value may be extracted from a change amount in time such as pitch, power, and tempo from one utterance. Furthermore, as the feature amount, a publicly known emotion feature may be used as the feature amount. Also, instead of voice emotion recognition, using voice recognition, if a word that may contain a certain emotion such as “No” is recognized, it may be determined that the emotion has been detected. it can.

音声特徴抽出処理１６０１により抽出された音声特徴は感情識別処理１６０２に入力され、感情を認識する。この感情識別処理１６０２、公知で知られている手法を用いることができる。例えば、判別分析を用いることにより、はじめに、学習を行ない、予め、明示されている音声データの特徴量から判別関数の係数を求める。例えば、判別関数の結果が怒りを１と平静をー１となるような、判別関数の係数を求めることが好ましい。そして、この判別関数を用いて、明示されていない音声データの判別を行なう。結果は、正の数字の場合には怒り、負の場合には平静となる。 The voice feature extracted by the voice feature extraction process 1601 is input to the emotion identification process 1602 to recognize the emotion. This emotion identification processing 1602 can be performed using a publicly known method. For example, by using discriminant analysis, learning is performed first, and the coefficient of the discriminant function is obtained from the feature amount of the voice data that is specified in advance. For example, it is preferable to obtain a coefficient of the discriminant function such that the result of the discriminant function is 1 for anger and -1 for calm. Then, using this discriminant function, voice data that is not specified is discriminated. The result is angry for positive numbers and calm for negative numbers.

また、感情の度合いを抽出する場合には、正の数字が大きい場合には怒りの度合いが高い、負の数字が大きい場合には、平静の度合いが高いと判定することができる。この手法は、感情を特定する際の１例であり、例えば、ニューラルネットや多変量解析を用いてもかまわない。また、感情識別処理１６０２をする際に、感情データベース１６０３を用いる。感情データベース１６０３は、予め、先ほど述べた学習のプロセスを予め行なった結果のデータである。このデータベースを用いることにより、学習を省くことができる。そして、感情識別をいろいろな感情の同定するための複数の変数が格納されているデータベースであり、さらに、複数のデータが該データの変数と対応づけて記録されているデータベースでもある。入力音声は、感情データベース１６０３と比較することで、現在入力された音声の感情を特定することが可能となる。 In addition, when the degree of emotion is extracted, it can be determined that the degree of anger is high when the positive number is large, and the degree of calm is high when the negative number is large. This method is an example when emotion is specified, and for example, a neural network or multivariate analysis may be used. In addition, when the emotion identification process 1602 is performed, the emotion database 1603 is used. The emotion database 1603 is data obtained as a result of performing the learning process described above in advance. By using this database, learning can be omitted. It is a database in which a plurality of variables for identifying various emotions are stored in emotion identification, and further a database in which a plurality of data is recorded in association with the variables of the data. By comparing the input voice with the emotion database 1603, it is possible to specify the emotion of the currently input voice.

図１８に感情データベース１６０３の１例であり、１８０１は怒鳴り声、１８０２は悲鳴、１８０３は１８０１から１８０２の内容を示す項目、１８０４〜１８０８は各項目に対する変数である。なお、これら感情データベース６０３に記憶される項目は、これらに限定されることなく、必要に応じて、削除及び追加することができるものとする。 FIG. 18 shows an example of the emotion database 1603, in which 1801 is a shout, 1802 is a scream, 1803 is an item indicating the contents of 1801 to 1802, and 1804 to 1808 are variables for each item. The items stored in the emotion database 603 are not limited to these items, and can be deleted and added as necessary.

さらに、人間の音声ではなくても、拳銃の発砲する音、ガラス窓が割れる音など、利用者に危機となる音が発生された場合でも、利用者の危機と判断し、通報する。この処理は、図１６の感情認識処理１６００と同じ処理を用いて行なう。しかし、図１６の感情認識処理１６００では、人の発話から感情を分析するために、感情データベース１６０３を用いているが、これを危機音データベース１７００に取り替える。この危機音データベース１７００は、予め危険な音をデータベース化したものである。危機音データベース１７００は、複数のデータが該データの変数と対応づけて記録されているデータベースである。図１７に危機音テーブル１７００の１例であり、１７０１は拳銃の発砲音、１７０２はガラスの割れる音、１７０３は項目、１７０４〜１７０８は各項目に対する変数である。なお、これら危機音データベース１７００に記憶される項目は、これらに限定されることなく、必要に応じて、削除及び追加することができるものとする。 Further, even if it is not a human voice, even if a sound that causes a crisis such as a sound of firing a handgun or a sound of breaking a glass window is generated, it is determined that the user is in danger and is notified. This process is performed using the same process as the emotion recognition process 1600 of FIG. However, in the emotion recognition process 1600 of FIG. 16, the emotion database 1603 is used to analyze emotions from human speech, but this is replaced with a crisis sound database 1700. The crisis sound database 1700 is a database of dangerous sounds. The crisis sound database 1700 is a database in which a plurality of data is recorded in association with variables of the data. FIG. 17 shows an example of the crisis sound table 1700, in which 1701 is a shooting sound of a handgun, 1702 is a sound of glass breaking, 1703 is an item, and 1704 to 1708 are variables for each item. The items stored in the crisis sound database 1700 are not limited to these items, and can be deleted and added as necessary.

感情は怒りだけでなく、他の感情を検出したものを用いてもかまわない。また、感情だけではなく、例えば緊張度、好意度やうそ判定など、人の心理状態を表せるものであれば、これを用いてもかまわない。この実施例１では、例えば駅名検索など、ある目的を持った対話に対して示したが、使用用途はこれだけではなく、例えば癒し会話など、コミュニケーションをより豊かにする対話に用いることができる。
以上説明したように、第１の実施形態は、人間と機械との対話において、感情を考慮することで、その感情にあった対話制御を行なうことが可能となった。この結果、表現豊かでスムーズな対話を実現することができる。 Emotions can be detected not only from anger but also from other emotions. In addition, not only emotion but also, for example, if it can express a person's psychological state such as a degree of tension, a favor level, and a lie determination, this may be used. In the first embodiment, a dialogue having a certain purpose such as a station name search is shown. However, the usage is not limited to this, and the dialogue can be used for a dialogue that enriches communication such as a healing conversation.
As described above, according to the first embodiment, it is possible to perform dialogue control suitable for an emotion by considering the emotion in the dialogue between the human and the machine. As a result, an expressive and smooth dialogue can be realized.

本発明の対話制御方法を実施するためのシステムの概要を示す図である。It is a figure which shows the outline | summary of the system for enforcing the dialogue control method of this invention. 本発明の第１の実施形態に係わる対話修正方法における手順を示すフロー図である。It is a flowchart which shows the procedure in the dialogue correction method concerning the 1st Embodiment of this invention. サブ対話の入力方法の手順を示すフロー図である。It is a flowchart which shows the procedure of the input method of a sub dialog. 普通入力の手順を示すフロー図である。It is a flowchart which shows the procedure of normal input. 簡単入力の手順を示すフロー図である。It is a flowchart which shows the procedure of simple input. 言葉イロハ内の手順を示すフロー図である。It is a flowchart which shows the procedure in the word Iroha. 感情の検知から対話修正の手順を示すフロー図である。It is a flowchart which shows the procedure of dialogue correction from the detection of an emotion. 対話関連付けデータベースの一例を示す図である。It is a figure which shows an example of a dialog correlation database. 言葉構成データベースの一例を示す図である。It is a figure which shows an example of a word structure database. 入力切替確認・対話継続確認データベースの一例を示す図である。It is a figure which shows an example of an input switching confirmation / dialog continuation confirmation database. 対話状態分析データベースの一例を示す図である。It is a figure which shows an example of a dialog state analysis database. 感情分析の手順を示すフロー図である。It is a flowchart which shows the procedure of emotion analysis. 対話文章データベースの一例を示す図である。It is a figure which shows an example of a dialog text database. 現在状態文章データベースの一例を示す図である。It is a figure which shows an example of a present condition text database. 切替・継続対話文章データベースの一例を示す図である。It is a figure which shows an example of a switching / continuation dialogue text database. 感情判定の手順を示すフロー図である。It is a flowchart which shows the procedure of emotion determination. 危機音データベースの一例を示す図である。It is a figure which shows an example of a crisis sound database. 感情データベースの一例を示す図である。It is a figure which shows an example of an emotion database.

Explanation of symbols

１０対話制御装置、１１制御部、１２記録部、１３入力部、１４出力部。 DESCRIPTION OF SYMBOLS 10 Dialogue control apparatus, 11 Control part, 12 Recording part, 13 Input part, 14 Output part.

Claims

An input unit that receives input, a recording unit that records conversation content and emotion determination information, an output unit that outputs the current state, and a control unit;
The control unit
From the input information sent from the input unit, analysis using the emotion determination information recorded in the recording unit to determine the emotion of the input information,
Select the next response sentence using the dialogue content and emotion determination result recorded in the recording unit,
An interactive control device characterized in that output is performed via the output unit.

2. The dialogue control apparatus according to claim 1, wherein the information used for the input unit is a sound.

3. The dialogue control apparatus according to claim 2, wherein the content analyzed by the control unit is a sound feature amount.

2. The dialogue control apparatus according to claim 1, wherein the content analyzed by the control unit detects emotion and the degree of emotion using emotion determination information stored in the recording unit.

The response generated by the control unit selects the next response sentence for proceeding with the dialogue from the dialogue content recorded in the recording unit based on the determination of the emotion and the degree of the emotion. The dialogue control apparatus according to claim 1.