JP2008170817A

JP2008170817A - Interaction control device, interaction control method and interaction control program

Info

Publication number: JP2008170817A
Application number: JP2007005127A
Authority: JP
Inventors: Shinji Sugiyama; 真治杉山; Hiroaki Sekiyama; 博昭関山; Toshiyuki Nanba; 利行難波; Jitsunashi Fujishiro; 実奈子藤城; Yasuhiko Fujita; 泰彦藤田; Emi Otani; 恵美大谷
Original assignee: Toyota Motor Corp; Advanced Media Inc
Current assignee: Toyota Motor Corp; Advanced Media Inc
Priority date: 2007-01-12
Filing date: 2007-01-12
Publication date: 2008-07-24

Abstract

<P>PROBLEM TO BE SOLVED: To provide an interaction control device capable of continuing natural interaction by timely correcting response to voice guide which is output in the past by a system. <P>SOLUTION: The interaction control device 100 for controlling interaction with a user, according to a scenario which is composed of nodes including a voice guide data and a voice recognition object word for defining user utterance in order to start outputting of the voice guide data, comprises: a voice recognition means 10 for recognizing the user utterance as a text data; a node recording means 12 for recording the node including the output voice guide data; a voice recognition object word acquiring means 13 for acquiring the voice recognition object word regarding the node which is recorded by the node recording means 10; and a next output determination means 14 for determining the voice guide data to be output next, based on the text data recognized by the voice recognition means 10 and the voice recognition object word acquired by the voice recognition object word acquiring means 13. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、ユーザ入力に応じて所定のシナリオに沿った案内データを出力する対話制御装置に関し、特に、対話制御装置が過去に出力した案内データに対する応答を適時にやり直すことができるようにすることで自然な対話を継続させることができる対話制御装置、対話制御方法及び対話制御プログラムに関する。 The present invention relates to a dialog control apparatus that outputs guidance data according to a predetermined scenario in response to a user input, and more particularly to enable a dialog control apparatus to redo a response to guidance data output in the past in a timely manner. The present invention relates to a dialog control device, a dialog control method, and a dialog control program that can continue a natural dialog.

従来、ユーザが音声入力した言葉を認識し、認識した言葉に対応する音声案内データを音声出力させることでユーザとの対話を継続させる模擬会話システムが知られている（例えば、特許文献１参照。）。 2. Description of the Related Art Conventionally, there is known a simulated conversation system that recognizes a word input by a user and outputs voice guidance data corresponding to the recognized word by voice to continue a conversation with the user (see, for example, Patent Document 1). ).

この模擬会話システムは、模擬会話の会話履歴を記憶しておき、音声出力中の話題（階層構造の質問群で構成される。）の出力継続時間、又は、ユーザによる回答として所与の言葉が音声入力された回数等が所定の条件を満たす場合に、ユーザによる直近の回答（音声入力）の内容にかかわらず、直前に出力した音声案内データの話題とは異なる話題の音声案内データを出力して話題を転換する。 This simulated conversation system stores a conversation history of simulated conversations, and a given word is output as an output duration of a topic (consisting of a hierarchical group of questions) during voice output or as a response by a user. When the number of voice input, etc. satisfies a predetermined condition, voice guidance data of a topic different from the topic of the voice guidance data output immediately before is output regardless of the content of the most recent answer (voice input) by the user. To change the topic.

また、模擬会話システムは、話題転換後の新たな話題のうち既に出力した質問を再出力させず、未出力の質問を出力させるようにすることで、同じ質問を避け、かつ、話題転換により会話が単調となるのを防止しながら、より自然に会話を継続させるようにする。
特開２００２−１６９８０４号公報 In addition, the simulated conversation system avoids the same question by not outputting the question that has already been output among the new topics after the topic change, but outputting the question that has not been output. To keep the conversation more natural while preventing it from becoming monotonous.
JP 2002-169804 A

しかしながら、特許文献１に記載の模擬会話システムは、所定の条件を満たした場合に話題を転換してしまうため、ユーザが直前の話題で出力された質問に対する別の展開を希望する場合、例えば、質問で提示された選択肢のうち既に選択した選択肢とは別の選択肢を選択したい場合等であっても話題が転換されてしまうため、ユーザは所望とする選択肢を選択することができない。 However, since the simulated conversation system described in Patent Document 1 changes the topic when a predetermined condition is satisfied, when the user desires another development for the question output in the immediately preceding topic, for example, Since the topic is changed even if it is desired to select an option different from the option already selected among the options presented in the question, the user cannot select the desired option.

また、話題転換後の新たな話題のうち既に出力した質問を再出力させないようにするので、過去の質問で提示された選択肢のうち既に選択した選択肢とは別の選択肢を選択したい場合であっても、ユーザは所望とする選択肢を選択することができない。 Also, since the questions that have already been output in the new topic after the topic change are not re-outputted, it is necessary to select an option that is different from the option already selected among the options presented in the previous question. However, the user cannot select a desired option.

上述の点に鑑み、本発明は、システムが過去に出力した案内データに対する応答を適時にやり直すことができるようにすることで自然な対話を継続させることができる対話制御装置、対話制御方法及び対話制御プログラムを提供することを目的とする。 In view of the above, the present invention provides a dialog control device, a dialog control method, and a dialog capable of continuing a natural dialog by allowing a system to redo a response to guidance data output in the past in a timely manner. An object is to provide a control program.

上述の目的を達成するために、第一の発明に係る対話制御装置は、案内データと該案内データの出力を開始させるためのユーザ入力を定める認識対象語とを有するノードから構成されるシナリオに沿ってユーザとの対話を制御する対話制御装置であって、ユーザ入力をテキストデータとして認識する認識手段と、出力した案内データを有するノードを記録するノード記録手段と、前記ノード記録手段が記録したノードに対応する認識対象語を取得する認識対象語取得手段と、前記認識手段が認識したテキストデータと前記認識対象語取得手段が取得した認識対象語とに基づいて次に出力させる案内データを決定する次出力決定手段と、を備えることを特徴とする。 In order to achieve the above object, a dialogue control apparatus according to a first aspect of the present invention is a scenario comprising a node having guidance data and a recognition target word that defines a user input for starting output of the guidance data. A dialogue control apparatus for controlling a dialogue with a user along with a recognition means for recognizing user input as text data, a node recording means for recording a node having the output guide data, and the node recording means recorded Based on the recognition target word acquisition means for acquiring the recognition target word corresponding to the node, the text data recognized by the recognition means and the recognition target word acquired by the recognition target word acquisition means, the guidance data to be output next is determined. And a next output determination means.

また、第二の発明は、第一の発明に係る対話制御装置であって、前記次出力決定手段は、前記認識手段が認識したテキストデータに前記認識対象語取得手段が取得した認識対象語が含まれる場合に、該認識対象語に対応するノードが有する案内データを次に出力させる案内データとして決定することを特徴とする。 Further, the second invention is the dialogue control apparatus according to the first invention, wherein the next output determining means includes the recognition target word acquired by the recognition target word acquiring means in the text data recognized by the recognition means. When it is included, the guide data included in the node corresponding to the recognition target word is determined as guide data to be output next.

また、第三の発明は、第一又は第二の発明に係る対話制御装置であって、前記案内データの出力は、音声又はテキストによる出力であり、前記ユーザ入力は、音声又はテキストによる入力である、ことを特徴とする。 Moreover, 3rd invention is the dialog control apparatus which concerns on 1st or 2nd invention, Comprising: The output of the said guidance data is an output by an audio | voice or a text, The said user input is an input by an audio | voice or a text It is characterized by that.

また、第四の発明に係る対話制御方法は、案内データと該案内データの出力を開始させるためのユーザ入力を定める認識対象語とを有するノードから構成されるシナリオに沿ってユーザとの対話を制御する対話制御方法であって、ユーザ入力をテキストデータとして認識する認識ステップと、出力した案内データを有するノードを記録するノード記録ステップと、前記ノード記録ステップにおいて記録されたノードに対応する認識対象語を取得する認識対象語取得ステップと、前記認識ステップにおいて認識されたテキストデータと前記認識対象語取得ステップにおいて取得された認識対象語とに基づいて次に出力させる案内データを決定する次出力決定ステップと、を備えることを特徴とする。 According to a fourth aspect of the present invention, there is provided a dialogue control method for carrying out dialogue with a user in accordance with a scenario composed of nodes having guidance data and recognition target words for defining user input for starting output of the guidance data. An interactive control method for controlling, a recognition step for recognizing user input as text data, a node recording step for recording a node having the output guidance data, and a recognition target corresponding to the node recorded in the node recording step A recognition target word acquisition step for acquiring a word; a next output determination for determining next guidance data to be output based on the text data recognized in the recognition step and the recognition target word acquired in the recognition target word acquisition step And a step.

また、第五の発明は、第四の発明に係る対話制御方法であって、前記次出力決定ステップは、前記認識ステップにおいて認識されたテキストデータに前記認識対象語取得ステップにおいて取得された認識対象語が含まれる場合に、該認識対象語に対応するノードが有する案内データを次に出力させる案内データとして決定することを特徴とする。 The fifth invention is the dialogue control method according to the fourth invention, wherein the next output determining step includes the recognition target acquired in the recognition target word acquisition step in the text data recognized in the recognition step. When a word is included, the guide data included in the node corresponding to the recognition target word is determined as guide data to be output next.

また、第六の発明は、第四又は第五の発明に係る対話制御方法であって、前記案内データの出力は、音声又はテキストによる出力であり、前記ユーザ入力は、音声又はテキストによる入力であることを特徴とする。 The sixth invention is the dialogue control method according to the fourth or fifth invention, wherein the output of the guidance data is output by voice or text, and the user input is input by voice or text. It is characterized by being.

また、第七の発明は、第四乃至第六の何れかの発明に係る対話制御方法をコンピュータに実行させるための対話制御プログラムである。 The seventh invention is a dialogue control program for causing a computer to execute the dialogue control method according to any of the fourth to sixth inventions.

上述の手段により、本発明は、システムが過去に出力した案内データに対する応答を適時にやり直すことができるようにすることで自然な対話を継続させることができる対話制御装置、対話制御方法及び対話制御プログラムを提供することができる。 With the above-described means, the present invention provides a dialog control device, a dialog control method, and a dialog control capable of continuing a natural dialog by allowing a system to repeat a response to guidance data output in the past in a timely manner. A program can be provided.

以下、図面を参照しつつ、本発明を実施するための最良の形態の説明を行う。 Hereinafter, the best mode for carrying out the present invention will be described with reference to the drawings.

（１）実施例の概要
本発明に係る対話制御装置は、シナリオに沿って音声案内を出力しながらユーザが必要とする情報を提供する装置であり、例えば、カーナビゲーションシステムにおけるグルメ紹介、ショッピング紹介、観光地紹介、オーディオ操作、機器操作等の各ドメインを行き来させたり、各ドメイン内における入力操作を制御したりするために利用される。 (1) Outline of Embodiment A dialogue control apparatus according to the present invention is an apparatus that provides information required by a user while outputting voice guidance according to a scenario. For example, gourmet introduction and shopping introduction in a car navigation system It is used to move between domains such as sightseeing spot introduction, audio operation, device operation, etc., and to control input operations within each domain.

「シナリオ」とは、音声案内データと次の音声案内データに移行するための移行条件を定める音声認識対象語とから構成される情報の集合（以下、「ノード」という。）をツリー状に連結したデータ構造をいい、このデータ構造により音声案内の順序を定める。 A “scenario” is a collection of information (hereinafter referred to as “nodes”) composed of voice guidance data and a speech recognition target word that defines a transition condition for transition to the next voice guidance data. The voice guidance order is determined by this data structure.

「音声認識対象語」は、ユーザによる発話が期待される語として予め登録される語であり、キーワード又はキーフレーズ等であってもよい。対話制御装置は、所定の音声認識対象語がユーザによって発話されたことを認識すると、所定のノードへの移行条件が満たされたとしてその所定のノードが有する音声案内データを出力させる。 The “voice recognition target word” is a word registered in advance as a word expected to be uttered by the user, and may be a keyword or a key phrase. When the dialogue control device recognizes that a predetermined speech recognition target word is uttered by the user, the dialogue control device outputs the voice guidance data of the predetermined node assuming that the transition condition to the predetermined node is satisfied.

「移行条件」とは、所定のノードに移行するための条件をいい、例えば、所定の音声認識対象語がユーザ発話として認識されることの他、外部情報やユーザプロファイルに基づいて設定されてもよい。なお、対話制御装置が車輌に搭載される場合における「外部情報」には、ウィンドウの開閉状態、エンジンの始動・停止状態、燃料残量、ブレーキ使用状態、車速、アクセル開度、エアコン設定温度、車載ラジオ使用状態等の情報があり、対話制御装置が車輌に搭載される場合における「ユーザプロファイル」には、運転者の性別、家族構成、誕生日、趣味、長時間運転する場合の休憩取得間隔、車輌使用頻度（朝晩毎日、朝晩平日、週末、月一回、夕方毎日等）、好きな食べ物、好きな音楽等がある。 “Transition condition” refers to a condition for transition to a predetermined node. For example, a predetermined speech recognition target word is recognized as a user utterance, or may be set based on external information or a user profile. Good. The “external information” when the interactive control device is mounted on the vehicle includes the open / close state of the window, the engine start / stop state, the remaining fuel amount, the brake usage state, the vehicle speed, the accelerator opening, the air conditioner set temperature, The “user profile” when there is information such as the in-vehicle radio usage status and the dialogue control device is installed in the vehicle includes the driver's gender, family structure, birthday, hobbies, and interval for taking a break when driving for a long time. , Vehicle usage frequency (every morning and evening, morning and evening weekdays, weekends, once a month, every day in the evening, etc.), favorite food, favorite music, etc.

また、各シナリオは、例えば、高速道路走行中に出力されるシナリオ、登録された施設に接近した場合に出力されるシナリオといった特定の内容を有し、ルートノード（ツリー構造の最上位にあるノードをいう。）、ブランチノード（ツリー構造の分岐点にあるノードをいう。）、及び、リーフノード（ツリー構造の末端にあるノードをいう。）から構成される。 Each scenario has specific contents such as a scenario output during driving on an expressway, a scenario output when approaching a registered facility, and the root node (the node at the top of the tree structure). ), Branch nodes (referred to as nodes at the branch point of the tree structure), and leaf nodes (referred to as nodes at the end of the tree structure).

なお、移行条件が満たされた場合に（例えば、ユーザ発話の中に音声認識対象語が含まれる場合をいう。）、所定のノードにおける音声案内等の所定の処理を開始させることをノードの「発火」といい、ノードへの移行条件のことを「発火条件」という。 Note that when the transition condition is satisfied (for example, a case where a speech recognition target word is included in the user utterance), a predetermined process such as voice guidance at a predetermined node is started. This is called “ignition”, and the transition condition to the node is called “ignition condition”.

また、ノードの発火条件が満たされたか否かを判定し、発火条件が満たされた場合に音声案内等の所定の処理を開始させ、発火条件が満たされない場合にはその判定処理を周期的に繰り返し実行させる制御のことを「発火制御」という。 In addition, it is determined whether or not the ignition condition of the node is satisfied, and when the ignition condition is satisfied, a predetermined process such as voice guidance is started, and when the ignition condition is not satisfied, the determination process is periodically performed. Control that is repeatedly executed is called "ignition control".

図２は、シナリオのツリー構造を示す図であり、シナリオＡがルートノードＮ１、ブランチノードＮ２、Ｎ３、リーフノードＮ４、Ｎ５、Ｎ６、Ｎ７を有することを示す。 FIG. 2 is a diagram showing a tree structure of a scenario, and shows that scenario A has a root node N1, branch nodes N2, N3, and leaf nodes N4, N5, N6, N7.

図３は、ノードの構成例を示す図であり、各ノードは、音声案内データ部５０及び音声認識対象語部５１を有する。また、図４は、音声認識対象語部５１の構成例を示す図であり、音声認識対象語部５１は、各音声認識対象語と移行先のノードとの間の対応関係を保持する。 FIG. 3 is a diagram illustrating a configuration example of nodes, and each node includes a voice guidance data unit 50 and a voice recognition target word unit 51. FIG. 4 is a diagram illustrating a configuration example of the speech recognition target word unit 51, and the speech recognition target word unit 51 holds a correspondence relationship between each speech recognition target word and the destination node.

例えば、ノードＮ３の音声案内データ部５０に記憶された音声案内データ「観光案内しようか。名物、名産品、見どころの情報があるけどどれがいい？」に対する音声認識対象語には、「名物」、「名産品」、「見どころ」、「聞きたくない」等が登録されており、対話制御装置は、ユーザの発話中に「名物」が含まれることを認識するとノードＮ５への移行条件が満たされたとしてノードＮ５を発火させる。 For example, the speech recognition target words for the voice guidance data “Let's guide sightseeing? There are specialties, specialties, and highlights, but which is better?” Stored in the voice guidance data section 50 of the node N3 are “specialties”, “ “Specialties”, “Highlights”, “I don't want to hear”, etc. are registered, and the dialog control device recognizes that “specialties” are included in the user's utterance, so that the transition condition to the node N5 is satisfied To fire the node N5.

また、対話制御装置は、過去に発火させたノードに関する情報を対話履歴として記録しておき、ユーザの発話を認識すると、対話履歴に記録しておいたノードのそれぞれが有する音声認識対象語をその記録の新しい順に取得して、過去に発火させたノードが有する音声認識対象語（直近のノードが有する音声認識対象語を「現音声認識対象語」とし、現音声認識対象語以外の過去に発火させたノードが有する音声認識対象語を「既出音声認識対象語」として区別する。）がユーザの発話の中に含まれるか否かを判定する。 In addition, the dialog control device records information about nodes that have been ignited in the past as a dialog history, and when recognizing a user's utterance, the speech recognition target word of each of the nodes recorded in the dialog history is Speech recognition target words of nodes that were acquired in the order of new records and fired in the past (the speech recognition target words of the most recent node are designated as “current speech recognition target words” and fired in the past other than the current speech recognition target words It is determined whether or not the speech recognition target word of the selected node is included in the user's utterance as “existing speech recognition target word”.

ユーザの発話の中に現音声認識対象語が含まれる場合、対話制御装置は、その現音声認識対象語に対応するノードを発火させ音声案内を出力させるが、ユーザの発話の中に現音声認識対象語が含まれない場合、対話制御装置は、既出音声認識対象語がユーザの発話の中に含まれるか否かを判定し、ユーザの発話の中に既出音声認識対象語が含まれる場合、その既出音声認識対象語に対応するノードを発火させる。 When the current speech recognition target word is included in the user's utterance, the dialogue control device fires a node corresponding to the current speech recognition target word and outputs a voice guidance, but the current speech recognition is included in the user's utterance. When the target word is not included, the dialogue control apparatus determines whether the already-recognized speech recognition target word is included in the user's utterance, and when the already-spoken speech recognition target word is included in the user's utterance, A node corresponding to the speech recognition target word is fired.

なお、対話制御装置は、その既出音声認識対象語に対応するノードを所定時間内に既に発火させていた場合には、そのノードの発火を禁止するようにしてもよい。 In addition, when the node corresponding to the speech recognition target word has already been ignited within a predetermined time, the dialogue control apparatus may prohibit the ignition of the node.

これにより、対話制御装置は、「さっきの○○ってどういうの？」、「やっぱり○○にする」といったユーザ発話により、ある時点における対話制御装置による質問において選択されなかった選択肢への移行をユーザが希望する場合にも柔軟に対応することができ、自然な対話を継続させることができる。 As a result, the dialog control device shifts to an option that was not selected in a question by the dialog control device at a certain point in time by a user utterance such as “What is XX of the day?” It is possible to flexibly cope with a case where the user desires, and a natural conversation can be continued.

（２）実施例の詳細
図１は、本発明に係る対話制御装置の構成例を示す図であり、対話制御装置１００は、制御部１、音声取得部２、記憶部３及び音声出力部４から構成され、所定のシナリオに沿って記憶部３に記憶された音声案内データを読み出して音声出力部４からその音声案内データを出力する。 (2) Details of Embodiment FIG. 1 is a diagram showing a configuration example of a dialogue control apparatus according to the present invention. A dialogue control apparatus 100 includes a control unit 1, a voice acquisition unit 2, a storage unit 3, and a voice output unit 4. The voice guidance data stored in the storage unit 3 is read in accordance with a predetermined scenario, and the voice guidance data is output from the voice output unit 4.

制御部１は、ＣＰＵ（Central Processing Unit）、ＲＡＭ（Random Access Memory）、ＲＯＭ（Read Only Memory）等を備えたコンピュータであり、音声認識手段１０、データ提供手段１１、ノード記録手段１２、音声認識対象語取得手段１３及び次出力決定手段１４に対応するプログラムをＲＯＭに記憶し、それらプログラムをＲＡＭ上に展開して対応する処理をＣＰＵに実行させる。 The control unit 1 is a computer including a CPU (Central Processing Unit), a RAM (Random Access Memory), a ROM (Read Only Memory), and the like, and includes a voice recognition unit 10, a data providing unit 11, a node recording unit 12, and a voice recognition unit. Programs corresponding to the target word acquisition means 13 and the next output determination means 14 are stored in the ROM, and these programs are expanded on the RAM to cause the CPU to execute corresponding processing.

音声取得部２は、ユーザが発する音声を取得するための装置であり、例えば、対話制御装置１００が車輌に搭載された場合、車輌を運転するドライバの音声を確実に認識できるよう指向性を備えたマイクが用いられる。 The voice acquisition unit 2 is a device for acquiring a voice uttered by a user. For example, when the dialogue control apparatus 100 is mounted on a vehicle, the voice acquisition unit 2 has directivity so that the voice of a driver driving the vehicle can be reliably recognized. A microphone is used.

また、音声取得部２は、入力音検知機能を備え、定常雑音以外の音をユーザの発話による音声として取得してもよく、音量や音の長さに基づいて取得する音声を取捨選択してもよい。また、音声取得部２は、発話検出機能を備え、入力される音声のうち人の発話部分のみを検出して取得するようにしてもよい。なお、音声取得部２は、取得したユーザの音声を制御部１に出力する。 In addition, the voice acquisition unit 2 may have an input sound detection function, and may acquire a sound other than stationary noise as a voice generated by the user's utterance. Also good. In addition, the voice acquisition unit 2 may have an utterance detection function, and may detect and acquire only a human utterance portion of input voice. The voice acquisition unit 2 outputs the acquired user voice to the control unit 1.

記憶部３は、対話制御装置１００が必要とする各種情報を記憶するための装置であり、例えば、図２乃至図４で示したデータ構造を有する複数のシナリオを体系的に構成したシナリオデータベース３０や後述の対話履歴データベース３１を格納するハードディスクである。 The storage unit 3 is a device for storing various types of information required by the dialogue control apparatus 100. For example, a scenario database 30 that systematically configures a plurality of scenarios having the data structure shown in FIGS. And a hard disk for storing a dialogue history database 31 to be described later.

音声出力部４は、音声案内データを音声出力するための装置であり、例えば、制御部１から出力される音声案内データを音声出力する車載スピーカである。 The voice output unit 4 is a device for outputting voice guidance data by voice, and is, for example, an in-vehicle speaker that outputs voice guidance data output from the control unit 1 by voice.

次に、制御部１が有する各種手段について詳細に説明する。 Next, various units included in the control unit 1 will be described in detail.

音声認識手段１０は、ユーザ発話をテキストデータとして認識するための手段であり、例えば、音声取得部２を介して取得したユーザ発話をテキストデータに変換する。 The voice recognition means 10 is means for recognizing a user utterance as text data. For example, the voice recognition means 10 converts the user utterance acquired via the voice acquisition unit 2 into text data.

また、音声認識手段１０は、ユーザ発話をテキストデータとして認識するばかりでなく、認識結果の信頼度、ユーザ発話の長さ（語数又は時間）、ユーザ発話の発話速度若しくは発話速度変化、又は、ユーザ発話の発話音量等の特徴を検出し、それら特徴をテキストデータに関連付けて記憶するようにしてもよい。ユーザが発話に込めた意思をより正確に認識するためである。 The voice recognition means 10 not only recognizes user utterances as text data, but also recognizes the reliability of the recognition result, the length of the user utterance (number of words or time), the utterance speed of the user utterance or the change in the utterance speed, or the user Features such as the utterance volume of an utterance may be detected and stored in association with text data. This is because the user can more accurately recognize the intention put into the utterance.

データ提供手段１１は、シナリオに沿って音声案内データを提供するための手段であり、例えば、発火条件を満たしたノードの音声案内データを音声出力部４から出力させる。 The data providing unit 11 is a unit for providing voice guidance data according to the scenario. For example, the voice providing data of a node that satisfies the firing condition is output from the voice output unit 4.

ノード記録手段１２は、出力した音声案内データを有するノードを対話履歴として記録するための手段であり、例えば、図５に示すように、発火の順番を示す発火履歴番号、発火させたノードの番号、及び、そのノードを発火させた時刻等を関連付けて記憶部３にある対話履歴データベース３１に記録する。 The node recording unit 12 is a unit for recording a node having the output voice guidance data as a dialogue history. For example, as shown in FIG. 5, the firing history number indicating the firing order, the number of the fired node And the time when the node was ignited are recorded in the dialogue history database 31 in the storage unit 3 in association with each other.

なお、ノード記録手段１２は、各ノードが有する音声案内データを表す概念キーワード、又は、その概念に含まれるサブキーワード若しくはサブキーフレーズを対話履歴データベースに記録するようにしてもよい。 Note that the node recording unit 12 may record a concept keyword representing voice guidance data included in each node, or a subkeyword or subkey phrase included in the concept, in the dialogue history database.

音声認識対象語取得手段１３は、発火させたノードが有する音声認識対象語を取得するための手段であり、例えば、直近に発火させたノードの音声認識対象語部５１を参照してそのノードの現音声認識対象語を取得する。 The speech recognition target word acquisition unit 13 is a unit for acquiring the speech recognition target word that the fired node has. For example, the speech recognition target word acquisition unit 13 refers to the speech recognition target word unit 51 of the node fired most recently. Get the current speech recognition target word.

また、音声認識対象語取得手段１３は、対話履歴データベース３１を参照して過去に発火させたノードが有する既出音声認識対象語を取得するようにしてもよい。 Further, the speech recognition target word acquisition means 13 may refer to the dialogue history database 31 to acquire the already-speech recognition target words that the nodes fired in the past have.

次出力決定手段１４は、次に出力させる音声案内を決定するための手段であり、例えば、ドメイン判定手段１４０、類似シナリオ検索手段１４１、話題転換手段１４２及び対話終了手段１４３等を有し、ユーザの発話内容や発話タイミング等に基づいて次に出力させる音声案内を決定する。 The next output determination unit 14 is a unit for determining the voice guidance to be output next, and includes, for example, a domain determination unit 140, a similar scenario search unit 141, a topic conversion unit 142, a dialogue end unit 143, and the like. The voice guidance to be output next is determined based on the utterance content and the utterance timing.

ドメイン判定手段１４０は、ユーザの発話内容に関連するドメイン（以下、「関連ドメイン」という。）が存在するか否かを判定するための手段であり、例えば、ユーザの発話に含まれる語句等に基づいて関連ドメインの存在の有無を判定する。 The domain determination means 140 is a means for determining whether or not a domain related to the user's utterance content (hereinafter referred to as “related domain”) exists. Based on this, the existence of the related domain is determined.

「ドメイン」とは、共通する内容でまとめられた情報の集合をいい、例えば、カーナビゲーションシステムにおけるグルメ紹介ドメイン、ショッピング紹介ドメイン、観光地紹介ドメイン、オーディオ操作ドメイン、機器操作ドメイン等があり、各ドメインは、自身を特徴付ける概念キーワード（例えば、スポーツがある。）を有し、その概念キーワードは、さらに、その概念に含まれるサブキーワード又はサブキーフレーズ等（例えば、野球、サッカー、バレーボール等がある。）を有する。 “Domain” means a set of information gathered with common contents. For example, there are a gourmet introduction domain, a shopping introduction domain, a sightseeing spot introduction domain, an audio operation domain, and a device operation domain in a car navigation system. The domain has a concept keyword that characterizes itself (for example, there is a sport), and the concept keyword further includes a subkeyword or a subkey phrase included in the concept (for example, baseball, soccer, volleyball, etc.). ).

また、各ドメインは、関連する情報を提供するためのシナリオ（例えば、観光地紹介ドメインにおける観光案内シナリオをいう。）や関連する操作を実行させるためのシナリオ（例えば、オーディオ操作ドメインにおける音量調整シナリオをいう。）を有する。 In addition, each domain has a scenario for providing related information (for example, a tourism guide scenario in the tourist destination introduction domain) and a scenario for executing a related operation (for example, a volume adjustment scenario in the audio operation domain). ).

また、対話制御装置１００は、ユーザの発話内容とドメインとの関連を判定するためにシソーラスを備えるようにしてもよく、ＴＦ−ＩＤＦ(Term Frequency−Inverse Document Frequency)法を用いてユーザの発話内容とドメインとの関連を判定するようにしてもよい。 Further, the dialogue control apparatus 100 may be provided with a thesaurus to determine the relation between the user's utterance content and the domain, and the user's utterance content using the TF-IDF (Term Frequency-Inverse Document Frequency) method. And the relationship between the domain and the domain may be determined.

類似シナリオ検索手段１４１は、ユーザの発話内容と内容が類似するシナリオ、又は、発火させたシナリオの内容と内容が類似するシナリオ（以下、「類似シナリオ」という。）を検索するための手段である。 The similar scenario search unit 141 is a unit for searching for a scenario whose content is similar to the content of the user's utterance or a scenario whose content is similar to the content of the fired scenario (hereinafter referred to as “similar scenario”). .

類似シナリオ検索手段１４１は、例えば、ユーザの発話に含まれる語句等、或いは、直近に発火させたシナリオにおけるノードが有する音声認識対象語やその音声出力データに含まれる語句と、検索対象の各シナリオにおけるノードが有する音声認識対象語やその音声出力データに含まれる語句とに基づいて、全てのシナリオの中から類似シナリオを検索する。 The similar scenario search means 141 includes, for example, a phrase included in the user's utterance, or a speech recognition target word included in a node in a scenario fired recently, a phrase included in the voice output data, and each scenario to be searched A similar scenario is searched from all scenarios based on the speech recognition target word of the node and the phrase included in the speech output data.

各シナリオは、ドメインと同様に、自身を特徴付ける概念キーワード（例えば、観光案内がある。）を有してもよく、さらに、その概念に含まれるサブキーワード又はサブキーフレーズ等（例えば、温泉、展望台、土産物屋等がある。）を有してもよい。 Each scenario may have a concept keyword that characterizes itself (for example, there is a tourist guide) as well as a domain, and further includes a subkeyword or subkey phrase included in the concept (for example, a hot spring, an observation deck). And souvenir shops, etc.).

また、対話制御装置１００は、ユーザの発話内容又は発火させたシナリオの内容と検索対象のシナリオの内容との類似度を判定するためにシソーラスを備えるようにしてもよく、ＴＦ−ＩＤＦ法を用いて類似度を判定するようにしてもよい。 Further, the dialog control apparatus 100 may include a thesaurus for determining the similarity between the user's utterance content or the content of the fired scenario and the content of the scenario to be searched, and uses the TF-IDF method. Thus, the degree of similarity may be determined.

また、類似シナリオ検索手段１４１は、ユーザの発話長（一回の発話長であってもよく、各シナリオにおける合計発話長であってもよい。）が所定時間以上の場合に限り、音声認識手段１０が認識したテキストデータから自立語を抽出して、類似シナリオを検索するようにしてもよい。 The similar scenario search means 141 is a voice recognition means only when the user's utterance length (one utterance length or the total utterance length in each scenario) is a predetermined time or more. A similar scenario may be searched by extracting an independent word from the text data recognized by 10.

ユーザの発話長が長い場合、ユーザが対話制御装置１００との対話に積極的であると推測でき、類似シナリオを発火させることで、対話制御装置１００は、ユーザとの自然な対話を継続させることができるからである。 When the user's utterance length is long, it can be assumed that the user is active in the dialog with the dialog control apparatus 100, and by igniting a similar scenario, the dialog control apparatus 100 can continue the natural dialog with the user. Because you can.

また、ユーザの発話長が短い場合、ユーザが対話制御装置１００との対話に消極的であると推測でき、類似シナリオを発火させないようにすることで、対話制御装置１００は、関心の薄い話題を提供してユーザに不快感を与えてしまうのを防止することができるからである。 In addition, when the user's utterance length is short, it can be assumed that the user is reluctant to interact with the dialog control apparatus 100, and by preventing the similar scenario from being ignited, the dialog control apparatus 100 allows a topic of low interest to be discussed. This is because it is possible to prevent the user from feeling uncomfortable.

話題転換手段１４２は、話題を転換するための手段であり、例えば、発火させることができるシナリオ群から乱数を用いて特定のシナリオを選択する。 The topic conversion unit 142 is a unit for converting a topic, and for example, selects a specific scenario from a scenario group that can be ignited using a random number.

対話終了手段１４３は、対話を終了させる手段であり、例えば、対話の終了をユーザに明示する終了通知メッセージを出力させる。 The dialog ending unit 143 is a unit that ends the dialog. For example, the dialog ending unit 143 outputs an end notification message that clearly indicates the end of the dialog to the user.

次に、図６を参照しながら、次出力決定手段１４が次に出力させる音声案内を決定する処理の流れについて説明する。図６は、この処理の流れを示すフローチャートである。 Next, the flow of processing for determining the voice guidance to be output next by the next output determining means 14 will be described with reference to FIG. FIG. 6 is a flowchart showing the flow of this process.

最初に、対話制御装置１００は、音声取得部２でユーザ発話を検知すると音声認識手段１０による認識ができたか否かを判定する（ステップＳ１）。 First, when the voice acquisition unit 2 detects a user utterance, the dialogue control apparatus 100 determines whether or not the voice recognition unit 10 has recognized (step S1).

ユーザ発話をテキストデータとして認識できた場合（ステップＳ１のＹＥＳ）、対話制御装置１００は、音声認識対象語取得手段１３により、ノード記録手段１２が記録した対話履歴データベース３１を参照し、過去に発火させたノードが有する音声認識対象語を取得して、テキストデータに音声認識対象語が含まれるか否かを判定する（ステップＳ２）。 When the user utterance can be recognized as text data (YES in step S1), the dialogue control apparatus 100 refers to the dialogue history database 31 recorded by the node recording unit 12 by the voice recognition target word acquisition unit 13 and fires in the past. The speech recognition target word possessed by the node is acquired, and it is determined whether or not the speech recognition target word is included in the text data (step S2).

なお、音声認識対象取得手段１３は、最後に発火させたノード以外の過去のノードに関しては、「はい」、「いいえ」、「どっちでもいい」、「どっちもいや」、「それ」等の語句を既出音声認識対象語として取得しないようにする。何れのノードに対する回答であるか特定できないためである。 It should be noted that the speech recognition target acquisition means 13 has a phrase such as “Yes”, “No”, “Any”, “Any” or “It” for past nodes other than the last fired node. Is not acquired as a speech recognition target word. This is because it cannot be specified which node the answer is for.

テキストデータに音声認識対象語が含まれる場合（ステップＳ２のＹＥＳ）、対話制御装置１００は、次出力決定手段１４に後述のノード移行処理を実行させる（ステップＳ３）。 When the text recognition target word is included in the text data (YES in step S2), the dialogue control apparatus 100 causes the next output determination unit 14 to execute a node transition process described later (step S3).

テキストデータに音声認識対象語が含まれない場合（ステップＳ２のＮＯ）、対話制御装置１００は、次出力決定手段１４における類似シナリオ検索手段１４１により、類似シナリオを検索する（ステップＳ４）。 When the text recognition target word is not included in the text data (NO in step S2), the dialogue control apparatus 100 searches for a similar scenario by the similar scenario search unit 141 in the next output determination unit 14 (step S4).

類似シナリオが検索できた場合（ステップＳ４のＹＥＳ）、対話制御装置１００は、データ提供手段１１により、その類似シナリオのルートノードが有する音声案内データを音声出力部４から出力させる（ステップＳ５）。 When a similar scenario can be searched (YES in step S4), the dialogue control apparatus 100 causes the data providing unit 11 to output voice guidance data included in the root node of the similar scenario from the voice output unit 4 (step S5).

類似シナリオが検索できない場合（ステップＳ４のＮＯ）、対話制御装置１００は、次出力決定手段１４におけるドメイン判定手段１４０により、ユーザの発話内容に関連するドメインが存在するか否かを判定する（ステップＳ６）。 When the similar scenario cannot be searched (NO in step S4), the dialogue control apparatus 100 determines whether or not there is a domain related to the user's utterance content by the domain determination unit 140 in the next output determination unit 14 (step S4). S6).

関連ドメインが存在する場合（ステップＳ６のＹＥＳ）、対話制御装置１００は、データ提供手段１１により、その関連ドメインにあるシナリオのルートノードが有する音声案内データを音声出力部４から出力させ（ステップＳ７）、関連ドメインが存在しない場合（ステップＳ６のＮＯ）、次出力決定手段１４に後述の対話継続可否判定処理を実行させる（ステップＳ８）。 When the related domain exists (YES in step S6), the dialogue control apparatus 100 causes the data providing unit 11 to output the voice guidance data of the scenario root node in the related domain from the voice output unit 4 (step S7). ), If the related domain does not exist (NO in step S6), the next output determining unit 14 is caused to execute a dialogue continuation possibility determination process described later (step S8).

次に、図７を参照しながら、ノード移行処理の流れについて説明する。なお、図７は、ノード移行処理の流れを示すフローチャートである。 Next, the flow of node migration processing will be described with reference to FIG. FIG. 7 is a flowchart showing the flow of node migration processing.

最初に、次出力決定手段１４は、対話履歴データベース３１を参照して、テキストデータに含まれる音声認識対象語を有するノード（以下、「該当ノード」という。）の発火後経過時間を算出する（ステップＳ１１）。 First, the next output determination unit 14 refers to the dialogue history database 31 to calculate the elapsed time after firing of a node having a speech recognition target word included in the text data (hereinafter referred to as “corresponding node”) ( Step S11).

なお、次出力決定手段１４は、該当ノードが複数存在する場合には、発火履歴番号が大きいほう（新しいほう）を優先させる。ユーザは、直前の選択肢や話題に話を戻したい場合が多いと考えられるからである。 When there are a plurality of corresponding nodes, the next output determining unit 14 gives priority to the larger (newer) firing history number. This is because it is considered that the user often wants to return to the previous choice or topic.

また、次出力決定手段１４は、該当ノードが同じシナリオ内に複数存在する場合には、下位のノードを優先させる。シナリオの下位にあるノードは、より上位にあるノードに比べてより具体的な質問を提示しているからであり、ユーザが期待する音声案内をより迅速に提供できると考えられるからである。 Further, the next output determining unit 14 gives priority to the lower node when there are a plurality of corresponding nodes in the same scenario. This is because the node at the lower level of the scenario presents a more specific question than the node at the higher level, and it is considered that the voice guidance expected by the user can be provided more quickly.

但し、ユーザ発話に「昨日」、「お昼」等の日時を特定する語句が含まれる場合には、その日時に合致する発火履歴を有する該当ノードを優先させるようにしてもよい。 However, if the user's utterance includes a phrase specifying a date such as “yesterday” or “noon”, priority may be given to a corresponding node having an ignition history that matches the date and time.

該当ノードの発火後経過時間が閾値Ｔ１（例えば、５分）未満の場合（ステップＳ１１のＹＥＳ）、次出力決定手段１４は、対話履歴データベース３１を参照して、その音声認識対象語が示す移行先のノード（以下、「対応ノード」という。）が発火済みであるか否かを判定し（ステップＳ１２）、対応ノードが発火済みの場合（ステップＳ１２のＹＥＳ）、後述の対話継続可否判定処理を実行させる（ステップＳ１３）。 When the elapsed time after firing of the corresponding node is less than a threshold value T1 (for example, 5 minutes) (YES in step S11), the next output determining unit 14 refers to the dialogue history database 31 and shifts indicated by the speech recognition target word. It is determined whether or not the previous node (hereinafter referred to as “corresponding node”) has been ignited (step S12), and if the corresponding node has been ignited (YES in step S12), a dialog continuation continuity determination process described later. Is executed (step S13).

対応ノードが発火済みでない場合（ステップＳ１２のＮＯ）、次出力決定手段１４は、対応ノードを次に出力させるノードとして決定し（ステップＳ１４）、データ提供手段１１によりその対応ノードが有する音声案内データを音声出力部４から出力させる。 If the corresponding node has not been ignited (NO in step S12), the next output determining unit 14 determines the corresponding node as a node to be output next (step S14), and the voice providing data that the corresponding node has by the data providing unit 11 Is output from the audio output unit 4.

一方、該当ノードの発火後経過時間が閾値Ｔ１（例えば、５分）以上の場合（ステップＳ１１のＮＯ）、次出力決定手段１４は、「○○（ユーザ発話に含まれる音声認識対象語）でいいですか？」、「○○（ユーザ発話に含まれる音声認識対象語）と△△（別の音声認識対象語）のうちの○○ですね。」等の確認メッセージを音声出力し（ステップＳ１５）、ユーザの確認を得た上で（例えば、ユーザ発話「はい」を取得した場合をいう。）（ステップＳ１６のＹＥＳ）、対応ノードを次に出力させるノードとして決定する（ステップＳ１４）。 On the other hand, when the elapsed time after firing of the corresponding node is equal to or greater than a threshold T1 (for example, 5 minutes) (NO in step S11), the next output determining means 14 is “XX (voice recognition target word included in user utterance)”. Confirmation message such as “Is it OK?”, “XX (speech recognition target word included in user utterance) and △△ (another speech recognition target word) is XX?” (Step) S15) After obtaining the user's confirmation (for example, the case where the user utterance “Yes” is acquired) (YES in step S16), the corresponding node is determined as the node to be output next (step S14).

ユーザの確認が得られなかった場合（例えば、ユーザ発話「いいえ」を取得した場合をいう。）（ステップＳ１６のＮＯ）、次出力決定手段１４は、後述の対話継続可否判定処理を実行させる（ステップＳ１７）。 When the user confirmation is not obtained (for example, the case where the user utterance “No” is acquired) (NO in Step S16), the next output determination unit 14 executes a dialog continuation continuity determination process described later ( Step S17).

所定時間が経過しているため、対話制御装置１００が出力した質問や選択肢の内容をユーザが正確に憶えていない場合があるからである。 This is because, since the predetermined time has elapsed, the user may not correctly remember the contents of questions and options output by the dialog control apparatus 100.

この場合、その対応ノードが発火済みであっても、次出力決定手段１４は、その対応ノードが有する音声案内データを音声出力部４から出力させるようにしてもよい。 In this case, even if the corresponding node has been fired, the next output determining unit 14 may cause the voice output unit 4 to output the voice guidance data of the corresponding node.

所定時間が経過しているため、その対応ノードが有する音声案内データを再出力しても、ユーザに煩わしさを感じさせることがないからである。 This is because, since the predetermined time has elapsed, even if the voice guidance data of the corresponding node is output again, the user does not feel bothered.

なお、次出力決定手段１４は、記憶部３に格納されたシソーラスを参照して音声認識対象語に類似する語（以下、「類似対象語」という。）を取得し、類似対象語がテキストデータに含まれる場合に、「○○（ユーザ発話に含まれる類似対象語に対応する音声認識対象語）でいいですか？」といった確認メッセージを音声出力するようにしてもよい。 The next output determining unit 14 refers to the thesaurus stored in the storage unit 3 to obtain a word similar to the speech recognition target word (hereinafter referred to as “similar target word”), and the similar target word is text data. May be output as a confirmation message such as “Are you sure you want to use the speech recognition target word corresponding to the similar target word included in the user utterance?”.

その上で、次出力決定手段１４は、ユーザの確認を得た場合に（例えば、ユーザ発話「はい」を取得した場合をいう。）、ノード移行処理を実行するようにしてもよい。対話制御装置１００が過去に提示した選択肢をユーザが正確に憶えていない場合もあるからである。 In addition, the next output determination unit 14 may execute the node migration process when the user's confirmation is obtained (for example, when the user utterance “Yes” is acquired). This is because the user may not accurately remember the options presented by the dialog control apparatus 100 in the past.

なお、対話制御装置１００は、発火後経過時間に基づいて制御方法を変更するが、対話履歴を幾つ（ノード数）遡ったかに基づいて制御方法を変更するようにしてもよい。 The dialog control apparatus 100 changes the control method based on the elapsed time after firing, but may change the control method based on how many dialog histories (number of nodes) are traced back.

次に、図８を参照しながら、対話継続可否判定処理の流れについて説明する。なお、図８は、対話継続可否判定処理の流れを示すフローチャートである。 Next, the flow of the process for determining whether or not to continue the conversation will be described with reference to FIG. FIG. 8 is a flowchart showing the flow of the process for determining whether or not to continue the conversation.

最初に、次出力決定手段１４は、発火制御により発火条件を満たすシナリオが存在するか否かを判定する（ステップＳ２１）。 First, the next output determination unit 14 determines whether or not there is a scenario that satisfies the ignition condition by the ignition control (step S21).

発火可能なシナリオが存在する場合（ステップＳ２１のＹＥＳ）、次出力決定手段１４は、類似シナリオ検索手段１４１により、直近に発火させたシナリオに類似する類似シナリオが存在するか否かを判定する（ステップＳ２２）。 When there is a scenario that can be ignited (YES in step S21), the next output determining unit 14 determines whether or not there is a similar scenario similar to the scenario fired most recently by the similar scenario search unit 141 ( Step S22).

直近に発火させたシナリオに類似する類似シナリオが存在する場合、対話制御装置１００は、データ提供手段１１により、その類似シナリオのルートノードが有する音声案内データを音声出力部４から出力させる（ステップＳ２３）。 If there is a similar scenario similar to the most recently fired scenario, the dialogue control apparatus 100 causes the data providing means 11 to output the voice guidance data of the root node of the similar scenario from the voice output unit 4 (step S23). ).

直近に発火させたシナリオに類似する類似シナリオが存在しない場合、対話制御装置１００は、次出力決定手段１４の話題転換手段１４２により、発火させることができるシナリオ群から乱数を用いて特定のシナリオを選択させ、データ提供手段１１により、その選択させたシナリオのルートノードが有する音声案内データを音声出力部４から出力させる（ステップＳ２４）。 When there is no similar scenario similar to the scenario fired most recently, the dialog control apparatus 100 uses the topic conversion unit 142 of the next output determination unit 14 to select a specific scenario from the scenario group that can be fired using a random number. Then, the voice providing data of the route node of the selected scenario is output from the voice output unit 4 by the data providing means 11 (step S24).

一方、発火可能なシナリオが存在しない場合（ステップＳ２１のＮＯ）、対話制御装置１００は、次出力決定手段１４の対話終了手段１４３により、「用があるときはまた呼んでね」といった対話の終了を明示する終了通知メッセージを出力させる（ステップＳ２５）。 On the other hand, if there is no scenario that can be ignited (NO in step S21), the dialogue control apparatus 100 terminates the dialogue such as “please call me again when necessary” by the dialogue termination means 143 of the next output determination means 14. Is output (step S25).

次に、対話制御装置１００とユーザとの間の対話例を用いて対話制御装置１００が対話を制御する流れを説明する。 Next, a flow in which the dialogue control apparatus 100 controls the dialogue will be described using an example of dialogue between the dialogue control apparatus 100 and the user.

ＧＰＳ（Global Positioning System）が出力した位置情報に基づいて車輌が所定地点に到達したことを認識すると、対話制御装置１００は、観光案内シナリオＡ（図２参照。）を発火させ、データ提供手段１１により、観光案内シナリオＡのノードＮ１、Ｎ２が有する音声案内データを順番に音声出力部４から出力させながら、ユーザとの対話を継続させる。 When recognizing that the vehicle has reached a predetermined point based on position information output by GPS (Global Positioning System), the dialogue control device 100 ignites a tourist guidance scenario A (see FIG. 2), and data providing means 11 Thus, the voice guidance data of the nodes N1 and N2 of the tourist guidance scenario A is sequentially output from the voice output unit 4, and the dialogue with the user is continued.

その後、対話制御装置１００は、観光案内シナリオＡのノードＮ３が有する音声案内データ「観光案内しようか。名物、名産品、見どころの情報があるけどどれがいい？」を音声出力部４から出力させる。 After that, the dialogue control apparatus 100 causes the voice output unit 4 to output the voice guidance data of the node N3 of the tourist guidance scenario A "Tourist guidance? Should there be information on specialties, special products, and highlights?

なお、ノード記録手段１２は、発火させたノードＮ１、Ｎ２、Ｎ３を対話履歴データベース３１に発火順に記録し、ユーザと対話制御装置１００との間の対話の流れを記録する（図５参照。）。 The node recording unit 12 records the fired nodes N1, N2, and N3 in the dialogue history database 31 in the firing order, and records the flow of dialogue between the user and the dialogue control device 100 (see FIG. 5). .

ノードＮ３の音声認識対象語部５１には、音声認識対象語「名物」、「名産品」、「見どころ」がノードＮ５、Ｎ６、Ｎ７それぞれへの移行条件として登録されており、音声認識対象語「聞きたくない」が対話を終了させる条件として登録されている（図４参照。）。 In the speech recognition target word part 51 of the node N3, speech recognition target words “specialties”, “specialties”, and “highlights” are registered as transition conditions to the nodes N5, N6, and N7, respectively. “I don't want to hear” is registered as a condition for ending the dialogue (see FIG. 4).

このとき、ユーザが「名物」を発話すると、音声認識手段１０は、そのユーザ発話をテキストデータに変換し、音声認識対象語取得手段１３は、ノードＮ１、Ｎ２、Ｎ３の音声認識対象語部５１を参照してノードＮ１、Ｎ２、Ｎ３のそれぞれが有する音声認識対象語を取得する。 At this time, when the user utters “special”, the voice recognition means 10 converts the user utterance into text data, and the voice recognition target word acquisition means 13 reads the voice recognition target word portion 51 of the nodes N1, N2, and N3. , The speech recognition target words possessed by each of the nodes N1, N2, and N3 are acquired.

その後、次出力決定手段１４は、そのユーザ発話にノード３における現音声認識対象語「名物」が含まれることを検知し、ノード３の音声認識対象語部５１（図４参照。）を参照して現音声認識対象語「名物」に対応付けられたノードＮ５を次に出力させる音声案内データを有するノードとして決定する。 Thereafter, the next output determining means 14 detects that the current speech recognition target word “specialty” in the node 3 is included in the user utterance, and refers to the speech recognition target word portion 51 (see FIG. 4) of the node 3. Thus, the node N5 associated with the current speech recognition target word “specialty” is determined as a node having voice guidance data to be output next.

その後、対話制御装置１００は、ノードＮ５が有する音声案内データ「このあたりの名物は、うなぎだよ」を音声出力部４から出力させる。なお、ノード記録手段１２は、発火させたノードＮ５を対話履歴データベース３１に追加する。 Thereafter, the dialogue control apparatus 100 causes the voice output unit 4 to output the voice guidance data “the famous product around here is eel” possessed by the node N5. The node recording unit 12 adds the fired node N5 to the dialogue history database 31.

ノードＮ５の音声認識対象語部５１には、音声認識対象語「うなぎ食べたい」、「いいねぇうなぎ」が後続ノードへの移行条件として登録されており、音声認識対象語「そうなんだ、知らなかった」、「うなぎは好きじゃないんだよね」等が対話を終了させる条件として登録されているものとする。 In the speech recognition target word part 51 of the node N5, the speech recognition target words “I want to eat eel” and “Iiie eel” are registered as a transition condition to the succeeding node, and the speech recognition target word “So, I don't know. ”,“ I don't like eels ”, etc. are registered as conditions for terminating the dialogue.

このとき、ユーザが「そうなんだ、知らなかった」を発話すると、音声認識手段１０は、そのユーザ発話をテキストデータに変換し、音声認識対象語取得手段１３は、ノードＮ１、Ｎ２、Ｎ３、Ｎ５のそれぞれの音声認識対象語部５１を参照してノードＮ１、Ｎ２、Ｎ３、Ｎ５のそれぞれが有する音声認識対象語を取得する。 At this time, when the user utters “Yes, I did not know”, the voice recognition means 10 converts the user utterance into text data, and the voice recognition target word acquisition means 13 reads the nodes N1, N2, N3, N5. The speech recognition target words included in each of the nodes N1, N2, N3, and N5 are acquired with reference to the respective speech recognition target word portions 51 of the nodes.

その後、次出力決定手段１４は、そのユーザ発話にノード５における現音声認識対象語「そうなんだ、知らなかった」が含まれることを検知し、ノード５の音声認識対象語部５１を参照して対話を終了させる。 Thereafter, the next output determination means 14 detects that the current speech recognition target word “Yes, I did not know” in the node 5 is included in the user utterance, and refers to the speech recognition target word unit 51 of the node 5. End the conversation.

以上のように対話を一旦終了させた後にユーザが発話を再開させた場合、ユーザの発話内容に応じて対話制御装置１００がどのように対話を継続させるかを以下に説明する。 As described above, how the dialog control apparatus 100 continues the dialogue according to the user's utterance contents when the user resumes the utterance after once ending the dialogue will be described below.

先ず、ユーザが「少し早いけど、食事しようかな。うなぎ食べたい」を発話した場合について説明する。 First, a case will be described in which the user utters “I want to eat a little bit early, but I want to eat.”

この場合、次出力決定手段１４は、対話履歴データベース３１を参照し、直前に発火させたノードＮ５の現音声認識対象語「うなぎ食べたい」がユーザ発話の中に含まれることを認識する。 In this case, the next output determination unit 14 refers to the dialogue history database 31 and recognizes that the current speech recognition target word “I want to eat eel” of the node N5 fired immediately before is included in the user utterance.

現音声認識対象語「うなぎ食べたい」に対応する後続ノードは過去に発火しておらず、次出力決定手段１４は、その後続ノードが有する音声案内データを次に出力させるものとして決定する。 The subsequent node corresponding to the current speech recognition target word “I want to eat eel” has not fired in the past, and the next output determining means 14 determines that the voice guidance data of the subsequent node is to be output next.

なお、後続ノードが過去に発火していた場合、次出力決定手段１４は、同じ質問を繰り返さないよう、「そうそう面白い話があるんだ」といった話題転換メッセージを出力させて話題を転換させたり、「僕からの観光案内は終わりです。もっと勉強しておくね。」といった終了通知メッセージを出力させて対話を明示的に終了させたりしてもよい。 If the subsequent node has fired in the past, the next output determination means 14 may change the topic by outputting a topic change message such as “There is such an interesting story” so as not to repeat the same question, You may explicitly end the dialogue by outputting an end notification message such as “My tourist guide is over. I will study more.”

また、後続ノードが過去に発火していた場合、次出力決定手段１４は、類似シナリオを発火させるようにしてもよい。類似シナリオを発火させることで、いつも同じ質問をするといった印象をユーザに与えないようにすることができ、かつ、全く異なる話題を提供してユーザを困惑させることがないようにすることができるからである。 Further, when the subsequent node has fired in the past, the next output determining means 14 may fire a similar scenario. By firing a similar scenario, you can avoid giving users the impression that they always ask the same question, and they can provide a completely different topic so as not to confuse the user. It is.

次に、ユーザが「少し早いけど、夕食にしちゃおうか」を発話した場合について説明する。 Next, a case where the user utters “Is it a little early but will I have dinner?” Will be described.

この場合、次出力決定手段１４は、対話履歴データベース３１を参照し、過去に発火させたノードの音声認識対象語がユーザ発話の中に含まれないことを認識する。 In this case, the next output determining unit 14 refers to the dialogue history database 31 and recognizes that the speech recognition target word of the node fired in the past is not included in the user utterance.

そこで、次出力決定手段１４は、記憶部３に格納されたシソーラスを参照して音声認識対象語に類似する類似対象語を取得し、ノードＮ５の現音声認識対象語「うなぎ食べたい」に類似するとして登録された類似対象語「夕食」がテキストデータに含まれることを認識し、その後続ノードが有する音声案内データを次に出力させるものとして決定する。 Therefore, the next output determining unit 14 refers to the thesaurus stored in the storage unit 3 to obtain a similar target word similar to the speech recognition target word, and is similar to the current speech recognition target word “I want to eat eel” at the node N5. It recognizes that the similar object word “dinner” registered as being included is included in the text data, and determines that the voice guidance data of the subsequent node is to be output next.

次に、ユーザが「あ、お土産買うのを忘れてた。名産品は何があったっけ。」を発話した場合について説明する。 Next, a case where the user utters “Oh, I forgot to buy souvenirs. What was the special product?” Will be described.

この場合、次出力決定手段１４は、対話履歴データベース３１を参照し、２ノード前に発火させたノードＮ３の既出音声認識対象語「名産品」がユーザ発話の中に含まれることを認識する。 In this case, the next output determining unit 14 refers to the dialogue history database 31 and recognizes that the already-recognized speech recognition target word “specialized product” of the node N3 fired two nodes before is included in the user utterance.

既出音声認識対象語「名産品」に対応するノードＮ６は過去に発火しておらず、次出力決定手段１４は、既出音声認識対象語「名産品」に対応するノードＮ６が有する音声案内データを次に出力させるものとして決定する。 The node N6 corresponding to the existing speech recognition target word “special product” has not been ignited in the past, and the next output determining means 14 uses the voice guidance data held by the node N6 corresponding to the existing speech recognition target word “special product”. Next, it is determined to be output.

なお、ノードＮ６が過去に発火していた場合であっても、発火から所定時間が経過している場合には、次出力決定手段１４は、ノードＮ６が有する音声案内データを次に出力させるものとして決定するようにしてもよい。発火から所定時間が経過している場合、同じ質問を繰り返してもユーザに不快感を与えることはないからである。 Even if the node N6 has fired in the past, if the predetermined time has elapsed since the firing, the next output determining means 14 causes the voice guidance data of the node N6 to be output next. You may make it determine as. This is because when a predetermined time has passed since the ignition, the user is not uncomfortable even if the same question is repeated.

次に、ユーザが「あ、お土産買うのを忘れてた。」を発話した場合について説明する。 Next, a case where the user speaks “Oh, I forgot to buy souvenirs” will be described.

この場合、次出力決定手段１４は、対話履歴データベース３１を参照し、過去に発火させたノードの音声認識対象語がユーザ発話の中に含まれないことを認識する。 In this case, the next output determination unit 14 refers to the dialogue history database 31 and recognizes that the speech recognition target word of the node that has fired in the past is not included in the user utterance.

そこで、次出力決定手段１４は、記憶部３に格納されたシソーラスを参照して音声認識対象語に類似する類似対象語を取得し、２ノード前に発火させたノードＮ３の既出音声認識対象語「名産品」に類似するとして登録された類似対象語「お土産」がテキストデータに含まれることを認識し、既出音声認識対象語「名産品」に対応するノードＮ６が有する音声案内データを次に出力させるものとして決定する。 Therefore, the next output determining unit 14 refers to the thesaurus stored in the storage unit 3 to acquire a similar target word similar to the voice recognition target word, and the already-spoken voice recognition target word of the node N3 fired two nodes before. Recognizing that the similar target word “souvenir” registered as being similar to “special product” is included in the text data, and following the voice guidance data possessed by the node N6 corresponding to the existing speech recognition target word “special product” Is determined to be output.

次に、ユーザが「そういえばこのあたりに有名な温泉があるって聞いたことある気がするんだけど」を発話した場合について説明する。 Next, a case where the user utters “I feel like I have heard that there is a famous hot spring around here” will be described.

この場合、次出力決定手段１４は、対話履歴データベース３１を参照し、過去に発火させたノードの音声認識対象語及び類似対象語がユーザ発話の中に含まれないことを認識する。 In this case, the next output determining unit 14 refers to the dialogue history database 31 and recognizes that the speech recognition target word and the similar target word of the node fired in the past are not included in the user utterance.

そこで、次出力決定手段１４は、類似シナリオ検索手段１４１により、自立語「そう」、「いう」、「この」、「あたり」、「有名」、「温泉」、「ある」、「聞く」、「こと」、「ある」、「気」、「する」、「けど」を抽出させ、自立語「温泉」と観光案内シナリオＡが有する概念キーワード「観光案内」に対応付けられたサブキーワード「温泉」とが一致することから関連シナリオとして観光案内シナリオＡを抽出させる。 Therefore, the next output determination unit 14 uses the similar scenario search unit 141 to determine the independent words “so”, “say”, “this”, “around”, “famous”, “hot spring”, “al”, “listen”, “Koto”, “Aru”, “Ki”, “Sue”, “Bad” are extracted, and the sub-keyword “Hot Spring” associated with the independent keyword “Hot Spring” and the concept keyword “Tourist Guide” of the Tourist Guide Scenario A ", The tourist guide scenario A is extracted as a related scenario.

これにより、次出力決定手段１４は、観光案内シナリオＡのルートノードＮ１が有する音声案内データを次に出力させるものとして決定する。 As a result, the next output determining means 14 determines that the voice guidance data of the route node N1 of the sightseeing guide scenario A is to be output next.

次に、ユーザが「そういえば昨日の野球の結果教えて」を発話した場合について説明する。 Next, the case where the user utters “Tell me the result of yesterday's baseball” will be described.

この場合、次出力決定手段１４は、対話履歴データベース３１を参照し、過去に発火させたノードの音声認識対象語及び類似対象語がユーザ発話の中に含まれないことを認識し、類似シナリオも存在しないことを認識する。 In this case, the next output determining unit 14 refers to the dialogue history database 31 and recognizes that the speech recognition target word and the similar target word of the node fired in the past are not included in the user utterance. Recognize that it does not exist.

そこで、次出力決定手段１４は、ドメイン判定手段１４０により、自立語「そう」、「いう」、「昨日」、「野球」、「結果」、「教える」を抽出させ、自立語「野球」とスポーツドメインが有する概念キーワード「スポーツ」に対応付けられたサブキーワード「野球」とが一致することから、ユーザ発話がスポーツドメインに関連するものであると判定させる。 Therefore, the next output determination means 14 causes the domain determination means 140 to extract the independent words “so”, “say”, “yesterday”, “baseball”, “result”, “teach”, and the independent word “baseball”. Since the sub-keyword “baseball” associated with the concept keyword “sports” of the sports domain matches, it is determined that the user utterance is related to the sports domain.

これにより、次出力決定手段１４は、スポーツドメインにあるシナリオのルートノードが有する音声案内データを次に出力させるものとして決定する。 Thereby, the next output determination means 14 determines as what outputs the voice guidance data which the route node of the scenario in a sports domain has next.

次に、ユーザが「あーなんか眠くなってきた」を発話した場合について説明する。 Next, a case where the user utters “Oh, I am getting sleepy” will be described.

この場合、次出力決定手段１４は、対話履歴データベース３１を参照し、過去に発火させたノードの音声認識対象語及び類似対象語がユーザ発話の中に含まれないことを認識し、かつ、類似シナリオも関連ドメインも存在しないことを認識する。 In this case, the next output determining unit 14 refers to the dialogue history database 31 and recognizes that the speech recognition target word and the similar target word of the node fired in the past are not included in the user utterance, and the similar Recognize that there are no scenarios or related domains.

そこで、次出力決定手段１４は、発火制御により発火条件を満たすシナリオが存在するか否かを判定し、発火可能なシナリオが存在すれば、そのシナリオのルートノードが有する音声案内データを次に出力させるものとして決定する。 Therefore, the next output determining means 14 determines whether or not there is a scenario that satisfies the ignition condition by the ignition control. If there is a scenario that can be ignited, the next voice output data that the root node of the scenario has is output. Decide what to do.

一方、発火可能なシナリオが存在しなければ、次出力決定手段１４は、終了通知メッセージを次に出力させるものとして決定する。 On the other hand, if there is no scenario that can be ignited, the next output determining means 14 determines that the end notification message is to be output next.

次に、ユーザの発話は検出したが、テキストデータとして認識できなかった場合について説明する。 Next, a case where the user's speech is detected but cannot be recognized as text data will be described.

この場合、次出力決定手段１４は、発火制御により発火条件を満たすシナリオが存在するか否かを判定し、発火可能なシナリオが存在すれば、類似シナリオ検索手段１４１により、直近に発火させたシナリオに類似する類似シナリオが発火可能なシナリオの中に存在するか否かを判定させ、直近に発火させたシナリオに類似する類似シナリオが存在する場合、類似シナリオのルートノードが有する音声案内データを次に出力させるものとして決定する。 In this case, the next output determining unit 14 determines whether or not there is a scenario that satisfies the ignition condition by the ignition control. If there is a similar scenario similar to the most recently fired scenario, the voice guidance data held by the root node of the similar scenario is Is determined to be output.

発火可能なシナリオは存在するが類似シナリオは存在しない場合、次出力決定手段１４は、乱数を用いて無作為に選択した発火可能なシナリオのルートノードが有する音声案内データを次に出力させるものとして決定する。 When there is a scenario that can be ignited but there is no similar scenario, the next output determination means 14 outputs the voice guidance data that the root node of the scenario that can be ignited randomly using a random number next outputs. decide.

発火可能なシナリオが存在しなければ、次出力決定手段１４は、終了通知メッセージを次に出力させるものとして決定する。 If there is no scenario that can be ignited, the next output determining means 14 determines that the end notification message is to be output next.

以上の構成により、対話制御装置１００は、複数のノードから構成されるシナリオにおいて既に発火させたノードを記録しておき、既に発火させたノードが有する音声認識対象語がユーザ発話に含まれる場合に、そのノードの下位にあるノードを発火させるので、ユーザが期待する音声案内を迅速かつ適時に提供することができ、自然な対話を継続させることができる。 With the above configuration, the dialogue control apparatus 100 records a node that has already been ignited in a scenario composed of a plurality of nodes, and the speech recognition target word of the already ignited node is included in the user utterance. Since the node below the node is fired, the voice guidance expected by the user can be provided promptly and in a timely manner, and a natural conversation can be continued.

また、対話制御装置１００が直前に提示した質問にしかユーザ発話による応答ができないといった制限を取り除き、対話制御装置１００が過去に提示した質問にも直接応答できるようにすることで、自然な対話を継続させることができ、ユーザの対話制御装置１００に対する親近感を高め、対話制御装置１００の継続利用を促すことができる。 Further, by removing the restriction that only the question presented by the dialog control device 100 can be answered only by the user's utterance and allowing the dialog control device 100 to directly respond to the question presented in the past, a natural dialogue can be performed. It is possible to continue, enhance the user's familiarity with the dialog control apparatus 100, and promote the continuous use of the dialog control apparatus 100.

以上、本発明の好ましい実施例について詳説したが、本発明は、上述した実施例に制限されることはなく、本発明の範囲を逸脱することなしに上述した実施例に種々の変形及び置換を加えることができる。 Although the preferred embodiments of the present invention have been described in detail above, the present invention is not limited to the above-described embodiments, and various modifications and substitutions can be made to the above-described embodiments without departing from the scope of the present invention. Can be added.

例えば、上述の実施例では、各ノードが下位のノードへの移行条件（下位のノードを発火させるための発火条件）及びその移行条件を構成する音声認識対象語に関する情報を有するが、各ノードが自身の発火条件及びその発火条件を構成する音声認識対象語を有するようにしてもよい。 For example, in the above-described embodiment, each node has information on the transition condition to the lower node (ignition condition for firing the lower node) and the speech recognition target words constituting the transition condition. You may make it have the speech recognition target word which comprises its own ignition conditions and the ignition conditions.

また、対話制御装置１００は、シナリオに沿って音声案内を出力しながらユーザの発話を認識して次に出力する音声案内を決定するというように、音声による入出力に基づく構成であるが、液晶ディスプレイ等の表示装置にテキストデータを表示しながらキーボード等を介したユーザによるテキスト入力を認識して次に表示するテキストデータを決定するというように、テキストによる入出力に基づく構成となるようにしてもよい。 In addition, the dialogue control apparatus 100 is configured based on voice input / output, such as recognizing a user's utterance while outputting voice guidance according to a scenario and determining voice guidance to be output next. A configuration based on input / output by text, such as determining text data to be displayed next by recognizing text input by a user via a keyboard while displaying text data on a display device such as a display, etc. Also good.

この場合、シナリオは、テキスト案内データと次のテキスト案内データに移行するための移行条件を定める認識対象語とを有するノードで構成され、認識対象語は、ユーザによるテキスト入力が期待される語句として予め登録される。 In this case, the scenario is composed of nodes having text guidance data and a recognition target word that defines a transition condition for transitioning to the next text guidance data, and the recognition target word is a word that is expected to be input by the user. Registered in advance.

また、音声取得部２は、キーボード、マウス又はタッチパネル等のテキスト入力部に置き換えられ、音声出力部４は、液晶ディスプレイ等のテキスト出力部に置き換えられる。 The voice acquisition unit 2 is replaced with a text input unit such as a keyboard, a mouse, or a touch panel, and the voice output unit 4 is replaced with a text output unit such as a liquid crystal display.

なお、入力を音声、出力をテキストに基づく構成とし、或いは、入力をテキスト、出力を音声に基づく構成としてもよい。 The input may be configured based on voice and the output based on text, or the input may be configured based on text and the output based on speech.

対話制御装置の構成例を示す図である。It is a figure which shows the structural example of a dialogue control apparatus. シナリオのツリー構造を示す図である。It is a figure which shows the tree structure of a scenario. ノードの構成例を示す図である。It is a figure which shows the structural example of a node. 音声認識対象語部の構成例を示す図である。It is a figure which shows the structural example of a speech recognition object word part. 対話履歴データベースの構成例を示す図である。It is a figure which shows the structural example of a dialogue history database. 次出力決定手段が次に出力させる音声案内を決定する処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the process which determines the voice guidance which a next output determination means outputs next. ノード移行処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a node transfer process. 対話継続可否判定処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a conversation continuation possibility determination process.

Explanation of symbols

１制御部
２音声取得部
３記憶部
４音声出力部
１０音声認識手段
１１データ提供手段
１２ノード記録手段
１３音声認識対象語取得手段
１４次出力決定手段
３０シナリオデータベース
３１対話履歴データベース
５０音声案内データ部
５１音声認識対象語部
１００対話制御装置
１４０ドメイン判定手段
１４１類似シナリオ検索手段
１４２話題転換手段
１４３対話終了手段
Ｎ１〜Ｎ７ノード DESCRIPTION OF SYMBOLS 1 Control part 2 Voice acquisition part 3 Storage | storage part 4 Voice output part 10 Voice recognition means 11 Data provision means 12 Node recording means 13 Voice recognition object word acquisition means 14 Secondary output determination means 30 Scenario database 31 Dialog history database 50 Voice guidance data part 51 Speech recognition target word part 100 Dialog control device 140 Domain determination means 141 Similar scenario search means 142 Topic conversion means 143 Dialog termination means N1 to N7 nodes

Claims

A dialogue control device for controlling dialogue with a user in accordance with a scenario composed of nodes having guidance data and recognition target words for defining user input for starting output of the guidance data,
Recognition means for recognizing user input as text data;
Node recording means for recording a node having the output guidance data;
Recognition target word acquisition means for acquiring a recognition target word corresponding to the node recorded by the node recording means;
Next output determining means for determining guide data to be output next based on the text data recognized by the recognition means and the recognition target word acquired by the recognition target word acquiring means;
A dialogue control apparatus comprising:

The next output determining means, when the recognition target word acquired by the recognition target word acquisition means is included in the text data recognized by the recognition means, next outputs guidance data possessed by a node corresponding to the recognition target word Decide as guidance data to be
The dialogue control apparatus according to claim 1, wherein

The output of the guidance data is output by voice or text,
The user input is input by voice or text.
The dialogue control apparatus according to claim 1 or 2, wherein

A dialogue control method for controlling dialogue with a user in accordance with a scenario composed of nodes having guidance data and recognition target words for defining user input for starting output of the guidance data,
A recognition step for recognizing user input as text data;
A node recording step for recording a node having the output guidance data;
A recognition target word acquisition step of acquiring a recognition target word corresponding to the node recorded in the node recording step;
A next output determination step of determining guidance data to be output next based on the text data recognized in the recognition step and the recognition target word acquired in the recognition target word acquisition step;
A dialogue control method comprising:

In the next output determining step, when the recognition target word acquired in the recognition target word acquisition step is included in the text data recognized in the recognition step, the guidance data included in the node corresponding to the recognition target word is Determined as guidance data to be output to
The dialogue control method according to claim 4, wherein:

The output of the guidance data is output by voice or text,
The user input is input by voice or text.
6. The dialogue control method according to claim 4 or 5, wherein

A dialogue control program for causing a computer to execute the dialogue control method according to any one of claims 4 to 6.