JP2015176099A

JP2015176099A - Dialog system construction assist system, method, and program

Info

Publication number: JP2015176099A
Application number: JP2014054491A
Authority: JP
Inventors: 祐美子下郡; Yumiko Shimogoori; 憲治岩田; Kenji Iwata; 雅弘伊藤; Masahiro Ito; 尚義永江; Hisayoshi Nagae
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2014-03-18
Filing date: 2014-03-18
Publication date: 2015-10-05
Also published as: WO2015141700A1

Abstract

PROBLEM TO BE SOLVED: To provide a dialog system construction assist system capable of constructing a scenario using a dialog between a user and operator and the action of the operator.SOLUTION: In a dialog system 100, an audio recognition unit 101 performs audio recognition on utterances contained in a dialog between a user and operator, and produces results of audio recognition including texts corresponding to the utterances. An intention understanding unit 102 understands the intentions of the utterances on the basis of the texts, and obtains result of intention understanding including types of utterances, intentions of the utterances, words, and meaning classes of teh words. A dialog information memory unit stores the results of audio recognition, the results of intention understanding, and actions which the operator has performed in relation to the dialog. A scenario construction unit 106 acquires as attributes words having meaning classes that are shared between an utterance of a question and an utterance of a response which are contained in the dialog, and uses the attributes and actions to construct a scenario.

Description

本発明の実施形態は、対話システム構築支援装置、方法、及びプログラムに関する。 Embodiments described herein relate generally to a dialogue system construction support apparatus, method, and program.

音声自動応答装置などのユーザの発話に対して自動応答する対話システムがある。このような対話システムは、事前に構築されたシナリオに従って応答する。対話システムは、シナリオがユーザの要求に適合していないために、応答に失敗することがある。その場合、オペレータがユーザに応答することになる。その後に同様の要求を受けた場合にうまく応答することができるように、新たなシナリオを対話システムに追加する必要がある。 There is an interactive system that automatically responds to a user's utterance such as an automatic voice response device. Such an interactive system responds according to a pre-built scenario. The interactive system may fail to respond because the scenario does not match the user's request. In that case, the operator responds to the user. A new scenario needs to be added to the dialog system so that it can respond well if a similar request is subsequently received.

対話システムが応答に失敗した件に関するユーザとオペレータとの対話からシナリオを構築できることが求められている。引用文献１には、エージェント及びユーザ間の会話セットを用いて学習を行う自動化応答システムが開示されている。しかしながら、オペレータが応答中に行ったアクション（行動）を考慮してユーザとオペレータとの対話からシナリオを構築する技術はない。 It is required that a scenario can be constructed from a dialogue between a user and an operator regarding a case where the dialogue system fails to respond. Cited Document 1 discloses an automated response system that performs learning using a conversation set between an agent and a user. However, there is no technique for constructing a scenario from a dialogue between a user and an operator in consideration of an action (behavior) performed by the operator during a response.

特許第４９０１７３８号Patent No. 4901738

本発明が解決しようとする課題は、ユーザとオペレータとの対話及びオペレータのアクションからシナリオを構築することができる対話システム構築支援装置、方法、及びプログラムを提供することである。 The problem to be solved by the present invention is to provide a dialogue system construction support apparatus, method, and program capable of constructing a scenario from a dialogue between a user and an operator and an action of the operator.

一実施形態に係る対話システム構築支援装置は、音声認識部、意図理解部、対話情報記憶部、及びシナリオ構築部を備える。音声認識部は、ユーザとオペレータとの対話に含まれる複数の発話に対して音声認識を行い、前記複数の発話それぞれに対応する複数のテキストを含む音声認識結果を生成する。意図理解部は、前記複数のテキストそれぞれに基づいて前記複数の発話それぞれの意図を理解し、前記複数の発話それぞれの種類と、前記複数の発話それぞれの意図と、前記複数のテキストに含まれる単語と、前記単語の意味クラスと、を含む意図理解結果を得る。対話情報記憶部は、前記音声認識結果、前記意図理解結果、及び前記オペレータが前記対話に関連して実行したアクションを関連付けて記憶する。シナリオ構築部は、前記対話に含まれる質問の発話と回答の発話とに共通の意味クラスを持つ単語を属性として取得し、前記属性及び前記アクションを用いてシナリオを構築する。 A dialogue system construction support apparatus according to an embodiment includes a voice recognition unit, an intention understanding unit, a dialogue information storage unit, and a scenario construction unit. The voice recognition unit performs voice recognition on a plurality of utterances included in the dialogue between the user and the operator, and generates a voice recognition result including a plurality of texts corresponding to each of the plurality of utterances. The intention understanding unit understands the intention of each of the plurality of utterances based on each of the plurality of texts, and each type of the plurality of utterances, each intention of the plurality of utterances, and words included in the plurality of texts And an intention understanding result including the meaning class of the word. The dialogue information storage unit stores the voice recognition result, the intention understanding result, and the action executed by the operator in relation to the dialogue in association with each other. The scenario construction unit obtains, as an attribute, a word having a semantic class common to the utterance of the question and the utterance of the answer included in the dialogue, and constructs a scenario using the attribute and the action.

実施形態に係る対話システムを概略的に示すブロック図。1 is a block diagram schematically showing a dialogue system according to an embodiment. 実施形態に係る対話ログ記録の手順例を示すフローチャート。The flowchart which shows the example of a procedure of the dialog log recording which concerns on embodiment. ユーザとオペレータとの対話の一例を示す図。The figure which shows an example of the dialogue between a user and an operator. 発話の種類の例を示す図。The figure which shows the example of the kind of utterance. 意図タグの例を示す図。The figure which shows the example of an intention tag. アクションの例を示す図。The figure which shows the example of action. 意味クラスの例を示す図。The figure which shows the example of a semantic class. アクション内容の例を示す図。The figure which shows the example of action content. 図３に示した対話に関する対話ログを示す図。The figure which shows the dialogue log regarding the dialogue shown in FIG. 実施形態に係るシナリオ構築の手順例を示すフローチャート。The flowchart which shows the example of a procedure of the scenario construction which concerns on embodiment. 図９に示した対話ログから構築されるシナリオの一例を示す図。The figure which shows an example of the scenario constructed | assembled from the dialogue log shown in FIG. 図９に示した対話ログから構築されるシナリオの他の例を示す図。The figure which shows the other example of the scenario constructed | assembled from the dialogue log shown in FIG. ユーザとオペレータとの対話の例であって、オペレータの質問に対してユーザが回答以外の発話をする例を示す図。The figure which is an example of a dialog with a user and an operator, Comprising: The figure which shows the example which a user utters other than an answer with respect to an operator's question. 図１に示したシナリオ構築部がシナリオを評価するための評価データを示す図。The figure which shows the evaluation data for the scenario construction part shown in FIG. 1 to evaluate a scenario. 実施形態に係るアクション候補表示の手順例を示すフローチャート。The flowchart which shows the example of a procedure of the action candidate display which concerns on embodiment. 図１に示した対話状態表示部が表示する内容の一例を示す図。The figure which shows an example of the content which the dialog state display part shown in FIG. 1 displays.

以下、図面を参照しながら実施形態を説明する。実施形態は、ユーザの発話に対して自動応答する対話システムに関する。この対話システムは、例えばコンタクトセンターで利用される。対話システムは、予め登録されたシナリオ（対話シナリオ）の中からユーザの発話に適合するシナリオを選択し、そのシナリオに従って応答する。対話システムが応答に失敗した場合には、オペレータがユーザとの対話を通じて応答する。対話システムは、ユーザとオペレータとの対話及びオペレータのアクション（行動）に基づいて新たなシナリオを構築することができる。その結果、対話システムは、その後に同様の要求を受けた場合にうまく応答することが可能になる。さらに、シナリオ構築コストを削減することができる。また、必要なオペレータの人数を低減させることができる。 Hereinafter, embodiments will be described with reference to the drawings. Embodiments relate to an interactive system that automatically responds to a user's utterance. This dialogue system is used in, for example, a contact center. The dialogue system selects a scenario that matches the user's utterance from scenarios registered in advance (dialog scenarios), and responds according to the scenario. If the interactive system fails to respond, the operator responds through interaction with the user. The dialogue system can construct a new scenario based on the dialogue between the user and the operator and the action (action) of the operator. As a result, the interactive system can respond well if it subsequently receives a similar request. Furthermore, scenario construction costs can be reduced. In addition, the number of necessary operators can be reduced.

図１は、実施形態に係る対話システム１００を概略的に示している。対話システム１００は、図１に示されるように、音声認識部１０１、意図理解部１０２、対話制御部１０３、応答生成部１０４、対話抽出部１０５、シナリオ構築部１０６、シナリオ更新部１０７、辞書記憶部１０８、意図モデル記憶部１０９、シナリオ記憶部１１０、対話ログ記憶部（対話情報記憶部ともいう）１１１、対話状態表示部１１２、シナリオ検索部１１３、及びシナリオオブジェクトデータベース（ＤＢ）１１４を備える。 FIG. 1 schematically shows an interactive system 100 according to the embodiment. As shown in FIG. 1, the dialogue system 100 includes a voice recognition unit 101, an intention understanding unit 102, a dialogue control unit 103, a response generation unit 104, a dialogue extraction unit 105, a scenario construction unit 106, a scenario update unit 107, and a dictionary storage. Unit 108, intention model storage unit 109, scenario storage unit 110, dialog log storage unit (also referred to as dialog information storage unit) 111, dialog state display unit 112, scenario search unit 113, and scenario object database (DB) 114.

まず、対話システム１００の自動応答処理について簡単に説明する。一例では、ユーザは、携帯電話やスマートフォンなどの端末を用いて、ネットワークを介して対話システム１００と通信する。対話システム１００は、自動応答処理によって、ネットワークを介して端末にサービスを提供する。例えば、対話システム１００は、後述する例のように、ユーザの目的地が示された地図のデータを端末に送信する。 First, the automatic response process of the interactive system 100 will be briefly described. In one example, the user communicates with the interactive system 100 via a network using a terminal such as a mobile phone or a smartphone. The interactive system 100 provides a service to a terminal via a network by automatic response processing. For example, the interactive system 100 transmits map data indicating the user's destination to the terminal, as in an example described later.

対話システム１００は、音声認識部１０１、意図理解部１０２、対話制御部１０３、応答生成部１０４、辞書記憶部１０８、意図モデル記憶部１０９、及びシナリオオブジェクトＤＢ１１４を用いて、自動応答処理を実行する。音声認識部１０１は、ユーザの発話に対して音声認識を行い、その発話に対応する自然言語テキスト（以下、単にテキストと記載する。）を生成する。意図理解部１０２は、辞書記憶部１０８及び意図モデル記憶部１０９を参照してテキストを分析することで発話の意図を理解し、意図理解結果を出力する。対話制御部１０３は、シナリオオブジェクトＤＢ１１４から意図理解結果に対応するシナリオを選択し、選択したシナリオに規定されたアクション（例えば地図データの送信）を実行する。応答生成部１０４は、対話制御部１０３が実行するアクションに対応する応答文を生成する。応答文は、音声合成技術によって音声に変換され出力される。 The dialogue system 100 executes automatic response processing using the voice recognition unit 101, the intention understanding unit 102, the dialogue control unit 103, the response generation unit 104, the dictionary storage unit 108, the intention model storage unit 109, and the scenario object DB 114. . The voice recognition unit 101 performs voice recognition on a user's utterance and generates a natural language text (hereinafter simply referred to as text) corresponding to the utterance. The intention understanding unit 102 analyzes the text with reference to the dictionary storage unit 108 and the intention model storage unit 109 to understand the intention of the utterance, and outputs an intention understanding result. The dialogue control unit 103 selects a scenario corresponding to the intention understanding result from the scenario object DB 114, and executes an action (for example, transmission of map data) defined in the selected scenario. The response generation unit 104 generates a response sentence corresponding to the action executed by the dialogue control unit 103. The response sentence is converted into voice by a voice synthesis technique and output.

次に、対話システム１００のシナリオ構築処理について説明する。
対話システム１００では、ユーザの要求に適合するシナリオがシナリオオブジェクトＤＢ１１４中に存在しないなどの理由により、ユーザとの対話に失敗することがある。ユーザとの対話に失敗した場合には、対話制御部１０３は、ユーザとのコネクションをオペレータに転送する。さらに、規定の条件が応答中に発生した場合にも、対話制御部１０３は、ユーザとのコネクションをオペレータに転送することができる。それにより、オペレータとユーザとの対話が開始する。 Next, scenario construction processing of the interactive system 100 will be described.
In the dialog system 100, the dialog with the user may fail due to a reason that a scenario that matches the user's request does not exist in the scenario object DB 114. When the dialogue with the user fails, the dialogue control unit 103 transfers the connection with the user to the operator. Furthermore, even when a prescribed condition occurs during the response, the dialogue control unit 103 can transfer the connection with the user to the operator. Thereby, the dialogue between the operator and the user starts.

対話システム１００は、ユーザとオペレータとの対話を分析する。対話システム１００は、その後に同様の要求を受けた場合にうまく応答することができるように、分析結果に基づいて新たなシナリオを構築する。シナリオ構築処理には、音声認識部１０１、意図理解部１０２、対話抽出部１０５、シナリオ構築部１０６、シナリオ更新部１０７、辞書記憶部１０８、意図モデル記憶部１０９、シナリオ記憶部１１０、対話ログ記憶部１１１、対話状態表示部１１２、及びシナリオ検索部１１３が使用される。シナリオ構築処理に関連するこれらの要素を含む部分を対話システム構築支援部と称する。対話システム構築支援部は、図１に示されるように、対話システム１００に組み入れられてもよく、対話システム１００の外部に設けられていてもよい。対話システム構築支援部が対話システム１００に組み入れられている場合、音声認識部１０１、意図理解部１０２、辞書記憶部１０８、及び意図モデル記憶部１０９は、自動応答処理及びシナリオ構築処理において共用することができる。 The dialogue system 100 analyzes the dialogue between the user and the operator. The dialogue system 100 constructs a new scenario based on the analysis result so that it can respond well when a similar request is subsequently received. For scenario construction processing, the speech recognition unit 101, intention understanding unit 102, dialogue extraction unit 105, scenario construction unit 106, scenario update unit 107, dictionary storage unit 108, intention model storage unit 109, scenario storage unit 110, dialogue log storage The unit 111, the dialog state display unit 112, and the scenario search unit 113 are used. A part including these elements related to the scenario construction process is referred to as a dialogue system construction support unit. As shown in FIG. 1, the dialogue system construction support unit may be incorporated in the dialogue system 100 or may be provided outside the dialogue system 100. When the dialogue system construction support unit is incorporated in the dialogue system 100, the voice recognition unit 101, the intention understanding unit 102, the dictionary storage unit 108, and the intention model storage unit 109 are shared in the automatic response process and the scenario construction process. Can do.

音声認識部１０１は、ユーザとオペレータとの発話に含まれる複数の発話に対して音声認識を行い、複数の発話それぞれに対応する複数のテキストを生成する。すなわち、音声認識部１０１は、音声認識技術によって、複数の発話を複数のテキストにそれぞれ変換する。 The voice recognition unit 101 performs voice recognition on a plurality of utterances included in the utterance between the user and the operator, and generates a plurality of texts corresponding to each of the plurality of utterances. That is, the voice recognition unit 101 converts a plurality of utterances into a plurality of texts by a voice recognition technique.

意図理解部１０２は、音声認識部１０１によって生成された各テキストに基づいて、そのテキストに対応する発話の意図を理解する。具体的には、意図理解部１０２は、テキストに対して形態素解析を行うことによりそのテキストを形態素単位の単語に分解する。続いて、意図理解部１０２は、辞書記憶部１０８に記憶されている辞書を参照して、固有表現抽出技術によって、名詞、固有名詞、動詞、未知語の単語それぞれに対して、単語の意味を表す意味クラスを割り当てる。辞書には、複数の単語が意味クラスに関連付けて登録されている。 The intention understanding unit 102 understands the intention of the utterance corresponding to the text based on each text generated by the speech recognition unit 101. Specifically, the intention understanding unit 102 performs morphological analysis on the text to decompose the text into words in morpheme units. Subsequently, the intent understanding unit 102 refers to the dictionary stored in the dictionary storage unit 108 and uses the proper expression extraction technique to determine the meaning of the word for each of the noun, proper noun, verb, and unknown word. Assign a semantic class to represent. In the dictionary, a plurality of words are registered in association with semantic classes.

意図理解部１０２は、形態素や単語の意味クラスや単語の表記などの素性を用いて意図モデル記憶部１０９に記憶されている意図モデルを参照することにより発話の意図を理解し、意図理解結果を出力する。意図モデルは、多数の発話サンプルから意味クラス及び単語などを素性とした学習により予め生成される。意図理解の方法は、ここで説明した例に限定されない。 The intention understanding unit 102 understands the intention of the utterance by referring to the intention model stored in the intention model storage unit 109 using the features such as the morpheme, the semantic class of the word, the notation of the word, and the like. Output. The intention model is generated in advance from a large number of utterance samples by learning using semantic classes and words as features. The method of understanding the intention is not limited to the example described here.

対話抽出部１０５は、意図理解部１０２から意図理解結果を受け取り、オペレータが応答中に対話システム１００に対して行った操作をオペレータのアクションとして検出する。アクションの検出は、オペレータが操作するコンピュータ端末から受け取る情報に基づくことができる。具体的には、対話抽出部１０５は、オペレータが実行したアクションの内容を示す情報をコンピュータ端末から受け取ることができる。対話抽出部１０５は、ユーザとオペレータとの対話の分析結果及びオペレータのアクションを互いに関連付けて対話ログ記憶部１１１に記録する。対話の分析結果は、ユーザの発話に関する音声認識結果及び意図理解結果、並びに、オペレータの発話に関する音声認識結果及び意図理解結果を含む。 The dialogue extraction unit 105 receives the intention understanding result from the intention understanding unit 102, and detects an operation performed on the dialogue system 100 by the operator during the response as an action of the operator. The action detection can be based on information received from a computer terminal operated by the operator. Specifically, the dialogue extraction unit 105 can receive information indicating the content of the action executed by the operator from the computer terminal. The dialogue extraction unit 105 records the analysis result of the dialogue between the user and the operator and the action of the operator in the dialogue log storage unit 111 in association with each other. The analysis result of the dialogue includes a voice recognition result and an intention understanding result regarding the user's utterance, and a voice recognition result and an intention understanding result regarding the operator's utterance.

シナリオ構築部１０６は、対話ログ記憶部１１１を参照してシナリオを構築し、そのシナリオをシナリオ記憶部１１０に格納する。シナリオ更新部１０７は、シナリオ記憶部１１０を参照してシナリオオブジェクトＤＢ１１４を更新する。具体的には、シナリオ更新部１０７は、シナリオ記憶部１１０に格納されているシナリオを対話制御部１０３が実行することが可能なオブジェクトに変換し、シナリオオブジェクトＤＢ１１４に任意のタイミングで追加する。一例では、シナリオ記憶部１１０に格納されているシナリオはテキストベースのシナリオであり、シナリオオブジェクトＤＢ１１４に格納されているシナリオはオブジェクトベースのシナリオである。なお、シナリオオブジェクトＤＢ１１４に格納されているシナリオはテキストベースのシナリオであってもよい。 The scenario construction unit 106 constructs a scenario with reference to the dialogue log storage unit 111 and stores the scenario in the scenario storage unit 110. The scenario update unit 107 updates the scenario object DB 114 with reference to the scenario storage unit 110. Specifically, the scenario update unit 107 converts the scenario stored in the scenario storage unit 110 into an object that can be executed by the dialogue control unit 103, and adds it to the scenario object DB 114 at an arbitrary timing. In one example, the scenario stored in the scenario storage unit 110 is a text-based scenario, and the scenario stored in the scenario object DB 114 is an object-based scenario. Note that the scenario stored in the scenario object DB 114 may be a text-based scenario.

シナリオ検索部１１３は、ユーザとオペレータとの対話からシナリオ特徴語を抽出し、シナリオ記憶部１１０からそのシナリオ特徴語に関連付けられているシナリオを類似シナリオとして選択する。シナリオ特徴語については後述する。対話状態表示部１１２は、類似シナリオを表示する。また、対話状態表示部１１２は、ユーザとオペレータとの対話の分析結果を表示する。 The scenario search unit 113 extracts a scenario feature word from the dialogue between the user and the operator, and selects a scenario associated with the scenario feature word from the scenario storage unit 110 as a similar scenario. The scenario feature word will be described later. The dialogue state display unit 112 displays a similar scenario. The dialogue state display unit 112 displays the analysis result of the dialogue between the user and the operator.

次に、対話システム１００の動作について説明する。
図２は、対話システム１００の対話ログ記録の手順を概略的に示している。ここでは、図３に示す対話を例に挙げて具体的に説明する。図２のステップＳ２０１では、ユーザとオペレータとの対話が開始する。このとき、対話抽出部１０５は、対話の開始を示す対話開始ラベルを対話ログ記憶部１１１に記録する。 Next, the operation of the interactive system 100 will be described.
FIG. 2 schematically shows a procedure of dialog log recording of the dialog system 100. Here, the dialogue shown in FIG. 3 will be described as an example. In step S201 in FIG. 2, a dialogue between the user and the operator starts. At this time, the dialog extracting unit 105 records a dialog start label indicating the start of the dialog in the dialog log storage unit 111.

ステップＳ２０２では、ユーザ又はオペレータが発話する。図３の対話例では、最初にユーザが「この前予約したレンタカーはどこで受け取るんだったっけ。」と発話する。ステップＳ２０３では、音声認識部１０１は、ステップＳ２０２で入力された発話に対して音声認識を行う。図３の対話例では、音声認識結果としてテキスト「この前予約したレンタカーはどこで受け取るんだったっけ。」が得られる。 In step S202, the user or operator speaks. In the dialog example of FIG. 3, the user first utters “Where do I receive the rental car booked last time?”. In step S203, the speech recognition unit 101 performs speech recognition on the utterance input in step S202. In the example of the dialog shown in FIG. 3, the text “Where do I receive the previously booked rental car?” Is obtained as the voice recognition result.

ステップＳ２０４では、意図理解部１０２は、音声認識結果から発話の意図を理解し、意図理解結果を出力する。意図理解結果は、発話の種類、意図タグ、及び意味クラスを含む。発話の種類は、対話における発話の役割を示す。発話の種類には、図４に示すように、「要求」、「挨拶」、「質問」、「応答」、「提案」、「確認」、「回答」などがある。発話の種類は、機械が理解できる形態で、例えば、発話種類ＩＤとして出力される。意図タグは、図５に示すような「フライト時刻表表示」、「レンタカー検索」、「レンタカー場所表示」、「ホテル料金検索」、「ホテル予約」などの意図を示す情報である。意図タグは、機械が理解できる形態で、例えば、意図タグＩＤとして出力される。 In step S204, the intention understanding unit 102 understands the intention of the utterance from the voice recognition result, and outputs the intention understanding result. The intention understanding result includes an utterance type, an intention tag, and a semantic class. The type of utterance indicates the role of the utterance in the dialogue. As shown in FIG. 4, the types of utterances include “request”, “greeting”, “question”, “response”, “suggestion”, “confirmation”, “answer”, and the like. The utterance type is output in the form understandable by the machine, for example, as an utterance type ID. The intention tag is information indicating intentions such as “flight timetable display”, “car rental search”, “car rental location display”, “hotel fee search”, “hotel reservation”, etc. as shown in FIG. The intention tag is output in a form that can be understood by the machine, for example, as an intention tag ID.

ステップＳ２０５では、対話抽出部１０５は、ステップＳ２０２で入力された発話から、意図タグ、属性、属性値、及びアクション内容のいずれかの情報を抽出し、音声認識結果、意図認識結果、抽出した情報を関連付けて対話ログ記憶部１１１に記録する。ステップＳ２０５の処理については後述する。 In step S205, the dialogue extraction unit 105 extracts any information of the intention tag, the attribute, the attribute value, and the action content from the utterance input in step S202, and the speech recognition result, the intention recognition result, and the extracted information. Are associated and recorded in the dialogue log storage unit 111. The process of step S205 will be described later.

ステップＳ２０６では、対話が終了したか否かが判断される。例えば、対話終了を示す発話が検出された場合やオペレータがアクションを実行した場合に、対話が終了したと判断される。対話が続く場合、ステップＳ２０２に戻る。ステップＳ２０２に戻ると、次の発話が生じる。図３の対話例では、オペレータが「レンタカーの受け取り場所ですね？」と発話する。この発話についてステップＳ２０３、Ｓ２０４、Ｓ２０５の処理が実行される。同様にして、オペレータの発話「空港はどちらでしょうか？」、ユーザの発話「○○空港です。」、オペレータの発話「どちらの航空会社をお使いですか？」、ユーザの発話「××航空です。」、オペレータの発話「レンタカーの受け取り場所の地図を送付します。」が順次に処理される。オペレータは、「レンタカーの受け取り場所の地図を送付します。」と発話するとともに、コンピュータ端末を操作して地図データをユーザの端末に送信する。対話抽出部１０５は、発話「レンタカーの受け取り場所の地図を送付します。」の意図理解結果に基づいてオペレータのアクションを検出する。対話抽出部１０５は、オペレータが応答中に実行したアクションの内容を取得し対話ログ記憶部１１１に記録する。アクションとしては、例えば、図６に示すように、「レンタカー係り転送」、「フライト時刻表表示」、「空港施設情報表示」、「レンタカー検索」などがある。アクションはアクションＩＤに対応付けられている。 In step S206, it is determined whether or not the dialogue has ended. For example, when an utterance indicating the end of the dialog is detected or when the operator executes an action, it is determined that the dialog has ended. If the dialogue continues, the process returns to step S202. Returning to step S202, the next utterance occurs. In the dialogue example of FIG. 3, the operator speaks “Are you a rental car? Processing of steps S203, S204, and S205 is executed for this utterance. Similarly, the operator's utterance “Which is the airport?”, The user ’s utterance “XX airport”, the operator ’s utterance “Which airline are you using?”, The user ’s utterance “XX Air The operator's utterance “I will send you a map of where to pick up the rental car” is processed sequentially. The operator speaks “I will send you a map of the rental car reception location” and operates the computer terminal to send the map data to the user terminal. The dialogue extraction unit 105 detects the action of the operator based on the intention understanding result of the utterance “I will send you a map of the rental car reception location.” The dialogue extraction unit 105 acquires the content of the action executed during the response by the operator and records it in the dialogue log storage unit 111. As the actions, for example, as shown in FIG. 6, there are “car rental charge transfer”, “flight timetable display”, “airport facility information display”, “car rental search”, and the like. An action is associated with an action ID.

対話が終了すると、ステップＳ２０７に進む。ステップＳ２０７では、対話抽出部１０５は、ユーザとオペレータとの対話が終了したとして、対話の終了を示す対話終了ラベルを対話ログ記憶部１１１に記録する。対話ログ記憶部１１１では、１つの対話に関するログは対話開始ラベル及び対話終了ラベル間に記録される。１つの対話に関する対話ログには、対話の分析結果、シナリオ特徴語、意図タグ、属性及びその意味クラス、属性値及びその意味クラス、並びに、アクション内容が含まれている。 When the dialogue ends, the process proceeds to step S207. In step S207, the dialog extracting unit 105 records a dialog end label indicating the end of the dialog in the dialog log storage unit 111, assuming that the dialog between the user and the operator has ended. In the dialog log storage unit 111, a log related to one dialog is recorded between a dialog start label and a dialog end label. The dialogue log related to one dialogue includes a dialogue analysis result, a scenario feature word, an intention tag, an attribute and its semantic class, an attribute value and its semantic class, and an action content.

ステップＳ２０５の処理をより詳細に説明する。
ステップＳ２０５−１では、対話抽出部１０５は、ステップＳ２０２で入力された発話の種類が確認である場合、この発話及びそれと対になる発話からシナリオ特徴語を抽出する。具体的には、対話抽出部１０５は、一方（例えばオペレータ）の確認の発話とその直前の他方（例えばユーザ）の発話とに共通する単語をシナリオ特徴語として抽出する。図３の対話例では、確認の発話はオペレータの発話「レンタカーの受け取り場所ですね？」である。この発話と対になる発話は、直前のユーザの発話「この前予約したレンタカーはどこで受け取るんだったっけ。」である。共通の単語は、「レンタカー」及び「受け取る」である。従って、「レンタカー」及び「受け取る」がシナリオ特徴語として抽出される。 The process of step S205 will be described in more detail.
In step S205-1, when the type of utterance input in step S202 is confirmation, the dialogue extraction unit 105 extracts a scenario feature word from this utterance and the utterance paired therewith. Specifically, the dialogue extraction unit 105 extracts a word common to one (for example, an operator) confirmation utterance and the other (for example, a user) utterance just before it as a scenario feature word. In the dialogue example of FIG. 3, the confirmation utterance is the operator's utterance “Is it a place to receive a rental car?”. The utterance paired with this utterance is the utterance of the user immediately before, “Where is the rental car reserved in advance? Common words are “rental car” and “receive”. Therefore, “rental car” and “receive” are extracted as scenario feature words.

ステップＳ２０５−２では、対話抽出部１０５は、発話の種類が質問であるか否かを判定する。発話の種類が質問である場合、ステップＳ２０５−３に進み、そうでなければステップＳ２０５−４に進む。ステップＳ２０５−４では、対話抽出部１０５は、発話の種類が回答であるか否かを判定する。発話の種類が回答である場合、ステップＳ２０５−５に進み、そうでなければステップＳ２０５−６に進む。ステップＳ２０５−６では、対話抽出部１０５は、発話がオペレータのアクションに関連するか否かを判定する。発話がアクションに関連する場合、ステップＳ２０５−８に進み、そうでなければステップＳ２０５−７に進む。 In step S205-2, the dialogue extraction unit 105 determines whether or not the type of utterance is a question. If the utterance type is a question, the process proceeds to step S205-3, and if not, the process proceeds to step S205-4. In step S205-4, the dialogue extraction unit 105 determines whether or not the type of utterance is an answer. If the utterance type is an answer, the process proceeds to step S205-5, and if not, the process proceeds to step S205-6. In step S205-6, the dialogue extraction unit 105 determines whether the utterance is related to the action of the operator. If the utterance is related to an action, the process proceeds to step S205-8; otherwise, the process proceeds to step S205-7.

対話抽出部１０５は、質問の発話から属性を取得し（ステップＳ２０５−３）、その質問の発話と対をなす回答の発話から属性値を取得する（ステップＳ２０５−５）。意味クラスは、図７に示すように、階層的に意味を分類したものであり得る。なお、意味クラスは階層構造で表現されなくてもよい。属性値は意図タグによって示される意図を遂行するための引数である。 The dialogue extraction unit 105 acquires an attribute from the utterance of the question (step S205-3), and acquires an attribute value from the utterance of the answer paired with the utterance of the question (step S205-5). As shown in FIG. 7, the semantic class may be a hierarchical classification of meanings. The semantic class may not be expressed in a hierarchical structure. The attribute value is an argument for carrying out the intention indicated by the intention tag.

具体的には、対話抽出部１０５は、質問の発話と回答の発話とに共通の意味クラスを持つ単語のうち、質問の発話中の単語を属性として取得し、回答の発話中の単語を属性値として取得する。図３の対話例では、オペレータの質問「空港はどちらでしょうか？」に対するユーザの回答は「○○空港です。」であり、これらの発話に共通する意味クラスは「Location_STATION_AIR」である。オペレータの発話「空港はどちらでしょうか？」において意味クラスが「Location_STATION_AIR」である単語は「空港」であり、「空港」が属性として抽出される。ユーザの発話「○○空港です。」において意味クラス「Location_STATION_AIR」を持つ単語は「○○空港」であり、「○○空港」が属性値として抽出される。さらに、オペレータの質問「どちらの航空会社をお使いですか？」に対するユーザの回答は「××航空です。」であり、これらの発話に共通する意味クラスは「Organization_COMPANEY_AIR」である。オペレータの発話「どちらの航空会社をお使いですか？」において意味クラスが「Organization_COMPANEY_AIR」である単語は「航空会社」であり、「航空会社」が属性として抽出される。ユーザの発話「××航空です。」において意味クラスが「Organization_COMPANEY_AIR」である単語は「××航空」であり、「××航空」が属性値として抽出される。図３の対話例からは、属性「空港」、属性値「○○空港」、意味クラス「Location_STATION_AIR」の組みと、属性「航空会社」、属性値「××航空」、意味クラス「Organization_COMPANEY_AIR」の組みとが得られる。 Specifically, the dialogue extraction unit 105 acquires, as an attribute, a word that is uttering a question as an attribute from words having a common semantic class for the utterance of the question and the utterance of the answer. Get as a value. In the dialog example of FIG. 3, the user's answer to the operator's question “Which airport is?” Is “XX airport.”, And the semantic class common to these utterances is “Location_STATION_AIR”. In the operator's utterance “Which airport is?”, The word whose semantic class is “Location_STATION_AIR” is “airport”, and “airport” is extracted as an attribute. The word having the semantic class “Location_STATION_AIR” in the user's utterance “XX airport” is “XX airport”, and “XX airport” is extracted as an attribute value. Furthermore, the user's answer to the operator's question “Which airline are you using?” Is “XX Airline.” The semantic class common to these utterances is “Organization_COMPANEY_AIR”. In the operator's utterance “Which airline are you using?”, The word whose semantic class is “Organization_COMPANEY_AIR” is “airline”, and “airline” is extracted as an attribute. In the user's utterance “XX is aviation”, a word whose semantic class is “Organization_COMPANEY_AIR” is “XX aviation”, and “XX aviation” is extracted as an attribute value. From the dialogue example of FIG. 3, the combination of the attribute “airport”, attribute value “XX airport”, semantic class “Location_STATION_AIR”, attribute “airline”, attribute value “xx aviation”, semantic class “Organization_COMPANEY_AIR” A pair is obtained.

なお、対話抽出部１０５は、確認の発話とこの発話と対をなす発話とに出現する同じ単語をシナリオ共通語として抽出する上記例に限らず、質問と回答の対などのオペレータの発話とユーザの発話の対に出現する同じ単語をシナリオ共通語として抽出してもよい。 The dialogue extraction unit 105 is not limited to the above example that extracts the same word appearing in the confirmation utterance and the utterance paired with this utterance as a scenario common word, but the operator's utterance such as a question and answer pair and the user The same word that appears in the pair of utterances may be extracted as a scenario common word.

対話抽出部１０５は、オペレータのアクションを検出すると、アクション内容を取得する（ステップＳ２０５−８）。アクション内容は、オペレータがシステムに対して実際に行った操作を含む。図８に、図３に示した対話例に関連してオペレータがアプリケーションを操作した場合に得られるアクション内容の例を示す。図８に示されるアクション内容は、レンタカーの受け取り場所を示した地図を送付するものである。 When the dialog extracting unit 105 detects the action of the operator, the dialog extracting unit 105 acquires the action content (step S205-8). The action content includes an operation actually performed on the system by the operator. FIG. 8 shows an example of action content obtained when the operator operates the application in relation to the dialogue example shown in FIG. The action content shown in FIG. 8 is to send a map showing a rental car receiving location.

対話抽出部１０５は、質問、回答、及びアクションに関連する発話のいずれでもない発話から意図タグを取得する（ステップ２０５−７）。この発話は対話の目的遂行に貢献しない意図を有する発話として対話ログ記憶部１１１に記録される。 The dialogue extraction unit 105 acquires an intention tag from an utterance that is not any of the utterances related to the question, the answer, and the action (step 205-7). This utterance is recorded in the dialogue log storage unit 111 as an utterance having an intention not to contribute to the purpose of the dialogue.

図９は、図３に示した対話例に関連する対話ログを示している。図９において、「START OPERATOR」が対話開始ラベルであり、「END OPERATOR」が対話終了ラベルである。発話及びアクションに関する情報は、対話開始ラベル及び対話終了ラベル間に記録されている。図９の例では、発話のログは、コロン区切りで、発話の対象者：発話の種類：発話内容：意図タグのように、記述されている。発話内容は、音声認識結果、単語、その意味クラスを含む。意味クラスは単語の直後の括弧内に記載されている。また、アクションのログは、コロン区切りで、アクションの対象者：アクション内容のように、記述されている。 FIG. 9 shows a dialogue log related to the dialogue example shown in FIG. In FIG. 9, “START OPERATOR” is a dialog start label, and “END OPERATOR” is a dialog end label. Information about the utterance and action is recorded between the dialog start label and the dialog end label. In the example of FIG. 9, the utterance log is described as a utterance target: utterance type: utterance content: intention tag, separated by colons. The utterance content includes a speech recognition result, a word, and its semantic class. The semantic class is listed in parentheses immediately after the word. In addition, the action log is described in the form of action target: action content, separated by a colon.

図１０は、対話ログからシナリオを構築する処理手順を概略的に示している。図１０のステップＳ３０１では、シナリオ構築部１０６は、対話ログ記憶部１１１から対話ログを読み込み、読み込んだ対話ログからシナリオ構築の対象となる対話に関する対話開始ラベル及び対話終了ラベルを抽出する。ステップＳ３０２では、シナリオ構築部１０６は、対話開始ラベル及び対話終了ラベル間のログを読み込む。ステップＳ３０３では、シナリオ構築部１０６は、「入力」、「動作」、及び「状態」をシナリオの構成単位として生成する。図１１Ａ及び１１Ｂは、図９の対話ログに基づいて構築されたシナリオの例を示す。図１１Ａに示されるシナリオは、３つの状態を含む。図１１Ｂに示されるシナリオは１つの状態を含む。入力は意図タグ及び属性を含む。動作は動作タグを含む。 FIG. 10 schematically shows a processing procedure for constructing a scenario from the dialogue log. In step S301 of FIG. 10, the scenario construction unit 106 reads the dialogue log from the dialogue log storage unit 111, and extracts the dialogue start label and dialogue end label related to the dialogue that is the target of scenario construction from the read dialogue log. In step S302, the scenario construction unit 106 reads a log between the dialog start label and the dialog end label. In step S303, the scenario construction unit 106 generates “input”, “operation”, and “state” as the constituent units of the scenario. 11A and 11B show examples of scenarios constructed based on the dialogue log of FIG. The scenario shown in FIG. 11A includes three states. The scenario shown in FIG. 11B includes one state. Input includes intent tags and attributes. The action includes an action tag.

ステップＳ３０４では、シナリオ構築部１０６は、種類が質問である発話と種類が回答である発話とに共通の意味クラス及びその意味クラスの単語を取得する。ここで、「共通」は、「同じ」又は「包含関係にある」という意味を持つ用語として用いている。シナリオ構築部１０６は、取得した単語（又は意味クラス）を入力の属性として使用する。 In step S304, the scenario construction unit 106 acquires a semantic class common to an utterance whose type is a question and an utterance whose type is an answer and words of the semantic class. Here, “common” is used as a term having the meaning of “same” or “inclusive relationship”. The scenario construction unit 106 uses the acquired word (or semantic class) as an input attribute.

ステップＳ３０４の処理をより詳細に説明する。ステップＳ３０４−１では、シナリオ構築部１０６は、種類が質問である発話から単語を属性候補として取得し、メモリ上に保持する。ステップＳ３０４−２では、シナリオ構築部１０６は、その次の発話の種類が回答である場合、その発話から単語を属性候補として取得し、メモリ上に保持する。ステップＳ３０４−３では、ステップＳ３０４−１及びステップＳ３０４−２で取得した単語の意味クラスを比較し、共通の意味クラスを持つ単語を属性であるとする。例えば、オペレータの発話「空港はどちらでしょうか？」とユーザの発話「○○空港です。」の対からは空港が属性として取得される。なお、属性の取得方法は、ステップＳ２０５−３の処理に関して説明したものと同様のものであってもよい。図９の対話ログからは、２つの属性「空港」及び「航空会社」が得られる。 The process of step S304 will be described in more detail. In step S304-1, the scenario construction unit 106 acquires a word as an attribute candidate from an utterance whose type is a question, and stores it in a memory. In step S304-2, when the next utterance type is an answer, the scenario construction unit 106 acquires a word from the utterance as an attribute candidate and stores it in the memory. In step S304-3, the semantic classes of the words acquired in steps S304-1 and S304-2 are compared, and words having a common semantic class are attributed. For example, an airport is acquired as an attribute from a pair of an operator's utterance “Which is an airport?” And a user's utterance “I am an airport”. Note that the attribute acquisition method may be the same as that described for the processing in step S205-3. From the dialogue log of FIG. 9, two attributes “airport” and “airline” are obtained.

ステップＳ３０４−３で属性が得られた場合、ステップＳ３０４−５に進む。ステップＳ３０４−５では、シナリオ構築部１０６は、ステップＳ３０４−３で得られた属性を用いて入力条件を生成する。具体的には、シナリオ構築部１０６は、属性を、種類が要求である直近の発話の意図タグに対応する入力属性としてシナリオに登録する。 If the attribute is obtained in step S304-3, the process proceeds to step S304-5. In step S304-5, the scenario construction unit 106 generates an input condition using the attribute obtained in step S304-3. Specifically, the scenario construction unit 106 registers the attribute in the scenario as an input attribute corresponding to the intention tag of the latest utterance whose type is request.

質問の発話の次の発話が回答でない場合のように属性が得らなかった場合、ステップＳ３０４−４に進む。ステップＳ３０４−４では、シナリオ構築部１０６は、オペレータの質問に対してユーザが質問で返したか否かを判断する。例えば、図１２に示される対話例では、オペレータの質問「ターミナルはどちらですか？」に対してユーザが「えっ？わかりません。」と応答している。このように質問に対する発話の種類が回答でない場合、シナリオ構築部１０６は、シナリオの冗長応答と判定し（他の応答の種類として判定してもよい）、すなわち、効率が悪いと判断し、構築中のシナリオの評価を低く設定する。ステップＳ３０４−６では、シナリオ構築部１０６は、種類が回答である発話を待つ。シナリオ構築部１０６は、種類が回答である発話を検出すると、質問の発話と回答の発話の対から属性を取得し、その属性に基づいて入力条件を生成する。 When the attribute is not obtained as in the case where the utterance next to the utterance of the question is not an answer, the process proceeds to step S304-4. In step S304-4, the scenario builder 106 determines whether or not the user has answered the question of the operator. For example, in the interactive example shown in FIG. 12, the user responds “Oh, I don't know” to the operator's question “Which terminal is?”. When the utterance type for the question is not an answer as described above, the scenario construction unit 106 determines that the response is a redundant response of the scenario (may be determined as another response type), that is, determines that the efficiency is low, and constructs Set the middle scenario rating low. In step S304-6, the scenario construction unit 106 waits for an utterance whose type is an answer. When the scenario construction unit 106 detects an utterance whose type is an answer, the scenario construction unit 106 acquires an attribute from the pair of the utterance of the question and the utterance of the answer, and generates an input condition based on the attribute.

ステップＳ３０４−４において質問の発話の次の発話の種類が質問でない場合、ステップＳ３０４−７に進む。Ｓ３０４−７では、意図理解部１０２により意図タグを取得し、Ｓ３０４−５で生成した入力条件と合わせて「入力」を生成する。 If the type of utterance next to the utterance of the question is not a question in step S304-4, the process proceeds to step S304-7. In S304-7, an intention tag is acquired by the intention understanding unit 102, and "input" is generated together with the input condition generated in S304-5.

ステップＳ３０５では、シナリオ構築部１０６は対話ログの読み込みを終了する。ステップＳ３０６では、シナリオ構築部１０６は、アクション内容に含まれる単語を意味クラスで置き換えて変数とする。ステップＳ３０７では、シナリオ構築部１０６は、構築したシナリオをシナリオ記憶部１１０に保存する。シナリオは、シナリオ特徴語によって検索可能なように、シナリオ特徴語と関連付けて保存される。 In step S305, the scenario construction unit 106 finishes reading the dialogue log. In step S306, the scenario construction unit 106 replaces a word included in the action content with a semantic class to make a variable. In step S307, the scenario construction unit 106 saves the constructed scenario in the scenario storage unit 110. The scenario is stored in association with the scenario feature word so that the scenario feature word can be searched.

なお、シナリオは、図１１Ａの例のように、ユーザとオペレータとの対話を忠実に再現するように構築されてもよく、図１１Ｂの例のように、必要な属性を一度に受け付けるように構築されてもよい。 The scenario may be constructed so as to faithfully reproduce the dialogue between the user and the operator as in the example of FIG. 11A, and is constructed so as to accept necessary attributes at one time as in the example of FIG. 11B. May be.

シナリオ更新部１０７は、シナリオ記憶部１１０に記憶されたシナリオを対話制御部１０３が実行できるオブジェクトに変換し、シナリオオブジェクトＤＢ１１４に追加する。更新のタイミングは、自動でもよく、管理者による操作に基づいていてもよい。複数のオペレータに関して同時に同じようなシナリオが構築される可能性がある。シナリオ記憶部１１０は、図１３に示すように、シナリオを、シナリオ特徴語、状態数、応答ステップ数、応答失敗数に関連付けて格納する。応答失敗数は、オペレータの質問に対してユーザが回答以外の発話をした場合などの応答に失敗した回数を示す。状態数、応答ステップ数、応答失敗数は、対話システム１００の管理者がシナリオをシナリオオブジェクトＤＢ１１１に追加するか否かを決定する際に使用される、シナリオを評価するための評価データの例である。シナリオ更新部１０７は、対話システム１００の管理者がシナリオオブジェクトＤＢ１１４に追加するシナリオを選択できるように、シナリオとともに評価データを表示することができる。 The scenario update unit 107 converts the scenario stored in the scenario storage unit 110 into an object that can be executed by the dialogue control unit 103, and adds the object to the scenario object DB 114. The update timing may be automatic or may be based on an operation by an administrator. Similar scenarios may be built simultaneously for multiple operators. As shown in FIG. 13, the scenario storage unit 110 stores a scenario in association with a scenario feature word, the number of states, the number of response steps, and the number of response failures. The number of response failures indicates the number of times the response has failed, such as when the user makes an utterance other than an answer to the operator's question. The number of states, the number of response steps, and the number of response failures are examples of evaluation data for evaluating a scenario used when the administrator of the interactive system 100 determines whether to add a scenario to the scenario object DB 111. is there. The scenario update unit 107 can display the evaluation data together with the scenario so that the administrator of the interactive system 100 can select the scenario to be added to the scenario object DB 114.

図１４は、応答中のオペレータに、実行すべきアクションの候補を提示する手順を示している。図１４のステップＳ４０１では、シナリオ検索部１１３は、オペレータの応答中にユーザとオペレータとの対話からシナリオ特徴語を抽出する。具体的には、シナリオ検索部１１３は、種類が確認である発話とそれと対になる発話とに共通する単語をシナリオ特徴語として抽出する。 FIG. 14 shows a procedure for presenting a candidate for an action to be executed to the responding operator. In step S401 in FIG. 14, the scenario search unit 113 extracts scenario feature words from the dialogue between the user and the operator during the operator's response. Specifically, the scenario search unit 113 extracts a word common to the utterance whose type is confirmation and the utterance paired therewith as a scenario feature word.

ステップＳ４０２では、シナリオ検索部１１３は、シナリオ特徴語を検索キーとして用いてシナリオ記憶部１１０を検索する。ステップＳ４０３では、シナリオ特徴語の全て又は一部が一致したシナリオである類似シナリオがあるか否かが判断される。類似シナリオがある場合、ステップＳ４０４に進み、そうでなければ処理終了となる。 In step S402, the scenario search unit 113 searches the scenario storage unit 110 using the scenario feature word as a search key. In step S403, it is determined whether there is a similar scenario that is a scenario in which all or some of the scenario feature words are matched. If there is a similar scenario, the process proceeds to step S404, and if not, the process ends.

ステップＳ４０４では、シナリオ検索部１１３は、類似シナリオに含まれるアクション内容を取得する。ステップＳ４０５では、シナリオ検索部１１３は、対話状態表示部１１２を通じて、取得したアクション内容をアクション候補として表示する。オペレータは、表示されたアクション候補を参考にして、実行するアクションを決定する。
このようにアクション候補を表示することにより、オペレータを補助することができる。 In step S404, the scenario search unit 113 acquires action content included in the similar scenario. In step S405, the scenario search unit 113 displays the acquired action content as an action candidate through the dialog state display unit 112. The operator determines an action to be executed with reference to the displayed action candidates.
Displaying action candidates in this way can assist the operator.

図１５は、対話状態表示部１１２が表示する内容例を示している。対話状態表示部１１２は、会話モニタ、意図理解モニタ、及び操作モニタを備える。会話モニタは、音声認識部１０１によるユーザとオペレータとの対話に対する音声認識結果を表示する。意図理解モニタは、意図理解部１０２によるユーザとオペレータとの対話に対する意図理解結果を表示する。操作モニタは、シナリオ検索部１１３によって取得されたアクション候補を表示する。図１５の例では、３つのアクション候補が表示されている。 FIG. 15 shows an example of contents displayed by the dialog state display unit 112. The dialog state display unit 112 includes a conversation monitor, an intention understanding monitor, and an operation monitor. The conversation monitor displays a voice recognition result for the dialogue between the user and the operator by the voice recognition unit 101. The intention understanding monitor displays an intention understanding result for the dialogue between the user and the operator by the intention understanding unit 102. The operation monitor displays the action candidates acquired by the scenario search unit 113. In the example of FIG. 15, three action candidates are displayed.

対話状態表示部１１２を備えることにより、オペレータがユーザの要求を視覚的に確認することができる。音声認識結果及び意図理解結果に不備がある場合、有用なシナリオを構築するためには、音声認識結果及び意図理解結果を修正する必要がある。図１５の例では、音声認識において誤認識が発生したために、意図理解が失敗している。応答中に音声認識結果及び意図理解結果をオペレータに提示することで、オペレータが音声認識結果及び意図理解結果を修正することが可能になる。 By providing the dialog state display unit 112, the operator can visually confirm the user's request. If the speech recognition result and the intention understanding result are incomplete, it is necessary to correct the speech recognition result and the intention understanding result in order to construct a useful scenario. In the example of FIG. 15, the intent understanding has failed because a misrecognition has occurred in the speech recognition. By presenting the voice recognition result and the intention understanding result to the operator during the response, the operator can correct the voice recognition result and the intention understanding result.

以上のように、本実施形態によれば、ユーザとオペレータとの対話の分析結果及びオペレータのアクションを含む対話ログに基づいてシナリオを構築することにより、必要なシナリオを対話システムに容易に追加することができる。 As described above, according to the present embodiment, a necessary scenario can be easily added to the dialogue system by constructing a scenario based on the analysis result of the dialogue between the user and the operator and the dialogue log including the action of the operator. be able to.

なお、実施形態の対話システム１００は、例えば、汎用のコンピュータ装置を基本ハードウェアとして用いることでも実現することが可能である。すなわち、音声認識部１０１、意図理解部１０２、対話制御部１０３、応答生成部１０４、対話抽出部１０５、シナリオ構築部１０６、シナリオ更新部１０７、対話状態表示部１１２、シナリオ検索部１１３は、上記のコンピュータ装置に搭載されたプロセッサにプログラムを実行させることにより実現することができる。このとき、対話システムは、上記のプログラムをコンピュータ装置に予めインストールすることで実現してもよいし、ＣＤ−ＲＯＭなどの記憶媒体に記憶して、あるいはネットワークを介して上記のプログラムを配布して、このプログラムをコンピュータ装置に適宜インストールすることで実現してもよい。また、対話ログ記憶部、シナリオ記憶部、辞書記憶部、及び意図記憶部は、上記のコンピュータ装置に内蔵あるいは外付けされたメモリ、ハードディスク若しくはＣＤ−Ｒ、ＣＤ−ＲＷ、ＤＶＤ−ＲＡＭ、ＤＶＤ−Ｒなどの記憶媒体などを適宜利用して実現することができる。 Note that the interactive system 100 of the embodiment can also be realized, for example, by using a general-purpose computer device as basic hardware. That is, the speech recognition unit 101, the intention understanding unit 102, the dialogue control unit 103, the response generation unit 104, the dialogue extraction unit 105, the scenario construction unit 106, the scenario update unit 107, the dialogue state display unit 112, and the scenario search unit 113 This can be realized by causing a processor mounted on the computer apparatus to execute the program. At this time, the interactive system may be realized by installing the above program in a computer device in advance, or may be stored in a storage medium such as a CD-ROM, or distributed through the network. The program may be implemented by appropriately installing it in a computer device. The dialogue log storage unit, scenario storage unit, dictionary storage unit, and intention storage unit are a memory, a hard disk or a CD-R, a CD-RW, a DVD-RAM, a DVD- It can be realized by appropriately using a storage medium such as R.

本発明のいくつかの実施形態を説明したが、これらの実施形態は、例として提示したものであり、発明の範囲を限定することは意図していない。これら新規な実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。これら実施形態やその変形は、発明の範囲や要旨に含まれるとともに、特許請求の範囲に記載された発明とその均等の範囲に含まれる。 Although several embodiments of the present invention have been described, these embodiments are presented by way of example and are not intended to limit the scope of the invention. These novel embodiments can be implemented in various other forms, and various omissions, replacements, and changes can be made without departing from the scope of the invention. These embodiments and modifications thereof are included in the scope and gist of the invention, and are included in the invention described in the claims and the equivalents thereof.

１００…対話システム、１０１…音声認識部、１０２…意図理解部、１０３…対話制御部、１０４…応答生成部、１０５…対話抽出部、１０６…シナリオ構築部、１０７…シナリオ更新部、１０８…辞書記憶部、１０９…意図モデル記憶部、１１０…シナリオ記憶部、１１１…対話ログ記憶部、１１２…対話状態表示部、１１３…シナリオ検索部、１１４…シナリオオブジェクトデータベース。 DESCRIPTION OF SYMBOLS 100 ... Dialog system 101 ... Voice recognition part 102 ... Intent understanding part 103 ... Dialog control part 104 ... Response generation part 105 ... Dialog extraction part 106 ... Scenario construction part 107 ... Scenario update part 108 ... Dictionary Storage unit 109 ... Intent model storage unit 110 Scenario scenario storage unit 111 Dialog log storage unit 112 Dialog state display unit 113 Scenario search unit 114 Scenario object database

Claims

A speech recognition unit that performs speech recognition on a plurality of utterances included in a dialogue between a user and an operator, and generates a speech recognition result including a plurality of texts corresponding to each of the plurality of utterances;
Understand each intention of each of the plurality of utterances based on each of the plurality of texts, and each type of the plurality of utterances, each intention of the plurality of utterances, a word included in the plurality of texts, An intent understanding unit that obtains an intent understanding result including a semantic class;
A dialogue information storage unit for storing the speech recognition result, the intention understanding result, and the action performed by the operator in association with the dialogue;
A scenario construction unit that obtains, as an attribute, a word having a common semantic class for the utterance of a question and the utterance of an answer included in the dialogue, and constructs a scenario using the attribute and the action;
A dialogue system construction support apparatus comprising:

The dialogue system construction support apparatus according to claim 1, further comprising a dialogue extraction unit that extracts, as scenario feature words, the same word that appears in a pair of the utterance of the operator and the utterance of the user included in the dialogue.

3. The dialogue system construction support apparatus according to claim 2, wherein the dialogue extraction unit extracts a common word between the confirmation utterance included in the dialogue and the utterance paired with the confirmation utterance as the scenario feature word.

A scenario storage unit for storing the scenario in association with the scenario feature word;
A scenario search unit that searches the scenario storage unit and acquires a scenario associated with the scenario feature word extracted by the dialog extraction unit as a similar scenario;
The dialogue system construction support apparatus according to claim 2, further comprising a display unit that displays an action included in the similar scenario.

The dialogue system construction support apparatus according to claim 1, further comprising a display unit that displays the voice recognition result and the intention recognition result.

The dialogue system construction support apparatus according to claim 1, further comprising a scenario update unit that adds the scenario to a database of the dialogue system.

The scenario construction unit generates evaluation data for evaluating the scenario,
The dialogue system construction support apparatus according to claim 6, wherein the scenario update unit displays the scenario together with the evaluation data so that it can be selected whether or not to add the scenario to the database.

Performing speech recognition on a plurality of utterances included in a dialog between a user and an operator, and generating a speech recognition result including a plurality of texts corresponding to each of the plurality of utterances;
Understand each intention of each of the plurality of utterances based on each of the plurality of texts, and each type of the plurality of utterances, each intention of the plurality of utterances, a word included in the plurality of texts, Obtaining intent understanding results including semantic classes;
Storing the speech recognition result, the intention understanding result, and the action performed by the operator in relation to the dialogue in association with each other;
Obtaining a word having a common semantic class for the utterance of the question and the utterance of the answer included in the dialogue as an attribute, and constructing a scenario using the attribute and the action;
A dialogue system construction support method comprising:

Computer
Speech recognition means for performing speech recognition on a plurality of utterances included in a dialogue between a user and an operator, and generating a speech recognition result including a plurality of texts corresponding to each of the plurality of utterances;
Understand each intention of each of the plurality of utterances based on each of the plurality of texts, and each type of the plurality of utterances, each intention of the plurality of utterances, a word included in the plurality of texts, An intent understanding means for obtaining an intent understanding result including a semantic class;
Storage means for associating and storing the speech recognition result, the intention understanding result, and an action performed by the operator in relation to the dialogue;
Constructing a dialogue system for acquiring words having a common semantic class for the utterance of the question and the utterance of the answer included in the dialogue as attributes and functioning as scenario construction means for constructing a scenario using the attributes and the actions Support program.