JP6604542B2

JP6604542B2 - Dialogue method, dialogue program and dialogue system

Info

Publication number: JP6604542B2
Application number: JP2015256787A
Authority: JP
Inventors: ヴィヴィアネ高橋; 充遠藤
Original assignee: Panasonic Intellectual Property Management Co Ltd
Current assignee: Panasonic Intellectual Property Management Co Ltd
Priority date: 2015-04-02
Filing date: 2015-12-28
Publication date: 2019-11-13
Anticipated expiration: 2035-12-28
Also published as: CN106055547B; CN106055547A; JP2016197227A

Description

本開示は、ユーザの発話に対して応答する対話システムにおける対話方法、ユーザの発話に対して応答する対話プログラム及びユーザの発話に対して応答する対話システムに関するものである。 The present disclosure relates to a dialog method in a dialog system that responds to a user's speech, a dialog program that responds to a user's speech, and a dialog system that responds to a user's speech.

近年、ユーザの好みを表したモデルを作成することにより、システムとユーザとのやり取りを効率よく行う技術が提案されている。 2. Description of the Related Art In recent years, a technique for efficiently exchanging a system and a user by creating a model that represents user preferences has been proposed.

例えば、特許文献１に示す音声対話システムは、認識対象となっているキーワードに対して、それらを応答文中に含める場合に使用する言い換え語と、応答文の種類を表す応答タイプと、言い換え語と応答タイプとが選択される条件と、を記録し、言い換え語と応答タイプとが選択される条件に基づいて、認識されたキーワードに対する言い換え語と応答文テンプレートを決定し、決定された応答文テンプレートに言い換え語を挿入することにより応答文を生成している。 For example, the spoken dialogue system shown in Patent Document 1 uses a paraphrase used when a keyword to be recognized is included in a response sentence, a response type indicating the type of the response sentence, and a paraphrase word. A condition for selecting a response type is recorded, a paraphrase and a response sentence template for the recognized keyword are determined based on the condition for selecting a paraphrase and a response type, and the determined response sentence template A response sentence is generated by inserting a paraphrase word into.

また、特許文献２に示す従来の音声理解システムは、電子番組ガイド（ＥＰＧ）の情報を受信し、ＥＰＧ情報を処理して、番組データベースを形成する知識抽出部と、口述リクエストを受け、口述リクエストを複数の単語からなる一連のテキスト情報に翻訳する音声認識部と、一連のテキスト情報を受け、口述リクエストの語義内容を解釈するように単語を処理する自然言語プロセッサと、十分な数のキーワードスロットが入力されたかどうかを判断するためにタスクフレームを分析し、空のスロットに入力するためにユーザに対して追加的情報を質問する会話制御部とを備えている。 In addition, the conventional speech understanding system shown in Patent Document 2 receives an electronic program guide (EPG) information, processes the EPG information, receives a dictation request and a knowledge extraction unit that forms a program database. A speech recognizer that translates text into a series of text information consisting of multiple words, a natural language processor that accepts a series of text information and processes words to interpret the semantic content of dictation requests, and a sufficient number of keyword slots And a conversation controller that analyzes the task frame to determine whether or not is entered and queries the user for additional information to enter into an empty slot.

特開２００８−３９９２８号公報JP 2008-39928 A 特開２０００−２５０５７５号公報JP 2000-250575 A

しかしながら、従来の音声理解システムでは、タスクフレームのスロットに入力する値をユーザに直接質問し、ユーザから得られた回答に対して再度確認し、スロットの値を決定している。そのため、システムとユーザとの対話時間が長くなるとともに、システムの処理時間が長くなる。 However, in the conventional speech understanding system, the value to be input to the slot of the task frame is directly asked to the user, the answer obtained from the user is confirmed again, and the value of the slot is determined. For this reason, the dialogue time between the system and the user becomes long, and the processing time of the system becomes long.

本開示は、上記の問題を解決するためになされたもので、対話システムとユーザとの対話時間を短縮することができるとともに、対話システムの処理時間を短縮することができる対話方法、対話プログラム及び対話システムを提供することを目的とするものである。 The present disclosure has been made in order to solve the above-described problem, and can provide a dialogue method, a dialogue program, and a dialogue program capable of reducing the dialogue time between the dialogue system and the user and reducing the processing time of the dialogue system. The purpose is to provide a dialogue system.

本開示の一局面に係る対話方法は、ユーザの発話に対して応答する対話システムに用いられる対話方法であって、前記ユーザの発話に対して応答文を生成するタスクを実行するために必要な複数のノードをそれぞれ関連付けて記憶し、前記ユーザの発話内容を示す発話情報を取得し、前記複数のノードの中から前記発話情報に対応する第１のノードを特定し、特定された前記第１のノードに関連付けられている複数の第２のノードの中から、前記複数の第２のノードのそれぞれに対応付けられた重み値に基づいて、１の第２のノードを選択し、選択された前記１の第２のノードに応じた応答文を生成する。 An interaction method according to an aspect of the present disclosure is an interaction method used in an interaction system that responds to a user's utterance, and is necessary for executing a task that generates a response sentence for the user's utterance A plurality of nodes are stored in association with each other, utterance information indicating the utterance content of the user is acquired, a first node corresponding to the utterance information is identified from the plurality of nodes, and the identified first One second node is selected based on the weight value associated with each of the plurality of second nodes from among the plurality of second nodes associated with the selected node, A response sentence corresponding to the first second node is generated.

この構成によれば、ユーザの発話に対して応答文を生成するタスクを実行するために必要な複数のノードがそれぞれ関連付けて記憶されている。ユーザの発話内容を示す発話情報が取得される。複数のノードの中から発話情報に対応する第１のノードが特定される。特定された第１のノードに関連付けられている複数の第２のノードの中から、複数の第２のノードのそれぞれに対応付けられた重み値に基づいて、１の第２のノードが選択される。そして、選択された１の第２のノードに応じた応答文が生成される。 According to this configuration, a plurality of nodes necessary for executing a task for generating a response sentence in response to a user's utterance are stored in association with each other. Utterance information indicating the user's utterance content is acquired. A first node corresponding to the utterance information is identified from among the plurality of nodes. One second node is selected from a plurality of second nodes associated with the identified first node based on a weight value associated with each of the plurality of second nodes. The Then, a response sentence corresponding to the selected second node is generated.

したがって、複数の第２のノードの中から１の第２のノードをユーザに選択させるための質問文を生成する必要がなく、複数の第２のノードのそれぞれに対応付けられた重み値に基づいて選択された１の第２のノードに応じた応答文が生成されるので、対話システムとユーザとの対話時間を短縮することができるとともに、対話システムの処理時間を短縮することができる。 Therefore, it is not necessary to generate a question sentence for causing the user to select one second node from among the plurality of second nodes, and based on the weight value associated with each of the plurality of second nodes. Since the response sentence corresponding to the selected second node is generated, the dialogue time between the dialogue system and the user can be shortened, and the processing time of the dialogue system can be shortened.

また、上記の対話方法において、前記重み値は、前記複数の第２のノードが過去に前記ユーザによって選択された確率を表してもよい。 In the above interactive method, the weight value may represent a probability that the plurality of second nodes have been selected by the user in the past.

この構成によれば、重み値は、複数の第２のノードが過去にユーザによって選択された確率を表すので、簡単に重み値を算出することができる。 According to this configuration, the weight value represents the probability that a plurality of second nodes have been selected by the user in the past, and thus the weight value can be easily calculated.

また、上記の対話方法において、前記複数の第２のノードのうち、前記確率が所定の値より大きい第２のノードを選択してもよい。 In the above interactive method, the second node having the probability larger than a predetermined value may be selected from the plurality of second nodes.

この構成によれば、複数の第２のノードのうち、確率が所定の値より大きい第２のノードが選択されるので、簡単に１の第２のノードを選択することができる。 According to this configuration, since the second node having a probability greater than a predetermined value is selected from among the plurality of second nodes, it is possible to easily select one second node.

また、上記の対話方法において、前記複数の第２のノードのうち、前記確率が所定の値より大きい第２のノードが存在しない場合、前記複数の第２のノードのいずれかを前記ユーザに選択させるための応答文を生成してもよい。 In the above interactive method, if there is no second node having the probability greater than a predetermined value among the plurality of second nodes, the user is selected from the plurality of second nodes. You may generate the response sentence for making it.

この構成によれば、複数の第２のノードのうち、確率が所定の値より大きい第２のノードが存在しない場合、複数の第２のノードのいずれかをユーザに選択させるための応答文が生成されるので、１の第２のノードを選択することができない場合であっても、ユーザに選択させることができる。 According to this configuration, when there is no second node having a probability greater than a predetermined value among the plurality of second nodes, a response statement for causing the user to select one of the plurality of second nodes is provided. Since it is generated, even if it is not possible to select one second node, the user can select it.

また、上記の対話方法において、前記応答文に対する前記ユーザの回答を示す情報を取得し、前記ユーザの回答が前記複数の第２のノードのうち１の第２のノードを選択する回答であるか否かに応じて、前記重み値を更新してもよい。 Further, in the above interactive method, information indicating the user's answer to the response sentence is acquired, and the user's answer is an answer for selecting one second node among the plurality of second nodes. Depending on whether or not, the weight value may be updated.

この構成によれば、応答文に対するユーザの回答を示す情報が取得される。ユーザの回答が複数の第２のノードのうち１の第２のノードを選択する回答であるか否かに応じて、重み値が更新される。 According to this configuration, information indicating the user's answer to the response sentence is acquired. The weight value is updated according to whether or not the user's answer is an answer for selecting one second node among the plurality of second nodes.

したがって、ユーザがシステムを利用する毎に重み値が更新されるので、ユーザの利用状況に応じた１の第２のノードを選択することができる。 Therefore, since the weight value is updated every time the user uses the system, it is possible to select one second node corresponding to the usage status of the user.

また、上記の対話方法において、複数の第１のノードのうちの１の第１のノードに関連付けられている複数の第２のノードのうちの１の第２のノードと、前記複数の第１のノードのうちの他の第１のノードに関連付けられている複数の第２のノードのそれぞれとの組合せに対して前記重み値が対応付けられており、前記１の第２のノードが特定されたか否かを判断し、前記１の第２のノードが特定された場合、前記１の第２のノードと、前記他の第１のノードに関連付けられている複数の第２のノードのそれぞれとの組合せに対して対応付けられた重み値に基づいて、前記他の第１のノードに関連付けられている前記複数の第２のノードの中から１の第２のノードを選択してもよい。 Further, in the above interactive method, the second node of one of the plurality of second nodes associated with the first node of the first of the plurality of first nodes, and the first of the plurality of first nodes. The weight value is associated with a combination with each of a plurality of second nodes associated with other first nodes among the nodes, and the first second node is identified. And when the first second node is specified, the first second node and each of the plurality of second nodes associated with the other first node One second node may be selected from the plurality of second nodes associated with the other first node based on a weight value associated with the combination of the first and second nodes.

この構成によれば、複数の第１のノードのうちの１の第１のノードに関連付けられている複数の第２のノードのうちの１の第２のノードと、複数の第１のノードのうちの他の第１のノードに関連付けられている複数の第２のノードのそれぞれとの組合せに対して重み値が対応付けられている。１の第２のノードが特定されたか否かが判断される。１の第２のノードが特定された場合、１の第２のノードと、他の第１のノードに関連付けられている複数の第２のノードのそれぞれとの組合せに対して対応付けられた重み値に基づいて、他の第１のノードに関連付けられている複数の第２のノードの中から１の第２のノードが選択される。 According to this configuration, one second node of the plurality of second nodes associated with the first node of the plurality of first nodes, and the plurality of first nodes A weight value is associated with each combination with a plurality of second nodes associated with the other first nodes. It is determined whether one second node has been identified. When one second node is specified, a weight associated with a combination of one second node and each of a plurality of second nodes associated with the other first node Based on the value, one second node is selected from among a plurality of second nodes associated with other first nodes.

したがって、複数の第１のノードのうちの１の第１のノードに関連付けられている複数の第２のノードのうちの１の第２のノードと、複数の第１のノードのうちの他の第１のノードに関連付けられている複数の第２のノードのそれぞれとの組合せに応じた１の第２のノードを選択することができる。 Therefore, one second node of the plurality of second nodes associated with one first node of the plurality of first nodes and the other of the plurality of first nodes One second node can be selected according to a combination with each of a plurality of second nodes associated with the first node.

本開示の他の局面に係る対話プログラムは、ユーザの発話に対して応答する対話プログラムであって、前記ユーザの発話に対して応答文を生成するタスクを実行するために必要な複数のノードをそれぞれ関連付けて記憶する記憶部と、前記ユーザの発話内容を示す発話情報を取得する取得部と、前記複数のノードの中から前記発話情報に対応する第１のノードを特定する特定部と、前記特定部によって特定された前記第１のノードに関連付けられている複数の第２のノードの中から、前記複数の第２のノードのそれぞれに対応付けられた重み値に基づいて、１の第２のノードを選択する選択部と、前記選択部によって選択された前記１の第２のノードに応じた応答文を生成する生成部としてコンピュータを機能させる。 A dialogue program according to another aspect of the present disclosure is a dialogue program that responds to a user's utterance, and includes a plurality of nodes necessary for executing a task of generating a response sentence with respect to the user's utterance. A storage unit that stores each of them in association with each other; an acquisition unit that acquires utterance information indicating the utterance content of the user; a specifying unit that specifies a first node corresponding to the utterance information from the plurality of nodes; Based on a weight value associated with each of the plurality of second nodes, a second one of the plurality of second nodes associated with the first node identified by the identifying unit The computer is caused to function as a selection unit that selects a node and a generation unit that generates a response sentence corresponding to the first second node selected by the selection unit.

本開示の他の局面に係る対話システムは、ユーザの発話に対して応答する対話システムであって、前記ユーザの発話に対して応答文を生成するタスクを実行するために必要な複数のノードをそれぞれ関連付けて記憶する記憶部と、前記ユーザの発話内容を示す発話情報を取得する取得部と、前記複数のノードの中から前記発話情報に対応する第１のノードを特定する特定部と、前記特定部によって特定された前記第１のノードに関連付けられている複数の第２のノードの中から、前記複数の第２のノードのそれぞれに対応付けられた重み値に基づいて、１の第２のノードを選択する選択部と、前記選択部によって選択された前記１の第２のノードに応じた応答文を生成する生成部と、を備える。 A dialog system according to another aspect of the present disclosure is a dialog system that responds to a user's utterance, and includes a plurality of nodes necessary for executing a task of generating a response sentence with respect to the user's utterance. A storage unit that stores each of them in association with each other; an acquisition unit that acquires utterance information indicating the utterance content of the user; a specifying unit that specifies a first node corresponding to the utterance information from the plurality of nodes; Based on a weight value associated with each of the plurality of second nodes, a second one of the plurality of second nodes associated with the first node identified by the identifying unit And a generation unit that generates a response sentence corresponding to the first second node selected by the selection unit.

本開示によれば、対話システムとユーザとの対話時間を短縮することができるとともに、対話システムの処理時間を短縮することができる。 According to the present disclosure, it is possible to shorten the dialogue time between the dialogue system and the user and reduce the processing time of the dialogue system.

本実施の形態における音声対話システムの概要を説明するための図である。It is a figure for demonstrating the outline | summary of the speech dialogue system in this Embodiment. 本実施の形態における音声対話システムの構成を示す図である。It is a figure which shows the structure of the speech dialogue system in this Embodiment. 判断条件テーブルの一例を示す図である。It is a figure which shows an example of a judgment condition table. 本実施の形態における音声対話システムの音声対話処理について説明するためのフローチャートである。It is a flowchart for demonstrating the voice dialogue process of the voice dialogue system in this Embodiment. 本実施の形態における音声対話システムの重み値更新処理について説明するためのフローチャートである。It is a flowchart for demonstrating the weight value update process of the speech dialogue system in this Embodiment. 本実施の形態における音声対話システムの音声対話処理と、従来の音声対話システムの音声対話処理との差異を説明するための図である。It is a figure for demonstrating the difference between the voice dialogue process of the voice dialogue system in this Embodiment, and the voice dialogue process of the conventional voice dialogue system. 本実施の形態の変形例における音声対話システムの意味ネットワークの一例を示す図である。It is a figure which shows an example of the semantic network of the voice dialogue system in the modification of this Embodiment. 本実施の形態の変形例における音声対話システムの音声対話処理について説明するためのフローチャートである。It is a flowchart for demonstrating the voice dialogue process of the voice dialogue system in the modification of this Embodiment. 従来例の音声対話システムにおいて用いられる対話方法による対話文の一例を表す図である。It is a figure showing an example of the dialogue sentence by the dialogue method used in the speech dialogue system of a prior art example. 本開示に係る音声対話システムにおいて用いられる対話方法による対話文の一例を表す図である。It is a figure showing an example of the dialogue sentence by the dialogue method used in the voice dialogue system concerning this indication.

以下添付図面を参照しながら、本発明の実施の形態について説明する。なお、以下の実施の形態は、本発明を具体化した一例であって、本発明の技術的範囲を限定するものではない。 Embodiments of the present invention will be described below with reference to the accompanying drawings. The following embodiments are examples embodying the present invention, and do not limit the technical scope of the present invention.

図１は、本実施の形態における音声対話システムの概要を説明するための図である。 FIG. 1 is a diagram for explaining the outline of the voice interaction system according to the present embodiment.

図１に示す例は、ドリンクを販売する際に用いられる意味ネットワークの一例を示している。図１に示す意味ネットワークは、ユーザの発話に対して応答文を生成するタスクを実行するために必要な複数のノードを含む。複数のノードは、それぞれ関連付けられている。関連付けられている２つのノードに対しては、２つのノードの関係性を示す関係情報が付与されている。関係情報は、一方のノードが他方のノードに対して下位概念の関係にあることを示す情報、一方のノードが他方のノードに含まれるコンセプトに係るタスクを実行するために必須の項目であることを示す情報、一方のノードが他方のノードに含まれるコンセプトに係るタスクに対して任意に設定される項目であることを示す情報、及び一方のノードが他方のノードの値であることを示す情報を含む。 The example shown in FIG. 1 shows an example of a semantic network used when selling drinks. The semantic network shown in FIG. 1 includes a plurality of nodes necessary for executing a task for generating a response sentence in response to a user's utterance. The plurality of nodes are associated with each other. Relation information indicating the relation between the two nodes is given to the two nodes associated with each other. The relationship information is information indicating that one node is in a lower-level concept with respect to the other node, and one node is an indispensable item for executing a task related to the concept included in the other node. Information indicating that one node is an item arbitrarily set for a task related to a concept included in the other node, and information indicating that one node is the value of the other node including.

例えば、“ｃｏｆｆｅｅ”を示すノード１１と“ｄｒｉｎｋ”を示すノード１２とには、“ｃｏｆｆｅｅ”を示すノード１１が“ｄｒｉｎｋ”を示すノード１２に対して下位概念である（ｉｓ−ａ関係である）ことを示す関係情報が対応付けられている。なお、“ｃｏｆｆｅｅ”を示すノード１１は、ドメインとも呼ばれる。 For example, the node 11 indicating “offee” and the node 12 indicating “drink” are subordinate to the node 12 indicating “drink” (the is-a relationship). ) Are associated with each other. Note that the node 11 indicating “coffee” is also called a domain.

また、“ｓｉｚｅ”を示すノード１４と“ｄｒｉｎｋ”を示すノード１２とには、“ｓｉｚｅ”を示すノード１４が“ｄｒｉｎｋ”を示すノード１２に対して必須の項目であることを示す関係情報が対応付けられている。“ｑｕａｎｔｉｔｙ”を示すノード１５と“ｄｒｉｎｋ”を示すノード１２とには、“ｑｕａｎｔｉｔｙ”を示すノード１５が“ｄｒｉｎｋ”を示すノード１２に対して任意に設定される項目であり、ユーザが通知する場合に決められるノードであることを示す関係情報が対応付けられている。 In addition, the node 14 indicating “size” and the node 12 indicating “link” include relationship information indicating that the node 14 indicating “size” is an essential item for the node 12 indicating “link”. It is associated. The node 15 indicating “quantity” and the node 12 indicating “link” are items in which the node 15 indicating “quantity” is arbitrarily set with respect to the node 12 indicating “link”, and is notified by the user. Relation information indicating that the node is determined in each case is associated.

また、“ｓｕｇａｒ”を示すノード１７と“ｃｏｆｆｅｅ”を示すノード１１とには、“ｓｕｇａｒ”を示すノード１７が“ｃｏｆｆｅｅ”を示すノード１１に対してタスクを実行するために必須の項目であることを示す関係情報が対応付けられている。“ｔｅｍｐｅｒａｔｕｒｅ”を示すノード１８と“ｃｏｆｆｅｅ”を示すノード１１とには、“ｔｅｍｐｅｒａｔｕｒｅ”を示すノード１８が“ｃｏｆｆｅｅ”を示すノード１１に対してタスクを実行するために必須の項目であることを示す関係情報が対応付けられている。“ｓｕｇａｒ”を示すノード１７及び“ｔｅｍｐｅｒａｔｕｒｅ”を示すノード１８は、必須スロット又は単にスロットと呼ぶ。 The node 17 indicating “sugar” and the node 11 indicating “offee” are indispensable items for the node 17 indicating “sugar” to execute a task on the node 11 indicating “offee”. Is associated with the relationship information. The node 18 indicating “temperature” and the node 11 indicating “offset” indicate that the node 18 indicating “temperature” is an indispensable item for executing a task on the node 11 indicating “offset”. The relationship information shown is associated. The node 17 indicating “sugar” and the node 18 indicating “temperature” are called essential slots or simply slots.

“ｈｏｔ”を示すノード１９と“ｔｅｍｐｅｒａｔｕｒｅ”を示すノード１８とには、“ｈｏｔ”を示すノード１９が“ｔｅｍｐｅｒａｔｕｒｅ”を示すノード１８の値であることを示す関係情報が対応付けられている。“ｃｏｌｄ”を示すノード２０と“ｔｅｍｐｅｒａｔｕｒｅ”を示すノード１８とには、“ｃｏｌｄ”を示すノード２０が“ｔｅｍｐｅｒａｔｕｒｅ”を示すノード１８の値であることを示す関係情報が対応付けられている。“ｈｏｔ”を示すノード１９及び“ｃｏｌｄ”を示すノード２０は、“ｔｅｍｐｅｒａｔｕｒｅ”を示すノード１８に対していずれかが選択されるノードである。“ｈｏｔ”を示すノード１９及び“ｃｏｌｄ”を示すノード２０は、スロット値と呼ぶ。 Relation information indicating that the node 19 indicating “hot” is the value of the node 18 indicating “temperature” is associated with the node 19 indicating “hot” and the node 18 indicating “temperature”. The node 20 indicating “cold” and the node 18 indicating “temperature” are associated with the relationship information indicating that the node 20 indicating “cold” is the value of the node 18 indicating “temperature”. The node 19 indicating “hot” and the node 20 indicating “cold” are nodes selected for the node 18 indicating “temperature”. The node 19 indicating “hot” and the node 20 indicating “cold” are called slot values.

ユーザがドリンクを購入する際に、コーヒーを選択した場合、砂糖が必要であるか否か、ホット及びコールドのいずれであるかは、必ず決定する必要があり、これらが決定されない場合、ドリンクを提供する際のユーザの発話に対して応答文を生成するタスクを実行することができない。すなわち、温度は、タスクを達成するために必須のノード（スロット）であり、システムは、スロットの値（この場合、コールド又はホット）を決定する必要がある。 When a user purchases a drink and chooses coffee, it must be determined whether sugar is required or whether it is hot or cold, and if these are not determined, the drink is served The task of generating a response sentence cannot be executed in response to the user's utterance. That is, temperature is an essential node (slot) to accomplish the task, and the system needs to determine the value of the slot (in this case, cold or hot).

従来のシステムでは、コーヒーの温度について、システムがユーザに対し、ホット及びコールドのいずれにするかを質問し、ユーザの回答を音声認識により判断していた。ユーザがホットを選択した場合、システムは、ホットでよいか否かを再度質問し、ユーザの回答を音声認識により判断し、コーヒーの温度を決定していた。 In the conventional system, the system asks the user whether the coffee temperature is hot or cold, and the user's answer is determined by voice recognition. When the user selects hot, the system asks again whether it is hot or not, determines the user's answer by voice recognition, and determines the temperature of the coffee.

これに対し、本開示のシステムでは、コーヒーの温度について、システムがユーザに対し、ホット及びコールドのいずれにするかを質問することなく、過去に複数のユーザがホット及びコールドのいずれを選択したかに応じてそれぞれに重み値を付与し、重み値に応じて、ホットにするか否か又はコールドにするか否かを質問する。例えば、過去にコールドが６０％の確率で選択され、ホットが４０％の確率で選択された場合、システムは、ユーザに対してコールドにするか否かを質問し、ユーザの回答を音声認識により判断し、コーヒーの温度を決定する。この場合、従来のシステムに比べて温度を再度確認する必要がなく、システムとユーザとの対話時間を短縮することができるとともに、システムの処理時間を短縮することができる。 On the other hand, in the system of the present disclosure, regarding the temperature of the coffee, whether the user has selected hot or cold in the past without asking the user whether the system should be hot or cold. Depending on the weight value, a weight value is assigned to each, and whether to make it hot or cold is inquired according to the weight value. For example, if cold has been selected with a probability of 60% in the past and hot has been selected with a probability of 40%, the system will ask the user whether or not to make it cold, and the user's answer will be voice recognition. Judge and determine the temperature of the coffee. In this case, it is not necessary to reconfirm the temperature as compared with the conventional system, the interaction time between the system and the user can be shortened, and the processing time of the system can be shortened.

また、具体的かつ正しい内容の質問が生成できれば、ユーザから、タスクを実行するために必要な情報を獲得することが容易となる。例えば、ユーザがコーヒーを注文した場合、システムが「ホットコーヒーでよろしいですね？」と問いかけることによって、ユーザの回答を「はい」又は「いいえ」などの肯定的な表現又は否定的な表現に限定することができる。 If a question with specific and correct contents can be generated, it becomes easy to acquire information necessary for executing a task from the user. For example, when a user orders coffee, the system limits the user's answer to positive or negative expressions such as “yes” or “no” by asking “Are you sure you want hot coffee?” can do.

また、別のケースとして、例えば、ユーザが「チーズバーガーセット」を指定するセットメニューの注文を行った場合について考える。このとき、システムがユーザにセットメニューにおけるドリンクの種類を尋ねる場合には、確率に基づいて、例えば、「コークとオレンジジュースのどちらですか？」という択一的な質問をする。これにより、ユーザの回答を、「コークです。」又は「オレンジジュースです。」など、システムが受理できる内容を含む回答に誘導しやすくなる。すなわち、システムが択一的な質問をすることによって、ユーザが想定外の回答を行わないよう促す。これにより、システムが受理できる表現をユーザが用いる可能性が従来技術を適用した場合よりも高まり、ユーザからより確実に情報を獲得することができる。 As another case, for example, consider a case where the user places an order for a set menu that specifies “cheese burger set”. At this time, when the system asks the user about the type of drink in the set menu, the user asks an alternative question such as “Which is cork or orange juice?” Based on the probability. This makes it easy to guide the user's answer to an answer that includes content that can be accepted by the system, such as “It is cork” or “It is orange juice.” That is, when the system asks an alternative question, the user is encouraged not to make an unexpected answer. As a result, the possibility that the user uses expressions that can be accepted by the system is higher than when the conventional technique is applied, and information can be more reliably acquired from the user.

図２は、本実施の形態における音声対話システムの構成を示す図である。音声対話システムは、音声認識部１０１、自然言語プロセッサ１０２、メモリ１０３、会話管理部１０４及び音声合成部１０５を備える。 FIG. 2 is a diagram showing a configuration of the voice interaction system according to the present embodiment. The speech dialogue system includes a speech recognition unit 101, a natural language processor 102, a memory 103, a conversation management unit 104, and a speech synthesis unit 105.

メモリ１０３は、意味ネットワーク記憶部１１１、重み値管理テーブル記憶部１１２及び判断条件テーブル記憶部１１３を備える。 The memory 103 includes a semantic network storage unit 111, a weight value management table storage unit 112, and a determination condition table storage unit 113.

意味ネットワーク記憶部１１１は、複数のノードを繋げた意味ネットワークを予め記憶している。意味ネットワーク記憶部１１１は、ユーザの発話に対して応答文を生成するタスクを実行するために必要な複数のノードをそれぞれ関連付けて記憶する。 The semantic network storage unit 111 stores in advance a semantic network connecting a plurality of nodes. The semantic network storage unit 111 stores a plurality of nodes necessary for executing a task for generating a response sentence in response to a user's utterance.

重み値管理テーブル記憶部１１２は、意味ネットワークに含まれるスロットの値と、重み値とを対応付けて記憶している。 The weight value management table storage unit 112 stores the value of the slot included in the semantic network and the weight value in association with each other.

判断条件テーブル記憶部１１３は、選択可能なスロット値の数と、スロット値が選択される条件と、条件を満たす際に得られるスロット値と、応答文を表すテンプレートとを対応付けて記憶している。 The determination condition table storage unit 113 stores the number of selectable slot values, the condition for selecting the slot value, the slot value obtained when the condition is satisfied, and the template representing the response sentence in association with each other. Yes.

音声認識部１０１は、マイク（不図示）によって取得された入力音声をテキスト情報に変換する。音声認識部１０１は、ユーザの発話を認識してテキスト情報に変換する。 The voice recognition unit 101 converts input voice acquired by a microphone (not shown) into text information. The voice recognition unit 101 recognizes a user's utterance and converts it into text information.

自然言語プロセッサ１０２は、ユーザの発話内容を示す発話情報（テキスト情報）を取得する。自然言語プロセッサ１０２は、複数のノードの中から発話情報に対応する第１のノード（スロット）を特定する。自然言語プロセッサ１０２は、音声認識部１０１によって出力される一連のテキスト情報を分析して、語義内容及びユーザの発話の意図を理解する。自然言語プロセッサ１０２は、例えば、言語理解用データベース（不図示）に記憶されている言語理解用知識を用いて発話内容を理解する。自然言語プロセッサ１０２は、テキスト情報から、意味のある単語を抽出する。自然言語プロセッサ１０２は、意味ネットワーク記憶部１１１に記憶されている意味ネットワーク内を検索し、意味ネットワーク内に抽出した単語が存在する場合、抽出した単語によって特定されるタスクに関係するスロット及びスロットに対応付けられている複数のスロット値を意味ネットワークから抽出する。 The natural language processor 102 acquires speech information (text information) indicating the content of the user's speech. The natural language processor 102 specifies a first node (slot) corresponding to the utterance information from the plurality of nodes. The natural language processor 102 analyzes a series of text information output by the speech recognition unit 101 to understand the meaning content and the intention of the user's utterance. The natural language processor 102 understands the utterance content using language understanding knowledge stored in a language understanding database (not shown), for example. The natural language processor 102 extracts meaningful words from the text information. The natural language processor 102 searches the semantic network stored in the semantic network storage unit 111, and if the extracted word exists in the semantic network, the natural language processor 102 sets the slot and slot related to the task specified by the extracted word. A plurality of associated slot values are extracted from the semantic network.

自然言語プロセッサ１０２は、構文解析部１３１及びメモリアクセス部１３２を備える。構文解析部１３１は、テキスト化されたユーザの発話内容から単語を抽出する処理を行う。メモリアクセス部１３２は、構文解析部１３１にて抽出された単語について、メモリ１０３に格納された意味ネットワーク内を検索し、スロット等を抽出し、抽出したスロットを会話管理部１０４（会話生成部１２１）へ出力する。 The natural language processor 102 includes a syntax analysis unit 131 and a memory access unit 132. The syntax analysis unit 131 performs a process of extracting words from the utterance contents of the user converted into text. The memory access unit 132 searches the semantic network stored in the memory 103 for the words extracted by the syntax analysis unit 131, extracts slots and the like, and extracts the extracted slots from the conversation management unit 104 (conversation generation unit 121). ).

なお、メモリアクセス部１３２は、会話管理部１０４が備えていてもよい。自然言語プロセッサ１０２は、テキスト化されたユーザの発話内容から、単語を抽出して会話管理部１０４のメモリアクセス部へ出力し、会話管理部１０４のメモリアクセス部は、意味ネットワークからスロット等を抽出してもよい。 The memory access unit 132 may be included in the conversation management unit 104. The natural language processor 102 extracts words from the utterance contents of the user converted into text and outputs them to the memory access unit of the conversation management unit 104. The memory access unit of the conversation management unit 104 extracts slots and the like from the semantic network. May be.

会話管理部１０４は、会話生成部１２１及び重み値更新部１２２を備える。会話生成部１２１は、自然言語プロセッサ１０２によって特定された第１のノード（スロット）に関連付けられている複数の第２のノード（スロット値）の中から、複数の第２のノードのそれぞれに対応付けられた重み値に基づいて、１の第２のノードを選択する。なお、重み値は、複数の第２のノードが過去にユーザによって選択された確率を表す。会話生成部１２１は、複数の第２のノードのうち、確率が所定の値より大きい第２のノードを選択する。会話生成部１２１は、選択された１の第２のノード（スロット値）に応じた応答文を生成する。会話生成部１２１は、複数の第２のノードのうち、確率が所定の値より大きい第２のノードが存在しない場合、複数の第２のノードのいずれかをユーザに選択させるための応答文を生成する。 The conversation management unit 104 includes a conversation generation unit 121 and a weight value update unit 122. The conversation generation unit 121 corresponds to each of the plurality of second nodes from the plurality of second nodes (slot values) associated with the first node (slot) specified by the natural language processor 102. Based on the assigned weight value, one second node is selected. The weight value represents the probability that a plurality of second nodes have been selected by the user in the past. The conversation generation unit 121 selects a second node having a probability that is greater than a predetermined value among the plurality of second nodes. The conversation generator 121 generates a response sentence corresponding to the selected second node (slot value). When there is no second node having a probability greater than a predetermined value among the plurality of second nodes, the conversation generation unit 121 generates a response sentence for causing the user to select one of the plurality of second nodes. Generate.

会話生成部１２１は、自然言語プロセッサ１０２によって抽出された複数のスロット値のそれぞれに対応付けられている重み値を、重み値管理テーブルから取得し、取得した重み値に基づいて、１のスロット値を決定する。会話生成部１２１は、決定したスロット値に応じた応答文を生成する。このとき、会話生成部１２１は、判断条件テーブル記憶部１１３に記憶されている判断条件テーブルを参照し、あるスロットに対して選択可能なスロット値の数に対応する判断条件を満たすか否かを判断する。判断条件を満たす場合には、会話生成部１２１は、スロット値を予め用意された応答文のテンプレートに挿入し、応答文を生成する。 The conversation generation unit 121 acquires a weight value associated with each of the plurality of slot values extracted by the natural language processor 102 from the weight value management table, and based on the acquired weight value, one slot value To decide. The conversation generation unit 121 generates a response sentence corresponding to the determined slot value. At this time, the conversation generation unit 121 refers to the determination condition table stored in the determination condition table storage unit 113 and determines whether or not the determination condition corresponding to the number of slot values that can be selected for a certain slot is satisfied. to decide. When the determination condition is satisfied, the conversation generation unit 121 inserts the slot value into a response sentence template prepared in advance, and generates a response sentence.

図３は、判断条件テーブルの一例を示す図である。 FIG. 3 is a diagram illustrating an example of the determination condition table.

図３に示すように、判断条件テーブルは、選択可能なスロット値の数と、スロット値が選択される条件と、条件を満たす際に得られるスロット値と、応答文を表すテンプレートとを対応付けている。 As shown in FIG. 3, the determination condition table associates the number of selectable slot values, the condition for selecting the slot value, the slot value obtained when the condition is satisfied, and the template representing the response sentence. ing.

例えば、会話生成部１２１は、選択可能なスロット値が“ｖ_１”及び“ｖ_２”の２つであり、ｖ_１が５０％より大きく、ｖ_２が５０％より小さい場合、ｖ_１をスロット値として選択する。また、会話生成部１２１は、選択可能なスロット値が“ｖ_１”及び“ｖ_２”の２つであり、ｖ_１が５０％より小さく、ｖ_２が５０％より大きい場合、ｖ_２をスロット値として選択する。そして、ｖ_１がスロット値として選択された場合、会話生成部１２１は、“ｖ_１が欲しいですか？（Ｄｏｙｏｕｗａｎｔｖ_１？）”という応答文を生成する。一方、ｖ_２がスロット値として選択された場合、会話生成部１２１は、“ｖ_２が欲しいですか？（Ｄｏｙｏｕｗａｎｔｖ_２？）”という応答文を生成する。 For example, if there are _two selectable slot values “v ₁ ” and “v ₂ ”, and v ₁ is larger than 50% and v ₂ is smaller than 50%, the conversation generating unit 121 sets v ₁ as a slot. Select as value. In addition, the conversation generation unit 121 has two selectable slot values “v ₁ ” and “v ₂ ”, and when v ₁ is smaller than 50% and v ₂ is larger than 50%, v ₂ is slotted. Select as value. Then, when _{the v 1} has been selected as a slot value, conversation generating unit 121, _"v or is _one you _{want? (Do you want v 1?} )" To generate a response sentence. On the other hand, if _{the v 2} has been selected as a slot value, conversation generating unit 121, _"v Do you ₂ _{want? (Do you want v 2?} )" To generate a response sentence.

また、会話生成部１２１は、選択可能なスロット値が“ｖ_１”及び“ｖ_２”の２つであり、ｖ_１及びｖ_２が５０％である場合、ｖ_１及びｖ_２をスロット値として選択する。そして、ｖ_１及びｖ_２がスロット値として選択された場合、会話生成部１２１は、“ｖ_１又はｖ_２のどちらにしますか？（Ｈｏｗａｂｏｕｔｖ_１ｏｒｖ_２？）”という応答文を生成する。 Further, the conversation generating unit 121 has two selectable slot values “v ₁ ” and “v ₂ ”, and when v ₁ and v ₂ are 50%, v ₁ and v ₂ are set as slot values. select. Then, when _{the v 1} and _{v 2} is selected as the slot value, conversation generating unit 121, _"v Do you want to either of ₁ or _{_{v 2? (How about v 1}} or v 2?)" Generates a response sentence To do.

また、会話生成部１２１は、選択可能なスロット値が“ｖ_１”、“ｖ_２”、・・・、“ｖ_ｘ”の複数であり、いずれかのスロット値ｖ_ｉが５０％より大きい場合、ｖ_ｉをスロット値として選択する。そして、ｖ_ｉがスロット値として選択された場合、会話生成部１２１は、“ｖ_ｉが欲しいですか？（Ｄｏｙｏｕｗａｎｔｖ_ｉ？）”という応答文を生成する。 Further, the conversation generating unit 121 has a plurality of selectable slot values “v ₁ ”, “v ₂ ”,..., “V _x ”, and any one of the slot values v _i is larger than 50%. , V _i are selected as slot values. Then, when _{the v i} has been selected as a slot value, conversation generation unit 121 generates a response sentence _{"v i} do you _{want? (Do you want v i?} )".

また、会話生成部１２１は、選択可能なスロット値が“ｖ_１”、“ｖ_２”、・・・、“ｖ_ｘ”の複数であり、いずれか１つのスロット値ｖ_ｉが４０％より大きく、スロット値ｖ_ｉとは異なるいずれか１つのスロット値ｖ_ｊが４０％より大きい場合、ｖ_ｉ及びｖ_ｊをスロット値として選択する。そして、ｖ_ｉ及びｖ_ｊがスロット値として選択された場合、会話生成部１２１は、“ｖ_ｉ又はｖ_ｊのどちらにしますか？（Ｈｏｗａｂｏｕｔｖ_ｉｏｒｖ_ｊ？）”という応答文を生成する。 Further, the conversation generating unit 121 has a plurality of selectable slot values “v ₁ ”, “v ₂ ”,..., “V _x ”, and any one of the slot values v _i is larger than 40%. If any one of the slot values v _j different from the slot value v _i is larger than 40%, v _i and v _j are selected as the slot values. Then, when _{the v i} and _{v j} has been selected as a slot value, conversation generating unit 121, generates a response sentence _{"v i} or _v Which do you _{want? (How about v i or v} j?) Of the _j" To do.

また、会話生成部１２１は、選択可能なスロット値が“ｖ_１”、“ｖ_２”、・・・、“ｖ_ｘ”の複数であり、いずれのスロット値ｖ_ｉも４０％より小さい場合、スロット値を選択しない。そして、スロット値が選択されない場合、会話生成部１２１は、“どのＸＸ（スロット名）が欲しいですか？（ＷｈａｔＸＸ（ｓｌｏｔｎａｍｅ）ｄｏｙｏｕｗａｎｔ？）”という応答文を生成する。 Further, the conversation generating unit 121 has a plurality of selectable slot values “v ₁ ”, “v ₂ ”,..., “V _x ”, and when any slot value v _i is smaller than 40%, Do not select a slot value. If no slot value is selected, the conversation generation unit 121 generates a response sentence “What XX (slot name) do you want?” ”.

また、自然言語プロセッサ１０２は、応答文に対するユーザの回答を示すテキスト情報を取得する。自然言語プロセッサ１０２は、テキスト情報が例えば“はい”などの肯定的な回答であるか、又は回答情報が例えば“いいえ”などの否定的な回答であるかを判断する。自然言語プロセッサ１０２は、ユーザの回答が肯定的であるか否定的であるかを示す階乙情報を重み値更新部１２２へ出力する。重み値更新部１２２は、ユーザの回答が複数の第２のノードのうち１の第２のノードを選択する回答であるか否かに応じて、重み値を更新する。すなわち、回答情報が肯定的な回答である場合、重み値更新部１２２は、選択可能な複数のスロット値に対応付けられている確率を再計算して更新する。一方、回答情報が否定的な回答である場合、会話生成部１２１は、複数のスロット値のいずれかをユーザに選択させるための応答文を生成する。 Further, the natural language processor 102 acquires text information indicating a user's answer to the response sentence. The natural language processor 102 determines whether the text information is a positive answer such as “Yes” or a negative answer such as “No”. The natural language processor 102 outputs floor information indicating whether the user's answer is positive or negative to the weight value update unit 122. The weight value updating unit 122 updates the weight value according to whether or not the user's answer is a reply that selects one second node among the plurality of second nodes. That is, when the answer information is a positive answer, the weight value update unit 122 recalculates and updates the probabilities associated with a plurality of selectable slot values. On the other hand, when the answer information is a negative answer, the conversation generation unit 121 generates a response sentence for causing the user to select one of a plurality of slot values.

音声合成部１０５は、会話管理部１０４によって生成された応答文を音声に変換する。音声合成部１０５によって変換された音声は、スピーカ（不図示）から出力される。 The voice synthesis unit 105 converts the response sentence generated by the conversation management unit 104 into voice. The voice converted by the voice synthesizer 105 is output from a speaker (not shown).

なお、図２に示す音声対話システムにおいて、１つの装置が、音声認識部１０１、自然言語プロセッサ１０２、メモリ１０３、会話管理部１０４及び音声合成部１０５を備えてもよい。また、音声認識部１０１、自然言語プロセッサ１０２、メモリ１０３、会話管理部１０４及び音声合成部１０５は、複数の装置に分散されていてもよい。例えば、端末装置が、音声認識部１０１及び音声合成部１０５を備え、端末装置とネットワークを介して通信可能に接続されたサーバが、自然言語プロセッサ１０２、メモリ１０３及び会話管理部１０４を備えてもよい。 In the voice dialogue system shown in FIG. 2, one device may include the voice recognition unit 101, the natural language processor 102, the memory 103, the conversation management unit 104, and the voice synthesis unit 105. Further, the speech recognition unit 101, the natural language processor 102, the memory 103, the conversation management unit 104, and the speech synthesis unit 105 may be distributed over a plurality of devices. For example, a terminal device may include a speech recognition unit 101 and a speech synthesis unit 105, and a server connected to the terminal device through a network may include a natural language processor 102, a memory 103, and a conversation management unit 104. Good.

続いて、本実施の形態における音声対話システムの音声対話処理について説明する。 Next, the voice dialogue process of the voice dialogue system in the present embodiment will be described.

図４は、本実施の形態における音声対話システムの音声対話処理について説明するためのフローチャートである。 FIG. 4 is a flowchart for explaining the voice dialogue processing of the voice dialogue system in the present embodiment.

まず、ステップＳ１において、自然言語プロセッサ１０２は、ユーザの発話内容を示す発話情報から、単語を取得する。 First, in step S1, the natural language processor 102 acquires a word from the utterance information indicating the user's utterance content.

次に、ステップＳ２において、自然言語プロセッサ１０２は、意味ネットワーク記憶部１１１に記憶されている意味ネットワーク内を検索し、各ノード間に対応付けられている関係情報に基づいて、抽出した単語によって特定されるタスクに関係するノード（スロット及びスロット値）を意味ネットワークから抽出する。 Next, in step S2, the natural language processor 102 searches the semantic network stored in the semantic network storage unit 111 and specifies the extracted word based on the relationship information associated with each node. The nodes (slot and slot value) related to the task to be executed are extracted from the semantic network.

次に、ステップＳ３において、会話生成部１２１は、タスクを実行するために値を入力する必要があるスロットを決定する。 Next, in step S3, the conversation generation unit 121 determines a slot in which a value needs to be input in order to execute the task.

次に、ステップＳ４において、会話生成部１２１は、決定したスロットに対応付けられている複数のスロット値の重み値を、重み値管理テーブルから取得する。 Next, in step S <b> 4, the conversation generation unit 121 acquires the weight values of a plurality of slot values associated with the determined slot from the weight value management table.

次に、ステップＳ５において、会話生成部１２１は、判断条件テーブル記憶部１１３に記憶されている判断条件テーブルを参照し、判断条件を満たす重み値があるか否かを判断する。ここで、判断条件を満たす重み値があると判断された場合（ステップＳ５でＹＥＳ）、ステップＳ６において、会話生成部１２１は、スロット値を、判断条件を満たす重み値に対応するスロット値に決定する。 Next, in step S <b> 5, the conversation generation unit 121 refers to the determination condition table stored in the determination condition table storage unit 113 and determines whether there is a weight value that satisfies the determination condition. If it is determined that there is a weight value that satisfies the determination condition (YES in step S5), in step S6, the conversation generating unit 121 determines the slot value to be a slot value corresponding to the weight value that satisfies the determination condition. To do.

次に、ステップＳ７において、会話生成部１２１は、決定したスロット値を用いて確認応答文を生成する。確認応答文とは、決定したスロット値でよいかをユーザに確認する応答文である。 Next, in step S7, the conversation generator 121 generates a confirmation response sentence using the determined slot value. The confirmation response text is a response text for confirming to the user whether the determined slot value is acceptable.

一方、判断条件を満たす重み値がないと判断された場合（ステップＳ５でＮＯ）、ステップＳ８において、会話生成部１２１は、要求応答文を生成する。要求応答文とは、複数の選択可能なスロット値の中から所望のスロット値の選択をユーザに対して要求する応答文である。 On the other hand, when it is determined that there is no weight value satisfying the determination condition (NO in step S5), the conversation generation unit 121 generates a request response sentence in step S8. The request response text is a response text requesting the user to select a desired slot value from a plurality of selectable slot values.

続いて、本実施の形態における音声対話システムの重み値更新処理について説明する。 Subsequently, the weight value update processing of the voice interaction system in the present embodiment will be described.

図５は、本実施の形態における音声対話システムの重み値更新処理について説明するためのフローチャートである。 FIG. 5 is a flowchart for explaining the weight value update processing of the voice interaction system according to the present embodiment.

まず、ステップＳ１１において、重み値更新部１２２は、会話生成部１２１によって生成された応答文に含まれるスロット値を確認する。 First, in step S <b> 11, the weight value update unit 122 confirms the slot value included in the response sentence generated by the conversation generation unit 121.

次に、ステップＳ１２において、重み値更新部１２２は、応答文に対するユーザの回答が肯定的であるか否かを示す回答情報を自然言語プロセッサ１０２から取得する。 Next, in step S <b> 12, the weight value update unit 122 acquires response information indicating whether or not the user's response to the response sentence is affirmative from the natural language processor 102.

次に、ステップＳ１３において、重み値更新部１２２は、回答情報が肯定的な回答であるか否かを判断する。ここで、回答情報が肯定的な回答であると判断された場合（ステップＳ１３でＮＯ）、ステップＳ１４において、重み値更新部１２２は、新たなスロット値を取得する。このとき、選択可能なスロット値が２つである場合、重み値更新部１２２は、ユーザに提示されなかったスロット値を新たなスロット値として取得する。また、選択可能なスロット値が３つ以上ある場合、重み値更新部１２２は、ユーザによって選択されたスロット値を新たなスロット値として取得する。 Next, in step S13, the weight value update unit 122 determines whether or not the answer information is a positive answer. Here, when it is determined that the answer information is a positive answer (NO in step S13), in step S14, the weight value updating unit 122 acquires a new slot value. At this time, when there are two selectable slot values, the weight value updating unit 122 acquires a slot value that has not been presented to the user as a new slot value. When there are three or more selectable slot values, the weight value update unit 122 acquires the slot value selected by the user as a new slot value.

一方、回答情報が肯定的な回答であると判断された場合（ステップＳ１３でＹＥＳ）、ステップＳ１５において、重み値更新部１２２は、重み値を再計算する。 On the other hand, when it is determined that the answer information is a positive answer (YES in step S13), in step S15, the weight value update unit 122 recalculates the weight value.

ここで、重み値の計算方法について説明する。まず、重み値更新部１２２によって重み値が計算される前の重み値管理テーブルには、重み値の初期値が格納される。あるスロットに対してｘ個のスロット値ｖ_１，ｖ_２，・・・，ｖ_ｘが選択可能であり、ユーザの人数がｎ人であり、各スロット値を選択したユーザの人数をＮ_１，Ｎ_２，・・・，Ｎ_ｘとすると、各スロット値の重み値（確率）は、Ｎ_１／ｎ，Ｎ_２／ｎ，・・・，Ｎ_ｘ／ｎで表される。このとき、ユーザの人数ｎ及び各スロット値を選択したユーザの人数Ｎ_１，Ｎ_２，・・・，Ｎ_ｘには、任意の数が代入される。例えば、過去の統計的なデータに基づいて、人数Ｎ_１，Ｎ_２，・・・，Ｎ_ｘが設定されてもよい。また、重み値の初期値は、全て同じ値に設定されてもよく、例えば、２つのスロット値が選択可能である場合、各スロット値の重み値（確率）の初期値は、それぞれ５０％に設定されてもよい。 Here, a method of calculating the weight value will be described. First, the initial value of the weight value is stored in the weight value management table before the weight value is calculated by the weight value update unit 122. X slot values v ₁ , v ₂ ,..., V _x can be selected for a certain slot, the number of users is n, and the number of users who have selected each slot value is N ₁ , Assuming N ₂ ,..., N _x , the weight value (probability) of each slot value is expressed by N ₁ / n, N ₂ / n,..., N _x / n. In this case, the number N ₁ of the user who selected the number n and the slot value of the _user, N _2, · · ·, the N _x, an arbitrary number is assigned. For example, the number of people N ₁ , N ₂ ,..., N _x may be set based on past statistical data. Further, the initial value of the weight value may be set to the same value. For example, when two slot values can be selected, the initial value of the weight value (probability) of each slot value is 50%. It may be set.

また、重み値更新部１２２によって重み値が再計算される場合、重み値更新部１２２は、ユーザの人数ｎに１を加算するとともに、選択されたスロット値の人数Ｎ_ｘに１を加算し、選択可能な全てのスロット値の重み値を再計算する。例えば、スロット値ｖ_２が選択された場合、各スロット値ｖ_１，ｖ_２，・・・，ｖ_ｘの重み値（確率）は、Ｎ_１／（ｎ＋１），（Ｎ_２＋１）／（ｎ＋１），・・・，Ｎ_ｘ／（ｎ＋１）となる。 Further, when the weight value is recalculated by the weight value update unit 122, the weight value update unit 122 adds 1 to the number of users n and 1 to the number N _x of the selected slot values, Recalculate the weight values of all selectable slot values. For example, when the slot value v ₂ is selected, the weight values (probabilities) of the slot values v ₁ , v ₂ ,..., V _x are N ₁ / (n + 1), (N ₂ +1) / (n + 1). ),..., N _x / (n + 1).

次に、ステップＳ１６において、重み値更新部１２２は、再計算した重み値を重み値管理テーブル記憶部１１２に記憶し、重み値管理テーブルの重み値を更新する。 Next, in step S16, the weight value update unit 122 stores the recalculated weight value in the weight value management table storage unit 112, and updates the weight value in the weight value management table.

図６は、本実施の形態における音声対話システムの音声対話処理と、従来の音声対話システムの音声対話処理との差異を説明するための図である。図６は、ハンバーガー店においてユーザがハンバーガーセットを注文する際の音声対話処理の一例を示している。 FIG. 6 is a diagram for explaining the difference between the voice dialogue processing of the voice dialogue system in the present embodiment and the voice dialogue processing of the conventional voice dialogue system. FIG. 6 shows an example of a voice interaction process when a user orders a hamburger set at a hamburger store.

まず、ユーザは、音声対話システムに対し、“ハンバーガーセットを下さい。”と発話する。 First, the user utters “Please give a hamburger set” to the voice interaction system.

従来の音声対話システムでは、ユーザの発話から“ハンバーガーセット”という単語を抽出し、抽出した単語に対応するタスクフレームを特定する。この場合、ハンバーガーセットを提供するタスクフレームが特定される。次に、従来の音声対話システムは、特定したタスクフレームを実行するのに必要なスロットを特定し、特定したスロットに対応する複数のスロット値のうちどのスロット値にするのかをユーザに質問する。図６に示す例では、スロットは、ドリンクであり、スロット値は、コーク、お茶、オレンジジュースなどである。従来の音声対話システムは、“ドリンクは何にしますか？”という応答文４０５を作成して音声出力する。これに対し、ユーザは、“コークを下さい。”と回答する。さらに、従来の音声対話システムは、ユーザによって発話された内容を確認するため、“ドリンクはコークでよいですか？”という応答文を作成して音声出力する。そして、ユーザは、“はい。”と回答する。ユーザから肯定する回答を取得すると、従来の音声対話システムは、タスクフレームのスロット値を設定し、タスクフレームを実行する。このとき、タスクフレーム内の必須スロットの全てに値が入力された場合、タスクフレームに対応したタスクが実行される。必須スロットの全てに値が入力されていない場合は、ユーザに対して、値の入力を促すような質問等が行われる。 In a conventional speech dialogue system, a word “hamburger set” is extracted from a user's utterance, and a task frame corresponding to the extracted word is specified. In this case, a task frame that provides a hamburger set is specified. Next, the conventional voice interaction system specifies a slot necessary for executing the specified task frame, and asks the user which slot value to use among a plurality of slot values corresponding to the specified slot. In the example shown in FIG. 6, the slot is a drink, and the slot value is coke, tea, orange juice, or the like. The conventional voice dialogue system creates a response sentence 405 "What do you want to drink?" In response to this, the user answers “Please give me coke”. Furthermore, in order to confirm the content uttered by the user, the conventional voice dialogue system creates a response sentence “Is the drink a coke?” And outputs it as a voice. Then, the user answers “Yes”. When an affirmative answer is obtained from the user, the conventional spoken dialogue system sets the slot value of the task frame and executes the task frame. At this time, if a value is input to all of the essential slots in the task frame, the task corresponding to the task frame is executed. If no value is entered in all of the required slots, a question or the like is made to prompt the user to enter a value.

一方、本実施の形態における音声対話システムでは、ユーザの発話から“ハンバーガーセット”という単語を抽出し、抽出した単語によって特定されるタスクに関係するノード（ドメイン、スロット及びスロット値）を意味ネットワークから抽出する。図６に示す例では、ドメイン４００は、“ハンバーガーセット”であり、スロット４０１は、“ドリンク”であり、スロット値４０２，４０３，４０４は、“コーク”、“お茶”及び“オレンジジュース”などである。 On the other hand, in the voice interaction system according to the present embodiment, the word “hamburger set” is extracted from the user's utterance, and the nodes (domain, slot, and slot value) related to the task specified by the extracted word are extracted from the semantic network. Extract. In the example shown in FIG. 6, the domain 400 is “hamburger set”, the slot 401 is “drink”, the slot values 402, 403, and 404 are “coke”, “tea”, “orange juice”, and the like. It is.

次に、本実施の形態における音声対話システムは、タスクを実行するために値を入力する必要があるスロットを決定する。ここで、決定されるスロットは、ドリンクである。次に、本実施の形態における音声対話システムは、決定したスロットに対応付けられている複数のスロット値の重み値を、重み値管理テーブルから取得する。図６に示す例では、スロット値であるコークの重み値は６０％であり、スロット値であるお茶の重み値は２０％であり、スロット値であるオレンジジュースの重み値は５％である。 Next, the voice interaction system in the present embodiment determines a slot in which a value needs to be input in order to execute the task. Here, the determined slot is a drink. Next, the voice interaction system according to the present embodiment acquires weight values of a plurality of slot values associated with the determined slot from the weight value management table. In the example shown in FIG. 6, the weight value of coke as a slot value is 60%, the weight value of tea as a slot value is 20%, and the weight value of orange juice as a slot value is 5%.

次に、本実施の形態における音声対話システムは、判断条件を満たす重み値があるか否かを判断する。この場合、コークの重み値が６０％であるため、本実施の形態における音声対話システムは、判断条件を満たす重み値があると判断する。次に、本実施の形態における音声対話システムは、スロット値を“コーク”に決定する。そして、本実施の形態における音声対話システムは、“コークにしますか？”という応答文４０６を作成して音声出力する。これに対し、ユーザは、“はい。”と回答する。ユーザから肯定する回答を取得すると、本実施の形態における音声対話システムは、ハンバーガーセットを提供する際のユーザの発話に対して応答文を生成するタスクを実行する。 Next, the voice interaction system according to the present embodiment determines whether there is a weight value that satisfies the determination condition. In this case, since the weight value of coke is 60%, the spoken dialogue system in the present embodiment determines that there is a weight value that satisfies the determination condition. Next, the voice interaction system in the present embodiment determines the slot value to be “coke”. Then, the voice interaction system in the present embodiment creates a response sentence 406 “Do you want to coke?” And outputs it as a voice. In response to this, the user answers “Yes”. When an affirmative answer is obtained from the user, the voice interaction system in the present embodiment executes a task of generating a response sentence for the user's utterance when providing the hamburger set.

上記のように、従来のシステムでは、システムがユーザに対し、ドリンクを何にするかを質問し、ユーザの回答を音声認識により判断していた。ユーザがコークを選択した場合、従来のシステムは、コークでよいか否かを再度質問し、ユーザの回答を音声認識により判断し、ドリンクを決定していた。 As described above, in the conventional system, the system asks the user what to drink, and the user's answer is determined by voice recognition. When the user selects coke, the conventional system asks again whether coke is acceptable, determines the user's answer by voice recognition, and determines a drink.

これに対し、本開示のシステムでは、システムがユーザに対し、ドリンクを何にするかを質問することなく、過去にユーザがいずれのドリンクを選択したかに応じてそれぞれに重み値を付与し、重み値に応じて、コークにするか否かを質問する。例えば、過去にコークが６０％の確率で選択された場合、本開示のシステムは、ユーザに対してドリンクはコークでよいかを確認する。そして、本開示のシステムは、ユーザの回答を音声認識により判断し、ユーザから肯定する回答が得られれば、ドリンクをコークに決定する。 On the other hand, in the system of the present disclosure, the system assigns a weight value to each according to which drink the user has selected in the past without asking the user what to drink. Ask whether to coke according to the weight value. For example, when coke has been selected with a probability of 60% in the past, the system of the present disclosure confirms to the user whether the drink can be coke. And the system of this indication judges a user's answer by voice recognition, and if a reply which affirms from a user is obtained, it will determine a drink as coke.

この場合、本開示のシステムは、従来のシステムに比べてドリンクを再度確認する必要がなく、システムとユーザとの対話時間を短縮することができるとともに、システムの処理時間を短縮することができる。 In this case, the system according to the present disclosure does not need to confirm the drink again as compared with the conventional system, can reduce the interaction time between the system and the user, and can reduce the processing time of the system.

続いて、本実施の形態における音声対話システムの変形例について説明する。 Next, a modified example of the voice interaction system in the present embodiment will be described.

図７は、本実施の形態の変形例における音声対話システムの意味ネットワークの一例を示す図である。図７に示す意味ネットワークは、レストランを検索する際に用いられる意味ネットワークの一例を示している。 FIG. 7 is a diagram illustrating an example of a semantic network of the voice interaction system according to the modification of the present embodiment. The semantic network shown in FIG. 7 shows an example of a semantic network used when searching for restaurants.

図７において、“レストラン”を示すノード２１は、“地域（ａｒｅａ）”を示すノード２２と、“種類（ｔｙｐｅ）”を示すノード２３とにリンクしている。“地域”を示すノード２２及び“種類（ｔｙｐｅ）”を示すノード２３は、必須のスロットである。“地域”を示すノード２２は、“北”を示すノード２４と“南”を示すノード２５とにリンクしている。“北”を示すノード２４及び“南”を示すノード２５は、“地域”を示すノード（スロット）２２のスロット値である。また、“種類”を示すノード２３は、“インド料理”を示すノード２６と“中華料理”を示すノード２７と“アメリカ料理”を示すノード２８とにリンクしている。“インド料理”を示すノード２６、“中華料理”を示すノード２７及び“アメリカ料理”を示すノード２８は、“種類”を示すノード（スロット）２３のスロット値である。 In FIG. 7, a node 21 indicating “restaurant” is linked to a node 22 indicating “area” and a node 23 indicating “type”. The node 22 indicating “region” and the node 23 indicating “type” are indispensable slots. The node 22 indicating “region” is linked to a node 24 indicating “north” and a node 25 indicating “south”. A node 24 indicating “north” and a node 25 indicating “south” are slot values of the node (slot) 22 indicating “region”. The node 23 indicating “kind” is linked to a node 26 indicating “Indian cuisine”, a node 27 indicating “Chinese cuisine”, and a node 28 indicating “American cuisine”. A node 26 indicating “Indian cuisine”, a node 27 indicating “Chinese cuisine”, and a node 28 indicating “American cuisine” are slot values of a node (slot) 23 indicating “type”.

さらに、図７に示す変形例では、異なるスロットのスロット値がリンクしており、“北”を示すノード（スロット値）２４は、“インド料理”を示すノード（スロット値）２６と“中華料理”を示すノード（スロット値）２７と“アメリカ料理”を示すノード（スロット値）２８とにリンクしている。“北”を示すノード２４と“インド料理”を示すノード２６との接続には、例えば３０％の重み値が付与されている。また、“北”を示すノード２４と“中華料理”を示すノード２７との接続には、例えば６０％の重み値が付与されている。さらに、“北”を示すノード２４と“アメリカ料理”を示すノード（スロット値）２８との接続には、例えば１０％の重み値が付与されている。つまり、過去に、北の地域が選択された後インド料理が選択された確率は、３０％であり、北の地域が選択された後中華料理が選択された確率は、６０％であり、北の地域が選択された後アメリカ料理が選択された確率は１０％である。 Further, in the modification shown in FIG. 7, the slot values of different slots are linked, and the node (slot value) 24 indicating “north” is the node (slot value) 26 indicating “Indian food” and “Chinese food”. Is linked to a node (slot value) 27 indicating "A" and a node (slot value) 28 indicating "American cuisine". For example, a weight value of 30% is given to the connection between the node 24 indicating “north” and the node 26 indicating “Indian cuisine”. For example, a weight value of 60% is assigned to the connection between the node 24 indicating “north” and the node 27 indicating “Chinese food”. Furthermore, for example, a weight value of 10% is assigned to the connection between the node 24 indicating “north” and the node (slot value) 28 indicating “American cuisine”. That is, in the past, the probability that Indian food was selected after the northern region was selected was 30%, and the probability that Chinese food was selected after the northern region was selected was 60%. The probability that American cuisine is selected after the region is selected is 10%.

本開示のシステムでは、ユーザが“町の北部にあるレストランを探している。（Ｉ’ｍｌｏｏｋｉｎｇｆｏｒａｒｅｓｔａｕｒａｎｔａｔｔｈｅｎｏｒｔｈｐａｒｔｏｆｔｏｗｎ．）”と発話した場合、ユーザに対して中華料理店にするか否かを確認し、ユーザの回答を音声認識により判断し、レストランを決定する。 In the system of the present disclosure, when the user speaks “I'm looking for a restaurant at the north part of town.”, The user speaks to a Chinese restaurant. The user's answer is judged by voice recognition, and the restaurant is determined.

重み値管理テーブル記憶部１１２は、複数の第１のノードのうちの１の第１のノードに関連付けられている複数の第２のノードのうちの１の第２のノードと、前記複数の第１のノードのうちの他の第１のノードに関連付けられている複数の第２のノードのそれぞれとの組合せに対して重み値を対応付けて記憶している。ここで、第１のノードは、スロットであり、第２のノードは、スロット値である。 The weight value management table storage unit 112 includes one second node of the plurality of second nodes associated with the first node of the plurality of first nodes, and the plurality of first nodes. A weight value is stored in association with each combination with a plurality of second nodes associated with other first nodes of one node. Here, the first node is a slot, and the second node is a slot value.

自然言語プロセッサ１０２は、１の第２のノードが特定されたか否かを判断する。会話生成部１２１は、１の第２のノードが特定された場合、１の第２のノードと、他の第１のノードに関連付けられている複数の第２のノードのそれぞれとの組合せに対して対応付けられた重み値に基づいて、他の第１のノードに関連付けられている複数の第２のノードの中から１の第２のノードを選択する。 The natural language processor 102 determines whether one second node has been identified. When one second node is specified, the conversation generation unit 121 determines the combination of one second node and each of a plurality of second nodes associated with other first nodes. Based on the weight values associated with each other, one second node is selected from the plurality of second nodes associated with the other first nodes.

続いて、本実施の形態の変形例における音声対話システムの音声対話処理について説明する。 Next, the voice dialogue process of the voice dialogue system in the modification of the present embodiment will be described.

図８は、本実施の形態の変形例における音声対話システムの音声対話処理について説明するためのフローチャートである。 FIG. 8 is a flowchart for explaining the voice dialogue processing of the voice dialogue system in the modification of the present embodiment.

ステップＳ２１及びステップＳ２２の処理は、図４に示すステップＳ１及びステップＳ２の処理と同じであるので、説明を省略する。 Since the process of step S21 and step S22 is the same as the process of step S1 and step S2 shown in FIG. 4, description is abbreviate | omitted.

次に、ステップＳ２３において、自然言語プロセッサ１０２は、意味ネットワーク内の任意のスロットが特定されたか否かを判断する。例えば、図７に示す例では、ユーザの発話内容から“地域”を示すスロットが特定されることになる。ここで、任意のスロットが特定されたと判断された場合（ステップＳ２３でＹＥＳ）、ステップＳ２４において、自然言語プロセッサ１０２は、特定されたスロットを選択する。 Next, in step S23, the natural language processor 102 determines whether or not any slot in the semantic network has been specified. For example, in the example shown in FIG. 7, a slot indicating “region” is specified from the content of the user's utterance. If it is determined that an arbitrary slot has been specified (YES in step S23), the natural language processor 102 selects the specified slot in step S24.

次に、ステップＳ２５において、自然言語プロセッサ１０２は、特定されたスロットにリンクしている次のスロットを選択する。例えば、図７に示す例では、“地域”を示すスロットにリンクしている“種類”を示すスロットが選択されることになる。 Next, in step S25, the natural language processor 102 selects the next slot linked to the identified slot. For example, in the example shown in FIG. 7, a slot indicating “type” linked to a slot indicating “region” is selected.

次に、ステップＳ２６において、会話生成部１２１は、特定されたスロットのスロット値と、選択された次のスロットの複数のスロット値とに対応付けられている重み値を、重み値管理テーブルから取得する。なお、重み値管理テーブルは、複数のスロットのうちの１のスロットに関連付けられている複数のスロット値のうちの１のスロット値と、複数のスロットのうちの他のスロットに関連付けられている複数のスロット値のそれぞれとの組合せに対して重み値を対応付けて記憶している。 Next, in step S26, the conversation generation unit 121 acquires, from the weight value management table, the weight values associated with the slot value of the identified slot and the plurality of slot values of the selected next slot. To do. The weight value management table includes one slot value among a plurality of slot values associated with one slot among a plurality of slots and a plurality of slots associated with other slots among the plurality of slots. The weight values are stored in association with the combinations with the respective slot values.

次に、ステップＳ２７において、会話生成部１２１は、判断条件テーブル記憶部１１３に記憶されている判断条件テーブルを参照し、判断条件を満たす重み値があるか否かを判断する。ここで、判断条件を満たす重み値があると判断された場合（ステップＳ２７でＹＥＳ）、ステップＳ２８において、会話生成部１２１は、スロット値を、判断条件を満たす重み値に対応するスロット値に決定する。例えば、図７に示す例では、“中華料理”を示すスロット値が決定されることになる。 Next, in step S27, the conversation generation unit 121 refers to the determination condition table stored in the determination condition table storage unit 113 and determines whether there is a weight value that satisfies the determination condition. If it is determined that there is a weight value that satisfies the determination condition (YES in step S27), in step S28, the conversation generating unit 121 determines the slot value to be a slot value corresponding to the weight value that satisfies the determination condition. To do. For example, in the example shown in FIG. 7, the slot value indicating “Chinese cuisine” is determined.

次に、ステップＳ２９において、会話生成部１２１は、決定したスロット値を用いて確認応答文を生成する。確認応答文とは、決定したスロット値でよいかをユーザに確認する応答文である。例えば、図７に示す例では、“中華料理店はどうですか？（Ｈｏｗａｂｏｕｔａｃｈｉｎｅｓｅｒｅｓｔａｕｒａｎｔ？）という確認応答文が生成されることになる。 Next, in step S29, the conversation generator 121 generates a confirmation response sentence using the determined slot value. The confirmation response text is a response text for confirming to the user whether the determined slot value is acceptable. For example, in the example shown in FIG. 7, a confirmation response sentence “How about a Chinese restaurant?” Is generated.

一方、任意のスロットが特定されていないと判断された場合（ステップＳ２３でＮＯ）、又は、判断条件を満たす重み値がないと判断された場合（ステップＳ２７でＮＯ）、ステップＳ３０において、会話生成部１２１は、要求応答文を生成する。要求応答文とは、複数の選択可能なスロット値の中から所望のスロット値の選択をユーザに対して要求する応答文である。例えば、図７に示す例において、ユーザが“レストランを探している。”と発話した場合、音声対話システムは、“地域”及び“種類”を示すスロットを決定する必要がある。そのため、会話生成部１２１は、地域”及び“種類”のいずれかのスロットのスロット値を選択するための要求応答文を生成する。例えば、会話生成部１２１は、“北部と南部のどちらですか？”という要求応答文、又は“インド料理、中華料理及びアメリカ料理のどれにしますか？”という要求応答文を生成する。 On the other hand, if it is determined that an arbitrary slot has not been specified (NO in step S23), or if it is determined that there is no weight value satisfying the determination condition (NO in step S27), conversation generation is performed in step S30. The unit 121 generates a request response sentence. The request response text is a response text requesting the user to select a desired slot value from a plurality of selectable slot values. For example, in the example shown in FIG. 7, when the user speaks “Looking for a restaurant”, the voice interaction system needs to determine slots indicating “region” and “type”. Therefore, the conversation generation unit 121 generates a request / response sentence for selecting the slot value of either the “region” or “type” slot, for example. ? "Reply-to-request" or "Indian, Chinese or American? Request response sentence is generated.

なお、ステップ２３において、任意のスロットが特定されていないと判断された場合、音声対話処理を終了してもよい。 If it is determined in step 23 that an arbitrary slot has not been specified, the voice interaction process may be terminated.

また、本実施の形態における重み値は、選択可能なスロット値のそれぞれが過去にユーザによって選択された確率を表しているが、本開示は特にこれに限定されず、選択可能なスロット値のそれぞれに対して値を付与してもよい。例えば、重み値更新部１２２は、スロット値がユーザによって選択された場合、選択されたスロット値の重み値に対して１を加算してもよい。 Further, the weight value in the present embodiment represents the probability that each of the selectable slot values has been selected by the user in the past, but the present disclosure is not particularly limited thereto, and each of the selectable slot values is A value may be assigned to. For example, when the slot value is selected by the user, the weight value update unit 122 may add 1 to the weight value of the selected slot value.

また、音声対話システムは、スロット値に対して任意の重み値を設定してもよい。例えば、販売店が特に販売したい商品のスロット値の重み値を他の商品のスロット値の重み値より高くすることにより、特に販売したい商品をユーザに勧めることができる。 The voice interactive system may set an arbitrary weight value for the slot value. For example, by setting the weight value of the slot value of a product that the dealer wants to sell particularly higher than the weight value of the slot value of other products, it is possible to recommend the product that the seller wants to sell to the user.

また、音声対話システムは、時期（季節）によって、任意の重み値を設定してもよい。参照される頻度が時期によって大きく変わるスロットについては、それまで更新処理によって更新されてきた重み値を、その時期が訪れる際に、時期の影響を考慮して任意の値に設定してもよい。また、時期毎に対応する重み値の情報を予め用意し、その時期が訪れた際に、全てのスロットの重み値を任意の値へ変更してもよい。このとき、変更した値は、その時期が過ぎるまで固定とせず、設定した後には重み値の更新処理を適用してもよい。 In addition, the voice interactive system may set an arbitrary weight value depending on time (season). For a slot whose frequency of reference varies greatly depending on the time, the weight value that has been updated by the update process up to that time may be set to an arbitrary value in consideration of the influence of the time when that time comes. Also, weight value information corresponding to each period may be prepared in advance, and the weight values of all slots may be changed to arbitrary values when the period comes. At this time, the changed value is not fixed until the time has passed, and the weight value update processing may be applied after setting.

また、本実施の形態では、ユーザの音声から変換されたテキスト情報を用いているが、本開示は特にこれに限定されず、キーボード又はタッチパネルなどの入力デバイスにより直接入力されたテキスト情報を用いてもよい。 In this embodiment, text information converted from a user's voice is used. However, the present disclosure is not particularly limited thereto, and text information directly input by an input device such as a keyboard or a touch panel is used. Also good.

また、本実施の形態の音声対話システムは、発話したユーザを特定する話者特定部を備えてもよい。この場合、重み値管理テーブルは、特定されたユーザ毎にスロット値と重み値とを対応付けて記憶する。これにより、個々のユーザに応じた応答文を生成することができ、システムとユーザとの対話時間をより短縮することができる。 In addition, the voice interaction system according to the present embodiment may include a speaker specifying unit that specifies a user who has spoken. In this case, the weight value management table stores a slot value and a weight value in association with each specified user. Thereby, the response sentence according to each user can be produced | generated, and the dialog time between a system and a user can be shortened more.

また、本実施の形態の音声対話システムにおいて、重み値を更新する期間又は回数を設定してもよい。この場合、音声対話システムは、重み値の更新を開始してから所定の期間が経過した場合、重み値の更新を停止させてもよい。また、音声対話システムは、重み値の更新回数が所定の回数に達した場合、重み値の更新を停止させてもよい。重み値の更新回数が増えるにつれて、重み値はある一定の値に収束する可能性がある。そこで、重み値を更新する期間又は回数を設定することにより、音声対話システムの処理負担を軽減することができる。 In the voice interaction system of the present embodiment, the period or number of times for updating the weight value may be set. In this case, the spoken dialogue system may stop updating the weight value when a predetermined period has elapsed since the start of updating the weight value. Further, the voice interaction system may stop updating the weight value when the number of weight value updates reaches a predetermined number. As the number of weight value updates increases, the weight value may converge to a certain value. Therefore, the processing load of the voice interactive system can be reduced by setting the period or number of times for updating the weight value.

また、所定の季節又は所定の期間のみ販売される商品なども存在するため、本実施の形態の音声対話システムは、所定の季節又は所定の期間のみ選択可能なスロット値を設定し、所定の季節又は所定の期間のみ重み値を更新してもよい。 In addition, since there are products that are sold only for a predetermined season or a predetermined period, the voice interaction system according to the present embodiment sets a slot value that can be selected only for a predetermined season or a predetermined period. Alternatively, the weight value may be updated only for a predetermined period.

本発明の効果を定量的に確認するために、シミュレーション実験を行った。図９Ａと図９Ｂは、それぞれ２つの異なる条件において実行された対話例を示す図である。図９Ａと図９Ｂともに、ハンバーガーショップにおける同一の状況において、店員と客との間で行われる対話の一例を示している。 In order to quantitatively confirm the effect of the present invention, a simulation experiment was conducted. FIG. 9A and FIG. 9B are diagrams showing examples of dialogs executed under two different conditions. FIG. 9A and FIG. 9B both show an example of the dialogue performed between the store clerk and the customer in the same situation in the hamburger shop.

図９Ａに示す対話例では、従来の音声対話システムが用いる質問の仕方（条件）によって店員側の質問文が生成され、それに対して客側の応答が行われている。図９Ｂに示す対話例では、本開示における音声対話システムが用いる質問の仕方（条件）によって店員側の質問文が生成され、それに対して客側の応答が行われている。以下、これら２つの質問の仕方（条件）を比較しながら説明を行う。 In the dialogue example shown in FIG. 9A, a question sentence on the store clerk side is generated according to a question method (condition) used by a conventional voice dialogue system, and a customer side response is made. In the dialogue example shown in FIG. 9B, a question sentence on the store clerk side is generated according to a question method (condition) used by the voice dialogue system according to the present disclosure, and a customer side response is made. Hereinafter, explanation will be made while comparing the way (conditions) of these two questions.

図９Ａと図９Ｂにおいて、“Would you like side salad or French fries?”（会話文１００１）と、“Would you like French fries?”（会話文２００１）とが対応している。このとき、会話文２００１の示す内容が質問として音声対話システムから出力されるとき、「meal side」として「french fries」が注文される確率が閾値以上であるという判断がなされている。それに対する客の発話は、それぞれ会話文１００２の“French fries”と会話文２００２の“Yes”となり、質問と回答のどちらも、図９Ｂに示す例の方が短くなっており、効率がよい。また、同様に、会話文１００５の“Would you like large, small or medium?”に対応する会話文２００５では、「medium」の注文確率が高いという統計に基づき、会話文１００５から“Would you like medium?”という効率のよい聞き方へ変更しており、それに対する客の回答も、従来例においては、会話文１００６の“medium”であるのに対して、本開示における音声対話システムの例では、会話文２００６の“Yes”となり、応答内容が短くなっている。 9A and 9B, “Would you like side salad or French fries?” (Conversation sentence 1001) corresponds to “Would you like French fries?” (Conversation sentence 2001). At this time, when the content indicated by the conversation sentence 2001 is output as a question from the spoken dialogue system, it is determined that the probability that “french fries” is ordered as “meal side” is equal to or greater than a threshold value. The customer's utterances are “French fries” of the conversation sentence 1002 and “Yes” of the conversation sentence 2002, respectively, and both the question and the answer are shorter in the example shown in FIG. 9B, which is more efficient. Similarly, in the conversation sentence 2005 corresponding to “Would you like large, small or medium?” In the conversation sentence 1005, based on the statistics that the order probability of “medium” is high, the conversation sentence 1005 is changed to “Would you like medium”. In the conventional example, the response of the customer is “medium” of the conversation sentence 1006, whereas in the example of the voice dialogue system in the present disclosure, The conversation sentence 2006 is “Yes”, and the response content is shortened.

一方で、従来例の会話文１００７の“What kind of meal drink would you like?”に対して、本開示に係る音声対話システムの例では、会話文２００７の“Would you like coke?”が提案型の質問として出力されており、従来例に対して短い質問となっている。しかし、これに対する客の応答は、従来例では会話文１００８の“Hi-orange lavaburst”であるのに対して、本開示の音声対話システムの例では、会話文２００８の“No. Hi-orange lavaburst”という回答となり、本開示の音声対話システムの例の方が、部分的に会話文が長くなっている。複数の特に高確率が期待される選択肢がない場合は、会話文１００３と会話文２００３、会話文１００４と会話文２００４のように、質問と回答に大きな違いはない。図９Ａと図９Ｂにおいて示される会話例によって受け付けられた注文の内容は、それぞれ、注文受付内容１０１０と注文内容２０１０であり、全く同じ結果となる。 On the other hand, in the example of the voice dialogue system according to the present disclosure, “Would you like coke?” In the conversation sentence 2007 is a proposal type in contrast to the “What kind of meal drink would you like?” In the conversation sentence 1007 in the conventional example. This is a short question compared to the conventional example. However, the customer's response to this is “Hi-orange lavaburst” of the conversation sentence 1008 in the conventional example, whereas “No. Hi-orange lavaburst” of the conversation sentence 2008 is the example of the spoken dialogue system of the present disclosure. "In the example of the speech dialogue system of the present disclosure, the conversation sentence is partially longer. When there are no multiple options that are expected to have a particularly high probability, there is no significant difference between the question and the answer, such as the conversation sentence 1003 and the conversation sentence 2003, and the conversation sentence 1004 and the conversation sentence 2004. The contents of the orders accepted by the conversation examples shown in FIGS. 9A and 9B are the order acceptance contents 1010 and the order contents 2010, respectively, which are exactly the same results.

このようにして行われた２つの対話の総文字数をカウントすると、それぞれ、３３０文字（１００９）と２７３文字（２００９）となり、本開示の音声対話システムが用いる質問の仕方（条件）によって店員側の質問文が生成された方が、会話の文字数が少ない、すなわち会話が短いことが分かる。ここでは、文字数を用いて両者を比較したが、文字数に所定の係数を掛け算することで、会話に要した時間を推定することができる。上述のとおり、店側の提案型の質問に対する客の回答が否定的であった場合には、会話に要する時間が部分的に長くなるが、店側の提案型の質問に対する客の回答が高確率で肯定的であることが期待できる場合のみ、Ｙｅｓ／Ｎｏタイプの質問を行うため、このように、従来例の質問よりも、提案型の質問を出力する場合の方が会話に要する時間が長くなるケースが起こるのは稀（低確率）であると言える。 When the total number of characters of the two dialogues performed in this way is counted, it becomes 330 characters (1009) and 273 characters (2009), respectively, and depending on the question method (condition) used by the voice dialogue system of the present disclosure, It can be seen that the number of characters in the conversation is smaller when the question sentence is generated, that is, the conversation is shorter. Here, although both were compared using the number of characters, the time required for the conversation can be estimated by multiplying the number of characters by a predetermined coefficient. As mentioned above, if the customer's answer to the store's proposal-type question is negative, the time required for the conversation is partially increased, but the customer's response to the store-side proposal-type question is high. Since a yes / no question is asked only when it can be expected to be affirmative in probability, the time required for the conversation is longer when a proposal type question is output than with a conventional question. It can be said that long cases occur rarely (low probability).

この一例と同様にして、１００例のオーダーについてシミュレーションしたところ、総文字数は、従来の方法で２８７４６文字、本開示にかかる方法で２６１６８文字となった。本開示にかかる方法では、トータルの対話の長さは、従来例と比較して１１．４％削減されており、本開示の音声対話システムにおける対話方法の効果が大きいことが確認された。 In the same manner as this example, a simulation was performed on the order of 100 cases. As a result, the total number of characters was 28746 characters by the conventional method and 26168 characters by the method according to the present disclosure. In the method according to the present disclosure, the total dialogue length is reduced by 11.4% compared to the conventional example, and it was confirmed that the dialogue method in the voice dialogue system according to the present disclosure has a large effect.

本開示に係る対話方法、対話プログラム及び対話システムは、対話システムとユーザとの対話時間を短縮することができるとともに、対話システムの処理時間を短縮することができ、ユーザの発話に対して応答する対話システムにおける対話方法、ユーザの発話に対して応答する対話プログラム及びユーザの発話に対して応答する対話システムとして有用である。 The dialogue method, the dialogue program, and the dialogue system according to the present disclosure can reduce the dialogue time between the dialogue system and the user, can reduce the processing time of the dialogue system, and respond to the user's utterance. The present invention is useful as a dialogue method in a dialogue system, a dialogue program that responds to a user's utterance, and a dialogue system that responds to a user's utterance.

１０１音声認識部
１０２自然言語プロセッサ
１０３メモリ
１０４会話管理部
１０５音声合成部
１１１意味ネットワーク記憶部
１１２重み値管理テーブル記憶部
１１３判断条件テーブル記憶部
１２１会話生成部
１２２重み値更新部
１３１構文解析部
１３２メモリアクセス部 DESCRIPTION OF SYMBOLS 101 Speech recognition part 102 Natural language processor 103 Memory 104 Conversation management part 105 Speech synthesizer 111 Meaning network storage part 112 Weight value management table storage part 113 Judgment condition table storage part 121 Conversation generation part 122 Weight value update part 131 Syntax analysis part 132 Memory access part

Claims

A dialogue method used in a dialogue system that responds to a user's utterance,
A plurality of nodes necessary for executing a task for generating a response sentence in response to the user's utterance are stored in association with each other,
Obtaining utterance information indicating the utterance content of the user;
Identifying a first node corresponding to the utterance information from the plurality of nodes;
Based on a weight value associated with each of the plurality of second nodes, a second node is selected from the plurality of second nodes associated with the identified first node. Selected,
It generates a response sentence according to the selected second node of said 1,
The weight value represents a probability that the plurality of second nodes have been selected by the user in the past,
A second node having the probability larger than a predetermined value is selected from the plurality of second nodes;
How to interact.

A dialogue method used in a dialogue system that responds to a user's utterance,
A plurality of nodes necessary for executing a task for generating a response sentence in response to the user's utterance are stored in association with each other,
Obtaining utterance information indicating the utterance content of the user;
Identifying a first node corresponding to the utterance information from the plurality of nodes;
Based on a weight value associated with each of the plurality of second nodes, a second node is selected from the plurality of second nodes associated with the identified first node. Selected,
Generating a response sentence corresponding to the selected first second node;
The weight value represents a probability that the plurality of second nodes have been selected by the user in the past,
If there is no second node having the probability greater than a predetermined value among the plurality of second nodes, a response sentence for causing the user to select one of the plurality of second nodes is generated. ,
To-talk method.

A dialogue method used in a dialogue system that responds to a user's utterance,
A plurality of nodes necessary for executing a task for generating a response sentence in response to the user's utterance are stored in association with each other,
Obtaining utterance information indicating the utterance content of the user;
Identifying a first node corresponding to the utterance information from the plurality of nodes;
Based on a weight value associated with each of the plurality of second nodes, a second node is selected from the plurality of second nodes associated with the identified first node. Selected,
Generating a response sentence corresponding to the selected first second node;
Obtaining information indicating the user's answer to the response sentence;
Updating the weight value according to whether or not the user's answer is an answer for selecting one second node of the plurality of second nodes;
To-talk method.

A dialogue method used in a dialogue system that responds to a user's utterance,
A plurality of nodes necessary for executing a task for generating a response sentence in response to the user's utterance are stored in association with each other,
Obtaining utterance information indicating the utterance content of the user;
Identifying a first node corresponding to the utterance information from the plurality of nodes;
Based on a weight value associated with each of the plurality of second nodes, a second node is selected from the plurality of second nodes associated with the identified first node. Selected,
Generating a response sentence corresponding to the selected first second node;
A second node of a plurality of second nodes associated with a first node of the plurality of first nodes and a second one of the plurality of first nodes. The weight value is associated with a combination with each of a plurality of second nodes associated with one node;
Determining whether the first second node has been identified;
When the first second node is specified, the first second node is associated with a combination of each of the plurality of second nodes associated with the other first node. Selecting one second node from among the plurality of second nodes associated with the other first node based on the determined weight value;
To-talk method.

An interactive program that responds to user utterances,
A storage unit for storing a plurality of nodes necessary for executing a task of generating a response sentence in response to the user's utterance;
An acquisition unit for acquiring utterance information indicating the utterance content of the user;
A specifying unit for specifying a first node corresponding to the utterance information from the plurality of nodes;
Based on the weight value associated with each of the plurality of second nodes from among the plurality of second nodes associated with the first node identified by the identifying unit, A selection unit for selecting two nodes;
Causing the computer to function as a generation unit that generates a response sentence corresponding to the first second node selected by the selection unit ;
The weight value represents a probability that the plurality of second nodes have been selected by the user in the past,
The selection unit selects a second node having the probability larger than a predetermined value among the plurality of second nodes.
Interactive program.

An interactive system that responds to user utterances,
A storage unit for storing a plurality of nodes necessary for executing a task of generating a response sentence in response to the user's utterance;
An acquisition unit for acquiring utterance information indicating the utterance content of the user;
A specifying unit for specifying a first node corresponding to the utterance information from the plurality of nodes;
Based on the weight value associated with each of the plurality of second nodes from among the plurality of second nodes associated with the first node identified by the identifying unit, A selection unit for selecting two nodes;
A generation unit that generates a response sentence corresponding to the first second node selected by the selection unit;
Equipped with a,
The weight value represents a probability that the plurality of second nodes have been selected by the user in the past,
The selection unit selects a second node having the probability larger than a predetermined value among the plurality of second nodes.
Dialog system.