JP2018063272A

JP2018063272A - Voice dialogue apparatus, voice dialogue system, and control method of voice dialogue apparatus

Info

Publication number: JP2018063272A
Application number: JP2015039573A
Authority: JP
Inventors: 釜井　孝浩; Takahiro Kamai; 孝浩釜井; 宇佐見　陽; Akira Usami; 陽宇佐見; 中西　雅浩; Masahiro Nakanishi; 雅浩中西
Original assignee: Panasonic Intellectual Property Management Co Ltd
Current assignee: Panasonic Intellectual Property Management Co Ltd
Priority date: 2015-02-27
Filing date: 2015-02-27
Publication date: 2018-04-19
Also published as: WO2016136208A1

Abstract

PROBLEM TO BE SOLVED: To provide a voice dialogue apparatus capable of correcting a dialogue content with a user with a simple method.SOLUTION: A voice dialogue apparatus 20A for carrying out a dialogue via voices with a user, includes: plural holding parts 103 for holding a piece of dialogue information representing the contents of the dialogue in which each of the plural holding parts is associated with an attribute of the terminology, and the plural holding parts hold the terminologies having the attribute; a storage part 104 that stores a history of the terminologies held by the plural holding parts; an acquisition part 101 that acquires a piece of speech data of uttered terminologies included in the data and causes a holding part associated with the attribute of the uttered terminology in the plural holding parts to hold; and a change part 102 that, when the speech data acquired by the acquisition part includes a control terminology for controlling the dialogue information, changes the terminology held by each the plural holding part 103 to the terminology held by the holding part at a past point specified identified the control terminology while referring the history stored in the storage part.SELECTED DRAWING: Figure 8

Description

本開示は、音声対話装置、音声対話システム、および、音声対話装置の制御方法に関する。 The present disclosure relates to a voice interaction device, a voice interaction system, and a control method of the voice interaction device.

特許文献１は、利用者から入力された情報に基づいて、次に入力されると期待される語彙群を利用者が視認可能なように提示する対話列認識装置を開示する。これにより、対話の誤認識に起因して利用者が途方に暮れるという不都合が防止される。 Patent Document 1 discloses an interactive sequence recognition device that presents a vocabulary group expected to be input next so that the user can visually recognize it based on information input from the user. This prevents the inconvenience that the user is at a loss due to erroneous recognition of the dialogue.

特開２００１−３４２９２号公報JP 2001-34292 A

本開示は、ユーザとの対話の内容を簡易な方法により修正する音声対話装置を提供する。 The present disclosure provides a voice interactive apparatus that corrects the content of a dialog with a user by a simple method.

本開示における音声対話装置は、ユーザとの音声による対話を行う音声対話装置であって、対話の内容を示す対話情報を保持するための複数の保持部であって、前記複数の保持部のそれぞれが用語の属性に対応付けられており、それぞれが当該保持部に対応付けられた属性を有する用語を保持するための複数の保持部と、前記複数の保持部が保持する用語の履歴を記憶している記憶部と、ユーザの音声による発話の内容を示す発話データを取得し、取得した前記発話データに含まれる発話用語を、前記複数の保持部のうち前記発話用語の属性に対応付けられた保持部に保持させる取得部と、前記取得部が取得した前記発話データに、前記対話情報を制御するための制御用語が含まれる場合に、前記記憶部が記憶している前記履歴を参照して、前記複数の保持部のそれぞれが保持している用語を、前記制御用語により特定される過去の時点において当該保持部が保持していた用語に変更する変更部とを備える。 The voice interaction device according to the present disclosure is a voice interaction device that performs a voice dialogue with a user, and is a plurality of holding units for holding dialogue information indicating the content of the dialogue, each of the plurality of holding units Are associated with the attribute of the term, each storing a plurality of holding units for holding a term having an attribute associated with the holding unit, and a history of terms held by the plurality of holding units. And utterance data indicating the content of the utterance by the user's voice, and the utterance term included in the obtained utterance data is associated with the attribute of the utterance term among the plurality of holding units When the acquisition unit to be held by the holding unit and the utterance data acquired by the acquisition unit include a control term for controlling the conversation information, refer to the history stored in the storage unit , The term respective serial plurality of holding portions holds, and a changing unit that changes the terms in which the holding portion is retained in the past time specified by the control terms.

本開示における音声対話装置は、ユーザとの対話の内容を簡易な方法により修正することができる。 The voice interaction device according to the present disclosure can correct the content of the dialogue with the user by a simple method.

実施の形態に係る音声対話装置及び音声対話システムの構成を示すブロック図。The block diagram which shows the structure of the voice interactive apparatus and voice dialogue system which concern on embodiment. 実施の形態に係る音声対話システムによる提示の説明図。Explanatory drawing of the presentation by the speech dialogue system which concerns on embodiment. 実施の形態に係る対話シーケンス及び履歴情報の第一の説明図。The 1st explanatory view of the dialogue sequence and history information concerning an embodiment. 実施の形態に係る音声対話装置によるメイン処理のフロー図。The flowchart of the main process by the voice interactive apparatus which concerns on embodiment. 実施の形態に係る音声対話装置による復元処理のフロー図。The flowchart of the restoration process by the voice interactive apparatus which concerns on embodiment. 実施の形態に係る音声対話装置による復元ポイント設定処理のフロー図。The flowchart of the restoration point setting process by the voice interactive apparatus which concerns on embodiment. 実施の形態に係る対話シーケンス及び履歴情報の第二の説明図。The 2nd explanatory view of the dialogue sequence and history information concerning an embodiment. 実施の形態の変形例に係る音声対話装置の構成を示すブロック図。The block diagram which shows the structure of the voice interactive apparatus which concerns on the modification of embodiment. 実施の形態の変形例に係る音声対話装置の制御方法を示すフロー図。The flowchart which shows the control method of the voice interactive apparatus which concerns on the modification of embodiment.

以下、適宜図面を参照しながら、実施の形態を詳細に説明する。但し、必要以上に詳細な説明は省略する場合がある。例えば、既によく知られた事項の詳細説明や実質的に同一の構成に対する重複説明を省略する場合がある。これは、以下の説明が不必要に冗長になるのを避け、当業者の理解を容易にするためである。 Hereinafter, embodiments will be described in detail with reference to the drawings as appropriate. However, more detailed description than necessary may be omitted. For example, detailed descriptions of already well-known matters and repeated descriptions for substantially the same configuration may be omitted. This is to avoid the following description from becoming unnecessarily redundant and to facilitate understanding by those skilled in the art.

なお、発明者（ら）は、当業者が本開示を十分に理解するために添付図面および以下の説明を提供するのであって、これらによって特許請求の範囲に記載の主題を限定することを意図するものではない。 The inventor (s) provides the accompanying drawings and the following description in order for those skilled in the art to fully understand the present disclosure, and is intended to limit the subject matter described in the claims. Not what you want.

（実施の形態）
本実施の形態において、ユーザとの対話の内容を簡易な方法により修正する音声対話装置について説明する。本実施の形態に係る音声対話装置は、ユーザとの音声による対話を行うものであり、ユーザとの対話の内容を示す対話情報を生成及び修正し、その対話情報を外部の処理装置に出力する。また、音声対話装置は、外部の処理装置から処理結果を取得しユーザに提示し、さらにユーザとの対話を継続する。このように、音声対話装置は、ユーザとの対話に基づいて、対話情報を生成及び修正しながら、順次、処理結果をユーザに提示するものである。 (Embodiment)
In the present embodiment, a voice dialogue apparatus for correcting the contents of dialogue with a user by a simple method will be described. The voice dialogue apparatus according to the present embodiment performs voice dialogue with the user, generates and corrects dialogue information indicating the content of the dialogue with the user, and outputs the dialogue information to an external processing device. . Further, the voice interaction device acquires the processing result from the external processing device and presents it to the user, and further continues the dialogue with the user. As described above, the voice interaction device sequentially presents the processing results to the user while generating and correcting the interaction information based on the interaction with the user.

なお、音声対話装置は、ユーザによるキー入力又はパネルへの接触などの操作が不可能又は困難である場合に有用である。例えば、ユーザが運転しているときにユーザの音声による指示を順次受けながら情報検索をするカーナビゲーション装置などの用途があり得る。また、キー又はパネルのようなユーザインタフェースを有さない音声対話装置でも有用である。 The voice interaction device is useful when an operation such as key input or panel touch by the user is impossible or difficult. For example, there may be applications such as a car navigation device that searches information while sequentially receiving instructions from the user's voice when the user is driving. It is also useful in a voice interaction device that does not have a user interface such as a key or a panel.

［１−１．構成］
図１は、本実施の形態に係る音声対話装置２０及び音声対話システム１の構成を示すブロック図である。 [1-1. Constitution]
FIG. 1 is a block diagram showing a configuration of a voice interaction device 20 and a voice interaction system 1 according to the present embodiment.

図１に示されるように、音声対話システム１は、表示装置１０と、スピーカ１１と、音声合成部１２と、マイク１３と、音声認識部１４と、音声対話装置２０と、タスク処理部４０とを備える。 As shown in FIG. 1, the voice dialogue system 1 includes a display device 10, a speaker 11, a voice synthesis unit 12, a microphone 13, a voice recognition unit 14, a voice dialogue device 20, and a task processing unit 40. Is provided.

表示装置１０は、表示画面を備える表示装置である。表示装置１０は、音声対話装置２０から取得する表示データに基づいて表示画面に映像を表示する。表示装置１０は、例えば、カーナビゲーション装置、スマートフォン（高機能携帯電話端末）、携帯電話端末、又は、ＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）などにより実現される。なお、表示装置１０は、音声対話装置２０が提示する情報に基づく映像を表示する装置の例として示したが、表示装置１０の代わりに、音声対話装置２０が提示する情報を音声として出力するスピーカを用いてもよい。このスピーカは、後述のスピーカ１１と共用してもよい。 The display device 10 is a display device that includes a display screen. The display device 10 displays an image on the display screen based on the display data acquired from the voice interaction device 20. The display device 10 is realized by, for example, a car navigation device, a smartphone (high function mobile phone terminal), a mobile phone terminal, or a PC (Personal Computer). Although the display device 10 is shown as an example of a device that displays an image based on information presented by the voice interaction device 20, a speaker that outputs information presented by the voice interaction device 20 as a voice instead of the display device 10. May be used. This speaker may be shared with the speaker 11 described later.

スピーカ１１は、音声を出力するスピーカである。スピーカ１１は、音声合成部１２から取得する音声信号に基づいて音声を出力する。スピーカ１１が出力した音声は、ユーザに聴取される。 The speaker 11 is a speaker that outputs sound. The speaker 11 outputs sound based on the sound signal acquired from the sound synthesizer 12. The sound output from the speaker 11 is heard by the user.

音声合成部１２は、応答文を音声信号に変換する処理部である。音声合成部１２は、音声対話装置２０からユーザへ伝達する情報である応答文を音声対話装置２０から取得し、スピーカにより出力するための音声信号を、取得した応答文に基づいて生成する。 The voice synthesis unit 12 is a processing unit that converts a response sentence into a voice signal. The voice synthesizing unit 12 acquires a response sentence, which is information transmitted from the voice dialogue apparatus 20 to the user, from the voice dialogue apparatus 20, and generates a voice signal to be output by the speaker based on the obtained response sentence.

なお、スピーカ１１及び音声合成部１２は、音声対話装置２０の一機能として音声対話装置２０の内部に備えられてもよいし、音声対話装置２０の外部に備えられてもよい。また、音声合成部１２は、音声対話装置２０とインターネット経由で通信可能なように、いわゆるクラウドサーバとして実現されてもよい。その場合、音声合成部１２と音声対話装置２０との接続、及び、音声合成部１２とスピーカ１１との接続は、インターネットを介した通信路を通じてなされる。 Note that the speaker 11 and the voice synthesizing unit 12 may be provided inside the voice dialogue device 20 as one function of the voice dialogue device 20 or may be provided outside the voice dialogue device 20. The voice synthesizer 12 may be realized as a so-called cloud server so as to be able to communicate with the voice interaction device 20 via the Internet. In that case, the connection between the voice synthesizer 12 and the voice interaction device 20 and the connection between the voice synthesizer 12 and the speaker 11 are made through a communication path via the Internet.

マイク１３は、音声を取得するマイクロホンである。マイク１３は、ユーザの音声を取得し、取得した音声に基づく音声信号を出力する。 The microphone 13 is a microphone that acquires sound. The microphone 13 acquires the user's voice and outputs an audio signal based on the acquired voice.

音声認識部１４は、ユーザの音声を対象として音声認識を行うことで、発話データを生成する処理部である。音声認識部１４は、マイク１３が生成した音声信号を取得し、取得した音声信号に対して音声認識処理を施すことで、ユーザによる発話の発話データを生成する。発話データは、ユーザから音声対話装置２０へ伝達する情報であり、「中華が食べたい」というように、文字（テキスト）で表現されるものである。なお、音声認識処理は、音声信号をテキスト情報に変換するものであるので、テキスト変換処理ということもできる。 The voice recognition unit 14 is a processing unit that generates speech data by performing voice recognition on the user's voice. The voice recognition unit 14 acquires the voice signal generated by the microphone 13 and performs voice recognition processing on the acquired voice signal, thereby generating utterance data of the user's utterance. The utterance data is information transmitted from the user to the voice interaction device 20, and is expressed by characters (text) such as “I want to eat Chinese”. Note that since the speech recognition process converts a speech signal into text information, it can also be referred to as a text conversion process.

なお、マイク１３及び音声認識部１４は、音声合成部１２等と同様、音声対話装置２０の一機能として音声対話装置２０の内部に備えられてもよいし、音声対話装置２０の外部に備えられてもよい。また、音声認識部１４は、音声合成部１２同様、クラウドサーバとして実現されてもよい。 Note that the microphone 13 and the voice recognition unit 14 may be provided inside the voice dialogue device 20 as one function of the voice dialogue device 20 as in the voice synthesis unit 12 or the like, or provided outside the voice dialogue device 20. May be. In addition, the voice recognition unit 14 may be realized as a cloud server like the voice synthesis unit 12.

タスク処理部４０は、ユーザと音声対話装置２０との対話の内容に基づいて処理を行い、その処理結果を示す情報又はその関連情報を出力する処理部である。タスク処理部４０による処理は、対話の内容に基づく情報処理であればどのようなものであってもよい。例えば、タスク処理部４０は、インターネット上のＷｅｂページから、対話の内容に適合するレストランのＷｅｂページを検索する検索処理を実行し、その検索結果を出力するものとしてもよく、この場合を以下で説明する。なお、タスク処理部４０による処理の実行単位のことをタスクともいう。また、タスク処理部４０は、処理部に相当する。 The task processing unit 40 is a processing unit that performs processing based on the content of the dialogue between the user and the voice interaction device 20 and outputs information indicating the processing result or related information. The processing by the task processing unit 40 may be any information processing based on the content of the dialogue. For example, the task processing unit 40 may execute a search process for searching a Web page of a restaurant that matches the content of the conversation from a Web page on the Internet, and output the search result. explain. Note that the unit of execution of processing by the task processing unit 40 is also referred to as a task. The task processing unit 40 corresponds to a processing unit.

なお、タスク処理部４０による処理の他の例として、対話の内容をデータとして蓄積する処理を実行し、その処理の成否を示す情報を出力するものとしてもよい。また、タスク処理部４０は、対話の内容に基づいて複数の電気機器のうち制御対象の電気機器を特定し、その電気機器の固有情報又は動作に関する情報を出力するものとしてもよい。 As another example of the process by the task processing unit 40, a process for accumulating the contents of the dialogue as data may be executed, and information indicating the success or failure of the process may be output. In addition, the task processing unit 40 may identify an electric device to be controlled among a plurality of electric devices based on the content of the dialogue, and may output specific information or information on the operation of the electric device.

音声対話装置２０は、ユーザとの音声による対話を行う処理装置である。音声対話装置２０は、ユーザとの対話の内容を示す対話情報を生成及び修正し、その対話情報をタスク処理部４０に出力する。また、音声対話装置２０は、タスク処理部４０から処理結果を取得しユーザに提示し、さらにユーザとの対話を継続する。 The voice interaction device 20 is a processing device that performs a voice interaction with a user. The spoken dialogue apparatus 20 generates and corrects dialogue information indicating the content of the dialogue with the user, and outputs the dialogue information to the task processing unit 40. The voice interaction device 20 acquires the processing result from the task processing unit 40 and presents it to the user, and further continues the dialogue with the user.

音声対話装置２０は、応答文生成部２１と、発話データ取得部２２と、シーケンス制御部２３と、タスク制御部２４と、操作部２５と、解析部２６と、メモリ２７と、タスク結果解析部２８と、提示制御部２９とを備える。 The voice interaction device 20 includes a response sentence generation unit 21, an utterance data acquisition unit 22, a sequence control unit 23, a task control unit 24, an operation unit 25, an analysis unit 26, a memory 27, and a task result analysis unit. 28 and a presentation control unit 29.

応答文生成部２１は、シーケンス制御部２３から応答指示を取得し、取得した応答指示に基づいて応答文を生成する処理部である。応答文は、音声対話装置２０からユーザへ伝達する情報であり、具体的には、「地域を指定下さい」というようなユーザに対して発話を促すための文章、「承知しました」というようなユーザの発話に対する相槌、又は、「検索します」というような音声対話装置２０の動作を説明する文章である。どのようなときにどのような応答指示をするかについては、後で詳細に説明する。 The response sentence generation unit 21 is a processing unit that acquires a response instruction from the sequence control unit 23 and generates a response sentence based on the acquired response instruction. The response sentence is information transmitted from the voice interaction device 20 to the user. Specifically, the response sentence is a sentence for prompting the user to speak such as “Please specify a region”, such as “I understand”. It is a sentence explaining the operation of the voice interactive apparatus 20 such as the user's utterance or “searching”. What kind of response instruction is given at what time will be described in detail later.

発話データ取得部２２は、ユーザによる発話の発話データを音声認識部１４から取得する処理部である。ユーザの音声による発話がなされた場合、マイク１３及び音声認識部１４により、上記発話の内容を示す発話データが生成され、この生成された発話データを発話データ取得部２２が取得する。また、発話データ取得部２２が取得する発話データは、対話の内容を過去の時点におけるものに変更するための制御用語を含むこともある。制御用語を含む発話データのことを制御発話データともいう。なお、発話データ取得部２２は、取得部の一機能に相当する。 The utterance data acquisition unit 22 is a processing unit that acquires utterance data of a user's utterance from the speech recognition unit 14. When the user's voice is uttered, the microphone 13 and the voice recognition unit 14 generate utterance data indicating the content of the utterance, and the utterance data acquisition unit 22 acquires the generated utterance data. Further, the utterance data acquired by the utterance data acquisition unit 22 may include control terms for changing the content of the dialogue to that at the past time. Utterance data including control terms is also referred to as control utterance data. Note that the utterance data acquisition unit 22 corresponds to one function of the acquisition unit.

シーケンス制御部２３は、音声対話装置２０とユーザとの対話の対話シーケンスを制御することで、ユーザとの対話を実現する処理部である。ここで、対話シーケンスとは、対話におけるユーザによる発話と音声対話装置２０による応答とを時系列で並べたデータのことである。なお、シーケンス制御部２３は、取得部の一機能に相当する。 The sequence control unit 23 is a processing unit that realizes a dialogue with the user by controlling a dialogue sequence of the dialogue between the voice dialogue apparatus 20 and the user. Here, the dialogue sequence is data in which utterances by the user in the dialogue and responses by the voice dialogue apparatus 20 are arranged in time series. The sequence control unit 23 corresponds to one function of the acquisition unit.

具体的には、シーケンス制御部２３は、ユーザによる発話の発話データを発話データ取得部２２から取得する。そして、取得した発話データ、これまでのユーザとの対話シーケンス、又は、タスク結果解析部２８から取得する処理結果に基づいて、次にユーザに提示すべき応答文を作成する指示（以降、「応答指示」ともいう）を生成し、応答文生成部２１に送る。シーケンス制御部２３がどのような場合にどのような応答指示を生成するかについては、後で具体的に説明する。 Specifically, the sequence control unit 23 acquires the utterance data of the user's utterance from the utterance data acquisition unit 22. Then, based on the acquired utterance data, the previous interaction sequence with the user, or the processing result acquired from the task result analysis unit 28, an instruction to create a response sentence to be presented to the user (hereinafter referred to as “response”). Is also referred to as “instruction”, and is sent to the response sentence generation unit 21. What kind of response instruction is generated in what case by the sequence control unit 23 will be specifically described later.

また、シーケンス制御部２３は、取得した発話データから用語（発話用語ともいう）を抽出し、抽出した用語を、操作部２５を介して、その用語の属性に対応付けられたスロット３１に格納し、保持させる。ここで、用語とは、単語のように比較的短い語のことをいい、例えば、１つの名詞、又は、１つの形容詞などが１つの用語に相当する。 Further, the sequence control unit 23 extracts a term (also referred to as an utterance term) from the acquired utterance data, and stores the extracted term in the slot 31 associated with the attribute of the term via the operation unit 25. , Keep it. Here, the term refers to a relatively short word such as a word. For example, one noun or one adjective corresponds to one term.

タスク制御部２４は、音声対話装置２０とユーザとの対話の内容をタスク処理部４０に出力し、出力した対話の内容に基づく処理をタスク処理部４０に実行させる処理部である。具体的には、タスク制御部２４は、複数のスロット３１が保持している用語をタスク処理部４０に出力する。また、タスク制御部２４は、複数のスロット３１の状態についての所定の条件が満たされるか否かを判定し、所定の条件が満たされる場合にのみ、複数のスロット３１が保持している用語をタスク処理部４０に出力するようにしてもよい。なお、タスク制御部２４は、外部処理制御部の一機能に相当する。 The task control unit 24 is a processing unit that outputs the content of the dialogue between the voice interactive device 20 and the user to the task processing unit 40 and causes the task processing unit 40 to execute processing based on the output content of the dialogue. Specifically, the task control unit 24 outputs the terms held in the plurality of slots 31 to the task processing unit 40. Further, the task control unit 24 determines whether or not a predetermined condition regarding the state of the plurality of slots 31 is satisfied, and the term held by the plurality of slots 31 is determined only when the predetermined condition is satisfied. You may make it output to the task process part 40. FIG. The task control unit 24 corresponds to one function of the external processing control unit.

操作部２５は、メモリ２７に格納されている対話の内容を示す情報を追加、削除又は変更する処理部である。具体的には、操作部２５は、発話データ取得部２２が取得した発話データに、対話情報を制御するための制御用語が含まれる場合に、履歴テーブル３２を参照して、複数のスロット３１のそれぞれが保持している用語を、制御用語により特定される過去の時点において当該スロット３１が保持していた用語に変更する。また、操作部２５は、タスク結果解析部２８からの指示を受けて、履歴テーブル３２上の所定のレコードに復元ポイントを設定してもよい。なお、操作部２５は、取得部の一機能、及び、変更部の一機能に相当する。 The operation unit 25 is a processing unit that adds, deletes, or changes information indicating the content of the dialogue stored in the memory 27. Specifically, the operation unit 25 refers to the history table 32 when the utterance data acquired by the utterance data acquisition unit 22 includes a control term for controlling the conversation information. The term held by each slot is changed to the term held by the slot 31 at the past time specified by the control term. Further, the operation unit 25 may set a restoration point in a predetermined record on the history table 32 in response to an instruction from the task result analysis unit 28. The operation unit 25 corresponds to one function of the acquisition unit and one function of the change unit.

解析部２６は、メモリ２７内のスロット３１又は履歴テーブル３２を解析し、解析結果に応じた通知をシーケンス制御部２３に行う処理部である。具体的には、解析部２６は、スロット３１のうちの必須スロット群のスロットそれぞれが用語を保持しているか否かを判定し、それぞれが用語を保持している場合には、その旨をシーケンス制御部２３に通知する。なお、解析部２６は、変更部の一機能に相当する。 The analysis unit 26 is a processing unit that analyzes the slot 31 or the history table 32 in the memory 27 and notifies the sequence control unit 23 according to the analysis result. Specifically, the analysis unit 26 determines whether or not each of the slots of the essential slot group of the slots 31 holds a term. The control unit 23 is notified. The analysis unit 26 corresponds to one function of the changing unit.

また、解析部２６は、操作部２５を利用して、対話の内容を過去の時点に復元するための復元処理を行う。解析部２６は、復元処理を行う際に、履歴テーブル３２内に設定された復元ポイントが複数あるか否かを判定し、複数の復元ポイントがあると判定した場合には、複数の復元ポイントの中から１つを選択するための条件をシーケンス制御部２３に送る。復元処理の具体的な処理内容については後で詳しく説明する。 Further, the analysis unit 26 uses the operation unit 25 to perform a restoration process for restoring the content of the dialogue to a past time point. When performing the restoration process, the analysis unit 26 determines whether there are a plurality of restoration points set in the history table 32. If it is determined that there are a plurality of restoration points, A condition for selecting one of them is sent to the sequence control unit 23. Specific processing contents of the restoration processing will be described in detail later.

メモリ２７は、対話の内容を記憶している記憶装置である。具体的には、メモリ２７は、スロット３１及び履歴テーブル３２を有する。 The memory 27 is a storage device that stores the contents of the dialogue. Specifically, the memory 27 has a slot 31 and a history table 32.

スロット３１は、対話の内容を示す対話情報を保持するための記憶領域であり、音声対話装置２０に複数備えられる。複数のスロット３１は、それぞれが用語の属性に対応付けられており、それぞれが当該スロット３１に対応付けられた属性を有する用語を保持する。そして、スロット３１のそれぞれに格納された用語全体が、上記対話情報を示している。スロット３１は、１つの用語を保持する。そして、スロット３１は、１つの用語を保持している状態において新たな用語を保持した場合には、保持していた１つの用語はスロット３１上からは消去される。 The slot 31 is a storage area for holding dialogue information indicating the contents of the dialogue, and a plurality of slots are provided in the voice dialogue device 20. Each of the plurality of slots 31 is associated with a term attribute, and holds a term having an attribute associated with the slot 31. The entire terms stored in each of the slots 31 indicate the dialogue information. The slot 31 holds one term. When the slot 31 holds a new term in a state where one term is held, the held one term is deleted from the slot 31.

ここで、用語の属性とは、当該用語の性質、特徴又はカテゴリを示す情報のことである。例えば、タスク処理部４０の処理がレストラン検索の場合、料理名、地域、予算、個室の有無、駐車場の有無、最寄駅からの徒歩での所要時間、貸切が可能か否か、又は、夜景が見えるか否かというような情報を属性として用いることができる。なお、スロット３１が用語を保持することを、スロット３１に用語が格納される、又は、登録される、と表現することもできる。なお、メモリ２７のうちのスロット３１の領域は、保持部に相当する。 Here, the term attribute is information indicating the nature, feature, or category of the term. For example, when the processing of the task processing unit 40 is a restaurant search, the dish name, area, budget, existence of a private room, existence of a parking lot, required time on foot from the nearest station, whether or not chartering is possible, or Information such as whether or not a night view is visible can be used as an attribute. Note that holding a term in the slot 31 can also be expressed as storing or registering a term in the slot 31. Note that the area of the slot 31 in the memory 27 corresponds to a holding unit.

また、スロット３１には、必須スロット及びオプションスロットという２つの種別が設けられていてもよい。必須スロットとは、当該スロットが用語を保持していないとタスク制御部２４がタスク処理部４０に用語を出力しないスロットのことである。また、オプションスロットとは、当該オプションスロットが用語を保持していなくても、すべての必須スロットが用語を保持していればタスク制御部２４がタスク処理部４０に用語を出力するスロットのことである。例えば、タスク処理として検索タスクを実行させる場合、すべてのスロットが保持している用語をタスク制御部２４がタスク処理部４０に出力する際、必須スロット群に含まれるすべてのスロットが用語を保持している場合に限り出力を行うようにするようにしてもよい。スロット３１が、必須スロット及びオプションスロットのうちのどちらであるかは、スロット３１ごとに予め定められている。なお、上記２つの種別が設けられず、１つのだけの種別である場合には、スロット３１の全てを必須スロットとしてもよいし、オプションスロットとしてもよい。これらのどちらにするかは、タスク処理部４０の処理、又は、対話の内容に基づいて適宜定められてよい。 Further, the slot 31 may be provided with two types, that is, an essential slot and an optional slot. The essential slot is a slot in which the task control unit 24 does not output the term to the task processing unit 40 unless the slot holds the term. An option slot is a slot in which the task control unit 24 outputs a term to the task processing unit 40 if all the essential slots hold the term even if the option slot does not hold the term. is there. For example, when a search task is executed as task processing, when the task control unit 24 outputs the terms held in all slots to the task processing unit 40, all slots included in the essential slot group hold the terms. The output may be performed only in the case of Whether the slot 31 is an essential slot or an optional slot is predetermined for each slot 31. When the above two types are not provided and only one type is provided, all of the slots 31 may be required slots or optional slots. Which of these may be determined as appropriate based on the processing of the task processing unit 40 or the content of the dialogue.

履歴テーブル３２は、複数のスロット３１が保持する用語の履歴を示すテーブルである。具体的には、履歴テーブル３２は、複数のスロット３１が過去に保持していた用語、及び、現在保持している用語が時系列で収められたテーブルである。スロット３１が新たな用語を保持することで、その直前に保持していた用語をスロット３１上から消去した場合でも、その消去された用語は、履歴テーブル３２には残されている。 The history table 32 is a table showing the history of terms held by the plurality of slots 31. Specifically, the history table 32 is a table in which the terms held in the past by the plurality of slots 31 and the terms currently held are stored in time series. By holding a new term in the slot 31, even when the term held immediately before is deleted from the slot 31, the deleted term remains in the history table 32.

なお、履歴テーブル３２には、過去に複数のスロット３１が保持した用語と共に、その時点での時刻を示す情報（例えば、タイムスタンプ）が格納されてもよい。また、時間の進みと共にレコードを追加的に格納するという前提があれば、履歴テーブル３２には、過去に複数のスロット３１が保持した用語だけが格納されてもよい。なお、メモリ２７のうち、履歴テーブル３２が記憶された領域は、記憶部に相当する。 The history table 32 may store information indicating the time at that time (for example, a time stamp) together with terms held by the plurality of slots 31 in the past. In addition, if there is a premise that records are additionally stored as time progresses, the history table 32 may store only terms held by a plurality of slots 31 in the past. Note that the area of the memory 27 in which the history table 32 is stored corresponds to a storage unit.

タスク結果解析部２８は、タスク処理部４０による処理結果を取得し、取得した処理結果を解析する処理部である。タスク結果解析部２８は、タスク処理部４０から処理結果を取得した場合には、取得した処理結果を解析し、解析結果をシーケンス制御部２３に渡す。なお、この解析結果は、履歴テーブル３２のうちの現在時刻に対応する時点に復元ポイントを設定するか否かを操作部２５が判定する際に用いられる。なお、タスク結果解析部２８は、外部処理制御部の一機能に相当する。 The task result analysis unit 28 is a processing unit that acquires a processing result obtained by the task processing unit 40 and analyzes the acquired processing result. When the task result analysis unit 28 acquires the processing result from the task processing unit 40, the task result analysis unit 28 analyzes the acquired processing result and passes the analysis result to the sequence control unit 23. This analysis result is used when the operation unit 25 determines whether or not to set a restoration point at a time corresponding to the current time in the history table 32. The task result analysis unit 28 corresponds to one function of the external processing control unit.

例えば、タスク結果解析部２８は、タスク処理部４０によるレストラン検索処理の結果として、検索された情報が掲載されたＷｅｂページのタイトル及びＵＲＬ（ＵｎｉｆｏｒｍＲｅｓｏｕｒｃｅＬｏｃａｔｏｒ）を取得する。また、タスク結果解析部２８は、検索処理の結果を解析し、検索された情報の件数を算出する。そして、タスク結果解析部２８は、検索された情報の件数が、ユーザによる閲覧に適した件数（例えば１件〜３０件程度）である場合にのみ復元ポイントを設定するようにしてもよい。また、タスク結果解析部２８は、検索された情報の件数が、０件、又は、１００件以上というように、ユーザによる閲覧に適さない件数である場合に復元ポイントを設定することを禁止するようにしてもよい。 For example, the task result analysis unit 28 acquires the title and URL (Uniform Resource Locator) of the Web page on which the searched information is posted as a result of the restaurant search process by the task processing unit 40. Further, the task result analysis unit 28 analyzes the result of the search process and calculates the number of searched information. Then, the task result analysis unit 28 may set the restoration point only when the number of retrieved information is the number suitable for browsing by the user (for example, about 1 to 30 cases). In addition, the task result analysis unit 28 prohibits setting a restoration point when the number of retrieved information items is not suitable for browsing by the user, such as 0 or 100 or more. It may be.

また、タスク結果解析部２８は、必須スロット群のスロットの全てが用語を保持した時点に復元ポイントを設定するようにしてもよいし、スロット３１が用語を保持している状態において、保持している用語と異なる用語を保持する状態に変わる時点に復元ポイントを設定してもよい。 In addition, the task result analysis unit 28 may set a restoration point when all of the slots of the essential slot group hold the term, or hold it while the slot 31 holds the term. A restoration point may be set at a point in time when the state changes to a state that holds a term different from the existing term.

提示制御部２９は、表示装置１０によりユーザに提示するための提示データを生成し、表示装置１０に出力する処理部である。提示制御部２９は、タスク処理部４０から処理結果を取得し、ユーザに効果的に処理結果を閲覧させるために表示装置１０の画面上の位置を整え、また、表示装置１０に出力するのに適したデータ形式に変換した上で、提示データを表示装置１０に出力する。 The presentation control unit 29 is a processing unit that generates presentation data to be presented to the user by the display device 10 and outputs the presentation data to the display device 10. The presentation control unit 29 acquires the processing result from the task processing unit 40, arranges the position on the screen of the display device 10 so that the user can browse the processing result effectively, and outputs it to the display device 10 The presentation data is output to the display device 10 after being converted into a suitable data format.

なお、音声対話装置２０の一部又は全部の機能、及び、タスク処理部４０は、音声合成部１２等同様、クラウドサーバとして実現されてもよい。 Note that part or all of the functions of the voice interaction device 20 and the task processing unit 40 may be implemented as a cloud server, like the voice synthesis unit 12 and the like.

図２は、本実施の形態に係る音声対話システム１による提示の説明図である。図２に示される説明図は、タスク処理部４０による処理結果を表示装置１０がユーザに提示するときの表示画面に表示される画像の一例である。 FIG. 2 is an explanatory diagram of presentation by the voice interaction system 1 according to the present embodiment. The explanatory diagram shown in FIG. 2 is an example of an image displayed on the display screen when the display device 10 presents the processing result by the task processing unit 40 to the user.

表示画面内の左側には、属性を示す文字列２０１〜２０５が表示されている。文字列２０１〜２０５は、複数のスロット３１それぞれの属性を示す文字列である。 On the left side of the display screen, character strings 201 to 205 indicating attributes are displayed. Character strings 201 to 205 are character strings indicating attributes of the plurality of slots 31.

表示画面内の右側には、用語２１１〜２１５が表示されている。用語２１１〜２１５は、それぞれ、文字列２０１〜２０５の属性に対応付けられたスロット３１が保持している用語である。 Terms 211 to 215 are displayed on the right side in the display screen. The terms 211 to 215 are terms held by the slots 31 associated with the attributes of the character strings 201 to 205, respectively.

表示画面内の下側には、文字列２０６及び結果情報２１６が示されている。文字列２０６は、文字列２０６の下方に表示されるものが検索結果であることを示す文字列である。結果情報２１６は、用語２１１〜２１５に基づいてタスク処理部４０がレストラン検索を行った結果を示す情報である。 A character string 206 and result information 216 are shown on the lower side of the display screen. The character string 206 is a character string indicating that what is displayed below the character string 206 is a search result. The result information 216 is information indicating a result of the restaurant search performed by the task processing unit 40 based on the terms 211 to 215.

このように、対話の内容と、その対話の内容に基づくタスク処理部４０による処理結果である結果情報とが表示装置１０に表示され、ユーザは、対話の内容が反映された処理結果を知ることができる。 Thus, the content of the dialogue and the result information that is the processing result by the task processing unit 40 based on the content of the dialogue are displayed on the display device 10, and the user knows the processing result in which the content of the dialogue is reflected. Can do.

なお、表示画面に表示される画像は、図２に示されるものに限定されるわけではなく、表示される情報、その配置などの表示の有無、表示位置は、任意に変更されてよい。 The image displayed on the display screen is not limited to that shown in FIG. 2, and the displayed information, the presence / absence of display such as its arrangement, and the display position may be arbitrarily changed.

図３は、本実施の形態に係る対話シーケンス及び履歴情報の第一の説明図である。 FIG. 3 is a first explanatory diagram of a dialogue sequence and history information according to the present embodiment.

図３には、対話シーケンス３１０、履歴テーブル３２０、及び、検索結果３３０が、対話シーケンスの時系列に併せて示されている。なお、図３に示される一列は、１つの時点に対応している。この一列のことを１レコードともいう。 In FIG. 3, the dialog sequence 310, the history table 320, and the search result 330 are shown together with the time series of the dialog sequence. Note that one row shown in FIG. 3 corresponds to one time point. This line is also called one record.

対話シーケンス３１０は、対話におけるユーザによる発話と音声対話装置２０による応答とを時系列で並べたデータである。 The dialogue sequence 310 is data in which utterances by the user in the dialogue and responses by the voice dialogue apparatus 20 are arranged in time series.

時刻情報３１１（タイムスタンプ）は、ユーザによる発話又は音声対話装置２０による応答があった時刻を示す時刻情報である。 The time information 311 (time stamp) is time information indicating the time when the user uttered or responded by the voice interaction apparatus 20.

発話３１２は、当該時刻におけるユーザによる発話を示す発話データである。具体的には、発話３１２は、発話データ取得部２２が、マイク１３及び音声認識部１４を介して取得したユーザの音声による発話を示す発話データである。 The utterance 312 is utterance data indicating the utterance by the user at the time. Specifically, the utterance 312 is utterance data indicating the utterance by the user's voice acquired by the utterance data acquisition unit 22 via the microphone 13 and the voice recognition unit 14.

応答３１３は、当該時刻における音声対話装置２０による応答を示す応答文である。具体的には、応答３１３は、応答文生成部２１が、シーケンス制御部２３からの応答指示を受けて生成するものである。 The response 313 is a response sentence indicating a response by the voice interaction device 20 at the time. Specifically, the response 313 is generated by the response sentence generation unit 21 in response to a response instruction from the sequence control unit 23.

履歴テーブル３２０は、必須スロット群３２１と、オプションスロット群３２２と、アクション３２３と、復元ポイント３２４との各情報を有する。履歴テーブル３２０は、履歴テーブル３２に格納されている、スロット３１の履歴を示す情報であり、対話シーケンス３１０の時刻情報３１１の時系列に合わせて示されている。履歴テーブル３２０は、履歴テーブル３２の一例である。 The history table 320 includes information on an essential slot group 321, an option slot group 322, an action 323, and a restoration point 324. The history table 320 is information indicating the history of the slot 31 stored in the history table 32, and is shown in time series of the time information 311 of the dialogue sequence 310. The history table 320 is an example of the history table 32.

必須スロット群３２１は、スロット３１のうちの必須スロットに、当該時点において保持されていた用語である。必須スロット群３２１には、例えば、「料理名」、「地域」及び「予算」の属性の用語が含まれる。 The essential slot group 321 is a term held in an essential slot among the slots 31 at the time. The essential slot group 321 includes, for example, terms of attributes of “dishes name”, “region”, and “budget”.

オプションスロット群３２２は、スロット３１のうちのオプションスロットに、当該時点において保持されていた用語である。オプションスロット群３２２には、例えば、「個室の有無」及び「駐車場の有無」の属性の用語が含まれる。 The option slot group 322 is a term held in the option slot of the slots 31 at the time. The option slot group 322 includes, for example, attribute terms of “presence / absence of private room” and “presence / absence of parking lot”.

アクション３２３は、当該時点において音声対話装置２０が実行した処理を示す情報であり、複数の情報が格納されることもある。例えば、ある属性のスロット３１に新たな用語を保持させた場合には、そのことを示すために、その属性の名称と、「登録」の文字列とが当該時点に設定される。また、タスク制御部２４がタスク処理部４０に用語を出力して情報検索をさせた時点には、「検索」の文字列が設定される。また、操作部２５が、スロット３１が保持している用語を過去の時点におけるものに変更した時点には、「復元」の文字列が設定される。 The action 323 is information indicating processing executed by the voice interactive apparatus 20 at the time, and a plurality of pieces of information may be stored. For example, when a new term is held in a slot 31 with a certain attribute, the name of the attribute and a character string “register” are set at the time point to indicate that. In addition, when the task control unit 24 outputs a term to the task processing unit 40 to search for information, a character string “search” is set. Further, when the operation unit 25 changes the term held in the slot 31 to that at the past time point, the character string “restore” is set.

復元ポイント３２４は、当該時点に復元ポイントが設定されているか否かを示す情報である。復元ポイントが設定されている時点には、「１」が設定されている。復元ポイントは、タスク結果解析部２８により設定されるか否かが判定され、操作部２５により履歴テーブル３２０に設定されるものである。 The restoration point 324 is information indicating whether or not a restoration point is set at the time point. At the time when the restoration point is set, “1” is set. It is determined whether or not the restoration point is set by the task result analysis unit 28, and is set in the history table 320 by the operation unit 25.

検索結果３３０は、当該時点におけるタスク処理部４０による検索処理の結果の件数である。検索結果３３０は、タスク結果解析部２８により設定されるものである。 The search result 330 is the number of search processing results by the task processing unit 40 at the time. The search result 330 is set by the task result analysis unit 28.

図３に示される対話シーケンスは、ユーザが、検索条件を変えながら、順次、異なる検索条件でレストラン検索を行うための対話において、対話の内容をユーザが意図する過去の時点におけるものに変更する場合のものである。 The dialogue sequence shown in FIG. 3 is a case where the user changes the contents of the dialogue to those at the past time intended by the user in the dialogue for performing restaurant search under different search conditions while changing the search conditions. belongs to.

レコードＲ１〜Ｒ７に対応する時点において、順次、ユーザによる発話に含まれる用語が発話データ取得部２２等により取得され、取得された用語のそれぞれが当該用語の属性に対応したスロット３１に格納される。 At the time corresponding to the records R1 to R7, the terms included in the user's utterance are sequentially acquired by the utterance data acquisition unit 22 and the like, and each of the acquired terms is stored in the slot 31 corresponding to the attribute of the term. .

レコードＲ８に対応する時点において、スロット３１が保持している用語に基づいた最初の検索処理がタスク処理部４０により行われる。これは、レコードＲ７に対応する時点で必須スロット群に含まれるスロット３１の全てに用語が格納されたことを契機として行われたものである。 At the time corresponding to the record R8, the first search process based on the term held in the slot 31 is performed by the task processing unit 40. This is performed when the term is stored in all the slots 31 included in the essential slot group at the time corresponding to the record R7.

レコードＲ９〜Ｒ１６に対応する時点において、スロット３１が保持している用語に基づいた検索処理が行われる。これは、ユーザが所望する検索結果が得られるように検索語を変えながら、順次、検索処理がなされたものである。 At the time corresponding to the records R9 to R16, search processing based on the terms held in the slot 31 is performed. In this case, the search processing is sequentially performed while changing the search word so that the search result desired by the user can be obtained.

レコードＲ１７に対応する時点において、対話の内容を過去の時点に戻すための制御発話がユーザによりなされる。これは、レコードＲ１４又はＲ１６に対応する時点での検索結果が０件であったので、検索件数が０件になる前の過去の時点の検索条件に戻そうと、ユーザが意図して行ったものである。 At the time corresponding to the record R17, the user makes a control utterance to return the content of the dialogue to a past time. This is because the search result at the time corresponding to the record R14 or R16 was 0, and the user intended to return to the search conditions of the past time before the number of searches became 0. Is.

レコードＲ１８〜Ｒ２０において、スロット３１のそれぞれが保持する用語が、レコードＲ１０におけるものに復元される。 In the records R18 to R20, the terms held in the slots 31 are restored to those in the record R10.

このようにすることで、音声対話装置２０は、対話の内容を、ユーザの音声による発話に基づいた過去の時点に戻し、その状態から新たな対話を継続的に実行することができる。このように、音声対話装置は、ユーザとの対話の内容を簡易な方法により修正することができる。 By doing in this way, the voice interactive apparatus 20 can return the content of the dialog to a past time point based on the speech by the user's voice, and continuously execute a new dialog from the state. In this way, the voice interaction device can correct the content of the dialogue with the user by a simple method.

［１−２．動作］
以上のように構成された音声対話装置２０及び音声対話システム１について、その動作を以下に説明する。 [1-2. Operation]
The operations of the voice interaction device 20 and the voice interaction system 1 configured as described above will be described below.

図４は、本実施の形態に係る音声対話装置２０によるメイン処理のフロー図である。 FIG. 4 is a flowchart of main processing by the voice interaction apparatus 20 according to the present embodiment.

ステップＳ１０１において、マイク１３は、ユーザによる発話の音声を取得し、取得した音声に基づいて音声信号を生成する。ここで、ユーザによる発話の音声とは、例えば「中華が食べたい」というようにレストラン検索のための用語を含む音声であってもよいし、「守口に戻して」というようにスロット３１が保持する用語を過去の時点におけるものに変更するための用語を含む音声であってもよい。 In step S 101, the microphone 13 acquires the voice of the user's utterance and generates a voice signal based on the acquired voice. Here, the voice of the utterance by the user may be a voice including a term for restaurant search such as “I want to eat Chinese”, or the slot 31 holds “return to the guard”. It may be a voice including a term for changing the term to be used at a past time.

ステップＳ１０２において、音声認識部１４は、ステップＳ１０１でマイク１３が生成した音声信号に対して音声認識処理を行うことで、ユーザによる発話の発話データを生成する。 In step S102, the voice recognition unit 14 performs voice recognition processing on the voice signal generated by the microphone 13 in step S101, thereby generating utterance data of the user's utterance.

ステップＳ１０３において、発話データ取得部２２は、ステップＳ１０２で音声認識部１４が生成した発話データを取得する。 In step S103, the utterance data acquisition unit 22 acquires the utterance data generated by the voice recognition unit 14 in step S102.

ステップＳ１０４において、シーケンス制御部２３は、ステップＳ１０３で発話データ取得部２２が取得した発話データが空（から）であるか否かを判定する。 In step S104, the sequence control unit 23 determines whether or not the utterance data acquired by the utterance data acquisition unit 22 in step S103 is empty.

ステップＳ１０４で発話データが空であるとシーケンス制御部２３が判定した場合（ステップＳ１０４で「Ｙ」）、ステップＳ１０５に進む。一方、発話データが空でないと判定した場合（ステップＳ１０４で「Ｎ」）、ステップＳ１２１に進む。 If the sequence control unit 23 determines that the utterance data is empty in step S104 (“Y” in step S104), the process proceeds to step S105. On the other hand, if it is determined that the utterance data is not empty (“N” in step S104), the process proceeds to step S121.

ステップＳ１０５において、シーケンス制御部２３は、操作部２５を利用して発話データに含まれる用語をスロット３１に格納する。具体的には、シーケンス制御部２３は、発話データに含まれる用語のそれぞれについて当該用語の属性を判定し、当該用語の属性に一致する属性を有するスロット３１に当該用語を格納する。例えば、シーケンス制御部２３は、発話データ「中華が食べたい」に含まれる用語「中華」が、料理名の属性を有する用語であると判定し、用語「中華」を料理名の属性を有するスロット３１に格納する。なお、このとき、シーケンス制御部２３は、スロット３１に格納される用語が本来の名称の略称又は俗称等であるような場合には、本来の名称に変換した上でスロット３１に格納してもよい。具体的には、シーケンス制御部２３は、用語「中華」が「中華料理」を短縮した名称（略称）であると判定し、スロット３１に「中華料理」を格納するようにしてもよい。 In step S 105, the sequence control unit 23 stores the terms included in the utterance data in the slot 31 using the operation unit 25. Specifically, the sequence control unit 23 determines the attribute of the term for each of the terms included in the utterance data, and stores the term in the slot 31 having an attribute that matches the attribute of the term. For example, the sequence control unit 23 determines that the term “Chinese” included in the utterance data “Chinese wants to eat” is a term having a dish name attribute, and the term “Chinese” is a slot having a dish name attribute. 31. At this time, when the term stored in the slot 31 is an abbreviation or common name of the original name, the sequence control unit 23 converts the original name into the original name and stores it in the slot 31. Good. Specifically, the sequence control unit 23 may determine that the term “Chinese” is an abbreviation of “Chinese cuisine” and store “Chinese cuisine” in the slot 31.

ステップＳ１０６において、操作部２５及び提示制御部２９は、スロット３１が保持している用語を表示装置１０により表示する。 In step S 106, the operation unit 25 and the presentation control unit 29 display the terms held in the slot 31 on the display device 10.

ステップＳ１０７において、操作部２５等は、必要な場合に、対話の内容を過去の時点におけるものに変更することで、対話の内容を復元するための復元処理を行う。復元処理の詳細については、後で詳細に説明する。 In step S107, the operation unit 25 or the like performs a restoration process for restoring the content of the dialogue by changing the content of the dialogue to that at the past time when necessary. Details of the restoration process will be described later in detail.

ステップＳ１０８において、解析部２６は、必須スロット群の全てのスロット３１に用語が格納されているか否か、つまり、必須スロット群の全てのスロット３１が用語を保持しているか否かを判定する。 In step S108, the analysis unit 26 determines whether or not the term is stored in all the slots 31 of the essential slot group, that is, whether or not all the slots 31 of the essential slot group hold the term.

ステップＳ１０８において全てのスロット３１に用語が格納されたと解析部２６が判定した場合（ステップＳ１０８で「Ｙ」）、ステップＳ１０９に進む。一方、全てのスロット３１に用語が格納されていないと解析部２６が判定した場合（ステップＳ１０８で「Ｎ」）、つまり、必須スロット群のうちの少なくとも１つのスロット３１が空である場合、ステップＳ１２２に進む。 When the analysis unit 26 determines that the terms are stored in all the slots 31 in step S108 ("Y" in step S108), the process proceeds to step S109. On the other hand, if the analysis unit 26 determines that no term is stored in all the slots 31 (“N” in step S108), that is, if at least one slot 31 in the essential slot group is empty, the step The process proceeds to S122.

ステップＳ１０９において、シーケンス制御部２３は、タスク処理をタスク処理部４０に実行させるための実行指示をタスク制御部２４に行う。このとき、操作部２５は、履歴テーブル３２に検索タスクを実行したことを記録する。具体的には、操作部２５は、履歴テーブル３２０における現時点のアクション３２３に「検索」を設定する。 In step S 109, the sequence control unit 23 issues an execution instruction for causing the task processing unit 40 to execute task processing to the task control unit 24. At this time, the operation unit 25 records in the history table 32 that the search task has been executed. Specifically, the operation unit 25 sets “search” to the current action 323 in the history table 320.

ステップＳ１１０において、タスク制御部２４は、ステップＳ１０９でのシーケンス制御部２３による実行指示に基づいて、スロット３１が保持している用語をタスク処理部４０に出力し、タスク処理部４０に検索処理を実行させる。タスク処理部４０は、タスク制御部２４が出力した用語を取得し、取得した用語を検索語として用いて検索処理を行い、検索結果を出力する。 In step S110, the task control unit 24 outputs the term held in the slot 31 to the task processing unit 40 based on the execution instruction from the sequence control unit 23 in step S109, and performs search processing on the task processing unit 40. Let it run. The task processing unit 40 acquires the term output by the task control unit 24, performs a search process using the acquired term as a search term, and outputs a search result.

ステップＳ１１１において、提示制御部２９は、ステップＳ１１０でタスク処理部４０が出力した検索結果を取得し、取得した検索結果を、表示装置１０によりユーザに提示するのに適切な形式（例えば、図２のような表示態様）に成形して表示装置１０に出力する。表示装置１０は、提示制御部２９が出力した検索結果を取得し、表示画面に表示する。 In step S111, the presentation control unit 29 acquires the search result output by the task processing unit 40 in step S110, and presents the acquired search result to the user in the display device 10 (for example, FIG. 2). Display form) and output to the display device 10. The display device 10 acquires the search result output by the presentation control unit 29 and displays it on the display screen.

ステップＳ１１２において、タスク結果解析部２８は、ステップＳ１１０でタスク処理部４０が出力した検索結果を取得し、取得した検索結果に基づいて復元ポイント設定処理を行う。復元ポイント設定処理の詳細については、後で詳細に説明する。 In step S112, the task result analysis unit 28 acquires the search result output by the task processing unit 40 in step S110, and performs restoration point setting processing based on the acquired search result. Details of the restoration point setting process will be described later in detail.

ステップＳ１１３において、シーケンス制御部２３は、ユーザに対して次の発話を促すための応答指示を、応答文生成部２１に対して行う。 In step S 113, the sequence control unit 23 gives a response instruction to prompt the user for the next utterance to the response sentence generation unit 21.

ステップＳ１１４において、応答文生成部２１は、応答指示に基づいて応答文を生成する。また、応答文生成部２１は、生成した応答文を音声合成部１２に出力し、当該応答文を音声としてスピーカ１１より出力し、ユーザに聴取させる。 In step S114, the response sentence generator 21 generates a response sentence based on the response instruction. In addition, the response sentence generation unit 21 outputs the generated response sentence to the speech synthesizer 12, and outputs the response sentence as a sound from the speaker 11 to allow the user to listen.

ステップＳ１１４の処理が終了したら、再びステップＳ１０１の処理を実行する。 When the process of step S114 ends, the process of step S101 is executed again.

ステップＳ１２１において、シーケンス制御部２３は、ユーザに対して再発話（前回と同じ発話を行うこと）を促すための応答指示を、応答文生成部２１に対して行う。ステップＳ１０４で発話データが空と判定されたことは、マイク１３が何らかの音を取得したにもかかわらずその音から音声認識部１４が発話データを取得することができなかったことを意味している。よって、ユーザに対して前回と同じ発話を行うことを要請することで、発話データを取得することができると期待される。 In step S 121, the sequence control unit 23 gives a response instruction for prompting the user to re-utter (repeat the same utterance as the previous time) to the response sentence generation unit 21. The fact that the utterance data is determined to be empty in step S104 means that the voice recognition unit 14 cannot acquire the utterance data from the sound although the microphone 13 has acquired some sound. . Therefore, it is expected that utterance data can be acquired by requesting the user to perform the same utterance as the previous time.

ステップＳ１２２において、シーケンス制御部２３は、ユーザに対して次の発話を促すための応答指示を、応答文生成部２１に対して行う。シーケンス制御部２３は、例えば、必須スロット群に含まれるスロット３１のうち、用語を保持していないものがある場合に、用語を保持していないスロット３１が保持すべき用語をユーザに発話させるための応答文を生成する応答指示を行う。 In step S 122, the sequence control unit 23 gives a response instruction for prompting the user to speak next to the response sentence generation unit 21. For example, when there is a slot 31 that does not hold a term among the slots 31 included in the essential slot group, the sequence control unit 23 causes the user to utter the term that the slot 31 that does not hold the term should hold. A response instruction is generated to generate a response sentence.

図５は、本実施の形態に係る音声対話装置による復元処理のフロー図である。図５に示されるフロー図は、図４におけるステップＳ１０７の処理を詳細に示すものであり、発話データに制御用語が含まれている場合にスロット３１が保持する用語を過去の時点におけるものに変更する処理を示すものである。 FIG. 5 is a flowchart of restoration processing by the voice interaction apparatus according to the present embodiment. The flowchart shown in FIG. 5 shows the details of the process of step S107 in FIG. 4, and when the utterance data includes a control term, the term held in the slot 31 is changed to that at the past time point. This shows the processing to be performed.

より具体的には、操作部２５は、発話データ取得部２２が取得した発話データに後述する第一用語及び第二用語が含まれるか否かを判定し、第一用語及び第二用語が含まれると判定した場合に、履歴テーブルを参照して、複数のスロット３１のそれぞれが保持している用語を、複数のスロット３１のうち第二用語の属性に対応付けられたスロット３１（対応保持部に相当）が第二用語を保持していた時点において当該スロット３１が保持していた用語に変更する。 More specifically, the operation unit 25 determines whether the utterance data acquired by the utterance data acquisition unit 22 includes first and second terms, which will be described later, and includes the first and second terms. If it is determined that the term held in each of the plurality of slots 31 is referred to the slot 31 associated with the attribute of the second term among the plurality of slots 31 (corresponding holding unit Corresponds to the term held in the slot 31 when the second term is held.

ステップＳ２０１において、シーケンス制御部２３は、発話データ取得部２２から取得した発話データに、復元用語（第一用語ともいう）が含まれているか否かを判定する。ここで、復元用語とは、対話情報を過去の時点に変更することを示す予め定められた用語であり、例えば、「（〜に）戻して」、又は、「（〜）じゃない」というようなものである。 In step S 201, the sequence control unit 23 determines whether the utterance data acquired from the utterance data acquisition unit 22 includes a restoration term (also referred to as a first term). Here, the restoration term is a predetermined term indicating that the dialogue information is changed to a past time point, for example, “return to (to)” or “not (to)”. It is a thing.

ステップＳ２０１で復元用語が含まれているとシーケンス制御部２３が判定した場合（ステップＳ２０１で「Ｙ」）には、ステップＳ２０２に進む。一方、復元用語が含まれていないと判定した場合（ステップＳ２０１で「Ｎ」）、図５に示される一連の処理を終了する。 If the sequence control unit 23 determines in step S201 that the restoration term is included (“Y” in step S201), the process proceeds to step S202. On the other hand, if it is determined that no restoration term is included (“N” in step S201), the series of processes shown in FIG. 5 is terminated.

ステップＳ２０２において、解析部２６は、発話データのうち復元用語を除く部分に含まれる用語（第二用語ともいう）を取得し、取得した用語に基づいて履歴テーブル３２から復元ポイントを抽出する。具体的には、解析部２６は、取得した用語の属性を判定し、履歴テーブル３２に含まれる復元ポイントのうち、取得した用語の属性に対応するスロット３１が保持していた用語が、当該取得した用語に一致する復元ポイントを抽出する。なお、上記第一用語及び上記第二用語が含まれている発話データが、制御発話データであるということもできる。なお、複数の設定ポイントが抽出されてもよい。 In step S202, the analysis unit 26 acquires a term (also referred to as a second term) included in a portion excluding the restoration term in the utterance data, and extracts a restoration point from the history table 32 based on the obtained term. Specifically, the analysis unit 26 determines the attribute of the acquired term, and among the restoration points included in the history table 32, the term held in the slot 31 corresponding to the acquired term attribute is Extract restore points that match the terms you have selected. It can also be said that the utterance data including the first term and the second term is control utterance data. A plurality of setting points may be extracted.

ステップＳ２０３において、解析部２６は、ステップＳ２０２で抽出した復元ポイントが１個であるか否かを判定する。 In step S203, the analysis unit 26 determines whether the number of restoration points extracted in step S202 is one.

ステップＳ２０３で復元ポイントが１個であると解析部２６が判定した場合（ステップＳ２０３で「Ｙ」）、ステップＳ２０４に進む。一方、復元ポイントが１個でないと解析部２６が判定した場合（ステップＳ２０３で「Ｎ」）、ステップＳ２１１に進む。 If the analysis unit 26 determines that there is one restoration point in step S203 ("Y" in step S203), the process proceeds to step S204. On the other hand, when the analysis unit 26 determines that there is not one restoration point (“N” in step S203), the process proceeds to step S211.

ステップＳ２０４において、操作部２５は、履歴テーブル３２を参照して、スロット３１が保持している用語を、ステップＳ２０２で１個抽出された復元ポイントの時点においてスロット３１に保持していた用語に変更する。つまり、操作部２５は、複数のスロット３１が保持している用語を、復元ポイントの時点におけるものに戻すように変更する。また、操作部２５は、履歴テーブル３２０において、復元ポイントの時点における用語に変更した時点に、アクションとして「復元」を設定する。なお、操作部２５は、復元ポイントの時点でスロット３１が保持していた用語がなかった、つまり、復元ポイントの時点でスロット３１が何も用語を保持していなかった場合には、スロット３１が用語を保持しない状態にする。 In step S204, the operation unit 25 refers to the history table 32 and changes the term held in the slot 31 to the term held in the slot 31 at the point of the restoration point extracted in step S202. To do. That is, the operation unit 25 changes the terminology held in the plurality of slots 31 so as to return to the term at the time of the restoration point. In addition, the operation unit 25 sets “restore” as the action when the history table 320 changes to the term at the time of the restoration point. Note that the operation unit 25 does not have a term held in the slot 31 at the time of the restoration point, that is, if the slot 31 holds no term at the time of the restoration point, the slot 31 Terminate the term.

ステップＳ２１１において、シーケンス制御部２３は、ユーザに対して復元ポイントが１個だけ抽出されるようにするための発話を促すための応答についての応答指示を、応答文生成部２１に対して行う。例えば、履歴テーブル３２０において、ユーザから「守口に戻して」というような制御発話が取得されたとき、この制御発話から特定される復元ポイントが２個あり得る。ユーザがこの２個の復元ポイントのどちらを意図しているかを指示する発話を促すために、シーケンス制御部２３は、「駐車場付で検索したところまで戻せばいいですか」という応答についての応答指示を行う。 In step S 211, the sequence control unit 23 issues a response instruction to the response sentence generation unit 21 regarding a response for prompting the user to extract only one restoration point. For example, in the history table 320, when a control utterance such as “return to the guard” is acquired from the user, there may be two restoration points specified from the control utterance. In order to urge the user to indicate which of the two restoration points is intended, the sequence control unit 23 responds with a response “Would it return to the place searched with the parking lot?” Give instructions.

ステップＳ２１１の後、ユーザにより２個の復元ポイントのいずれかを特定する発話がなされた場合には、次回のメイン処理（図４）から実行されるステップＳ２０２では、復元ポイントが１個抽出され、ステップＳ２０４が実行されることになる。 After step S211, when the user makes an utterance that specifies one of the two restoration points, one restoration point is extracted in step S202 executed from the next main process (FIG. 4). Step S204 is executed.

なお、上記において、第二用語に代えて、属性の名称である属性名を用いてもよい。つまり、操作部２５は、発話データ取得部２２が取得した発話データに第一用語及び属性名が含まれるか否かを判定し、第一用語及び属性名が含まれると判定した場合に、履歴テーブルを参照して、複数のスロット３１のそれぞれが保持している用語を、複数のスロット３１のうち上記属性名により示される属性に対応付けられたスロット３１（対応保持部に相当）が現在保持している用語を保持する直前の時点において当該スロット３１が保持していた用語に変更するようにしてもよい。 In the above, instead of the second term, an attribute name that is an attribute name may be used. That is, the operation unit 25 determines whether or not the first term and the attribute name are included in the utterance data acquired by the utterance data acquisition unit 22, and when it is determined that the first term and the attribute name are included, Referring to the table, the terms held by each of the plurality of slots 31 are currently held by the slot 31 (corresponding to the correspondence holding unit) associated with the attribute indicated by the attribute name among the plurality of slots 31. The term may be changed to the term held in the slot 31 immediately before the term is held.

図６は、本実施の形態に係る音声対話装置による復元ポイント設定処理のフロー図である。図６に示されるフロー図は、図４におけるステップＳ１１２の処理を詳細に示すものである。 FIG. 6 is a flowchart of restoration point setting processing by the voice interaction apparatus according to the present embodiment. The flowchart shown in FIG. 6 shows details of the process in step S112 in FIG.

ステップＳ３０１において、操作部２５は、復元ポイントを設定するための条件に基づいて処理を分岐する。上記条件が、「検索を実行した時点」（条件Ｃ）である場合（ステップＳ３０１で「条件Ｃ」）、ステップＳ３０２に進む。一方、上記条件が、『「検索を実行した時点」かつ「検索結果が有効」』（条件Ｄ）である場合（ステップＳ３０１で「条件Ｄ」）、ステップＳ３０３に進む。なお、ここでは条件が２つの場合を例として示したが、３つ以上の条件がある場合でも同様の処理が可能である。 In step S301, the operation unit 25 branches the process based on a condition for setting a restoration point. When the above condition is “when search is executed” (condition C) (“condition C” in step S301), the process proceeds to step S302. On the other hand, when the above condition is ““ time when search is performed ”and“ search result is valid ”” (condition D) (“condition D” in step S301), the process proceeds to step S303. Here, the case where there are two conditions is shown as an example, but the same processing is possible even when there are three or more conditions.

ステップＳ３０２において、操作部２５は、履歴テーブル３２０における現在の時点に復元ポイントを設定する。 In step S 302, the operation unit 25 sets a restoration point at the current time point in the history table 320.

ステップＳ３０３において、操作部２５は、タスク結果解析部２８の解析結果である検索結果を取得し、検索された情報の件数が０件であるか否かを判定する。 In step S303, the operation unit 25 acquires a search result that is an analysis result of the task result analysis unit 28, and determines whether or not the number of pieces of searched information is zero.

ステップＳ３０３で検索された情報の件数が０件である場合（ステップＳ３０３で「Ｙ」）、操作部２５は、この時点に復元ポイントを設定せずに一連の処理を終了する。すなわち、操作部２５は、情報検索の結果を取得した時点であっても、情報検索の結果に含まれる情報が０件であった時点には、復元ポイントを設定することを禁止する。一方、検索された情報の件数が０件でない場合（ステップＳ３０３で「Ｎ」）、ステップＳ３０２に進む。 When the number of information retrieved in step S303 is zero (“Y” in step S303), the operation unit 25 ends a series of processes without setting a restoration point at this point. That is, the operation unit 25 prohibits setting a restoration point when the information included in the information search result is 0 even when the information search result is acquired. On the other hand, when the number of retrieved information is not 0 (“N” in step S303), the process proceeds to step S302.

なお、検索された情報の件数がユーザによる閲覧に適さない数（例えば１００件以上）である場合にも、０件である場合と同様、この時点に復元ポイントを設定しないことにしてもよい。 Even when the number of retrieved information is not suitable for browsing by the user (for example, 100 or more), the restoration point may not be set at this time as in the case of zero.

図７は、本実施の形態に係る履歴情報の第二の説明図である。図７に示される対話シーケンスは、ユーザが、検索条件を変えながら、順次、異なる検索条件でレストラン検索を行うための対話において、音声の誤認識などに起因して対話の内容がユーザの意図と異なるものとなった場合に、対話の内容を過去の時点におけるものに変更する場合のものである。 FIG. 7 is a second explanatory diagram of history information according to the present embodiment. The dialogue sequence shown in FIG. 7 is a dialogue in which a user sequentially searches a restaurant under different search conditions while changing the search conditions. In this case, the contents of the dialogue are changed to those at the past time when they are different.

図７には、図３と同様、対話シーケンス３１０等が示されている。 FIG. 7 shows a dialogue sequence 310 and the like as in FIG.

レコードＲ１〜Ｒ５に対応する時点において、順次、ユーザによる発話に含まれる用語が発話データ取得部２２等により取得され、取得された用語のそれぞれが当該用語の属性に対応したスロット３１に格納される。 At the time corresponding to the records R1 to R5, the terms included in the user's utterance are sequentially acquired by the utterance data acquisition unit 22 and the like, and each of the acquired terms is stored in the slot 31 corresponding to the attribute of the term. .

レコードＲ６に対応する時点において、スロット３１が保持している用語に基づいた最初の検索処理がタスク処理部４０により行われる。これは、レコードＲ５に対応する時点で必須スロット群に含まれるスロット３１の全てに用語が格納されたことを契機として行われたものである。 At the time corresponding to the record R6, the task processing unit 40 performs the first search process based on the term held in the slot 31. This is performed when the term is stored in all the slots 31 included in the essential slot group at the time corresponding to the record R5.

レコードＲ７〜Ｒ１４に対応する時点において、スロット３１に格納された用語に基づいた検索処理が行われる。これは、ユーザが所望する検索結果が得られるように検索語を変えながら、順次、検索処理がなされたものである。 At a time corresponding to the records R7 to R14, a search process based on the terms stored in the slot 31 is performed. In this case, the search processing is sequentially performed while changing the search word so that the search result desired by the user can be obtained.

この対話の中で、音声認識部１４による誤認識によりスロット３１が保持している用語がユーザの意図と異なるものに変更されている。具体的には、レコードＲ１１に対応する時点においてユーザが検索条件として駐車場を追加する意図で「駐車場も（Chushajomo）」と発話したものの、これを音声認識部１４が「中華料理（Chukaryori）」と誤認識し、レコードＲ１２において用語「中華料理」が料理名の属性のスロット３１に格納される。また、レコードＲ１３に対応する時点においてユーザが検索条件を修正する意図で「中華じゃなくてイタリア（Chuka-janakute-itaria）」と発話したものの、これを音声認識部１４が「入谷（Iriya）」すなわち地名である入谷と誤認識し、レコードＲ１５に対応する時点において用語「入谷」が地域の属性のスロット３１に格納される。 In this dialogue, the term held in the slot 31 is changed to a different one from the user's intention due to erroneous recognition by the voice recognition unit 14. Specifically, at the time corresponding to the record R11, the user uttered “parking lot (Chushajomo)” with the intention of adding a parking lot as a search condition, but the voice recognition unit 14 “Chinese cuisine (Chukaryori) And the term “Chinese cuisine” is stored in the slot 31 of the attribute of the dish name in the record R12. In addition, at the time corresponding to the record R13, the user uttered “Chuka-janakute-itaria” instead of “Chuka-janakute-itaria” with the intention of correcting the search condition. That is, the place name is erroneously recognized as Iriya, and the term “Iriya” is stored in the slot 31 of the regional attribute at the time corresponding to the record R15.

レコードＲ１５に対応する時点において、対話の内容を過去の時点に戻すための発話がユーザによりなされる。これは、レコードＲ１２又はＲ１４に対応する時点でスロット３１が保持している用語がユーザの意図と異なり変更されたので、この変更が行われる前の過去の時点の検索条件に戻そうと、ユーザが意図して行ったものである。 At the time corresponding to the record R15, the user makes an utterance for returning the content of the dialogue to the past time. This is because the term held in the slot 31 at the time corresponding to the record R12 or R14 has been changed unlike the user's intention, and the user tries to return to the search condition at the past time before the change was made. Is intended.

レコードＲ１５〜Ｒ１６に対応する時点において、スロット３１のそれぞれが保持している用語が、レコードＲ１０におけるものに復元される。 At the time corresponding to the records R15 to R16, the terms held in the slots 31 are restored to those in the record R10.

このようにすることで、音声対話装置は、対話の内容を、ユーザによる発話に基づいた過去の時点に戻し、その状態から新たな対話を継続的に実行することができる。このように、音声対話装置は、ユーザとの対話の内容を簡易な方法により修正することができる。 In this way, the voice interaction device can return the content of the interaction to a past time point based on the utterance by the user, and continuously execute a new interaction from that state. In this way, the voice interaction device can correct the content of the dialogue with the user by a simple method.

［１−３．変形例］
図８は、本実施の形態の変形例に係る音声対話装置２０Ａの構成を示すブロック図である。 [1-3. Modified example]
FIG. 8 is a block diagram showing a configuration of a voice interactive apparatus 20A according to a modification of the present embodiment.

図８に示されるように、音声対話装置２０Ａは、ユーザとの音声による対話を行う音声対話装置２０Ａであって、対話の内容を示す対話情報を保持するための複数の保持部１０３であって、複数の保持部１０３のそれぞれが用語の属性に対応付けられており、それぞれが当該保持部１０３に対応付けられた属性を有する用語を保持する複数の保持部１０３と、複数の保持部１０３が保持する用語の履歴を記憶している記憶部１０４と、ユーザの音声による発話の内容を示す発話データを取得し、取得した発話データに含まれる発話用語を、複数の保持部１０３のうち発話用語の属性に対応付けられた保持部１０３に保持させる取得部１０１と、対話情報を制御するための制御用語が含まれる制御発話データを取得部１０１が取得した場合に、記憶部１０４を参照して、複数の保持部１０３のそれぞれが保持している用語を、制御用語により特定される過去の時点において当該保持部１０３が保持していた用語に変更する変更部１０２とを備える。 As shown in FIG. 8, the voice interaction device 20A is a voice interaction device 20A that performs a dialogue with a user by voice, and includes a plurality of holding units 103 for holding dialogue information indicating the contents of the dialogue. Each of the plurality of holding units 103 is associated with a term attribute, and each of the plurality of holding units 103 holds a term having an attribute associated with the holding unit 103, and a plurality of holding units 103 A storage unit 104 that stores a history of terms to be held, and utterance data indicating the content of utterances by the user's voice are acquired. When the acquisition unit 101 acquires control unit utterance data including a control term for controlling dialogue information and the acquisition unit 101 stored in the storage unit 103 associated with the attribute of A change unit 102 that refers to the memory unit 104 and changes a term held by each of the plurality of holding units 103 to a term held by the holding unit 103 at a past time specified by the control term; Is provided.

なお、音声対話装置２０Ａは、さらに、複数の保持部１０３のそれぞれが保持している用語を対話情報として、対話情報に基づいて処理を行う処理部に対話情報を出力し、出力の応答として処理の結果を示す情報を取得する外部処理制御部１０５を備えてもよい。 The voice interaction device 20A further outputs the interaction information to a processing unit that performs processing based on the interaction information, using the terms held by each of the plurality of storage units 103 as interaction information, and processes the response as an output response. An external processing control unit 105 that acquires information indicating the result of the above may be provided.

なお、処理部は、取得した用語を検索語として情報検索を実行し、外部処理制御部１０５は、情報検索の結果を応答として取得し、音声対話装置２０Ａは、さらに、外部処理制御部１０５が取得した情報検索の結果をユーザに提示するための提示制御部１０６を備えてもよい。 The processing unit executes an information search using the acquired term as a search term, the external processing control unit 105 acquires a result of the information search as a response, and the voice interaction apparatus 20A further includes the external processing control unit 105. You may provide the presentation control part 106 for showing the acquired information search result to a user.

図９は、本実施の形態の変形例に係る音声対話装置２０Ａの制御方法を示すフロー図である。 FIG. 9 is a flowchart showing a control method of the voice interactive apparatus 20A according to a modification of the present embodiment.

図９に示されるように、ユーザとの音声による対話を行う音声対話装置２０Ａの制御方法は、ユーザの音声による発話の内容を示す発話データを取得し（ステップＳ４０１）、取得した発話データに含まれる発話用語を、複数の保持部１０３のうち発話用語の属性に対応付けられた保持部１０３に保持させる（ステップＳ４０２）取得ステップと、取得ステップで取得した発話データに、対話情報を制御するための制御用語が含まれる場合に、記憶部１０４が記憶している履歴を参照して、複数の保持部１０３のそれぞれが保持している用語を、制御用語により特定される過去の時点において当該保持部１０３が保持していた用語に変更する（ステップＳ４０３）変更ステップとを含む。 As shown in FIG. 9, the control method of the voice interaction device 20A that performs voice dialogue with the user acquires utterance data indicating the content of the utterance by the user's voice (step S401), and is included in the acquired utterance data. Of the plurality of holding units 103 is held in the holding unit 103 that is associated with the attribute of the utterance term (step S402), and the utterance data acquired in the acquiring step is used to control the conversation information. When the control term is included, the term held by each of the plurality of holding units 103 is referred to the history stored in the storage unit 104 and the term held in the past specified by the control term is held. A change step (step S403).

本変形例に係る音声対話装置２０Ａは、音声対話装置２０と同様の効果を奏する。 The voice interaction device 20A according to this modification has the same effects as the voice interaction device 20.

［１−４．効果等］
以上のように、本実施の形態に係る音声対話装置２０は、ユーザとの音声による対話を行う音声対話装置２０であって、対話の内容を示す対話情報を保持するための複数のスロット３１であって、複数のスロット３１のそれぞれが用語の属性に対応付けられており、それぞれが当該スロット３１に対応付けられた属性を有する用語を保持するための複数のスロット３１と、複数のスロット３１が保持する用語の履歴を記憶している履歴テーブル３２と、ユーザの音声による発話の内容を示す発話データを取得し、取得した発話データに含まれる発話用語を、複数のスロット３１のうち発話用語の属性に対応付けられたスロットに保持させる発話データ取得部２２と、発話データ取得部２２が取得した発話データに、対話情報を制御するための制御用語が含まれる場合に、履歴テーブル３２が記憶している履歴を参照して、複数のスロット３１のそれぞれが保持している用語を、制御用語により特定される過去の時点において当該スロット３１が保持していた用語に変更する操作部２５とを備える。 [1-4. Effect]
As described above, the voice interaction device 20 according to the present embodiment is a voice interaction device 20 that performs a voice dialogue with a user, and includes a plurality of slots 31 for holding dialogue information indicating the content of the dialogue. Each of the plurality of slots 31 is associated with a term attribute, and each of the plurality of slots 31 for holding a term having an attribute associated with the slot 31 includes a plurality of slots 31 A history table 32 that stores the history of terms to be held, and utterance data indicating the contents of utterances by the user's voice are acquired, and the utterance terms included in the acquired utterance data are converted into the utterance terms of the plurality of slots 31. An utterance data acquisition unit 22 to be held in a slot associated with an attribute, and utterance data acquired by the utterance data acquisition unit 22 are used for controlling dialogue information. If the term is included, the history stored in the history table 32 is referred to, and the term held in each of the plurality of slots 31 is changed to the slot 31 at the past time point specified by the control term. And an operation unit 25 for changing to the retained term.

これによれば、音声対話装置２０は、ユーザの音声に基づいて、対話情報を過去の時点におけるものに変更する、つまり、対話情報を過去の状態に戻すことができる。ここで、過去の時点とは、ユーザの音声により定められる時点である。よって、ユーザは、過去の時点を特定するための制御用語を含む音声による発話をすることで、音声対話装置２０との対話の内容である対話情報を、過去の時点におけるものに戻すことができる。このように、音声対話装置２０は、ユーザとの対話の内容を簡易な方法により修正することができる。 According to this, the voice interaction device 20 can change the conversation information to that at the past time point based on the user's voice, that is, the conversation information can be returned to the past state. Here, the past time point is a time point determined by the user's voice. Therefore, the user can return the conversation information, which is the content of the conversation with the voice interaction apparatus 20, to the one at the past time point by uttering by voice including the control terms for specifying the past time point. . Thus, the voice interaction device 20 can correct the content of the dialogue with the user by a simple method.

特に、音声対話装置２０は、ユーザの音声に基づいた制御を行うことにより、ユーザとの対話の内容を簡易な方法により修正する点に特徴を有する。ユーザは、従来の音声対話装置との音声対話では対話の内容を時系列で把握することが難しいので、対話の内容をユーザが希望する過去の時点に戻すという操作を行うことが難しい。本実施の形態に係る音声対話装置２０は、ユーザの音声に基づいた制御を行うので、対話の内容をユーザが希望する過去の時点に戻すことができる。そして、対話の内容が複雑になる、つまり、用語の数が増えるほど、ユーザの音声に基づく制御の優位性が高まると考えられる。 In particular, the voice dialogue apparatus 20 is characterized in that the content of the dialogue with the user is corrected by a simple method by performing control based on the voice of the user. Since it is difficult for the user to grasp the content of the dialogue in time series in the voice dialogue with the conventional voice dialogue apparatus, it is difficult to perform an operation of returning the content of the dialogue to a past time point desired by the user. Since the voice interaction device 20 according to the present embodiment performs control based on the user's voice, the content of the dialogue can be returned to a past time point desired by the user. And it is thought that the superiority of the control based on the user's voice increases as the content of the dialogue becomes more complicated, that is, the number of terms increases.

また、技術進化に伴い対話情報がより複雑になる場合、例えば、保持部が数十個又はそれ以上あるような場合には、上記のような修正の方法の優位性が高い。なぜなら、本実施の形態に示すような十個に満たない保持部を備える音声対話装置であれば、ユーザとの対話の内容を過去の時点に戻すことを行う代わりに、保持部が保持する用語をリセットし、最初から用語を設定し直すことも現実的に可能である。しかし、音声対話装置が数十個又はそれ以上の保持部を備える場合には、保持部が保持する用語を最初から設定し直すことは煩雑であり、ユーザにとって大きな負担となるので、現実的に可能とは言い難い。このような場合に、音声対話装置２０は、ユーザとの対話の内容を過去の時点に戻すことができるので、最初から設定し直すことなく、ユーザが希望する過去の時点から対話をやり直すことができる利点がある。 In addition, when the conversation information becomes more complicated as the technology evolves, for example, when there are several tens or more holding units, the above-described correction method is highly advantageous. Because, in the case of a voice dialogue apparatus having less than ten holding units as shown in the present embodiment, the term held by the holding unit instead of returning the content of the dialogue with the user to a past time point It is practically possible to reset and reset the terminology from the beginning. However, in the case where the voice interactive apparatus includes several tens or more holding units, it is complicated to reset the terms held by the holding unit from the beginning, which is a heavy burden on the user. It is hard to say that it is possible. In such a case, since the voice interaction device 20 can return the content of the dialog with the user to a past time, it is possible to restart the dialog from a past time desired by the user without resetting from the beginning. There are advantages you can do.

また、制御用語は、対話情報を過去の時点に変更することを示す予め定められた用語である第一用語と、予め定められた文字列とは異なる第二用語とを含み、操作部２５は、発話データ取得部２２が取得した発話データに第一用語及び第二用語が含まれるか否かを判定し、第一用語及び第二用語が含まれると判定した場合に、履歴を参照して、複数のスロット３１のそれぞれが保持している用語を、複数のスロット３１のうち第二用語の属性に対応付けられたスロット３１が第二用語を保持していた時点において当該スロット３１が保持していた用語に変更してもよい。 The control terms include a first term that is a predetermined term indicating that the dialogue information is changed to a past time point, and a second term that is different from the predetermined character string. The speech data acquired by the speech data acquisition unit 22 determines whether or not the first term and the second term are included, and if it is determined that the first term and the second term are included, refer to the history. The term held by each of the plurality of slots 31 is retained by the slot 31 when the slot 31 associated with the attribute of the second term among the plurality of slots 31 retains the second term. You may change the terminology.

これによれば、音声対話装置２０は、取得部が取得した制御用語に基づいて、保持部が保持する用語の内容を用いて具体的に過去の時点を特定する。このように、音声対話装置２０は、ユーザとの対話の内容を、より具体的な方法により修正することができる。 According to this, based on the control term acquired by the acquisition unit, the voice interaction device 20 specifically specifies a past time point using the content of the term held by the holding unit. Thus, the voice interaction device 20 can correct the content of the dialogue with the user by a more specific method.

また、操作部２５は、履歴テーブル３２が記憶している履歴上のある時点における複数のスロット３１の状態が所定の条件を満たす場合に、上記時点に復元ポイントを設定し、複数のスロット３１のそれぞれが保持している用語を変更する際には、復元ポイントが設定された時点のうち、複数のスロット３１のうち第二用語の属性に対応付けられたスロット３１が第二用語を保持していた時点において当該スロット３１が保持していた用語に変更してもよい。 In addition, when the state of the plurality of slots 31 at a certain point in the history stored in the history table 32 satisfies a predetermined condition, the operation unit 25 sets a restoration point at the above point, When changing the term that each holds, the slot 31 associated with the attribute of the second term among the plurality of slots 31 at the time when the restoration point is set holds the second term. The term may be changed to the term held in the slot 31 at that time.

これによれば、音声対話装置２０は、ユーザの音声に基づいた時点に復元ポイントを設定することで、後に保持部が保持する用語を変更する対象となる時点を所定の条件を用いて絞り込んでおくことができる。これにより、音声対話装置２０は、保持部が保持する用語を変更する際に、所定の条件により絞り込まれた、より適切な過去の時点に対話の状態を戻すことができる。 According to this, the voice interaction device 20 sets a restoration point at a time point based on the user's voice, thereby narrowing down a time point for which a term held by the holding unit is changed later using a predetermined condition. I can leave. Thereby, when changing the term which a holding | maintenance part hold | maintains, the voice interactive apparatus 20 can return the state of a dialog to the more suitable past time point narrowed down by predetermined conditions.

また、音声対話装置２０は、さらに、複数のスロット３１のそれぞれが保持している用語を対話情報として、対話情報に基づいて処理を行うタスク処理部４０に出力し、前記出力の応答として前記処理の結果を示す情報を取得するタスク制御部２４を備えてもよい。 Further, the voice interaction device 20 further outputs the terms held in each of the plurality of slots 31 as interaction information to the task processing unit 40 that performs processing based on the interaction information, and the processing as a response to the output A task control unit 24 that acquires information indicating the result of the above may be provided.

これによれば、音声対話装置２０は、複数の保持部が保持する用語を外部の処理部により処理した結果をユーザに提示する。よって、ユーザは、音声対話装置２０との対話の内容を反映した処理結果を取得することができる。 According to this, the voice interaction apparatus 20 presents the result of processing the terms held by the plurality of holding units by the external processing unit to the user. Therefore, the user can acquire the processing result reflecting the content of the dialogue with the voice dialogue apparatus 20.

また、タスク処理部４０は、取得した用語を検索語として情報検索を実行し、タスク制御部２４は、情報検索の結果を応答として取得し、音声対話装置２０は、さらに、外部処理制御部が取得した前記情報検索の結果を前記ユーザに提示するための提示制御部２９を備えてもよい。 In addition, the task processing unit 40 performs an information search using the acquired term as a search term, the task control unit 24 acquires a result of the information search as a response, and the voice interaction device 20 further includes an external processing control unit. You may provide the presentation control part 29 for showing the acquired result of the said information search to the said user.

これによれば、音声対話装置２０は、外部の処理部による処理の結果として、対話の内容に基づいた検索処理の結果を取得し、ユーザに提示することができる。 According to this, the speech dialogue apparatus 20 can acquire the result of the search process based on the content of the dialogue as a result of the processing by the external processing unit and present it to the user.

また、操作部２５は、履歴において、タスク制御部２４が情報検索の結果を取得した時点に、復元ポイントを設定してもよい。 Further, the operation unit 25 may set a restoration point at the time when the task control unit 24 acquires the information search result in the history.

これによれば、音声対話装置２０は、復元ポイントを用いて、保持部が保持する用語を、情報検索を行った時点におけるものに戻すことができる。情報検索を行った時点は、その結果が得られる時点でもあり、対話の中でユーザが特定しやすい時点である。このように復元ポイントを設定することで、音声対話装置２０は、保持部が保持する用語を、ユーザが直感的に特定しやすい時点におけるものに戻すことができる。 According to this, the voice interaction apparatus 20 can return the term held by the holding unit to that at the time when the information search is performed using the restoration point. The time when the information search is performed is also the time when the result is obtained, and is the time when the user can easily specify in the dialogue. By setting the restoration point in this way, the voice interaction apparatus 20 can return the term held by the holding unit to the one at the time when the user can easily specify intuitively.

また、操作部２５は、履歴において、タスク制御部２４が情報検索の結果を取得した時点であっても、情報検索の結果に含まれる情報が０件であった時点には、復元ポイントを設定することを禁止してもよい。 In addition, the operation unit 25 sets a restore point when the information included in the information search result is 0 even when the task control unit 24 acquires the information search result in the history. Doing so may be prohibited.

これによれば、音声対話装置２０は、情報検索の結果が０件であった時点を、復元ポイントを設定する時点から除外することができる。ユーザが対話の状態を戻そうとする場合、情報検索の結果が１件以上あった時点にするのが有用と考えられる。よって、音声対話装置２０は、ユーザとの対話の内容を、ユーザにとって有用な時点におけるものに戻すことができる。 According to this, the voice interactive apparatus 20 can exclude the time when the result of the information search is 0 from the time when the restoration point is set. When the user wants to return the dialog state, it is considered useful to set the time when there are one or more information search results. Therefore, the voice interaction device 20 can return the content of the dialog with the user to a time useful for the user.

また、操作部２５は、複数のスロット３１のそれぞれが保持している用語を変更する際に、履歴上に２以上の復元ポイントがある場合には、２以上の復元ポイントのうちユーザにより特定される復元ポイントを用いて、用語を変更してもよい。 In addition, when there are two or more restoration points on the history when the terms held by each of the plurality of slots 31 are changed, the operation unit 25 is specified by the user among the two or more restoration points. You may change the terminology using a restore point.

これによれば、音声対話装置２０は、複数ある復元ポイントのうちユーザが特定する１つの復元ポイントを用いて、ユーザとの対話の内容を過去の時点におけるものに戻すことができる。 According to this, the voice interactive apparatus 20 can return the content of the dialog with the user to the one at the past time by using one restoration point specified by the user among a plurality of restoration points.

また、音声対話装置２０は、さらに、２以上の復元ポイントのうち用語を変更するのに用いる１つの復元ポイントをユーザから受け付けるための応答文を生成する応答文生成部２１を備えてもよい。 Moreover, the voice interactive apparatus 20 may further include a response sentence generation unit 21 that generates a response sentence for accepting from the user one restoration point used to change a term among two or more restoration points.

これによれば、音声対話装置２０は、複数ある復元ポイントのうちからユーザにより１つの復元ポイントを特定させる。これにより、音声対話装置２０は、具体的にユーザから復元ポイントの指定を受け付け、ユーザとの対話の内容を過去の時点におけるものに戻すことができる。 According to this, the voice interactive apparatus 20 causes the user to specify one restoration point from among a plurality of restoration points. As a result, the voice interaction device 20 can specifically accept the designation of the restoration point from the user, and return the content of the dialogue with the user to that at the past time.

また、制御用語は、対話情報を過去の時点に変更することを示す予め定められた用語である第一用語と、属性の名称である属性名とを含み、操作部２５は、発話データ取得部２２が取得した発話データに第一用語及び属性名が含まれるか否かを判定し、第一用語及び属性名が含まれると判定した場合に、履歴を参照して、複数のスロット３１のそれぞれが保持している用語を、複数のスロット３１のうち属性名により示される属性に対応付けられたスロット３１が現在保持している用語を保持する直前の時点において当該スロット３１が保持していた用語に変更してもよい。 The control term includes a first term that is a predetermined term indicating that the dialogue information is changed to a past time point, and an attribute name that is a name of the attribute. The operation unit 25 includes an utterance data acquisition unit. 22 determines whether the first term and the attribute name are included in the utterance data acquired, and if it is determined that the first term and the attribute name are included, each of the plurality of slots 31 is referred to with reference to the history. The term held in the slot 31 immediately before holding the term currently held in the slot 31 associated with the attribute indicated by the attribute name among the plurality of slots 31 You may change to

これによれば、音声対話装置２０は、取得部が取得した制御用語に基づいて、保持部が対応付けられている属性の名称を用いて具体的に過去の時点を特定する。このように、音声対話装置２０は、ユーザとの対話の内容を、より具体的な方法により修正することができる。 According to this, based on the control term acquired by the acquisition unit, the voice interaction device 20 specifically specifies a past time point using the name of the attribute with which the holding unit is associated. Thus, the voice interaction device 20 can correct the content of the dialogue with the user by a more specific method.

また、本実施の形態に係る音声対話システム１は、ユーザとの音声による対話を行う音声対話システム１であって、対話の内容を示す対話情報を保持するための複数のスロット３１であって、複数のスロット３１のそれぞれが用語の属性に対応付けられており、それぞれが当該スロット３１に対応付けられた属性を有する用語を保持するための複数のスロット３１と、複数のスロット３１が保持する用語の履歴を記憶している履歴テーブル３２と、ユーザの音声による発話の内容を示す発話データを取得し、取得した発話データに含まれる発話用語を、複数のスロット３１のうち発話用語の属性に対応付けられたスロット３１に保持させる発話データ取得部２２と、発話データ取得部２２が取得した発話データに、対話情報を制御するための制御用語が含まれる場合に、履歴テーブル３２が記憶している履歴を参照して、複数のスロット３１のそれぞれが保持している用語を、制御用語により特定される過去の時点において当該スロット３１が保持していた用語に変更する操作部２５と、ユーザの音声を取得して音声信号を生成するマイク１３と、マイク１３が生成した音声信号に対して音声認識処理を施すことで、発話データ取得部２２により取得される発話データを生成する音声認識部１４と、複数のスロット３１が保持している対話情報を取得し、取得した対話情報に対して所定の処理を施し、処理の結果を示す情報を出力するタスク処理部４０と、ユーザの音声による発話に対する応答文を生成し、生成した応答文に対して音声合成処理を施すことで音声信号を生成する音声合成部１２と、音声合成部１２が生成した音声信号を音声として出力するスピーカと、タスク処理部４０が出力した処理の結果を表示する表示装置１０とを備える。 Further, the voice interaction system 1 according to the present embodiment is a voice interaction system 1 that performs a dialogue with a user by voice, and includes a plurality of slots 31 for holding dialogue information indicating the content of the dialogue, Each of the plurality of slots 31 is associated with a term attribute, each of which has a term having an attribute associated with the slot 31, and a term held by the plurality of slots 31 The history table 32 storing the history of the utterance and the utterance data indicating the content of the utterance by the user's voice are acquired, and the utterance term included in the acquired utterance data is associated with the attribute of the utterance term among the plurality of slots 31. The utterance data acquisition unit 22 to be held in the attached slot 31 and the utterance data acquired by the utterance data acquisition unit 22 are used for controlling dialogue information. If the term is included, the history stored in the history table 32 is referred to, and the term held in each of the plurality of slots 31 is changed to the slot 31 at the past time point specified by the control term. The utterance data acquisition is performed by performing the voice recognition process on the voice signal generated by the operation unit 25 that changes to the retained term, the microphone 13 that acquires the user's voice and generates the voice signal, and the microphone 13. The speech recognition unit 14 that generates the utterance data acquired by the unit 22 and the conversation information held by the plurality of slots 31 are acquired, a predetermined process is performed on the acquired dialog information, and the processing result is shown. A task processing unit 40 that outputs information, and a sound that generates a speech signal by generating a response sentence to an utterance by a user's voice and performing a speech synthesis process on the generated response sentence It comprises a combining unit 12, and a speaker for outputting audio signals the speech synthesis unit 12 has generated as a voice, and a display device 10 for displaying the result of processing the task processing unit 40 is outputted.

これにより、上記音声対話装置２０と同様の効果を奏する。 Thereby, there exists an effect similar to the said voice interactive apparatus 20. FIG.

また、本実施の形態に係る音声対話方法は、ユーザとの音声による対話を行う音声対話装置２０の制御方法であって、音声対話装置２０は、対話の内容を示す対話情報を保持するための複数のスロット３１であって、複数のスロット３１のそれぞれが用語の属性に対応付けられており、それぞれが当該スロット３１に対応付けられた属性を有する用語を保持するための複数のスロット３１と、複数のスロット３１が保持する用語の履歴を記憶している履歴テーブル３２とを備え、制御方法は、ユーザの音声による発話の内容を示す発話データを取得し、取得した発話データに含まれる発話用語を、複数のスロット３１のうち発話用語の属性に対応付けられたスロット３１に保持させる取得ステップと、取得ステップで取得した発話データに、対話情報を制御するための制御用語が含まれる場合に、履歴テーブル３２が記憶している履歴を参照して、複数のスロット３１のそれぞれが保持している用語を、制御用語により特定される過去の時点において当該スロット３１が保持していた用語に変更する変更ステップとを含む。 The voice interaction method according to the present embodiment is a control method of the voice interaction device 20 that performs a voice interaction with a user, and the voice interaction device 20 holds dialogue information indicating the content of the dialogue. A plurality of slots 31, each of the plurality of slots 31 being associated with a term attribute, each of which has a term having an attribute associated with the slot 31; A history table 32 storing a history of terms held by a plurality of slots 31, and the control method acquires utterance data indicating the content of the utterance by the user's voice, and the utterance terms included in the acquired utterance data Is acquired in the slot 31 associated with the attribute of the utterance term among the plurality of slots 31, and the utterance data acquired in the acquisition step is When a control term for controlling the talk information is included, the history stored in the history table 32 is referred to, and the term held in each of the plurality of slots 31 is identified by the control term. And a change step for changing to the term held in the slot 31 at the time of.

以上のように、本開示における技術の例示として、実施の形態を説明した。そのために、添付図面および詳細な説明を提供した。 As described above, the embodiments have been described as examples of the technology in the present disclosure. For this purpose, the accompanying drawings and detailed description are provided.

したがって、添付図面および詳細な説明に記載された構成要素の中には、課題解決のために必須な構成要素だけでなく、上記実装を例示するために、課題解決のためには必須でない構成要素も含まれ得る。そのため、それらの必須ではない構成要素が添付図面や詳細な説明に記載されていることをもって、直ちに、それらの必須ではない構成要素が必須であるとの認定をするべきではない。 Accordingly, among the components described in the accompanying drawings and the detailed description, not only the components essential for solving the problem, but also the components not essential for solving the problem in order to illustrate the above implementation. May also be included. Therefore, it should not be immediately recognized that these non-essential components are essential as those non-essential components are described in the accompanying drawings and detailed description.

また、上述の実施の形態は、本開示における技術を例示するためのものであるから、特許請求の範囲またはその均等の範囲において種々の変更、置き換え、付加、省略などを行うことができる。 Moreover, since the above-mentioned embodiment is for demonstrating the technique in this indication, a various change, replacement, addition, abbreviation, etc. can be performed in a claim or its equivalent range.

本開示は、簡易な方法により、ユーザとの対話の内容を修正することができる音声対話装置として有用である。例えば、本開示は、カーナビゲーション装置、スマートフォン（高機能携帯電話端末）、携帯電話端末、又は、ＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）のアプリケーションに適用することができる。 The present disclosure is useful as a speech dialogue apparatus that can correct the content of dialogue with a user by a simple method. For example, the present disclosure can be applied to an application of a car navigation device, a smartphone (high-function mobile phone terminal), a mobile phone terminal, or a PC (Personal Computer).

１音声対話システム
１０表示装置
１１スピーカ
１２音声合成部
１３マイク
１４音声認識部
２０、２０Ａ音声対話装置
２１応答文生成部
２２発話データ取得部
２３シーケンス制御部
２４タスク制御部
２５操作部
２６解析部
２７メモリ
２８タスク結果解析部
２９、１０６提示制御部
３１スロット
３２、３２０履歴テーブル
４０タスク処理部
１０１取得部
１０２変更部
１０３保持部
１０４記憶部
１０５外部処理制御部
３１０対話シーケンス
３１１時刻情報
３１２発話
３１３応答
３２１必須スロット群
３２２オプションスロット群
３２３アクション
３２４復元ポイント
３３０検索結果 DESCRIPTION OF SYMBOLS 1 Voice dialogue system 10 Display apparatus 11 Speaker 12 Voice synthesizer 13 Microphone 14 Voice recognition part 20, 20A Voice dialogue apparatus 21 Response sentence generation part 22 Utterance data acquisition part 23 Sequence control part 24 Task control part 25 Operation part 26 Analysis part 27 Memory 28 Task result analysis unit 29, 106 Presentation control unit 31 Slot 32, 320 History table 40 Task processing unit 101 Acquisition unit 102 Change unit 103 Holding unit 104 Storage unit 105 External processing control unit 310 Dialog sequence 311 Time information 312 Utterance 313 Response 321 Required slot group 322 Optional slot group 323 Action 324 Restore point 330 Search result

Claims

A voice dialogue device that performs voice dialogue with a user,
A plurality of holding units for holding dialogue information indicating the content of the dialogue, wherein each of the plurality of holding units is associated with a term attribute, and each of the plurality of holding units has an attribute associated with the holding unit. A plurality of holding portions for holding the terms having;
A storage unit storing a history of terms held by the plurality of holding units;
Acquisition of utterance data indicating the content of utterances by the user's voice, and acquisition of utterance terms included in the acquired utterance data in a holding unit associated with the attribute of the utterance term among the plurality of holding units And
When the utterance data acquired by the acquisition unit includes a control term for controlling the conversation information, each of the plurality of holding units is referred to with reference to the history stored in the storage unit A voice interaction device comprising: a changing unit that changes a held term to a term held by the holding unit at a past time point specified by the control term.

The control term is:
Including a first term that is a predetermined term indicating that the dialogue information is changed to a past time point, and a second term that is different from the predetermined character string,
The changing unit is
It is determined whether or not the first term and the second term are included in the utterance data acquired by the acquisition unit, and the history is determined when it is determined that the first term and the second term are included. Referring to the term held by each of the plurality of holding units, the corresponding holding unit associated with the attribute of the second term among the plurality of holding units holds the second term. The spoken dialogue apparatus according to claim 1, wherein the term is changed to a term held by the holding unit at the time.

The changing unit is
When the state of a plurality of holding units at a certain point on the history stored in the storage unit satisfies a predetermined condition, a restoration point is set at the point in time,
When changing the term held by each of the plurality of holding units, the holding associated with the attribute of the second term among the plurality of holding units at the time when the restoration point is set The spoken dialogue apparatus according to claim 2, wherein when the unit holds the second term, the term is changed to the term held by the holding unit.

The voice interaction device further includes:
An external that outputs the terminology held by each of the plurality of holding units as the dialogue information to a processing unit that performs processing based on the dialogue information, and acquires information indicating the result of the processing as a response to the output The voice interactive apparatus according to claim 3, further comprising a processing control unit.

The processing unit performs an information search using the acquired term as a search term,
The external processing control unit acquires the result of the information search as the response,
The voice interaction device further includes:
The voice interactive apparatus according to claim 4, further comprising a presentation control unit for presenting the result of the information search acquired by the external processing control unit to the user.

The voice interactive apparatus according to claim 5, wherein the changing unit sets the restoration point at the time when the external processing control unit acquires the information search result in the history.

In the history, the changing unit is configured to restore the restoration when the information included in the information search result is 0 even when the external processing control unit acquires the information search result. The voice interactive apparatus according to claim 6, wherein setting of points is prohibited.

The change unit may change the term held by each of the plurality of holding units, and if there are two or more restoration points on the history, the user among the two or more restoration points. The spoken dialogue apparatus according to claim 3, wherein the term is changed by using a restoration point specified by

The voice interaction device further includes:
The voice dialogue apparatus according to claim 8, further comprising: a response sentence generation unit that generates a response sentence for accepting, from a user, one restoration point used to change the term among the two or more restoration points.

The control term is:
Including a first term that is a predetermined term indicating that the dialogue information is changed to a past time point, and an attribute name that is a name of the attribute,
The changing unit is
It is determined whether the first term and the attribute name are included in the utterance data acquired by the acquisition unit, and when it is determined that the first term and the attribute name are included, the history is referred to. The term held by each of the plurality of holding units is immediately before holding the term currently held by the correspondence holding unit associated with the attribute indicated by the attribute name among the plurality of holding units. The spoken dialogue apparatus according to claim 1, wherein the term is changed to a term held by the holding unit at the point of time.

A voice dialogue system that performs voice dialogue with a user,
A plurality of holding units for holding dialogue information indicating the content of the dialogue, wherein each of the plurality of holding units is associated with a term attribute, and each of the plurality of holding units has an attribute associated with the holding unit. A plurality of holding portions for holding the terms having;
A storage unit storing a history of terms held by the plurality of holding units;
Acquisition of utterance data indicating the content of utterances by the user's voice, and acquisition of utterance terms included in the acquired utterance data in a holding unit associated with the attribute of the utterance term among the plurality of holding units And
When the utterance data acquired by the acquisition unit includes a control term for controlling the conversation information, each of the plurality of holding units is referred to with reference to the history stored in the storage unit A changing unit that changes the holding term to a term held by the holding unit at a past time specified by the control term;
A microphone that captures the user's voice and generates a voice signal;
A voice recognition unit that generates the utterance data acquired by the acquisition unit by performing a voice recognition process on the voice signal generated by the microphone;
A processing unit that acquires the dialogue information held by the plurality of holding units, performs predetermined processing on the acquired dialogue information, and outputs information indicating a result of the processing;
A speech synthesizer that generates a response signal to an utterance by the user's voice and generates a speech signal by performing speech synthesis processing on the generated response statement;
A speaker that outputs the voice signal generated by the voice synthesizer as voice;
A voice dialogue system comprising: a display device that displays a result of the processing output by the processing unit.

A method for controlling a voice dialogue apparatus that performs voice dialogue with a user,
The voice interaction device
A plurality of holding units for holding dialogue information indicating the content of the dialogue, wherein each of the plurality of holding units is associated with a term attribute, and each of the plurality of holding units has an attribute associated with the holding unit. A plurality of holding portions for holding the terms having;
A storage unit storing a history of terms held by the plurality of holding units,
The control method is:
Acquisition of utterance data indicating the content of utterances by the user's voice, and acquisition of utterance terms included in the acquired utterance data in a holding unit associated with the attribute of the utterance term among the plurality of holding units Steps,
When the utterance data acquired in the acquisition step includes a control term for controlling the conversation information, each of the plurality of holding units is referred to with reference to the history stored in the storage unit A change step of changing the held term to a term held by the holding unit at a past time specified by the control term.