JP6495014B2

JP6495014B2 - Spoken dialogue control device, control method of spoken dialogue control device, and spoken dialogue device

Info

Publication number: JP6495014B2
Application number: JP2015002568A
Authority: JP
Inventors: 暁本村
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2015-01-08
Filing date: 2015-01-08
Publication date: 2019-04-03
Anticipated expiration: 2035-01-08
Also published as: JP2016126293A

Description

本発明は、ユーザの発話に対して応答する音声対話装置を制御するための音声対話制御装置に関する。 The present invention relates to a voice dialogue control device for controlling a voice dialogue device that responds to a user's utterance.

ユーザの発話に対して音声や動作で応答することで、ユーザと対話する音声対話装置（ロボット）が、従来から広く研究されている。ここで、ユーザと音声対話装置の対話においては、ユーザが発話してから、音声対話装置が当該発話の内容に応じた応答をするまでにある程度の時間を要する。この時間に音声対話装置が何も動作しないと、ユーザが音声対話装置とのコミュニケーションにおいてストレスを感じる可能性がある。この問題に対する解決策として、例えば、下記の特許文献１には、ユーザの入力を受け付けてから、サーバとの通信状態が復帰するまでの待機時間を算出して、待機時間に応じた所定の情報の提示を行う技術が開示されている。 2. Description of the Related Art Conventionally, a speech dialogue apparatus (robot) that interacts with a user by responding to the user's utterance with voice or motion has been widely studied. Here, in the dialogue between the user and the voice interactive device, a certain amount of time is required from when the user speaks until the voice interactive device responds according to the content of the speech. If the voice interaction device does not operate at this time, the user may feel stress in communication with the voice interaction device. As a solution to this problem, for example, in Patent Document 1 below, a waiting time from when a user input is received until the communication state with the server is restored is calculated, and predetermined information corresponding to the waiting time is calculated. A technique for presenting the above is disclosed.

特開２０１４−１７４４８５号公報（２０１４年０９月２２日公開）JP 2014-174485 A (published on September 22, 2014) 特開２０１４−１９１０３０号公報（２０１４年１０月０６日公開）JP 2014-191030 A (released on October 06, 2014) 特開２００３−３３０９２３号公報（２００３年１１月２１日公開）JP 2003-330923 A (published November 21, 2003)

しかしながら、上記の特許文献１〜３に記載の技術では、場つなぎ動作の内容は画一的であり、待機時間の長さに応じた柔軟な対応を取ることができない。例えば、特許文献１の技術では、算出した待機時間が所定時間以上であれば、待機時間がどれだけ長くても、実行される場つなぎ動作は対話の中断を謝罪するメッセージの出力となってしまう。 However, in the techniques described in Patent Documents 1 to 3, the contents of the joining operation are uniform, and it is not possible to take a flexible response according to the length of the standby time. For example, in the technique of Patent Document 1, if the calculated waiting time is equal to or longer than a predetermined time, no matter how long the waiting time is, the connecting operation that is executed will output a message that apologizes for interruption of the dialogue. .

本発明は、上記の問題に鑑みてなされたものであり、その目的は、待機時間の長さに応じた場つなぎ動作を実行することで、ユーザと音声対話装置とのコミュニケーションの柔軟性を向上させる音声対話制御装置などを提供することにある。 The present invention has been made in view of the above problems, and an object thereof is to improve the flexibility of communication between the user and the voice interactive apparatus by executing a joint operation according to the length of the standby time. It is to provide a voice dialogue control device and the like.

上記の課題を解決するために、本発明の一態様に係る音声対話制御装置は、音声対話装置が、ユーザが発した音声を取得した後の所定の時点から、当該音声に対する応答が出力可能になるまでの待機時間を予測する待機時間予測部と、上記待機時間予測部が予測した上記待機時間と、上記音声対話装置が実行可能な動作を示す複数の動作候補それぞれの実行に要する動作時間とに基づいて、上記複数の動作候補から１または複数を場つなぎ動作として選択する場つなぎ動作決定部と、上記場つなぎ動作決定部が選択した上記場つなぎ動作を上記音声対話装置に実行させる場つなぎ動作実行部と、を備える。 In order to solve the above-described problem, the voice conversation control device according to one aspect of the present invention can output a response to the voice from a predetermined time point after the voice dialogue device acquires the voice uttered by the user. A standby time prediction unit that predicts a standby time until the operation time, the standby time predicted by the standby time prediction unit, and an operation time required to execute each of a plurality of operation candidates indicating operations that can be executed by the voice interactive device Based on the above, a field joining operation determining unit that selects one or more of the plurality of motion candidates as a field joining operation, and a field joining that causes the voice interactive apparatus to execute the field joining operation selected by the field joining operation determining unit. An operation execution unit.

また、上記の課題を解決するために、本発明の一態様に係る音声対話制御装置の制御方法は、音声対話装置に実行させる音声対話制御装置の制御方法であって、音声対話装置が、ユーザが発した音声を取得した後の所定の時点から、当該音声に対する応答が出力可能になるまでの待機時間を予測する待機時間予測ステップと、上記待機時間予測ステップにて予測した上記待機時間と、上記音声対話装置が実行可能な動作を示す複数の動作候補それぞれの実行に要する動作時間とに基づいて、上記複数の動作候補から１または複数を場つなぎ動作として選択する場つなぎ動作決定ステップと、上記場つなぎ動作決定ステップにて決定された上記場つなぎ動作を上記音声対話装置に実行させる場つなぎ動作実行ステップと、を含む。 In order to solve the above problems, a control method for a voice interaction control device according to an aspect of the present invention is a method for controlling a voice interaction control device to be executed by a voice interaction device, wherein the voice interaction device is a user. A standby time predicting step for predicting a standby time until a response to the sound can be output from a predetermined time after acquiring the voice emitted by the voice, and the standby time predicted in the standby time predicting step, Based on the operation time required for execution of each of the plurality of operation candidates indicating the operations that can be performed by the voice interaction device, the step of selecting a joint operation to select one or more from the plurality of operation candidates as a joint operation; A field joining operation execution step for causing the voice interactive apparatus to execute the field joining operation determined in the field joining operation determining step.

本発明の一態様によれば、ユーザと音声対話装置とのコミュニケーションの柔軟性を向上させるという効果を奏する。 According to one aspect of the present invention, there is an effect that the flexibility of communication between the user and the voice interaction apparatus is improved.

本発明の実施形態１に係る音声対話装置の構成を示すブロック図である。It is a block diagram which shows the structure of the voice interactive apparatus which concerns on Embodiment 1 of this invention. 図１に示す音声対話装置の記憶部に記憶されている場つなぎ動作テーブルのデータ構造およびデータ例を示す図である。It is a figure which shows the data structure and example of a data which are stored in the memory | storage part of the voice interactive apparatus shown in FIG. 図１に示す音声対話装置の記憶部に記憶されている場つなぎ順序テーブルのデータ構造およびデータ例を示す図である。It is a figure which shows the data structure and example of a data of a connection connection order table memorize | stored in the memory | storage part of the voice interactive apparatus shown in FIG. 図１に示す音声対話制御装置が実行する応答実行処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of the response execution process which the voice dialogue control apparatus shown in FIG. 1 performs. 図４に示すフローチャートにおける場つなぎ動作決定処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of the joining operation | movement determination process in the flowchart shown in FIG. 本発明の実施形態２に係る音声対話装置の構成を示すブロック図である。It is a block diagram which shows the structure of the voice interactive apparatus which concerns on Embodiment 2 of this invention. 図６に示す音声対話装置の記憶部に記憶されている場つなぎ動作テーブルのデータ構造およびデータ例を示す図である。FIG. 7 is a diagram illustrating a data structure and a data example of a joining operation table stored in a storage unit of the voice interactive apparatus shown in FIG. 6. 図６に示す音声対話制御装置が実行する場つなぎ動作決定処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of the place connection operation | movement determination process which the voice interactive control apparatus shown in FIG. 6 performs. 本発明の実施形態３に係る音声対話装置の構成を示すブロック図である。It is a block diagram which shows the structure of the voice interactive apparatus which concerns on Embodiment 3 of this invention. 図９に示す音声対話制御装置が実行する応答実行処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of the response execution process which the voice interactive control apparatus shown in FIG. 9 performs.

〔実施形態１〕
本発明の一実施形態（実施形態１）について図１から図５に基づいて説明すると以下のとおりである。 Embodiment 1
An embodiment (Embodiment 1) of the present invention will be described below with reference to FIGS.

まず、図１に基づいて、本実施形態に係る音声対話装置１０について説明する。図１は、本実施形態に係る音声対話装置１０の構成を示すブロック図である。 First, the voice interactive apparatus 10 according to the present embodiment will be described with reference to FIG. FIG. 1 is a block diagram showing a configuration of a voice interaction apparatus 10 according to the present embodiment.

音声対話装置１０は、ユーザの発話に対して音声や動作で応答することで、ユーザと対話する装置である。なお、音声対話装置１０の具体例としては人型ロボットが挙げられるが、これに限定されるものではない。例えば、音声対話装置１０の他の具体例として、スマートフォンなどの音声対話機能付きの携帯端末や、音声対話機能付きのカーナビゲーションシステムなどが挙げられる。図１に示すように、音声対話装置１０は、音声対話制御装置を制御部１として備えている。なお、音声対話装置１０と音声対話制御装置とは別体であってもよい。また、音声対話装置１０は、上記制御部１（音声対話制御装置）の他に、音声入力部２、通信部３、音声出力部４、駆動部５、および記憶部６を備えている。 The voice interaction device 10 is a device that interacts with a user by responding to the user's utterance with a voice or action. In addition, although a humanoid robot is mentioned as a specific example of the voice interactive apparatus 10, it is not limited to this. For example, other specific examples of the voice interaction device 10 include a mobile terminal with a voice interaction function such as a smartphone, a car navigation system with a voice interaction function, and the like. As shown in FIG. 1, the voice interaction device 10 includes a voice interaction control device as the control unit 1. Note that the voice interaction device 10 and the voice interaction control device may be separate. The voice interaction device 10 includes a voice input unit 2, a communication unit 3, a voice output unit 4, a drive unit 5, and a storage unit 6 in addition to the control unit 1 (voice conversation control device).

音声入力部２はユーザが発した音声を取得するいわゆるマイクである。音声入力部２は、取得した音声を音声データに変換し、後述する音声認識部１３に出力する。また音声入力部２は当該音声データのサイズ（データ量）および時間（発話時間）の少なくとも一方（以降、音声付属情報と称する）を、待機時間予測部１１に出力する。通信部３は音声対話装置１０と外部機器との通信を行う。具体的には、通信部３は、後述する応答生成部１４によって制御されて、外部機器から応答生成に必要なデータを受信する。例えば通信部３は、天気予報に関するデータを管理する天気予報サーバ（不図示）から、明日の天気に関するデータを取得し、応答生成部１４に出力する。音声出力部４は音声を出力するいわゆるスピーカである。具体的には、音声出力部４は、ユーザが発した音声に対する応答や、後述する場つなぎ動作としての音声を出力する。駆動部５は音声対話装置１０（人型ロボット）における頭部や脚部などの可動部位を駆動させるものであり、例えばサーボモータである。なお、サーボモータ以外のアクチュエータを用いてもよい。具体的には、駆動部５は、ユーザが発した音声に対する応答や、場つなぎ動作としての動作を、可動部位を駆動させることで音声対話装置１０に行わせる。なお、音声対話装置１０がスマートフォンなどの可動部位を有しない装置である場合、駆動部５は省略されてもよい。記憶部６は、音声対話装置１０にて使用される各種データを記憶する。記憶部６は少なくとも、場つなぎ動作テーブル６１および場つなぎ順序テーブル６２を記憶している。なお、これらのテーブルの詳細については後述する。 The voice input unit 2 is a so-called microphone that acquires voice uttered by the user. The voice input unit 2 converts the acquired voice into voice data and outputs the voice data to the voice recognition unit 13 described later. The voice input unit 2 outputs at least one of the size (data amount) and time (speech time) of the voice data (hereinafter referred to as voice attached information) to the standby time prediction unit 11. The communication unit 3 performs communication between the voice interaction device 10 and an external device. Specifically, the communication unit 3 is controlled by a response generation unit 14 described later, and receives data necessary for generating a response from an external device. For example, the communication unit 3 acquires data on tomorrow's weather from a weather forecast server (not shown) that manages data on weather forecasts, and outputs the data to the response generation unit 14. The audio output unit 4 is a so-called speaker that outputs audio. Specifically, the voice output unit 4 outputs a response to a voice uttered by the user and a voice as a jointing operation described later. The drive unit 5 drives a movable part such as a head or a leg in the voice interaction device 10 (humanoid robot), and is a servo motor, for example. An actuator other than the servo motor may be used. Specifically, the drive unit 5 causes the voice interaction device 10 to perform a response to a voice uttered by the user or an operation as a jointing operation by driving the movable part. In addition, when the voice interaction apparatus 10 is an apparatus that does not have a movable part such as a smartphone, the driving unit 5 may be omitted. The storage unit 6 stores various data used in the voice interaction device 10. The storage unit 6 stores at least a field connection operation table 61 and a field connection order table 62. Details of these tables will be described later.

制御部１は、音声対話装置１０が備える各部を統括制御する。制御部１は、待機時間予測部１１、場つなぎ動作制御部１２、音声認識部１３、応答生成部１４、および応答実行部１５を含んでいる。 The control unit 1 performs overall control of each unit included in the voice interaction device 10. The control unit 1 includes a standby time prediction unit 11, a joining operation control unit 12, a voice recognition unit 13, a response generation unit 14, and a response execution unit 15.

待機時間予測部１１は、音声対話装置１０がユーザの発した音声を取得してから、当該音声に対する応答が出力可能となるまでの待機時間を予測する。具体的には、待機時間予測部１１は音声入力部２から音声付属情報を受け取ると、当該音声データのサイズ（データ量）を用いて待機時間を予測する。より詳細には、待機時間予測部１１は、「待機時間＝α×データ量（αは単位データ量あたりに要する待機時間であり、所定の値である）」という計算式を用いて、待機時間を算出する。待機時間予測部１１は、予測（算出）した待機時間を後述する場つなぎ動作決定部２１に出力する。なお、待機時間予測部１１は、音声データの時間（ユーザの発話時間）を用いて待機時間を予測してもよい。具体的には、待機時間予測部１１は、「待機時間＝β×発話時間（βは単位発話時間あたりに要する待機時間であり、所定の値である）」という計算式を用いて、待機時間を算出してもよい。また、音声データのデータ量および発話時間の両方を用いて、待機時間を予測（算出）してもよい。データ量から算出した待機時間と発話時間から算出した待機時間とが異なる場合、より長い（または短い）方の待機時間を採用してもよいし、２つの待機時間の平均値を算出し、算出した平均待機時間を場つなぎ動作決定部２１に出力してもよい。 The standby time prediction unit 11 predicts a standby time from when the voice interaction apparatus 10 acquires a voice uttered by the user until a response to the voice can be output. Specifically, when the standby time prediction unit 11 receives the audio attachment information from the audio input unit 2, the standby time prediction unit 11 predicts the standby time using the size (data amount) of the audio data. More specifically, the standby time prediction unit 11 uses the calculation formula “standby time = α × data amount (α is a standby time required per unit data amount, which is a predetermined value)” to calculate the standby time. Is calculated. The standby time predicting unit 11 outputs the predicted (calculated) standby time to the place transition operation determining unit 21 described later. Note that the standby time prediction unit 11 may predict the standby time using the time of the voice data (user's speech time). Specifically, the standby time prediction unit 11 uses the calculation formula “standby time = β × speech time (β is a standby time required per unit utterance time, which is a predetermined value)”. May be calculated. Further, the standby time may be predicted (calculated) using both the data amount of the voice data and the speech time. When the waiting time calculated from the data amount and the waiting time calculated from the utterance time are different, the longer (or shorter) waiting time may be adopted, or the average value of the two waiting times is calculated and calculated. The average waiting time may be output to the operation determining unit 21 by connecting.

場つなぎ動作制御部１２は、場つなぎ動作の決定および実行を行う。場つなぎ動作制御部１２は、場つなぎ動作決定部２１および場つなぎ動作実行部２２を含む。 The field connection operation control unit 12 determines and executes the field connection operation. The field connection operation control unit 12 includes a field connection operation determination unit 21 and a field connection operation execution unit 22.

場つなぎ動作決定部２１は、待機時間予測部１１が予測した待機時間に基づいて、音声対話装置１０が実行する場つなぎ動作を決定する。ここで、場つなぎ動作とは、待機時間、すなわちユーザが発した音声を取得してから、当該音声に対する応答が出力可能となるまでの時間中に、音声対話装置１０に実行させる動作である。具体的には、場つなぎ動作決定部２１は、記憶部６に記憶されている場つなぎ動作テーブル６１を用いて、待機時間予測部１１が予測した待機時間と、待機時間中に音声対話装置１０に実行させる場つなぎ動作に要する場つなぎ動作時間とに応じて、場つなぎ動作を決定する。 Based on the standby time predicted by the standby time predicting unit 11, the cross-linking operation determining unit 21 determines the connecting operation performed by the voice interaction device 10. Here, the place-joining operation is an operation that is executed by the voice interaction apparatus 10 during the standby time, that is, the time from when the voice uttered by the user is acquired until the response to the voice can be output. Specifically, the field connection operation determination unit 21 uses the field connection operation table 61 stored in the storage unit 6 to predict the standby time predicted by the standby time prediction unit 11 and the voice interaction apparatus 10 during the standby time. The field joining operation is determined in accordance with the field joining operation time required for the field joining operation to be executed by the computer.

ここで図２を参照して、場つなぎ動作テーブル６１の詳細について説明する。図２は、記憶部６に記憶されている場つなぎ動作テーブル６１のデータ構造およびデータ例を示す図である。なお、図２に示す場つなぎ動作テーブル６１は一例であり、データ構造およびデータ例を図２の例に限定するものではない。場つなぎ動作テーブル６１は、場つなぎ動作を示す情報と、当該場つなぎ動作に要する時間である場つなぎ動作時間とを対応付けたテーブルである。「場つなぎ動作」のカラムには、音声対話装置１０が実行可能な動作を示す複数の動作候補の情報（以下、場つなぎ動作情報と称する）が格納される。「種別」のカラムには、各場つなぎ動作が音声を出力するものであるか（図２では「音声」で示されている）、音声対話装置１０の可動部位を動作させるものであるか（図２では「身振り」で示されている）、またはその両方を実行するものであるか（図２では「音声＋身振り」で示されている）を示す情報が格納される。「場つなぎ動作時間」のカラムには上記場つなぎ動作時間が格納されている。 Here, with reference to FIG. 2, the details of the joining operation table 61 will be described. FIG. 2 is a diagram illustrating a data structure and a data example of the connection operation table 61 stored in the storage unit 6. 2 is an example, and the data structure and the data example are not limited to the example in FIG. The field connection operation table 61 is a table in which information indicating a field connection operation is associated with a field connection operation time that is a time required for the field connection operation. Stored in the “place-joining action” column is information on a plurality of action candidates (hereinafter referred to as place-joining action information) indicating actions that can be executed by the voice interaction apparatus 10. In the “type” column, whether the connection operation at each place outputs a voice (indicated by “voice” in FIG. 2), or moves the movable part of the voice interactive apparatus 10 ( Information indicating whether or not to perform both (indicated by “gesture” in FIG. 2) (indicated by “voice + gesture” in FIG. 2) is stored. The above-mentioned “joining operation time” is stored in the “joining operation time” column.

より具体的には、場つなぎ動作決定部２１は受け取った待機時間から、場つなぎ動作テーブル６１の各場つなぎ動作時間を減算して、各場つなぎ動作情報における減算値Ｔ_Ｎ（第１減算値）を算出する。なお、Ｎは場つなぎ動作テーブル６１における「Ｎｏ．」に格納されている数字である。続いて、場つなぎ動作決定部２１は算出した減算値Ｔ_Ｎのそれぞれについて、０以上かつ、場つなぎ動作を音声対話装置１０が実行してから応答の生成が完了するまでに、音声対話装置１０が動作しない時間として許容できる時間を示す第１許容時間Ｘ以下となるか否か（０≦Ｔ_Ｎ≦Ｘを満たす場つなぎ動作情報があるか否か）を判定する。第１許容時間Ｘは予め設定されている値であり、例えばＸ＝２であれば、場つなぎ動作が完了してから応答の生成が完了するまでの時間として許容できる時間が２秒であるということである。 More specifically, the field connection operation determination unit 21 subtracts each field connection operation time in the field connection operation table 61 from the received standby time, and subtracts the value T _N (first subtraction value) in each field connection operation information. ) Is calculated. Note that N is a number stored in “No.” in the connection operation table 61. Subsequently, for each of the calculated subtraction values T _N , the field connection operation determination unit 21 is greater than or equal to 0, and from the time when the voice interaction apparatus 10 executes the field connection operation until the generation of the response is completed, the voice interaction apparatus 10. It is determined whether or not the time is equal to or shorter than a first allowable time X indicating a time that is allowable as a time during which no operation is performed (whether there is connection operation information that satisfies 0 ≦ T _N ≦ X). The first allowable time X is a preset value. For example, if X = 2, the allowable time from the completion of the splicing operation until the generation of the response is 2 seconds. That is.

０≦Ｔ_Ｎ≦Ｘを満たす場つなぎ動作情報がある場合、場つなぎ動作決定部２１は、当該場つなぎ動作情報が示す場つなぎ動作を音声対話装置１０に実行させる場つなぎ動作として決定し、当該場つなぎ動作情報を場つなぎ動作実行部２２に出力する。例えば待機時間が２秒であり、第１許容時間Ｘ＝１である場合、図２に示すＮｏ．２およびＮｏ．３の場つなぎ動作情報が０≦Ｔ_Ｎ≦Ｘを満たす。よって、場つなぎ動作決定部２１は、Ｎｏ．２またはＮｏ．３の場つなぎ動作情報を読み出し、場つなぎ動作実行部２２に出力する。 When there is field connection operation information satisfying 0 ≦ T _N ≦ X, the field connection operation determination unit 21 determines the field connection operation indicated by the field connection operation information as a field connection operation that causes the voice interaction apparatus 10 to execute, and The field connection operation information is output to the field connection operation execution unit 22. For example, when the standby time is 2 seconds and the first allowable time X = 1, No. 1 shown in FIG. 2 and no. The field connection operation information of 3 satisfies 0 ≦ T _N ≦ X. Therefore, the spot-linking operation determining unit 21 determines the No. 2 or No. 3 is read out and output to the connecting operation execution unit 22.

なお、０≦Ｔ_Ｎ≦Ｘを満たす場つなぎ動作情報が複数ある場合は、音声対話装置１０が動作を実行しない時間をより短くするために、減算値Ｔ_Ｎの値がより小さい場つなぎ動作情報を選択することが好ましい。つまり上記の例の場合、減算値Ｔ_Ｎが０となるＮｏ．３の場つなぎ動作情報を選択することが好ましい。また、減算値Ｔ_Ｎの値が同じ場つなぎ動作情報が複数ある場合は、それらの中から場つなぎ動作情報を１つランダムに選択してもよい。 In addition, when there are a plurality of field connection operation information satisfying 0 ≦ T _N ≦ X, the field connection operation information with a smaller value of the subtraction value T _N is used in order to shorten the time during which the voice interactive device 10 does not execute the operation. Is preferably selected. That is, in the case of the above example, No. 1 in which the subtraction value _TN is 0. It is preferable to select the operation information for connecting the three. Further, when there are a plurality of field joining operation information having the same value of the subtraction value _TN , one of the field joining operation information may be selected at random.

一方、０≦Ｔ_Ｎ≦Ｘを満たす場つなぎ動作情報が無い場合、場つなぎ動作決定部２１は、減算値Ｔ_Ｎの正負の符号を変更した値である符号変更値−Ｔ_Ｎ（第２減算値）それぞれについて、０以上かつ、応答の生成が完了してから音声対話装置１０の場つなぎ動作が完了するまでの時間として許容できる時間を示す第２許容時間Ｙ以下となるか否か（０≦−Ｔ_Ｎ≦Ｙを満たす場つなぎ動作情報があるか否か）を判定する。第２許容時間Ｙは予め設定されている値であり、例えば、Ｙ＝２であれば、応答の生成が完了してから、場つなぎ動作が完了するまでの時間として許容できる時間が２秒であるということである。なお、場つなぎ動作決定部２１は、各場つなぎ動作時間から受け取った待機時間を減算することで符号変換値−Ｔ_Ｎを算出してもよい。 On the other hand, when there is no field joining operation information that satisfies 0 ≦ T _N ≦ X, the field joining operation determining unit 21 changes the sign change value −T _N (second subtraction) that is a value obtained by changing the sign of the subtraction value T _N. (Value) For each of them, it is 0 or more and whether or not it is equal to or less than a second allowable time Y indicating an allowable time from when the generation of the response is completed to when the connection operation of the voice interactive device 10 is completed (0 ≦ −T _N ≦ Y or not is determined). The second allowable time Y is a preset value. For example, if Y = 2, the allowable time from the completion of response generation to the completion of the splicing operation is 2 seconds. That is. Incidentally, the field joint operation determining unit 21 may calculate the code conversion value -T _N by subtracting the waiting time received from the play connecting operation time.

０≦−Ｔ_Ｎ≦Ｙを満たす場つなぎ動作情報がある場合、場つなぎ動作決定部２１は、当該場つなぎ動作情報が示す場つなぎ動作を音声対話装置１０に実行させる場つなぎ動作として決定し、当該場つなぎ動作情報を場つなぎ動作実行部２２に出力する。例えば待機時間が１秒であり、第２許容時間Ｙ＝１である場合、図２に示すＮｏ．２およびＮｏ．３の場つなぎ動作情報が０≦−Ｔ_Ｎ≦Ｙを満たす。そのため、場つなぎ動作決定部２１は、Ｎｏ．２またはＮｏ．３の場つなぎ動作情報を読み出し、場つなぎ動作実行部２２に出力する。 When there is field connection operation information satisfying 0 ≦ −T _N ≦ Y, the field connection operation determination unit 21 determines the field connection operation indicated by the field connection operation information as a field connection operation that causes the voice interaction apparatus 10 to execute. The field connection operation information is output to the field connection operation execution unit 22. For example, when the standby time is 1 second and the second allowable time Y = 1, No. 1 shown in FIG. 2 and no. The field connection operation information of 3 satisfies 0 ≦ −T _N ≦ Y. For this reason, the place-connecting operation determination unit 21 determines whether the No. 2 or No. 3 is read out and output to the connecting operation execution unit 22.

なお、０≦−Ｔ_Ｎ≦Ｙを満たす場つなぎ動作情報が複数ある場合は、音声対話装置１０が動作を実行しない時間をより短くするために、符号変換値−Ｔ_Ｎの値がより小さい場つなぎ動作情報を選択することが好ましい。つまり上記の例の場合、符号変換値−Ｔ_Ｎが０となるＮｏ．２の場つなぎ動作情報を選択することが好ましい。また、符号変換値−Ｔ_Ｎの値が同じ場つなぎ動作情報が複数ある場合は、それらの中から場つなぎ動作情報を１つランダムに選択してもよい。 Incidentally, 0 ≦ -T If field joint operation information satisfying _N ≦ Y have multiple, in order to shorten the time for the voice dialogue system 10 does not perform the operation, the value of the code conversion value -T _N is smaller than field It is preferable to select bridging operation information. In other words, in the case of the above example, the code conversion value −T _N becomes 0. It is preferable to select the operation information for connecting the two points. Also, if the value of the code conversion value -T _N there are multiple same field joint operation information, a field joint operation information may be selected in a single random from those.

なお、第１許容時間Ｘおよび第２許容時間Ｙの少なくとも一方において、すべての場つなぎ動作情報に対して同じ値が設定されてもよいし、場つなぎ動作情報ごとに異なる値が設定されてもよい。また、第１許容時間Ｘおよび第２許容時間Ｙの少なくとも一方は、音声データのデータ量および発話時間の少なくとも一方に応じて設定されてもよい。つまり場つなぎ動作決定部２１は、待機時間予測部１１から受け取った音声データのデータ量または発話時間に基づいて、第１許容時間Ｘおよび第２許容時間Ｙの少なくとも一方を決定する。 Note that, in at least one of the first allowable time X and the second allowable time Y, the same value may be set for all the joining motion information, or different values may be set for each joining motion information. Good. Further, at least one of the first allowable time X and the second allowable time Y may be set according to at least one of the amount of audio data and the speech time. That is to say, the field connection operation determination unit 21 determines at least one of the first allowable time X and the second allowable time Y based on the data amount or speech time of the audio data received from the standby time prediction unit 11.

一方、０≦−Ｔ_Ｎ≦Ｙを満たす場つなぎ動作情報が無い場合、場つなぎ動作決定部２１は、複数の場つなぎ動作情報を選択する。具体的には、場つなぎ動作決定部２１は、場つなぎ動作時間≦待機時間を満たす場つなぎ動作情報のうち、場つなぎ動作時間が最も長い場つなぎ動作情報を１つ選択する。そして、待機時間から、選択した場つなぎ動作情報に対応付けられた場つなぎ動作時間を減算した値（残時間）を算出し、場つなぎ動作時間≦残時間を満たす場つなぎ動作情報をさらに選択する。そして、場つなぎ動作決定部２１は、選択した複数の場つなぎ動作情報に対応付けられた場つなぎ動作時間を合計した合計値を算出し、０≦待機時間−合計値≦Ｘまたは０≦−（待機時間−合計値）≦Ｙを満たすか否かを判定する。いずれか一方を満たす場合、複数の場つなぎ動作情報を「Ｎｏ．」のカラムの数字と対応付けて、場つなぎ動作実行部２２に出力する。 On the other hand, when there is no field connection operation information that satisfies 0 ≦ −T _N ≦ Y, the field connection operation determination unit 21 selects a plurality of field connection operation information. Specifically, the field connection operation determination unit 21 selects one field connection operation information having the longest field connection operation time among the field connection operation information satisfying the field connection operation time ≦ the standby time. Then, a value (remaining time) obtained by subtracting the field connection operation time associated with the selected field connection operation information from the standby time is calculated, and the field connection operation information satisfying the field connection operation time ≦ the remaining time is further selected. . Then, the field connection operation determination unit 21 calculates a total value obtained by summing the field connection operation times associated with the selected plurality of field connection operation information, and 0 ≦ standby time−total value ≦ X or 0 ≦ − ( It is determined whether or not standby time−total value) ≦ Y is satisfied. When either one is satisfied, a plurality of field connection operation information is output to the field connection operation execution unit 22 in association with the numbers in the “No.” column.

一方、いずれも満たさない場合、待機時間から合計値を減算した値を算出し、場つなぎ動作時間≦当該算出した値を満たす場つなぎ動作情報をさらに選択する。そして、場つなぎ動作決定部２１は、選択した複数の場つなぎ動作情報に対応付けられた場つなぎ動作時間を合計した合計値を算出し、０≦待機時間−合計値≦Ｘまたは０≦−（待機時間−合計値）≦Ｙを満たすか否かを判定する。場つなぎ動作決定部２１は、これらの処理を０≦待機時間−合計値≦Ｘまたは０≦−（待機時間−合計値）≦Ｙのいずれか一方を満たすようになるまで繰り返す。 On the other hand, if none of them is satisfied, a value obtained by subtracting the total value from the waiting time is calculated, and the joining operation information satisfying the joining operation time ≦ the calculated value is further selected. Then, the field connection operation determination unit 21 calculates a total value obtained by summing the field connection operation times associated with the selected plurality of field connection operation information, and 0 ≦ standby time−total value ≦ X or 0 ≦ − ( It is determined whether or not standby time−total value) ≦ Y is satisfied. The field connection operation determination unit 21 repeats these processes until either 0 ≦ standby time−total value ≦ X or 0 ≦ − (standby time−total value) ≦ Y is satisfied.

場つなぎ動作実行部２２は、場つなぎ動作決定部２１が決定した場つなぎ動作情報が示す場つなぎ動作を音声対話装置１０に実行させる。具体的には、場つなぎ動作実行部２２は、場つなぎ動作決定部２１から場つなぎ動作情報を受け取ると、当該場つなぎ動作情報が示す場つなぎ動作を音声対話装置１０に実行させる。例えば、音声出力部４を制御して音声を出力させたり、駆動部５を制御して音声対話装置１０の可動部位を動作させたりする。場つなぎ動作実行部２２は、場つなぎ動作の実行が完了すると、その旨を応答実行部１５に通知する。また、場つなぎ動作実行部２２は、場つなぎ動作決定部２１から複数の場つなぎ動作情報を受け取った場合、記憶部６に記憶されている場つなぎ順序テーブル６２を用いて複数の場つなぎ動作情報が示す場つなぎ動作の実行順序を決定する。 The field connection operation execution unit 22 causes the voice interaction apparatus 10 to execute the field connection operation indicated by the field connection operation information determined by the field connection operation determination unit 21. Specifically, when the field connection operation execution unit 22 receives the field connection operation information from the field connection operation determination unit 21, the field connection operation execution unit 22 causes the voice interaction apparatus 10 to execute the field connection operation indicated by the field connection operation information. For example, the voice output unit 4 is controlled to output a voice, or the drive unit 5 is controlled to operate a movable part of the voice interaction device 10. When the execution of the joining operation is completed, the jumping operation execution unit 22 notifies the response execution unit 15 to that effect. In addition, when the field connection operation execution unit 22 receives a plurality of field connection operation information from the field connection operation determination unit 21, a plurality of field connection operation information is stored using the field connection order table 62 stored in the storage unit 6. The execution order of the connection operation indicated by is determined.

ここで、場つなぎ順序テーブル６２の詳細について図３を参照して説明する。図３は、記憶部６に記憶されている場つなぎ順序テーブル６２のデータ構造およびデータ例を示す図である。なお、図３に示す場つなぎ順序テーブル６２は一例であり、データ構造およびデータ例を図３の例に限定するものではない。場つなぎ順序テーブル６２は、場つなぎ動作の組み合わせと、当該組み合わせにおける場つなぎ動作の順序とを対応付けたテーブルである。つまり、場つなぎ動作実行部２２は、場つなぎ順序テーブル６２を参照することで、場つなぎ動作決定部２１から受け取った複数の場つなぎ動作情報が示す場つなぎ動作を実行する順序を決定することができる。「場つなぎ動作Ｎｏ」のカラムには、場つなぎ動作テーブル６１の「Ｎｏ．」のカラムの数字が複数格納されている。当該カラムにおいては、複数の数字の順序は特に意味を持たない。一方、「動作順序」のカラムにも、場つなぎ動作テーブル６１の「Ｎｏ．」のカラムの数字が複数格納されている。ただし、当該カラムにおける数字の順序は、場つなぎ動作を実行する順序を示している。例えば、図３に示すＮｏ．１の動作順序は「３，２」であるので、図２に示すＮｏ．３の場つなぎ動作情報が示す場つなぎ動作を実行した後で、図２に示すＮｏ．２の場つなぎ動作情報が示す場つなぎ動作を実行することを示している。「場つなぎ動作」のカラムには、「動作順序」のカラムに格納された順序に応じた、場つなぎ動作の内容が格納されている。なお、このカラムは動作順序を分かりやすく説明するために示しているものであり、場つなぎ順序テーブル６２から省略されてもよい。また、場つなぎ順序テーブル６２は、ユーザが編集可能であってもよい。 Here, the details of the joining order table 62 will be described with reference to FIG. FIG. 3 is a diagram illustrating a data structure and data example of the connecting sequence table 62 stored in the storage unit 6. 3 is merely an example, and the data structure and the data example are not limited to the example in FIG. The field connection order table 62 is a table in which combinations of field connection operations are associated with the order of field connection operations in the combinations. In other words, the field joining operation execution unit 22 can determine the order in which the field joining operation indicated by the plurality of field joining operation information received from the field joining operation determination unit 21 is executed by referring to the field joining order table 62. it can. A plurality of numbers in the “No.” column of the field connection operation table 61 are stored in the “field connection operation No.” column. In the column, the order of a plurality of numbers has no particular meaning. On the other hand, a plurality of numbers in the “No.” column of the joining operation table 61 are also stored in the “operation order” column. However, the order of the numbers in the column indicates the order in which the joining operation is performed. For example, as shown in FIG. 1 is “3, 2”, the No. 1 shown in FIG. No. 3 shown in FIG. 2 shows that the field connection operation indicated by the field connection operation information is executed. In the column of “joining operation”, the contents of the joining operation corresponding to the order stored in the “operation order” column are stored. This column is shown for easy understanding of the operation order, and may be omitted from the joining order table 62. Further, the joining order table 62 may be editable by the user.

場つなぎ動作実行部２２は、場つなぎ順序テーブル６２から受け取った複数の場つなぎ動作情報と対応付けられている「Ｎｏ．」の数字の組み合わせを「場つなぎ動作Ｎｏ」のカラムから特定し、複数の場つなぎ動作の動作順序を特定する。そして、特定した動作順序で音声対話装置１０に場つなぎ動作を実行させる。なお、複数の場つなぎ動作の動作順序の決定は、上述した場つなぎ順序テーブル６２を用いる例に限定されない。例えば、場つなぎ動作実行部２２は、複数の場つなぎ動作の動作順序をランダムに決定してもよいし、動作順序を場つなぎ動作に対応付けられている「Ｎｏ．」の数字が若い順としてもよい。この場合、記憶部６は場つなぎ順序テーブル６２を記憶していなくてもよい。 The field joining operation execution unit 22 identifies a combination of numbers “No.” associated with a plurality of field joining operation information received from the field joining order table 62 from the “field joining operation No” column. The operation sequence of the spot connection operation is specified. Then, the voice interactive device 10 is caused to execute the connecting operation in the specified operation order. Note that the determination of the operation order of a plurality of spot joining operations is not limited to the example using the spot joining order table 62 described above. For example, the jumping operation execution unit 22 may randomly determine the operation order of a plurality of jumping operations, or sets the movement order in ascending order of the numbers “No.” associated with the jumping operation. Also good. In this case, the storage unit 6 does not have to store the joining order table 62.

音声認識部１３は、音声入力部２から受け取った音声データについて、音声認識処理を行う。なお、音声認識処理については既存の技術を利用することができる。音声認識部１３は、受け取った音声データの音声認識結果を応答生成部１４に出力する。 The voice recognition unit 13 performs voice recognition processing on the voice data received from the voice input unit 2. Note that existing technology can be used for the speech recognition processing. The voice recognition unit 13 outputs the voice recognition result of the received voice data to the response generation unit 14.

応答生成部１４は、ユーザが発した音声に対する応答を示す応答情報を生成する。この応答には、音声の出力、音声対話装置１０の可動部位の動作、並びに、音声の出力および可動部位の動作の３種類がある。応答生成部１４による応答情報の生成には既存の技術を利用することができる。例えば、記憶部６に認識した音声データの内容と応答内容とを対応付けたテーブル（不図示）を格納しておき、当該テーブルを参照することで応答情報を生成してもよい。また、応答生成部１４は、応答情報の生成に、明日の天気の情報などの外部データを用いる必要がある場合、通信部３を制御して取得した当該外部データを用いて応答情報を生成する。応答生成部１４は、生成した応答情報（音声出力用の音声データや、可動部位を動作させるためのアクションデータなど）を応答実行部１５に出力する。 The response generation unit 14 generates response information indicating a response to the voice uttered by the user. There are three types of responses: voice output, movement of the moving part of the voice interaction device 10, and voice output and movement of the movable part. An existing technique can be used to generate response information by the response generation unit 14. For example, a table (not shown) in which the content of the voice data recognized and the response content are associated with each other may be stored in the storage unit 6, and the response information may be generated by referring to the table. In addition, when it is necessary to use external data such as tomorrow's weather information to generate response information, the response generation unit 14 generates response information using the external data acquired by controlling the communication unit 3. . The response generation unit 14 outputs the generated response information (voice data for voice output, action data for operating the movable part, etc.) to the response execution unit 15.

応答実行部１５は、応答生成部１４が生成した応答情報が示す応答を実行する。具体的には、応答実行部１５は、応答生成部１４から応答情報を受け取り、場つなぎ動作実行部２２から場つなぎ動作が完了した旨を通知されると、当該応答情報が示す動作を音声対話装置１０に実行させる。例えば、音声出力部４を制御して音声を出力させたり、駆動部５を制御して音声対話装置１０の可動部位を動作させたりする。 The response execution unit 15 executes a response indicated by the response information generated by the response generation unit 14. Specifically, when the response execution unit 15 receives the response information from the response generation unit 14 and is notified that the joining operation has been completed from the joining operation executing unit 22, the response information indicates the operation indicated by the response information. Let the device 10 execute it. For example, the voice output unit 4 is controlled to output a voice, or the drive unit 5 is controlled to operate a movable part of the voice interaction device 10.

次に、図４に基づいて、制御部１が実行する応答実行処理の流れについて説明する。図４は、制御部１が実行する応答実行処理の流れの一例を示すフローチャートである。 Next, based on FIG. 4, the flow of response execution processing executed by the control unit 1 will be described. FIG. 4 is a flowchart illustrating an example of a response execution process executed by the control unit 1.

まず、音声入力部２は音声の入力を待機している（Ｓ１）。音声入力部２は、ユーザが発した音声を取得すると（Ｓ１でＹＥＳ）、取得した音声を音声データに変換し、当該音声データを音声認識部１３に出力し、また当該音声データの音声付属情報を待機時間予測部１１に出力する。 First, the voice input unit 2 waits for voice input (S1). When the voice input unit 2 acquires the voice uttered by the user (YES in S1), the voice input unit 2 converts the acquired voice into voice data, outputs the voice data to the voice recognition unit 13, and the voice attached information of the voice data. Is output to the standby time prediction unit 11.

続いて待機時間予測部１１は待機時間を予測する（Ｓ２、待機時間予測ステップ）。待機時間予測部１１は予測した待機時間を場つなぎ動作決定部２１に出力する。続いて場つなぎ動作決定部２１は、場つなぎ動作決定処理を行う（Ｓ３）。なお、場つなぎ動作決定処理の詳細については後述する。場つなぎ動作決定部２１は、音声対話装置１０に実行させると決定した場つなぎ動作を示す場つなぎ動作情報を、場つなぎ動作実行部２２に出力する。そして、場つなぎ動作実行部２２は、受け取った場つなぎ動作情報に応じて、音声対話装置１０に場つなぎ動作を実行させる（Ｓ４、場つなぎ動作実行ステップ）。場つなぎ動作実行部２２は、場つなぎ動作の実行が完了すると、その旨を応答実行部１５に通知する。 Subsequently, the standby time prediction unit 11 predicts the standby time (S2, standby time prediction step). The standby time prediction unit 11 outputs the predicted standby time to the operation determination unit 21 by connecting. Subsequently, the field joining operation determination unit 21 performs a field joining operation determination process (S3). Note that the details of the joining operation determination process will be described later. The field connection operation determination unit 21 outputs the field connection operation information indicating the field connection operation determined to be executed by the voice interaction apparatus 10 to the field connection operation execution unit 22. Then, the field connection operation execution unit 22 causes the voice interaction device 10 to execute the field connection operation according to the received field connection operation information (S4, field connection operation execution step). When the execution of the joining operation is completed, the jumping operation execution unit 22 notifies the response execution unit 15 to that effect.

一方、音声認識部１３は音声認識処理を行う（Ｓ５）。具体的には、音声認識部１３は、音声データを受け取ると、当該音声データについて音声認識処理を行い、音声認識結果を応答生成部１４に出力する。続いて応答生成部１４は応答情報を生成する（Ｓ６）。具体的には、応答生成部１４は、受け取った音声認識結果に応じた応答情報を生成し、応答実行部１５に出力する。 On the other hand, the voice recognition unit 13 performs voice recognition processing (S5). Specifically, when receiving the voice data, the voice recognition unit 13 performs voice recognition processing on the voice data and outputs the voice recognition result to the response generation unit 14. Subsequently, the response generation unit 14 generates response information (S6). Specifically, the response generation unit 14 generates response information corresponding to the received voice recognition result and outputs the response information to the response execution unit 15.

なお、図４に示すように、ステップＳ２、Ｓ３、Ｓ４の処理とステップＳ５、Ｓ６の処理とは並列に行われる。つまり、応答実行部１５は応答情報および場つなぎ動作の実行が完了した旨の通知のいずれか一方のみを受け取った場合、もう一方を受け取るまで待機する。そして、応答実行部１５は上記通知と応答情報とを受け取ると、音声対話装置１０に応答を実行させる（Ｓ７）。具体的には、応答実行部１５は、受け取った応答情報に応じて、音声出力部４を制御して音声を出力させたり駆動部５を制御して音声対話装置１０の可動部位を動作させたりする。以上で、応答実行処理は終了する。 As shown in FIG. 4, the processes in steps S2, S3, and S4 and the processes in steps S5 and S6 are performed in parallel. That is, when only one of the response information and the notification that the execution of the joining operation is completed is received, the response execution unit 15 waits until the other is received. When the response execution unit 15 receives the notification and the response information, the response execution unit 15 causes the voice interaction device 10 to execute a response (S7). Specifically, the response execution unit 15 controls the voice output unit 4 to output voice or controls the drive unit 5 to operate the movable part of the voice interaction device 10 according to the received response information. To do. Thus, the response execution process ends.

続いて、図５に基づいて、場つなぎ動作決定部２１が実行する場つなぎ動作決定処理の流れについて説明する。図５は、図４のフローチャートにおける場つなぎ動作決定処理の流れの一例を示すフローチャートである。なお、図５のフローチャートにおいて、場つなぎ動作テーブル６１に含まれる場つなぎ動作情報には、一般的に想定される待機時間程度の場つなぎ動作時間が対応付けられているものとする。 Next, the flow of the field connection operation determination process executed by the field connection operation determination unit 21 will be described with reference to FIG. FIG. 5 is a flowchart showing an example of the flow of the jointing operation determination process in the flowchart of FIG. In the flowchart of FIG. 5, it is assumed that the field connection operation information included in the field connection operation table 61 is associated with a field connection operation time that is generally equal to a standby time.

まず、場つなぎ動作決定部２１は、待機時間予測部１１から待機時間を受け取ると、場つなぎ動作テーブル６１を読み出し、予測した待機時間から各場つなぎ動作時間を減算した減算値Ｔ_Ｎを算出する（Ｓ１１）。続いて、場つなぎ動作決定部２１は、算出した減算値Ｔ_Ｎおよび第１許容時間Ｘを用いて場つなぎ動作テーブル６１を参照し、０≦Ｔ_Ｎ≦Ｘを満たす場つなぎ動作情報があるか否かを判定する（Ｓ１２）。 First, when a standby time is received from the standby time predicting unit 11, the field transition operation determination unit 21 reads the field transition operation table 61 and calculates a subtraction value _TN obtained by subtracting each field connection operation time from the predicted standby time. (S11). Subsequently, the field connection operation determination unit 21 refers to the field connection operation table 61 using the calculated subtraction value _TN and the first allowable time X, and determines whether there is field connection operation information satisfying 0 ≦ T _N ≦ X. It is determined whether or not (S12).

０≦Ｔ_Ｎ≦Ｘを満たす場つなぎ動作情報がある場合（Ｓ１２でＹＥＳ）、場つなぎ動作決定部２１は、当該場つなぎ動作情報のうちの１つが示す場つなぎ動作を、音声対話装置１０が実行する場つなぎ動作に決定する（Ｓ１３、場つなぎ動作決定ステップ）。具体的には、０≦Ｔ_Ｎ≦Ｘを満たす場つなぎ動作情報のうち、Ｔ_Ｎの値がより小さい場つなぎ動作情報を選択する。そして、場つなぎ動作決定部２１は、選択した場つなぎ動作情報を場つなぎ動作実行部２２に出力する。 When there is field connection operation information satisfying 0 ≦ T _N ≦ X (YES in S12), the field connection operation determination unit 21 performs the field connection operation indicated by one of the field connection operation information by the voice interactive device 10. It is determined to be a place joining operation to be executed (S13, place joining operation determining step). Specifically, the field connection operation information having a smaller value of T _N is selected from the field connection operation information satisfying 0 ≦ T _N ≦ X. Then, the field connection operation determination unit 21 outputs the selected field connection operation information to the field connection operation execution unit 22.

一方、０≦Ｔ_Ｎ≦Ｘを満たす場つなぎ動作情報が無い場合（Ｓ１２でＮＯ）、場つなぎ動作決定部２１は、減算値Ｔ_Ｎから符号変更値−Ｔ_Ｎを算出し、符号変更値−Ｔ_Ｎおよび第２許容時間Ｙを用いて場つなぎ動作テーブル６１を参照し、０≦−Ｔ_Ｎ≦Ｙを満たす場つなぎ動作情報があるか否かを判定する（Ｓ１４）。 On the other hand, 0 ≦ _{T N} When situ joint operation information satisfying ≦ X is not (NO at S12), the field joint operation determiner 21 calculates a sign change value -T _N from the subtraction value _{T N,} sign change value - It is determined whether or not there is connection operation information satisfying 0 ≦ −T _N ≦ Y by referring to the connection operation table 61 using _TN and the second allowable time Y (S14).

０≦−Ｔ_Ｎ≦Ｙを満たす場つなぎ動作情報がある場合（Ｓ１４でＹＥＳ）、場つなぎ動作決定部２１は、当該場つなぎ動作情報のうちの１つが示す場つなぎ動作を、音声対話装置１０が実行する場つなぎ動作に決定する（Ｓ１５、場つなぎ動作決定ステップ）。具体的には、０≦−Ｔ_Ｎ≦Ｙを満たす場つなぎ動作情報のうち、−Ｔ_Ｎの値がより小さい場つなぎ動作情報を選択する。 When there is field connection operation information satisfying 0 ≦ −T _N ≦ Y (YES in S14), the field connection operation determination unit 21 displays the field connection operation indicated by one of the field connection operation information as the voice interactive device 10. Is determined to be a field joining operation to be executed (S15, field joining operation determining step). Specifically, of the field joint operation information satisfying 0 ≦ -T _N ≦ Y, the value of -T _N selects a smaller field joint operation information.

一方、０≦−Ｔ_Ｎ≦Ｙを満たす場つなぎ動作情報が無い場合（Ｓ１４でＮＯ）、複数の場つなぎ動作を組み合わせて、音声対話装置１０が実行する場つなぎ動作を示す場つなぎ動作情報とする（Ｓ１６、場つなぎ動作決定ステップ）。なお、この場合、上記ステップＳ４において、場つなぎ動作実行部２２は、受け取った複数の場つなぎ動作情報および「Ｎｏ．」の数字を用いて場つなぎ順序テーブル６２を参照し、複数の場つなぎ動作の実行順序（動作順序）を特定し、特定した動作順序で場つなぎ動作を実行させる。以上で、場つなぎ動作決定処理は終了する。 On the other hand, when there is no field connection operation information satisfying 0 ≦ −T _N ≦ Y (NO in S14), the field connection operation information indicating the field connection operation executed by the voice interactive device 10 by combining a plurality of field connection operations and (S16, step of determining the jointing operation). In this case, in step S4, the field connection operation execution unit 22 refers to the field connection order table 62 using the received plurality of field connection operation information and the number “No.”, and performs a plurality of field connection operations. The execution order (operation order) is specified, and the jumping operation is executed in the specified operation order. This is the end of the joining operation determination process.

なお、本実施形態において、待機時間予測部１１が予測した待機時間よりも早く応答情報の生成が完了した場合であって、場つなぎ動作決定部２１が、複数の場つなぎ動作を選択している場合、場つなぎ動作実行部２２は、その時点以降に行う場つなぎ動作の実行をキャンセルしてもよい。具体的には、場つなぎ動作実行部２２は、応答実行部１５から応答の出力が可能となったことを通知されたとき、その時点以降に行う場つなぎ動作の実行をキャンセルして、場つなぎ動作の実行が完了したことを応答実行部１５に通知する。また、場つなぎ動作決定部２１は、組み合わせる複数の場つなぎ動作の場つなぎ動作時間を変更してもよい。 In the present embodiment, when the generation of the response information is completed earlier than the standby time predicted by the standby time prediction unit 11, the joining operation determining unit 21 selects a plurality of joining operations. In this case, the joining operation executing unit 22 may cancel the execution of the joining operation performed after that time. Specifically, when the response transfer unit 15 is notified that the response can be output from the response execution unit 15, the connection transfer operation unit 22 cancels the execution of the link transfer operation performed after that time, The response execution unit 15 is notified that the execution of the operation has been completed. Further, the field connection operation determination unit 21 may change the field connection operation time of a plurality of field connection operations to be combined.

〔実施形態２〕
本発明の他の実施形態（実施形態２）について、図６〜図８に基づいて説明すれば、以下のとおりである。なお、説明の便宜上、前記実施形態にて説明した部材と同じ機能を有する部材については、同じ符号を付記し、その説明を省略する。 [Embodiment 2]
Another embodiment (second embodiment) of the present invention will be described below with reference to FIGS. For convenience of explanation, members having the same functions as those described in the embodiment are given the same reference numerals, and descriptions thereof are omitted.

上述した実施形態１では、０≦−Ｔ_Ｎ≦Ｙを満たす場つなぎ動作情報が無い場合、複数の場つなぎ動作情報を組み合わせることで、０≦−Ｔ_Ｎ≦Ｙを満たす場つなぎ動作情報としていた。一方、本実施形態では、０≦−Ｔ_Ｎ≦Ｙを満たす場つなぎ動作情報が無い場合、場つなぎ動作情報に対応付けられた場つなぎ動作時間を変更することで、０≦−Ｔ_Ｎ≦Ｙを満たす場つなぎ動作情報とする例について説明する。 In the first embodiment described above, when there is no field connection operation information that satisfies 0 ≦ −T _N ≦ Y, a plurality of field connection operation information is combined to obtain field connection operation information that satisfies 0 ≦ −T _N ≦ Y. . On the other hand, in this embodiment, when there is no field connection operation information that satisfies 0 ≦ −T _N ≦ Y, by changing the field connection operation time associated with the field connection operation information, 0 ≦ −T _N ≦ Y. An example of the connection operation information that satisfies the following conditions will be described.

まず、図６に基づいて、本実施形態に係る音声対話装置１０ａについて説明する。図６は、本実施形態に係る音声対話装置１０ａの構成を示すブロック図である。図６に示すように、音声対話装置１０ａは、実施形態１に係る音声対話装置１０と比較して、制御部１に代えて制御部１ａを備え、記憶部６に代えて記憶部６ａを備えている。本実施形態では、制御部１ａは、実施形態１に係る制御部１と比較して、場つなぎ動作制御部１２に代えて場つなぎ動作制御部１２ａを含む。また記憶部６ａは、実施形態１に係る記憶部６と比較して、場つなぎ動作テーブル６１に代えて場つなぎ動作テーブル６１ａを記憶している。また、記憶部６ａは、記憶部６と異なり場つなぎ順序テーブル６２を記憶していない。 First, the voice interactive apparatus 10a according to the present embodiment will be described with reference to FIG. FIG. 6 is a block diagram showing the configuration of the voice interaction apparatus 10a according to this embodiment. As shown in FIG. 6, the voice interaction device 10 a includes a control unit 1 a instead of the control unit 1 and a storage unit 6 a instead of the storage unit 6 as compared with the voice interaction device 10 according to the first embodiment. ing. In the present embodiment, the control unit 1a includes a field joining operation control unit 12a instead of the field joining operation control unit 12 as compared with the control unit 1 according to the first embodiment. In addition, the storage unit 6 a stores a joining operation table 61 a instead of the joining operation table 61 as compared with the storage unit 6 according to the first embodiment. Further, unlike the storage unit 6, the storage unit 6 a does not store the connection order table 62.

場つなぎ動作制御部１２ａは、場つなぎ動作の決定および実行を行う。場つなぎ動作制御部１２ａは、場つなぎ動作決定部２１ａおよび場つなぎ動作実行部２２ａを含む。 The field connection operation control unit 12a determines and executes the field connection operation. The field connection operation control unit 12a includes a field connection operation determination unit 21a and a field connection operation execution unit 22a.

場つなぎ動作決定部２１ａは、待機時間予測部１１が予測した待機時間に基づいて、音声対話装置１０ａが実行する場つなぎ動作を決定するものである。場つなぎ動作決定部２１ａは、実施形態１に係る場つなぎ動作決定部２１と異なり、０≦−Ｔ_Ｎ≦Ｙを満たす場つなぎ動作情報が無い場合、場つなぎ動作テーブル６１ａを参照して、場つなぎ動作情報に対応付けられた場つなぎ動作時間を変更することで、０≦−Ｔ_Ｎ≦Ｙを満たす場つなぎ動作情報とする。 The field connection operation determination unit 21a determines a field connection operation performed by the voice interaction device 10a based on the standby time predicted by the standby time prediction unit 11. Unlike the field connection operation determination unit 21 according to the first embodiment, the field connection operation determination unit 21a refers to the field connection operation table 61a when there is no field connection operation information that satisfies 0 ≦ −T _N ≦ Y. By changing the field connection operation time associated with the connection operation information, the field connection operation information satisfying 0 ≦ −T _N ≦ Y is obtained.

ここで、図７を参照して、場つなぎ動作テーブル６１ａの詳細について説明する。図６は、記憶部６ａに記憶されている場つなぎ動作テーブル６１ａのデータ構造およびデータ例を示す図である。場つなぎ動作テーブル６１ａでは、場つなぎ動作情報と場つなぎ動作時間とに、さらに場つなぎ動作時間の変更幅を示す情報が対応付けられている。「場つなぎ動作時間変更幅」のカラムには、各場つなぎ動作時間の変更可能範囲を示す数値（変更許容情報、以降、変更幅と呼称する）が格納されている。例えば、Ｎｏ．２の場つなぎ動作は、場つなぎ動作時間が１秒であるが、変更幅が０．８〜１．５秒である。これはつまり、Ｎｏ．２の場つなぎ動作、すなわち「えっと。」の発話を０．８〜１．５秒の時間範囲で早くしたり、または遅くしたりすることができることを示している。なお、図７の例では変更幅として時間範囲の情報を格納しているが、変更幅はこの例に限定されない。例えば変更幅として、場つなぎ動作時間を基準時間（１００％）とするパーセンテージの情報（例えば、「８０〜１５０％」といった情報）を格納してもよい。また、変更幅はすべての場つなぎ動作情報で同じであってもよい。 Here, with reference to FIG. 7, the detail of the joining operation | movement table 61a is demonstrated. FIG. 6 is a diagram illustrating a data structure and a data example of the joining operation table 61a stored in the storage unit 6a. In the field connection operation table 61a, the field connection operation information and the field connection operation time are further associated with information indicating the change range of the field connection operation time. In the column of “placement operation time change width”, a numerical value (change permission information, hereinafter referred to as change width) indicating a changeable range of each place connection operation time is stored. For example, no. In the field joining operation of No. 2, the field joining operation time is 1 second, but the change width is 0.8 to 1.5 seconds. This means that no. This shows that the linking operation of 2 can be made faster or slower in the time range of 0.8 to 1.5 seconds, that is, the utterance of “um”. In the example of FIG. 7, time range information is stored as the change width, but the change width is not limited to this example. For example, as a change width, information of a percentage (for example, information such as “80 to 150%”) using the connection operation time as the reference time (100%) may be stored. Also, the change width may be the same for all the connection operation information.

より具体的には、場つなぎ動作決定部２１ａは、０≦−Ｔ_Ｎ≦Ｙを満たす場つなぎ動作情報が無い場合、場つなぎ動作テーブル６１ａを参照して、０≦Ｔ_Ｎ≦Ｘまたは０≦−Ｔ_Ｎ≦Ｙを満たすような数値を変更幅に含む場つなぎ動作情報を特定し、当該数値を新たな場つなぎ動作時間とする。そして、特定した場つなぎ動作情報と、変更後の場つなぎ動作時間とを対応付けて場つなぎ動作実行部２２ａに出力する。なお、変更幅がパーセンテージである場合、場つなぎ動作決定部２１ａは、場つなぎ動作時間（基準時間）と当該パーセンテージとの乗算によって新たな場つなぎ動作時間を求める。 More specifically, when there is no field connection operation information satisfying 0 ≦ −T _N ≦ Y, the field connection operation determination unit 21a refers to the field connection operation table 61a and 0 ≦ T _N ≦ X or 0 ≦. The field connection operation information including a numerical value satisfying −T _N ≦ Y in the change width is specified, and the numerical value is set as a new field connection operation time. Then, the identified field connection operation information and the changed field connection operation time are associated with each other and output to the field connection operation execution unit 22a. When the change width is a percentage, the field joining operation determination unit 21a obtains a new field joining operation time by multiplying the field joining operation time (reference time) and the percentage.

場つなぎ動作実行部２２ａは、場つなぎ動作決定部２１ａが決定した場つなぎ動作情報が示す場つなぎ動作を音声対話装置１０ａに実行させる。場つなぎ動作実行部２２ａは、実施形態１に係る場つなぎ動作実行部２２と異なり、場つなぎ動作決定部２１ａから場つなぎ動作情報と場つなぎ動作時間とを受け取った場合、当該場つなぎ動作時間で場つなぎ動作情報が示す場つなぎ動作が完了するように、音声対話装置１０ａに場つなぎ動作を実行させる。例えば、「えっと。」の発話を示す場つなぎ動作情報と、１．５秒という場つなぎ動作時間を受け取った場合、場つなぎ動作実行部２２ａは、通常は１秒で発話される（音声出力部４から出力される）「えっと。」を、１．５秒に引き伸ばして音声出力部４から出力させる。 The field connection operation execution unit 22a causes the voice interaction device 10a to execute the field connection operation indicated by the field connection operation information determined by the field connection operation determination unit 21a. Unlike the field connection operation execution unit 22 according to the first embodiment, the field connection operation execution unit 22a receives the field connection operation information and the field connection operation time from the field connection operation determination unit 21a. The voice interactive device 10a is caused to execute the jointing operation so that the jointing operation indicated by the jointing operation information is completed. For example, when receiving the field connection operation information indicating the utterance of “Et.” And the field connection operation time of 1.5 seconds, the field connection operation execution unit 22a normally utters the speech (voice output). “Et.” (Output from the unit 4) is extended to 1.5 seconds and output from the audio output unit 4.

次に、図８に基づいて、場つなぎ動作決定部２１ａが実行する場つなぎ動作決定処理の流れについて説明する。図８は、本実施形態における場つなぎ動作決定処理の流れの一例を示すフローチャートである。なお、本実施形態における応答実行処理は、場つなぎ動作決定処理の内容を除いて、実施形態１にて説明した応答実行処理（図４に示す応答実行処理）と同様であるため、ここでの説明を省略する。また、図８に示すフローチャートのうち、ステップＳ２１〜Ｓ２５については、図５に示すステップＳ１１〜Ｓ１５と同様であるため、ここでの説明を省略する。 Next, based on FIG. 8, the flow of the field joining operation determination process executed by the field joining operation determination unit 21a will be described. FIG. 8 is a flowchart showing an example of the flow of the field connection operation determination process in the present embodiment. The response execution process in the present embodiment is the same as the response execution process described in the first embodiment (the response execution process shown in FIG. 4) except for the contents of the joining operation determination process. Description is omitted. Also, in the flowchart shown in FIG. 8, steps S21 to S25 are the same as steps S11 to S15 shown in FIG.

０≦−Ｔ_Ｎ≦Ｙを満たす場つなぎ動作情報が無い場合（Ｓ２４でＮＯ）、場つなぎ動作決定部２１ａは、場つなぎ動作時間を変更して、０≦Ｔ_Ｎ≦Ｘまたは０≦−Ｔ_Ｎ≦Ｙを満たす場つなぎ動作とする（Ｓ２６）。そして、特定した場つなぎ動作情報と変更後の場つなぎ動作時間とを対応付けて場つなぎ動作実行部２２ａに出力する。なお、この場合、上記ステップＳ４において、場つなぎ動作実行部２２ａは、場つなぎ動作決定部２１ａから場つなぎ動作情報と場つなぎ動作時間とを受け取ると、当該場つなぎ動作時間で場つなぎ動作情報が示す場つなぎ動作が完了するように、音声対話装置１０ａに場つなぎ動作を実行させる。最後に、場つなぎ動作実行部２２ａは、場つなぎ動作の実行が完了すると、その旨を応答実行部１５に通知する。以上で、場つなぎ動作決定処理は終了する。 When there is no field connection operation information that satisfies 0 ≦ −T _N ≦ Y (NO in S24), the field connection operation determination unit 21a changes the field connection operation time to 0 ≦ T _N ≦ X or 0 ≦ −T. _The joining operation is performed when _N ≦ Y is satisfied (S26). Then, the identified field connection operation information and the changed field connection operation time are associated with each other and output to the field connection operation execution unit 22a. In this case, in step S4, when the field connection operation execution unit 22a receives the field connection operation information and the field connection operation time from the field connection operation determination unit 21a, the field connection operation information is displayed in the field connection operation time. The voice interactive device 10a is caused to execute the joining operation so that the joining operation shown is completed. Finally, when the joining operation is completed, the joining operation executing unit 22a notifies the response executing unit 15 to that effect. This is the end of the joining operation determination process.

〔実施形態３〕
本発明のさらに別の実施形態（実施形態３）について、図９および図１０に基づいて説明すれば、以下のとおりである。なお、説明の便宜上、前記実施形態にて説明した部材と同じ機能を有する部材については、同じ符号を付記し、その説明を省略する。 [Embodiment 3]
The following will describe still another embodiment (Embodiment 3) of the present invention with reference to FIGS. 9 and 10. FIG. For convenience of explanation, members having the same functions as those described in the embodiment are given the same reference numerals, and descriptions thereof are omitted.

上述した実施形態１および２では、待機時間の予測は音声データのデータ量および発話時間の少なくとも１つを用いて行われていた。一方、本実施形態では、さらに音声認識結果を用いる例について説明する。 In the first and second embodiments described above, the standby time is predicted using at least one of the amount of speech data and the speech time. On the other hand, in the present embodiment, an example in which a voice recognition result is further used will be described.

まず、図９に基づいて、本実施形態に係る音声対話装置１０ｂについて説明する。図９は、本実施形態に係る音声対話装置１０ｂの構成を示すブロック図である。図９に示すように、音声対話装置１０ｂは、実施形態１に係る音声対話装置１０と比較して、制御部１に代えて制御部１ｂを備えている。本実施形態では、制御部１ｂは、実施形態１に係る制御部１と比較して、待機時間予測部１１に代えて待機時間予測部１１ｂを含み、場つなぎ動作制御部１２に代えて場つなぎ動作制御部１２ｂを含み、音声認識部１３に代えて音声認識部１３ｂを含む。 First, the voice interactive apparatus 10b according to the present embodiment will be described with reference to FIG. FIG. 9 is a block diagram showing the configuration of the voice interactive apparatus 10b according to the present embodiment. As shown in FIG. 9, the voice interaction device 10 b includes a control unit 1 b instead of the control unit 1 as compared with the voice interaction device 10 according to the first embodiment. In this embodiment, compared with the control unit 1 according to the first embodiment, the control unit 1b includes a standby time prediction unit 11b instead of the standby time prediction unit 11, and replaces the field connection operation control unit 12 with a field connection. An operation control unit 12b is included, and a voice recognition unit 13b is included instead of the voice recognition unit 13.

待機時間予測部１１ｂは、実施形態１に係る待機時間予測部１１と異なり、まず、ユーザが発した音声を取得してから、音声認識部１３ｂが音声認識を完了するまでの音声認識待機時間を予測する。具体的には、待機時間予測部１１ｂは、音声入力部２から音声付属情報を受け取ると、当該音声データのサイズ（データ量）を用いて音声認識待機時間を予測する。なお、データ量を用いる例については、実施形態１にて既に説明しているため、ここでの説明を省略する。また、待機時間予測部１１ｂは、音声認識部１３ｂから音声認識結果を受け取動作動作ると、音声認識処理が完了してから応答が出力可能となるまでの応答生成待機時間を予測する。具体的には、待機時間予測部１１ｂは、音声データのサイズ（データ量）を用いて応答生成待機時間を予測する。なお、データ量を用いる例については、実施形態１にて既に説明しているため、ここでの説明を省略する。また、待機時間予測部１１ｂは、音声データの時間（ユーザの発話時間）を用いて音声認識待機時間および応答生成待機時間を予測してもよい。また、音声データのデータ量および発話時間の両方を用いて、音声認識待機時間および応答生成待機時間を予測（算出）してもよい。発話時間を用いる例、およびデータ量および発話時間の両方を用いる例については、実施形態１にて既に説明しているため、ここでの説明を省略する。 Unlike the standby time prediction unit 11 according to the first embodiment, the standby time prediction unit 11b first determines the voice recognition standby time from when the voice uttered by the user is acquired until the voice recognition unit 13b completes the voice recognition. Predict. Specifically, when the standby time predicting unit 11b receives the voice attached information from the voice input unit 2, the standby time predicting unit 11b predicts the voice recognition standby time using the size (data amount) of the voice data. Note that the example using the data amount has already been described in the first embodiment, and thus the description thereof is omitted here. Further, when the standby time predicting unit 11b operates to receive the voice recognition result from the voice recognition unit 13b, the standby time prediction unit 11b predicts a response generation standby time from when the voice recognition process is completed until a response can be output. Specifically, the standby time prediction unit 11b predicts the response generation standby time using the size (data amount) of the audio data. Note that the example using the data amount has already been described in the first embodiment, and thus the description thereof is omitted here. Further, the standby time prediction unit 11b may predict the voice recognition standby time and the response generation standby time using the time of the voice data (user's utterance time). Further, the voice recognition standby time and the response generation standby time may be predicted (calculated) using both the data amount of the voice data and the speech time. Since an example using the utterance time and an example using both the data amount and the utterance time have already been described in the first embodiment, description thereof is omitted here.

さらに待機時間予測部１１ｂは、予測（算出）した応答生成待機時間を、受け取った音声認識結果に応じて修正する。具体的には、応答生成部１４が、通信部３を介して何らかの検索を行う必要があることを、音声認識結果が示している場合、応答情報の生成に要する時間が長くなると予想される。この場合、待機時間予測部１１ｂは、必要な検索の回数に応じて、応答生成待機時間を長くする。例えば、音声認識結果が、音声データ中に「明日の天気」、「降水確率」といった、天気予報を管理する外部サーバから情報を取得する必要がある文言が含まれていることを示している場合、応答生成待機時間を長くする。また、音声対話装置１０ｂが、ユーザを認識する（例えば応答として、ユーザの名前を呼ぶ、ユーザの方向を向く）必要があることを音声認識結果が示している場合、ユーザを識別したり、ユーザがいる場所を特定したりする必要があるため、応答情報の生成に要する時間が長くなると予想される。この場合も、待機時間予測部１１ｂは、応答生成待機時間を長くする。なお、ユーザを識別したり、ユーザのいる場所を特定したりする処理は、音声対話装置１０ｂが備えているカメラ（不図示）から動画または静止画を取得して行うことができる。そして、待機時間予測部１１ｂは、応答生成待機時間を場つなぎ動作決定部２１ｂに出力する。なお、待機時間予測部１１ｂは、音声の認識結果のデータサイズから待機時間を予測するなどの方法で、音声の認識結果のみから応答生成待機時間を予測してもよい。 Further, the standby time prediction unit 11b corrects the predicted (calculated) response generation standby time according to the received voice recognition result. Specifically, when the speech recognition result indicates that the response generation unit 14 needs to perform some kind of search via the communication unit 3, it is expected that the time required for generating the response information will be long. In this case, the standby time prediction unit 11b increases the response generation standby time according to the number of necessary searches. For example, if the voice recognition result indicates that the voice data contains words that require information to be acquired from an external server that manages the weather forecast, such as “Tomorrow's weather” and “Precipitation probability” Increase the response generation waiting time. In addition, when the voice recognition result indicates that the voice interaction device 10b needs to recognize the user (for example, call the user name as a response, face the user), the user can be identified, It is expected that the time required for generating the response information will be longer because it is necessary to specify the location where there is. Also in this case, the standby time prediction unit 11b increases the response generation standby time. Note that the process of identifying the user and specifying the location where the user is located can be performed by acquiring a moving image or a still image from a camera (not shown) provided in the voice interactive apparatus 10b. Then, the standby time prediction unit 11b outputs the response generation standby time to the operation determination unit 21b. Note that the standby time prediction unit 11b may predict the response generation standby time from only the speech recognition result by a method such as predicting the standby time from the data size of the speech recognition result.

場つなぎ動作制御部１２ｂは、実施形態１に係る場つなぎ動作制御部１２と比較して、場つなぎ動作決定部２１に代えて場つなぎ動作決定部２１ｂを含み、場つなぎ動作実行部２２に代えて場つなぎ動作実行部２２ｂを含む。 Compared to the field connection operation control unit 12 according to the first embodiment, the field connection operation control unit 12b includes a field connection operation determination unit 21b instead of the field connection operation determination unit 21, and replaces the field connection operation execution unit 22. A tethering operation execution unit 22b is included.

場つなぎ動作決定部２１ｂは、場つなぎ動作決定部２１と異なり、待機時間予測部１１ｂから音声認識待機時間または応答生成待機時間を受け取ると、音声対話装置１０ｂに実行させる場つなぎ動作を決定する。なお、決定の詳細は実施形態１に係る場つなぎ動作決定部２１と同様であるため、ここでの説明を省略する。 Unlike the field connection operation determination unit 21, the field connection operation determination unit 21b determines the field connection operation to be executed by the voice interaction device 10b when receiving the voice recognition standby time or the response generation standby time from the standby time prediction unit 11b. Note that the details of the determination are the same as those of the joining operation determination unit 21 according to the first embodiment, and thus the description thereof is omitted here.

場つなぎ動作実行部２２ｂは、場つなぎ動作実行部２２と異なり、音声認識待機時間中の場つなぎ動作の実行が完了し、場つなぎ動作決定部２１ｂから、応答生成待機時間中に音声対話装置１０ｂに実行させる場つなぎ動作の場つなぎ動作情報を受け取ると、応答生成待機時間中の場つなぎ動作を音声対話装置１０ｂに実行させる。 Unlike the field connection operation execution unit 22, the field connection operation execution unit 22b completes the execution of the field connection operation during the voice recognition standby time, and from the field connection operation determination unit 21b during the response generation standby time. When the field connection operation information of the field connection operation to be executed is received, the voice interaction device 10b is caused to execute the field connection operation during the response generation waiting time.

音声認識部１３ｂは、実施形態１に係る音声認識部１３と異なり、音声認識結果を応答生成部１４と待機時間予測部１１ｂとに出力する。 Unlike the speech recognition unit 13 according to the first embodiment, the speech recognition unit 13b outputs a speech recognition result to the response generation unit 14 and the standby time prediction unit 11b.

次に図１０を参照して、制御部１ｂが実行する応答実行処理の流れについて説明する。図１０は制御部１ｂが実行する応答実行処理の流れの一例を示すフローチャートである。 Next, the flow of response execution processing executed by the control unit 1b will be described with reference to FIG. FIG. 10 is a flowchart illustrating an example of the flow of response execution processing executed by the control unit 1b.

まず、音声入力部２は音声の入力を待機している（Ｓ３１）。音声入力部２は、ユーザが発した音声を取得すると（Ｓ３１でＹＥＳ）、取得した音声を音声データに変換し、当該音声データを音声認識部１３ｂに出力し、また当該音声データの音声付属情報を待機時間予測部１１ｂに出力する。 First, the voice input unit 2 waits for voice input (S31). When the voice input unit 2 acquires the voice uttered by the user (YES in S31), the voice input unit 2 converts the acquired voice into voice data, outputs the voice data to the voice recognition unit 13b, and the voice attached information of the voice data. Is output to the standby time prediction unit 11b.

続いて、待機時間予測部１１ｂは音声認識待機時間を予測する（Ｓ３２）。待機時間予測部１１ｂは予測した音声認識待機時間を場つなぎ動作決定部２１ｂに出力する。続いて、場つなぎ動作決定部２１ｂは場つなぎ動作決定処理を行う（Ｓ３３）。なお、場つなぎ動作決定処理の詳細については、実施形態１と同様であるためここでの説明を省略する。場つなぎ動作決定部２１ｂは、音声対話装置１０ｂに実行させると決定した場つなぎ動作を示す場つなぎ動作情報を、場つなぎ動作実行部２２ｂに出力する。そして、場つなぎ動作実行部２２ｂは、受け取った場つなぎ動作情報に応じて、音声対話装置１０ｂに場つなぎ動作を実行させる（Ｓ３４）。 Subsequently, the standby time prediction unit 11b predicts a speech recognition standby time (S32). The standby time prediction unit 11b outputs the predicted voice recognition standby time to the operation determination unit 21b. Subsequently, the field connection operation determination unit 21b performs a field connection operation determination process (S33). Note that the details of the jointing operation determination process are the same as those in the first embodiment, and thus the description thereof is omitted here. The field connection operation determination unit 21b outputs, to the field connection operation execution unit 22b, field connection operation information indicating the field connection operation determined to be executed by the voice interaction apparatus 10b. Then, the place joining operation execution unit 22b causes the voice interaction device 10b to execute the place joining operation according to the received place joining operation information (S34).

一方、音声認識部１３ｂは音声認識処理を行う（Ｓ３５）。音声認識部１３ｂは、音声認識結果を待機時間予測部１１ｂおよび応答生成部１４に出力する。 On the other hand, the voice recognition unit 13b performs voice recognition processing (S35). The voice recognition unit 13b outputs the voice recognition result to the standby time prediction unit 11b and the response generation unit 14.

待機時間予測部１１ｂは、音声認識結果を受け取ると、音声付属情報と音声認識結果とに基づいて応答生成待機時間を予測する（Ｓ３６）。待機時間予測部１１ｂは予測した応答生成待機時間を場つなぎ動作決定部２１ｂに出力する。続いて場つなぎ動作決定部２１ｂは、場つなぎ動作決定処理を行う（Ｓ３７）。場つなぎ動作決定部２１ｂは、音声対話装置１０ｂに実行させると決定した場つなぎ動作を示す場つなぎ動作情報を、場つなぎ動作実行部２２ｂに出力する。 When receiving the voice recognition result, the standby time predicting unit 11b predicts a response generation standby time based on the voice attached information and the voice recognition result (S36). The standby time prediction unit 11b outputs the predicted response generation standby time to the operation determination unit 21b. Subsequently, the field joining operation determination unit 21b performs a field joining operation determination process (S37). The field connection operation determination unit 21b outputs, to the field connection operation execution unit 22b, field connection operation information indicating the field connection operation determined to be executed by the voice interaction apparatus 10b.

一方、応答生成部１４は応答情報を生成する（Ｓ３９）。具体的には、応答生成部１４は、受け取った音声認識結果に応じた応答情報を生成し、応答実行部１５に出力する。 On the other hand, the response generation unit 14 generates response information (S39). Specifically, the response generation unit 14 generates response information corresponding to the received voice recognition result and outputs the response information to the response execution unit 15.

なお、図１０に示すように、場つなぎ動作実行部２２ｂは、ステップＳ３４の場つなぎ動作の実行が終了し、かつ、ステップＳ３７の場つなぎ動作決定処理が終了したとき、場つなぎ動作決定部２１ｂから受け取った場つなぎ動作情報に応じて、音声対話装置１０ｂに場つなぎ動作を実行させる（Ｓ３８）。ここでステップＳ３４の処理およびステップＳ３７の処理の一方が終了していない場合、場つなぎ動作実行部２２ｂは、当該処理が終了するまで待機する。 As shown in FIG. 10, when the execution of the field connection operation in step S34 is completed and the field connection operation determination process in step S37 is completed, the field connection operation execution unit 22b performs the field connection operation determination unit 21b. The voice dialogue device 10b is caused to execute the joining operation in accordance with the joining operation information received from (S38). Here, when one of the process of step S34 and the process of step S37 has not ended, the joining operation executing unit 22b waits until the process ends.

また、図１０に示すように、ステップＳ３８の処理と、ステップＳ３９の処理とは並列に行われる。つまり、応答実行部１５は、応答情報および場つなぎ動作の実行が完了した旨の通知のいずれか一方のみを受け取った場合、もう一方を受け取るまで待機する。そして、応答実行部１５は上記通知と応答情報とを受け取ると、音声対話装置１０ｂに応答を実行させる（Ｓ４０）。以上で、応答実行処理は終了する。 Also, as shown in FIG. 10, the process of step S38 and the process of step S39 are performed in parallel. That is, when only one of the response information and the notification that the execution of the joining operation is completed is received, the response execution unit 15 waits until the other is received. When the response execution unit 15 receives the notification and the response information, the response execution unit 15 causes the voice interaction device 10b to execute a response (S40). Thus, the response execution process ends.

なお、本実施形態では、音声認識処理中にも音声対話装置１０ｂに場つなぎ動作を実行させていたが、音声認識処理中には場つなぎ処理を実行させなくてもよい。この場合、音声入力部２は音声を取得すると当該音声の音声データを音声認識部１３ｂにのみ出力する。またこの場合、図１０のステップＳ３２からＳ３４は省略される。つまり、待機時間予測部１１ｂが予測する待機時間は、応答生成待機時間のみとなる。 In the present embodiment, the joining operation is executed in the voice interaction device 10b during the speech recognition process, but the joining process may not be executed during the speech recognition process. In this case, when the voice input unit 2 acquires the voice, the voice input unit 2 outputs the voice data of the voice to only the voice recognition unit 13b. In this case, steps S32 to S34 in FIG. 10 are omitted. That is, the standby time predicted by the standby time prediction unit 11b is only the response generation standby time.

〔変形例〕
上述した実施形態１では、ユーザが発した音声の音声認識、および当該音声に対する応答情報の生成は、いずれも音声対話制御装置（制御部１）にて行っていたが、これらの処理は音声対話装置１０と通信可能な外部装置（外部サーバ、不図示）が行ってもよい。つまり、音声対話装置１０は、音声を取得すると音声データに変換し、通信部３を介して当該音声データを外部装置に送信する。外部装置は、音声認識および応答情報の生成を行い、応答情報を音声対話装置１０に送信する。また、第１許容時間Ｘまたは第２許容時間Ｙのいずれか一方のみを用いた判定を行ってもよい。また、場つなぎ動作決定部２１は、条件を満たす場つなぎ動作情報が複数ある場合、減算値Ｔ_Ｎ（または符号変更値−Ｔ_Ｎ）の値がより小さいものを選択していたが、この例に限定されるものではない。例えば、場つなぎ動作テーブル６１に各場つなぎ動作を最後に実行した日時を示す履歴情報を格納するカラムがあり、条件を満たす場つなぎ動作情報が複数ある場合は、当該履歴情報がより古い日時を示しているものを選択してもよい。また、待機時間予測部１１が予測した待機時間よりも早く応答情報の生成が完了した場合、場つなぎ動作実行部２２は、実行している場つなぎ動作の速度を速めて、当該動作の実行が早く完了するようにしてもよい。具体的には、場つなぎ動作実行部２２は、応答実行部１５から応答の出力が可能となったことを通知されたとき、実行している場つなぎ動作の速度を速めて実行する。なお、これらの変形例は実施形態２および３にも適用可能である。 [Modification]
In Embodiment 1 described above, the voice recognition of the voice uttered by the user and the generation of response information for the voice are both performed by the voice dialogue control device (control unit 1). An external device (external server, not shown) that can communicate with the device 10 may perform the processing. That is, when the voice interaction apparatus 10 acquires voice, it converts it into voice data, and transmits the voice data to an external device via the communication unit 3. The external device performs voice recognition and response information generation, and transmits the response information to the voice interaction device 10. Alternatively, the determination using only one of the first allowable time X and the second allowable time Y may be performed. In addition, in the case where there are a plurality of pieces of place joining operation information that satisfy the condition, the place joining operation determining unit 21 selects the one having a smaller subtraction value T _N (or sign change value −T _N ). It is not limited to. For example, if there is a column for storing history information indicating the date and time when each place-joining operation was last executed in the place-joining operation table 61, and there are a plurality of place-joining action information that satisfy the conditions, the date and time when the history information is older. You may choose what is shown. Further, when the generation of response information is completed earlier than the standby time predicted by the standby time prediction unit 11, the joining operation execution unit 22 increases the speed of the joining operation that is being executed, and the execution of the operation is performed. It may be completed early. Specifically, when the response execution unit 15 is notified that the response can be output, the connection operation execution unit 22 increases the speed of the connection operation performed. These modifications can also be applied to the second and third embodiments.

また、上述した実施形態１において、待機時間予測部１１が予測した待機時間よりも応答情報の生成に時間がかかった場合、待機時間予測部１１は、音声認識部１３が行った音声認識結果を用いて、再度待機時間を予測してもよい。そして、新たな待機時間が以前の待機時間よりも長くなる場合、場つなぎ動作決定部２１は、再度場つなぎ動作の決定を行ってもよい。なお、この変形例は実施形態２にも適用可能である。 In the above-described first embodiment, when the response information generation takes longer than the standby time predicted by the standby time prediction unit 11, the standby time prediction unit 11 displays the voice recognition result performed by the voice recognition unit 13. It may be used to predict the waiting time again. When the new standby time becomes longer than the previous standby time, the joining operation determining unit 21 may determine the joining operation again. This modification can also be applied to the second embodiment.

また、上述した実施形態１において、場つなぎ動作決定部２１が決定した場つなぎ動作の種別が「音声」である場合、決定した場つなぎ動作より場つなぎ動作時間が短く、かつ種別が「身振り」である場つなぎ動作情報を選択し、２つの場つなぎ動作情報を組み合わせて場つなぎ動作実行部２２に出力してもよい。同様に、決定した場つなぎ動作の種別が「身振り」である場合は、決定した場つなぎ動作より場つなぎ動作時間が短く、かつ種別が「音声」である場つなぎ動作情報を選択し、２つの場つなぎ動作情報を組み合わせて場つなぎ動作実行部２２に出力してもよい。例えば、図２のＮｏ．７の場つなぎ動作情報が示す場つなぎ動作（種別：身振り、「起き上がる」動作を行う）を、音声対話装置１０が実行する場つなぎ動作と決定した場合、場つなぎ動作決定部２１は、例えば図２のＮｏ．４の場つなぎ動作情報が示す場つなぎ動作（種別：音声、「ちょっと待ってね」と発話する）を音声対話装置１０が実行する場つなぎ動作としてさらに決定し、これらの場つなぎ動作情報を場つなぎ動作実行部２２に出力する。場つなぎ動作実行部２２は、この情報を受けて、音声対話装置１０に「『ちょっと待ってね』と発話しながら『起き上がる』動作」を実行させる。これにより、場つなぎ動作のバリエーションが増え、ユーザを飽きさせないようにすることができる。なお、この変形例は実施形態２および３にも適用可能である。 In the above-described first embodiment, when the type of the field connection operation determined by the field connection operation determination unit 21 is “voice”, the field connection operation time is shorter than the determined field connection operation and the type is “gesture”. May be selected, and the two pieces of field connection operation information may be combined and output to the field connection operation execution unit 22. Similarly, when the determined type of the field transition operation is “gesture”, the field transition operation time is shorter than the determined field transition operation and the type is “speech”, and The joining operation information may be combined and output to the joining operation execution unit 22. For example, in FIG. In the case where the place-joining operation (type: gesture, performing “getting up” action) indicated by the place-joining action information in FIG. No. 2 4 is further determined as a place-joining operation executed by the voice interaction device 10, and the place-joining action information indicated by the place-joining action information indicated by the place-joining action information in FIG. The data is output to the connecting operation execution unit 22. Upon receiving this information, the place-linking operation execution unit 22 causes the voice interaction apparatus 10 to execute “an operation to“ wake up ”while speaking“ Please wait a moment ””. As a result, variations in the joining operation can be increased and the user can be prevented from getting bored. This modification can also be applied to the second and third embodiments.

〔ソフトウェアによる実現例〕
音声対話制御装置、すなわち制御部１、制御部１ａ、および制御部１ｂは、集積回路（ＩＣチップ）等に形成された論理回路（ハードウェア）によって実現してもよいし、ＣＰＵ（Central Processing Unit）を用いてソフトウェアによって実現してもよい。 [Example of software implementation]
The voice interaction control device, that is, the control unit 1, the control unit 1a, and the control unit 1b may be realized by a logic circuit (hardware) formed in an integrated circuit (IC chip) or the like, or may be a CPU (Central Processing Unit). ) May be implemented by software.

後者の場合、音声対話装置１０は、各機能を実現するソフトウェアであるプログラムの命令を実行するＣＰＵ、上記プログラムおよび各種データがコンピュータ（またはＣＰＵ）で読み取り可能に記録されたＲＯＭ（Read Only Memory）または記憶装置（これらを「記録媒体」と称する）、上記プログラムを展開するＲＡＭ（Random Access Memory）などを備えている。そして、コンピュータ（またはＣＰＵ）が上記プログラムを上記記録媒体から読み取って実行することにより、本発明の目的が達成される。上記記録媒体としては、「一時的でない有形の媒体」、例えば、テープ、ディスク、カード、半導体メモリ、プログラマブルな論理回路などを用いることができる。また、上記プログラムは、該プログラムを伝送可能な任意の伝送媒体（通信ネットワークや放送波等）を介して上記コンピュータに供給されてもよい。なお、本発明は、上記プログラムが電子的な伝送によって具現化された、搬送波に埋め込まれたデータ信号の形態でも実現され得る。 In the latter case, the voice interaction device 10 includes a CPU that executes instructions of a program that is software that realizes each function, and a ROM (Read Only Memory) in which the program and various data are recorded so as to be readable by a computer (or CPU). Alternatively, a storage device (these are referred to as “recording media”), a RAM (Random Access Memory) that expands the program, and the like are provided. And the objective of this invention is achieved when a computer (or CPU) reads the said program from the said recording medium and runs it. As the recording medium, a “non-temporary tangible medium” such as a tape, a disk, a card, a semiconductor memory, a programmable logic circuit, or the like can be used. The program may be supplied to the computer via an arbitrary transmission medium (such as a communication network or a broadcast wave) that can transmit the program. The present invention can also be realized in the form of a data signal embedded in a carrier wave in which the program is embodied by electronic transmission.

〔まとめ〕
本発明の態様１に係る音声対話制御装置（制御部１）は、音声対話装置（１０）が、ユーザが発した音声を取得した後の所定の時点から、当該音声に対する応答が出力可能になるまでの待機時間を予測する待機時間予測部（１１）と、上記待機時間予測部が予測した上記待機時間と、上記音声対話装置が実行可能な動作を示す複数の動作候補それぞれの実行に要する動作時間とに基づいて、上記複数の動作候補から１または複数を場つなぎ動作として選択する場つなぎ動作決定部（２１）と、上記場つなぎ動作決定部が選択した上記場つなぎ動作を上記音声対話装置に実行させる場つなぎ動作実行部（２２）と、を備える。 [Summary]
The voice conversation control device (control unit 1) according to the first aspect of the present invention can output a response to the voice from a predetermined time after the voice dialogue device (10) acquires the voice uttered by the user. The waiting time prediction unit (11) that predicts the waiting time until the waiting time, the waiting time predicted by the waiting time prediction unit, and the operations required to execute each of a plurality of motion candidates indicating the operations that can be performed by the voice interactive device And, based on the time, select one or more of the plurality of motion candidates as a field connection operation, a field connection operation determination unit (21), and the field connection operation selected by the field connection operation determination unit as the field connection operation. And a connection operation execution unit (22) to be executed.

上記の構成によれば、待機時間と、複数の動作候補それぞれの実行に要する動作時間とに基づいて、複数の動作候補から場つなぎ動作を決定するので、音声対話装置に、待機時間の長さに応じた場つなぎ動作を実行させることができる。例えば、待機時間が短い場合には、「えっと。」と発話するなどの短い時間で完了する場つなぎ動作を実行させる。また、待機時間が長い場合には、「腕を組んで首を上下する動作を定期的に繰り返す」動作の実行などの、完了までに長い時間がかかる場つなぎ動作を実行させる。よって、ユーザと音声対話装置とのコミュニケーションの柔軟性を向上させることができる。 According to the above configuration, since the joining operation is determined from the plurality of motion candidates based on the standby time and the operation time required for each of the plurality of motion candidates, the length of the standby time is determined in the voice interaction device. It is possible to execute the connecting operation according to the situation. For example, when the waiting time is short, a joining operation is executed in a short time such as uttering “um”. When the waiting time is long, a joining operation such as a “repeating periodically the operation of raising and lowering the neck with arms folded” is executed when it takes a long time to complete. Therefore, the flexibility of communication between the user and the voice interaction device can be improved.

なお、「ユーザが発した音声を取得した後の所定の時点」とは、待機時間の始点を示す文言である。当該文言が示す待機時間の始点は、ユーザが発した音声を取得した時点であってもよいし、音声を取得してから所定時間後の時点であってもよいし、音声認識処理を完了して、応答の生成を開始した時点であってもよい。つまり、待機時間の始点は、ユーザが発した音声を取得してから、当該音声に対する応答が出力可能になるまでの間において任意に設定可能である。 The “predetermined time point after acquiring the voice uttered by the user” is a word indicating the start point of the standby time. The start point of the waiting time indicated by the wording may be the time when the voice uttered by the user is acquired, may be the time after a predetermined time after acquiring the voice, or the voice recognition process is completed. It may be the time when the generation of the response is started. In other words, the start point of the waiting time can be arbitrarily set after the voice uttered by the user is acquired until the response to the voice can be output.

本発明の態様２に係る音声対話制御装置は、上記態様１において、上記場つなぎ動作決定部は、上記複数の動作候補のうち、上記動作時間を上記待機時間から減算した第１減算値が、０以上かつ、上記場つなぎ動作の実行が完了してから上記応答が出力可能になるまでに上記音声対話装置が動作しない時間として許容できる時間を示す第１許容時間以下である上記動作候補を、上記場つなぎ動作として決定してもよい。 In the voice interaction control device according to aspect 2 of the present invention, in the aspect 1 described above, the joining motion determination unit includes a first subtraction value obtained by subtracting the operation time from the standby time among the plurality of operation candidates. The motion candidates that are equal to or greater than 0 and less than or equal to a first permissible time indicating a time that is acceptable as a time during which the voice interactive device does not operate until the response can be output after the execution of the splicing operation is completed. It may be determined as the above-described place-joining operation.

上記の構成によれば、待機時間が動作時間より長く、待機時間と動作時間との差が第１許容時間以下となる動作を場つなぎ動作として決定するので、場つなぎ動作が終了した後から応答を実行するまでの間で、音声対話装置が動作しない時間を最低限とすることができる。これにより、ユーザが音声対話装置とのコミュニケーションにおいてストレスを感じることを防ぐことができる。 According to the above configuration, since the standby time is longer than the operation time, and the operation in which the difference between the standby time and the operation time is equal to or less than the first allowable time is determined as the joining operation, the response is made after the joining operation is finished. The time during which the voice interactive apparatus does not operate can be minimized until the process is executed. Thereby, it can prevent that a user feels stress in communication with a voice interactive apparatus.

本発明の態様３に係る音声対話制御装置は、上記態様２において、上記第１許容時間は、上記動作候補ごとに、または、上記音声の時間および上記音声から生成された音声データのサイズの少なくとも一方に応じて設定されてもよい。 In the voice interaction control device according to aspect 3 of the present invention, in the aspect 2, the first allowable time is at least one of the motion candidates or the time of the sound and the size of the sound data generated from the sound. It may be set according to one.

場つなぎ動作が終了した後から応答を実行するまでの音声対話装置が動作しない時間について、ユーザが許容できる時間は動作ごとに異なると考えられる。そこで上記の構成によれば、第１許容時間を動作候補ごとに設定するので、動作に応じた許容時間とすることができる。例えば、動作時間が短い動作の第１許容時間を短くしたり、動作時間が長い動作の第１許容時間を長くしたりすることで、動作に応じた許容時間とすることができる。 Regarding the time during which the voice interactive device does not operate after the end of the connecting operation until the response is executed, the time that the user can accept is considered to be different for each operation. Therefore, according to the above configuration, the first allowable time is set for each motion candidate, so that the allowable time according to the motion can be set. For example, the allowable time according to the operation can be set by shortening the first allowable time for an operation with a short operation time or increasing the first allowable time for an operation with a long operation time.

また、音声の時間および音声から生成された音声データのサイズが大きい場合、実際の待機時間において、外部からの影響を受けやすくなるため、予測した待機時間とのずれが大きくなると考えられる。そこで上記の構成によれば、第１許容時間を音声の時間および音声から生成された音声データのサイズの少なくとも一方に応じて設定するので、実際の待機時間と予測した待機時間とのずれを考慮した場つなぎ動作の選択を可能とすることができる。例えば、音声データのサイズが大きい場合に、第１許容時間を長くすれば、実際の待機時間が予測した待機時間を超過した場合でも、音声対話装置が動作しない時間を短くすることができる。 Further, when the time of the sound and the size of the sound data generated from the sound are large, the actual standby time is likely to be affected by the outside, so that the deviation from the predicted standby time is considered to be large. Therefore, according to the above configuration, since the first allowable time is set according to at least one of the time of the sound and the size of the sound data generated from the sound, the difference between the actual waiting time and the predicted waiting time is taken into consideration. In this case, it is possible to select a connection operation. For example, when the size of the voice data is large, if the first allowable time is increased, the time during which the voice interactive apparatus does not operate can be shortened even if the actual standby time exceeds the predicted standby time.

本発明の態様４に係る音声対話制御装置は、上記態様１から３のいずれかにおいて、上記場つなぎ動作決定部は、上記複数の動作候補のうち、上記待機時間を上記動作時間から減算した第２減算値が、０以上かつ、上記応答が出力可能になってから上記場つなぎ動作の実行が完了するまでの時間として許容できる時間を示す第２許容時間以下である上記動作候補を、上記場つなぎ動作として決定してもよい。 In the voice interaction control device according to aspect 4 of the present invention, in any one of the aspects 1 to 3, the stage connection operation determining unit is configured to subtract the waiting time from the operation time among the plurality of operation candidates. 2 If the subtraction value is 0 or more and the motion candidate is less than a second allowable time indicating a time that is acceptable as the time from when the response can be output to when the execution of the connection operation is completed, The connection operation may be determined.

上記の構成によれば、動作時間が待機時間より長く、待機時間と動作時間との差が第２許容時間以下となる動作を場つなぎ動作として決定するので、応答の生成が完了してから場つなぎ動作が完了するまでの時間を最低限とすることができる。これにより、応答の生成が完了してから応答の実行までの時間を最低限とすることができる。 According to the above configuration, since the operation time is longer than the standby time and the operation in which the difference between the standby time and the operation time is equal to or less than the second allowable time is determined as the bridging operation, the operation is performed after the generation of the response is completed. The time required for completing the joining operation can be minimized. Thereby, the time from the completion of the generation of the response to the execution of the response can be minimized.

本発明の態様５に係る音声対話制御装置は、上記態様４において、上記第２許容時間は、上記動作候補ごとに、または、上記音声の時間および上記音声から生成された音声データのサイズの少なくとも一方に応じて設定されてもよい。 In the voice interaction control device according to aspect 5 of the present invention, in the above aspect 4, the second allowable time is at least one of the motion candidates or the time of the sound and the size of the sound data generated from the sound. It may be set according to one.

場つなぎ動作が実行される時間について、ユーザが許容できる時間は動作ごとに異なると考えられる。そこで上記の構成によれば、第２許容時間を動作候補ごとに設定するので、動作に応じた許容時間とすることができる。例えば、動作時間が長い動作の第２許容時間を短くすることで、長時間の場つなぎ動作の後で、音声対話装置がすぐに応答を返すこととなる。これにより、ユーザが音声対話装置とのコミュニケーションにおいてストレスを感じることを防ぐことができる。 Regarding the time at which the jointing operation is executed, the time that the user can accept is considered to be different for each operation. Therefore, according to the above configuration, the second allowable time is set for each motion candidate, so that the allowable time according to the motion can be set. For example, by shortening the second permissible time of an operation having a long operation time, the voice interactive apparatus immediately returns a response after a long-time connection operation. Thereby, it can prevent that a user feels stress in communication with a voice interactive apparatus.

また、音声の時間および音声から生成された音声データのサイズが大きい場合、実際の待機時間において、外部からの影響を受けやすくなるため、予測した待機時間とのずれが大きくなると考えられる。そこで上記の構成によれば、第２許容時間を音声の時間および音声から生成された音声データのサイズの少なくとも一方に応じて設定するので、実際の待機時間と予測した待機時間とのずれを考慮した場つなぎ動作の選択を可能とすることができる。例えば、音声データのサイズが大きい場合に、第２許容時間を長くすれば、実際の待機時間が予測した待機時間を超過した場合でも、応答の生成が完了した後、音声対話装置がすぐに応答を返すことができる。 Further, when the time of the sound and the size of the sound data generated from the sound are large, the actual standby time is likely to be affected by the outside, so that the deviation from the predicted standby time is considered to be large. Therefore, according to the above configuration, since the second allowable time is set according to at least one of the time of the sound and the size of the sound data generated from the sound, the difference between the actual waiting time and the predicted waiting time is considered. In this case, it is possible to select a connection operation. For example, if the second allowable time is increased when the size of the voice data is large, even if the actual standby time exceeds the predicted standby time, the voice interactive device immediately responds after the generation of the response is completed. Can be returned.

本発明の態様６に係る音声対話制御装置は、上記態様１から５のいずれかにおいて、上記待機時間予測部は、上記音声の時間および上記音声から生成された音声データのサイズの少なくとも一方を用いて、上記待機時間を予測してもよい。 In the voice interaction control device according to aspect 6 of the present invention, in any one of the aspects 1 to 5, the standby time prediction unit uses at least one of the time of the sound and the size of the sound data generated from the sound. Thus, the waiting time may be predicted.

音声の時間が長いまたは音声データのサイズが大きいと、応答生成に要する時間が長くなると考えられる。そこで上記の構成によれば、音声の時間および音声データのサイズの少なくとも一方を用いて待機時間を予測する。これにより、待機時間を高精度で予測ことができる。 If the voice time is long or the voice data size is large, it is considered that the time required for generating a response becomes long. Therefore, according to the above configuration, the standby time is predicted using at least one of the audio time and the audio data size. As a result, the standby time can be predicted with high accuracy.

本発明の態様７に係る音声対話制御装置は、上記態様６において、上記待機時間予測部は、上記待機時間の予測に、さらに上記音声の認識結果を用いてもよい。 In the voice interaction control device according to aspect 7 of the present invention, in the aspect 6, the standby time prediction unit may further use the speech recognition result for the prediction of the standby time.

上記の構成によれば、音声の時間および音声データのサイズに加え、さらに音声の認識結果を用いて待機時間を予測するので、音声の内容に応じた待機時間の予測を行うことができ、さらに高精度な予測が可能となる。例えば、音声認識を行った結果、外部サーバにて管理されている情報の検索などを実行する必要がある場合、応答生成に時間がかかることが予想される。このような場合に待機時間を長く予測することで、待機時間と実際の応答生成に要する時間との間のずれをさらに少なくすることができる。 According to the above configuration, since the standby time is predicted using the voice recognition result in addition to the voice time and the size of the voice data, the standby time can be predicted according to the contents of the voice. Precise prediction is possible. For example, if it is necessary to search information managed by an external server as a result of voice recognition, it is expected that it takes time to generate a response. In such a case, it is possible to further reduce the difference between the standby time and the time required for actual response generation by predicting the standby time longer.

本発明の態様８に係る音声対話制御装置は、上記態様１から７のいずれかにおいて、上記動作候補には、上記動作時間と共に、当該動作時間の変更可能範囲が予め設定されており、上記場つなぎ動作決定部は、上記動作時間を上記変更可能範囲内で変更した動作時間に基づいて、上記動作候補を選択してもよい。 In the voice interaction control device according to aspect 8 of the present invention, in any one of the above aspects 1 to 7, the motion candidate is preliminarily set with a changeable range of the operation time together with the operation time. The bridging motion determination unit may select the motion candidate based on the motion time in which the motion time is changed within the changeable range.

上記の構成によれば、実行開始から終了までの時間を変更することができるので、動作時間が固定的であった場合には選択できなかった場つなぎ動作を選択できるようになる。よって、場つなぎ動作のバリエーションを増やすことができる。 According to the above configuration, since the time from the start to the end of execution can be changed, it is possible to select a joint operation that cannot be selected when the operation time is fixed. Therefore, the variation of the joining operation can be increased.

本発明の態様９に係る音声対話制御装置は、上記態様１から８のいずれかにおいて、上記場つなぎ動作決定部は、上記複数の動作候補から、２以上の動作候補を組み合わせて選択してもよい。 In the voice conversation control device according to aspect 9 of the present invention, in any of the above aspects 1 to 8, the connection motion determination unit may select a combination of two or more motion candidates from the plurality of motion candidates. Good.

上記の構成によれば、２以上の動作候補を組み合わせて選択するので、待機時間中に１つの場つなぎ動作を実行したり、２以上の場つなぎ動作を実行したりすることができる。よって、場つなぎ動作のバリエーションを増やすことができる。 According to the above configuration, since two or more motion candidates are selected in combination, it is possible to execute one place connecting operation or to execute two or more place connecting operations during the standby time. Therefore, the variation of the joining operation can be increased.

本発明の態様１０に係る音声対話制御装置は、上記態様９において、上記場つなぎ動作実行部は、上記音声に対する応答が出力可能になった時点で、上記場つなぎ動作決定部により選択され、かつ、実行を開始していない上記場つなぎ動作がある場合、当該場つなぎ動作の実行をキャンセルしてもよい。 In the voice interaction control device according to aspect 10 of the present invention, in the above aspect 9, the field connection operation execution unit is selected by the field connection operation determination unit when a response to the sound can be output, and If there is the above-described joining operation that has not started execution, the execution of the joining operation may be canceled.

上記の構成によれば、応答が出力可能になった時点で、実行を開始していない動作場つなぎ動作の実行をキャンセルするので、応答が出力可能となってから応答の実行までの時間を最低限とすることができる。 According to the above configuration, when the response can be output, the execution of the connection operation that has not started execution is canceled, so the time from when the response can be output until the response is executed is minimized. Limit.

本発明の態様１１に係る音声対話制御装置の制御方法は、音声対話装置に実行させる音声対話制御装置の制御方法であって、音声対話装置が、ユーザが発した音声を取得した後の所定の時点から、当該音声に対する応答が出力可能になるまでの待機時間を予測する待機時間予測ステップ（Ｓ２）と、上記待機時間予測ステップにて予測された上記待機時間と、上記音声対話装置が実行可能な動作を示す複数の動作候補それぞれの実行に要する動作時間とに基づいて、上記複数の動作候補から１または複数を場つなぎ動作として選択する場つなぎ動作決定ステップ（Ｓ１３、Ｓ１５、Ｓ１６）と、場つなぎ動作決定ステップにて選択された上記場つなぎ動作を上記音声対話装置に実行させる場つなぎ動作実行ステップ（Ｓ４）と、を含む。この制御方法によれば、態様１に係る音声対話制御装置と同様の作用効果を有する。 A control method for a voice interaction control device according to an aspect 11 of the present invention is a control method for a voice interaction control device to be executed by a voice interaction device, wherein the voice interaction device obtains a predetermined sound after acquiring a voice uttered by a user. A standby time prediction step (S2) for predicting a standby time from the time point until the response to the voice can be output, the standby time predicted in the standby time prediction step, and the voice interactive device can be executed Based on the operation time required to execute each of a plurality of motion candidates indicating a correct motion, a joint motion determining step (S13, S15, S16) for selecting one or more of the plurality of motion candidates as a joint motion; A step of performing a jointing operation (S4) for causing the voice interaction apparatus to execute the jointing operation selected in the jointing operation determining step. According to this control method, the same function and effect as those of the spoken dialogue control apparatus according to aspect 1 are obtained.

本発明の態様１２に係る音声対話装置は、上記態様１から１０のいずれかに係る音声対話制御装置を備えてもよい。上記の構成によれば、この音声対話装置は、ユーザとのコミュニケーションの柔軟性を向上させることができる。 The voice interaction apparatus according to aspect 12 of the present invention may include the voice interaction control apparatus according to any one of aspects 1 to 10 described above. According to the above configuration, this voice interaction apparatus can improve the flexibility of communication with the user.

本発明の各態様に係る音声対話制御装置は、コンピュータによって実現してもよく、この場合には、コンピュータを上記音声対話制御装置が備える各部（ソフトウェア要素）として動作させることにより上記音声対話制御装置をコンピュータにて実現させる音声対話制御装置の制御プログラム、およびそれを記録したコンピュータ読み取り可能な記録媒体も、本発明の範疇に入る。 The voice conversation control device according to each aspect of the present invention may be realized by a computer. In this case, the voice conversation control device is operated by causing the computer to operate as each unit (software element) included in the voice dialogue control device. The control program of the voice interaction control device that realizes the above in a computer and a computer-readable recording medium on which the control program is recorded also fall within the scope of the present invention.

本発明は上述した各実施形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能であり、異なる実施形態にそれぞれ開示された技術的手段を適宜組み合わせて得られる実施形態についても本発明の技術的範囲に含まれる。さらに、各実施形態にそれぞれ開示された技術的手段を組み合わせることにより、新しい技術的特徴を形成することができる。 The present invention is not limited to the above-described embodiments, and various modifications are possible within the scope shown in the claims, and embodiments obtained by appropriately combining technical means disclosed in different embodiments. Is also included in the technical scope of the present invention. Furthermore, a new technical feature can be formed by combining the technical means disclosed in each embodiment.

本発明は、ユーザの発話に対して応答する音声対話装置を制御するための音声対話制御装置に利用することができる。 INDUSTRIAL APPLICABILITY The present invention can be used for a voice dialogue control device for controlling a voice dialogue device that responds to a user's utterance.

１、１ａ、１ｂ制御部（音声対話制御装置）、１０、１０ａ、１０ｂ音声対話装置、１１、１１ｂ待機時間予測部、２１、２１ａ、２１ｂ場つなぎ動作決定部、２２、２２ａ、２２ｂ場つなぎ動作実行部、Ｓ２待機時間予測ステップ、Ｓ１３、Ｓ１５、Ｓ１６場つなぎ動作決定ステップ、Ｓ４場つなぎ動作実行ステップ 1, 1a, 1b Control unit (voice dialogue control device) 10, 10a, 10b Voice dialogue device, 11, 11b Standby time prediction unit, 21, 21a, 21b Field connection operation determination unit, 22, 22a, 22b Field connection operation Execution unit, S2 standby time prediction step, S13, S15, S16 field connection operation determination step, S4 field connection operation execution step

Claims

A standby time prediction unit that predicts a standby time from when a voice interaction device acquires a voice uttered by a user until a response to the voice can be output;
Based on the standby time predicted by the standby time prediction unit and the operation time required to execute each of the plurality of operation candidates indicating the operations that can be performed by the voice interactive device, one or more of the plurality of operation candidates is selected. A place-joining operation determining unit to select as a place-joining operation;
A field connection operation execution unit that causes the voice interaction device to execute the field connection operation selected by the field connection operation determination unit ;
The field transition operation determination unit is configured to perform the field transition operation after the second subtraction value obtained by subtracting the waiting time from the operation time is 0 or more and the response can be output among the plurality of motion candidates. execution of the second permissible time less is the operation candidate indicating a time acceptable as the time to complete the voice interaction control apparatus to said Rukoto determined as the field joint operation.

The field connection operation determination unit has a first subtraction value obtained by subtracting the operation time from the standby time among the plurality of operation candidates is 0 or more, and the response after the execution of the field connection operation is completed. 2. The motion candidate that is equal to or shorter than a first allowable time indicating a time that is allowable as a time during which the voice interactive device does not operate until the voice interactive device can be output is determined as the jointing operation. Voice dialogue control device.

3. The voice dialogue according to claim 1, wherein the waiting time prediction unit predicts the waiting time by using at least one of the time of the voice and the size of voice data generated from the voice. Control device.

In the operation candidate, a changeable range of the operation time is set in advance together with the operation time,
The field joint operation determination unit, the operation time based on operating time was changed within the change range, according to any one of claims 1 to 3, characterized in that selecting the operation candidate Spoken dialogue control device.

A method for controlling a voice dialogue control device to be executed by a voice dialogue device, comprising:
A standby time prediction step of predicting a standby time from when a voice interaction device acquires a voice uttered by a user until a response to the voice can be output;
Based on the standby time predicted in the standby time prediction step and the operation time required to execute each of the plurality of operation candidates indicating the operations that can be executed by the voice interactive device, one or more of the plurality of operation candidates is A step of determining a joining operation for selecting a plurality of joining operations as a joining operation;
And situ connecting operation execution step of executing the field connecting operation selected by the field joint operation determining step in the voice dialogue system, only including,
In the step of linking operation, the second operation of subtracting the waiting time from the operation time is 0 or more among the plurality of operation candidates, and the linking operation is performed after the response can be output. A control method for a spoken dialogue control apparatus, wherein the motion candidate that is equal to or shorter than a second permissible time indicating a time that is permissible as a time until the execution of the voice is completed is determined as the jointing motion .

Voice dialogue system, characterized in that it comprises a voice conversation control apparatus according to any one of claims 1 to 4.