JP2015148648A

JP2015148648A - Dialogue system, speech controller, dialog unit, speech control method, control program of speech controller and control program of dialog unit

Info

Publication number: JP2015148648A
Application number: JP2014019742A
Authority: JP
Inventors: 章友大西; Akitomo Onishi; 広瀬　斉志; Tadashi Hirose; 斉志広瀬; 千葉　雅裕; Masahiro Chiba; 雅裕千葉; 佳世森長; Kayo Morinaga; 友宏相曽; Tomohiro Aiso; 和典柴田; Kazunori Shibata
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2014-02-04
Filing date: 2014-02-04
Publication date: 2015-08-20

Abstract

PROBLEM TO BE SOLVED: To prevent erroneous recognition or erroneous operation by detecting circumstances under which erroneous recognition of a voice or erroneous operation caused therefrom may occur.SOLUTION: A dialogue system 401 is a system for controlling a dialog unit which outputs a response voice responding to a voice output by a user. A speech control server 1 includes: a condition determination section 56 that, when a dialog unit 2 erroneously detects a voice other than a recognition object as the recognition object, determines a voice erroneous recognition condition is satisfied; and a speech control section 57 that, when the voice erroneous recognition condition is satisfied, controls not to output any voice. The determination whether or not the condition is satisfied is made based on operation information of a TV 3.

Description

本発明は、音声を認識する対話システムに関し、特に、ユーザが発した音声を認識して、該音声に対し返答音声を出力する対話システムに関する。 The present invention relates to a dialog system that recognizes voice, and more particularly to a dialog system that recognizes voice uttered by a user and outputs a response voice to the voice.

従来、ユーザが発した音声に対し、音声認識結果に応じた処理を行う対話装置がある。 2. Description of the Related Art Conventionally, there is an interactive device that performs processing corresponding to a voice recognition result for voice uttered by a user.

例えば、特許文献１には、通常会話モードと無線通信装置使用モード（音声入力によってハンズフリーで操作するモード）との切り替えを、音声（例えば、「モード切替」という特定のキーワード）の入力によって行う携帯型無線通信装置が開示されている。これにより、携帯型無線通信装置に対する指示のつもりでなく、ユーザが発した通常の会話時の音声を指示と認識してしまい誤動作するという問題を回避している。 For example, in Patent Document 1, switching between a normal conversation mode and a wireless communication device usage mode (a mode in which hands-free operation is performed by voice input) is performed by inputting voice (for example, a specific keyword “mode switching”). A portable wireless communication device is disclosed. This avoids the problem that the user does not intend to give an instruction to the portable wireless communication device but erroneously recognizes the voice of the normal conversation made by the user as the instruction.

また、運転中のユーザの安全に配慮してユーザに会話させるための情報処理システムが開示されている。具体的には、特許文献２には、車両のセンサ情報およびカーナビ情報に基づいて走行状況を把握し、当該走行状況を説明する音データを、入力された運転者の音声データに付加する通信制御装置が開示されている。これにより、上記運転者と通話する車外の通話者に対して、運転者の走行状況を知らせることができ、走行状況を考慮した会話を行わせることができる。 In addition, an information processing system for allowing a user to talk in consideration of the safety of the user during driving is disclosed. Specifically, in Patent Document 2, communication control for grasping a traveling situation based on vehicle sensor information and car navigation information and adding sound data describing the traveling situation to input voice data of a driver. An apparatus is disclosed. As a result, it is possible to inform the caller outside the vehicle who talks with the driver about the driving situation of the driver and to have a conversation in consideration of the driving situation.

さらに、特許文献３には、話者の感性に即した円滑な対話を行うことができる音声対話装置が開示されている。具体的には、音声対話装置は、通常状態では、ユーザの発話速度に応じた速度で応答音声を出力する一方、当該応答音声を出力中に所定のイベントが発生したときに、該応答音声の出力速度を高速化あるいは低速化する。さらに、音声対話装置は、車両の走行状態に応じて、応答音声の提供を見合わせる待機状態を呈する。 Furthermore, Patent Document 3 discloses a voice dialogue apparatus that can perform a smooth dialogue in accordance with the sensitivity of a speaker. Specifically, in a normal state, the voice interaction device outputs a response voice at a speed corresponding to the user's speaking speed, and when a predetermined event occurs while outputting the response voice, Increase or decrease the output speed. Furthermore, the voice interactive apparatus exhibits a standby state in which the provision of the response voice is delayed according to the traveling state of the vehicle.

特開２００２−３３７９４号公報（２００２年１月３１日公開）JP 2002-33794 A (published January 31, 2002) 特開２００６−３５２３５６号公報（２００６年１２月２８日公開）JP 2006-352356 A (released on December 28, 2006) 特開２００８−２６４６３号公報（２００８年２月７日公開）JP 2008-26463 A (published February 7, 2008)

上述の従来技術においては、ユーザまたは対話装置が置かれている環境によっては、対話装置の音声認識機能が精度良く働かずに、入力された音声を誤認識し、誤認識に伴って誤作動（誤った返答音声を出力してしまうなど）起こしたりする可能性がある。このような誤認識または誤作動が起こり得る環境としては、例えば、別の音声出力機器から音声が出力され、その音声を対話装置が検出してしまう環境、あるいは、ユーザが対話装置に対してではなく、他の対象（人または機器）に向けて発話をしている環境などが考えられる。 In the above-described prior art, depending on the environment in which the user or the interactive device is placed, the speech recognition function of the interactive device does not work accurately, and the input speech is misrecognized. (Such as outputting an incorrect answer voice). As an environment in which such erroneous recognition or malfunction may occur, for example, an environment in which voice is output from another voice output device and the voice is detected by the dialogue device, or the user does not respond to the dialogue device. There may be an environment where the user is speaking to another object (person or device).

しかしながら、上述の従来技術においては、誤認識または誤作動が起こり得る上述のような環境を検出できないため、誤認識または誤作動を回避するための適切な処理を実行できないという問題がある。具体的には、特許文献１の技術では、対話装置を（音声を認識しない）モードへ切り替えるためには、ユーザによる切り替えを指示するための意図的な
操作（音声入力）が必要となる。さらに、ユーザの手間が増え、ユーザが上記操作を実行し損なった場合に誤認識を回避することができない。また、特許文献２および３の技術は、車両の走行状況を検出するものであり、誤認識または誤作動が起こり得る環境を検出できない。 However, in the above-described conventional technology, there is a problem that an appropriate process for avoiding erroneous recognition or malfunction cannot be performed because the above-described environment in which erroneous recognition or malfunction can occur cannot be detected. Specifically, in the technique of Patent Document 1, in order to switch the dialogue apparatus to the mode (not recognizing voice), an intentional operation (voice input) for instructing switching by the user is required. Furthermore, erroneous recognition cannot be avoided if the user's effort increases and the user fails to perform the above operation. In addition, the techniques of Patent Documents 2 and 3 detect the traveling state of the vehicle, and cannot detect an environment in which erroneous recognition or malfunction may occur.

本発明は、上記の問題点に鑑みてなされたものであり、その目的は、音声の誤認識またはそれに伴う誤作動が起こり得る環境下にあることを検出し、誤認識または誤作動を回避することが可能な対話システムを実現することにある。 The present invention has been made in view of the above-mentioned problems, and its object is to detect that there is an environment in which erroneous recognition of speech or a malfunction associated therewith can occur and to avoid erroneous recognition or malfunction. It is to realize a dialogue system that can do this.

上記の課題を解決するために、本発明の一態様に係る対話システムは、ユーザが発した音声に対し返答音声を出力する対話装置を制御する対話システムであって、ユーザが上記対話装置に対して発した認識対象音声ではない認識対象外音声が、上記対話装置によって認識対象音声として誤検知され得る場合に、上記対話装置の音声誤認識条件が成立すると判断する条件判断手段と、上記条件判断手段によって上記音声誤認識条件が成立すると判断された場合に、上記対話装置によって上記返答音声が出力されないように制御する発話制御手段とを含み、上記条件判断手段は、上記認識対象外音声を直接または間接的に発生させる対象外音声発生源機器の稼働状況を示す稼働情報に基づいて、上記音声誤認識条件の成否を判断する。 In order to solve the above problem, a dialog system according to an aspect of the present invention is a dialog system that controls a dialog device that outputs a response voice in response to a voice uttered by a user. A non-recognition voice that is not a recognition target voice, and a condition determination unit that determines that the voice recognition condition of the dialog device is satisfied when the dialog device can be erroneously detected as a recognition target voice; Utterance control means for controlling so that the reply voice is not output by the interactive device when it is judged by the means that the voice error recognition condition is satisfied, and the condition judgment means directly outputs the non-recognition target voice. Or the success or failure of the said audio | voice recognition condition is judged based on the operation information which shows the operation condition of the non-target audio | voice generation source apparatus to generate | occur | produce indirectly.

上記の課題を解決するために、本発明の一態様に係る発話制御装置は、ユーザが発した音声に対し返答音声を出力する対話装置を制御する発話制御装置であって、ユーザが上記対話装置に対して発した認識対象音声ではない認識対象外音声が、上記対話装置によって認識対象音声として誤検知され得る場合に、上記対話装置の音声誤認識条件が成立すると判断する条件判断手段と、上記条件判断手段によって上記音声誤認識条件が成立すると判断された場合に、上記対話装置によって上記返答音声が出力されないように制御する発話制御手段とを備え、上記条件判断手段は、上記認識対象外音声を直接または間接的に発生させる対象外音声発生源機器の稼働状況を示す稼働情報に基づいて、上記音声誤認識条件の成否を判断する。 In order to solve the above problem, an utterance control device according to an aspect of the present invention is an utterance control device that controls an interactive device that outputs a response voice in response to a voice uttered by a user. A non-recognition voice that is not a recognition target voice that is generated with respect to the above-mentioned condition determination means for determining that the voice recognition condition of the dialog apparatus is satisfied when the dialog apparatus can be erroneously detected as a recognition target voice; Utterance control means for controlling so that the reply voice is not output by the dialogue device when the condition judgment means judges that the voice error recognition condition is satisfied, and the condition judgment means includes the non-recognition voice Is determined based on the operation information indicating the operation status of the out-of-target audio generating device that directly or indirectly generates the above.

上記の課題を解決するために、本発明の一態様に係る対話装置は、ユーザが発した音声に対し返答音声を出力する対話装置であって、ユーザが上記対話装置に対して発した認識対象音声ではない認識対象外音声が、上記対話装置によって認識対象音声として誤検知され得る場合に、上記対話装置の音声誤認識条件が成立すると判断する条件判断手段と、上記条件判断手段によって上記音声誤認識条件が成立すると判断された場合に、上記返答音声の出力を抑制する対話制御手段とを備え、上記条件判断手段は、上記認識対象外音声を直接または間接的に発生させる対象外音声発生源機器の稼働状況を示す稼働情報に基づいて、上記音声誤認識条件の成否を判断する。 In order to solve the above-described problem, an interactive apparatus according to an aspect of the present invention is an interactive apparatus that outputs a response voice in response to a voice uttered by a user, and a recognition target that the user utters to the interactive apparatus. When a non-recognition voice that is not a voice can be erroneously detected as a recognition target voice by the dialogue apparatus, a condition judgment unit that judges that the voice recognition condition of the dialogue apparatus is satisfied, and the voice judgment error by the condition judgment unit A dialogue control means for suppressing the output of the reply voice when it is judged that a recognition condition is satisfied, and the condition judgment means generates the non-recognized voice directly or indirectly. The success or failure of the voice error recognition condition is determined based on the operation information indicating the operation status of the device.

上記の課題を解決するために、本発明の一態様に係る発話制御方法は、ユーザが発した音声に対し返答音声を出力する対話装置を制御する発話制御方法であって、ユーザが上記対話装置に対して発した認識対象音声ではない認識対象外音声が、上記対話装置によって認識対象音声として誤検知され得る場合に、上記対話装置の音声誤認識条件が成立すると判断する条件判断ステップと、上記条件判断ステップにて上記音声誤認識条件が成立すると判断された場合に、上記対話装置によって上記返答音声が出力されないように制御する発話制御ステップとを含み、上記条件判断ステップでは、上記認識対象外音声を直接または間接的に発生させる対象外音声発生源機器の稼働状況を示す稼働情報に基づいて、上記音声誤認識条件の成否を判断する。 In order to solve the above-described problem, an utterance control method according to an aspect of the present invention is an utterance control method for controlling an interactive apparatus that outputs a response voice in response to a voice uttered by a user. A condition determination step for determining that a speech error recognition condition of the interactive device is satisfied when a non-recognition speech that is not a recognition target speech that is issued to the device may be erroneously detected as a recognition target speech by the interactive device; An utterance control step for controlling so that the reply voice is not output by the dialogue device when it is determined in the condition determination step that the voice error recognition condition is satisfied. Based on the operation information indicating the operation status of the non-target audio source device that generates sound directly or indirectly, the success or failure of the above-mentioned sound misrecognition condition is judged. .

本発明の一態様によれば、対話装置の置かれた環境に応じて、適切に、音声出力を抑制することができるので、結果として、音声の誤認識またはそれに伴う誤作動を回避することができるという効果を奏する。 According to one aspect of the present invention, it is possible to appropriately suppress voice output in accordance with the environment in which the interactive device is placed, and as a result, it is possible to avoid voice misrecognition or associated malfunction. There is an effect that can be done.

本発明の実施形態１に係る対話システムにおける各装置の要部構成を示す機能ブロックである。It is a functional block which shows the principal part structure of each apparatus in the dialogue system which concerns on Embodiment 1 of this invention. 上記対話システムの概略を示す図である。It is a figure which shows the outline of the said dialogue system. （ａ）は、ＴＶから情報収集サーバに送信される情報の具体例を示す図であり、（ｂ）は、機器状態管理テーブルの具体例を示す図であり、（ｃ）は、情報収集サーバから発話制御サーバに送信される情報の具体例を示す図であり、（ｄ）は、機器装置対応テーブルの具体例を示す図であり、（ｅ）は、ルールテーブルの具体例を示す図である。(A) is a figure which shows the specific example of the information transmitted to information collection server from TV, (b) is a figure which shows the specific example of an apparatus state management table, (c) is an information collection server It is a figure which shows the specific example of the information transmitted to an utterance control server from FIG., (D) is a figure which shows the specific example of an apparatus apparatus corresponding table, (e) is a figure which shows the specific example of a rule table. is there. 上記対話システムにおいて、ＴＶが実行する操作情報送信処理の流れと、情報収集サーバが実行する稼働情報送信処理の流れとを示すフローチャートである。It is a flowchart which shows the flow of the operation information transmission process which TV performs in the said interactive system, and the flow of the operation information transmission process which an information collection server performs. 上記対話システムにおいて、発話制御サーバが実行する発話制御処理の流れと、対話装置が実行する対話モード制御処理の流れとを示すフローチャートである。It is a flowchart which shows the flow of the speech control process which an speech control server performs in the said dialog system, and the flow of the dialog mode control process which a dialog apparatus performs. 本発明の実施形態２（または実施形態４）に係る対話システムの概略を示す図である。It is a figure which shows the outline of the dialogue system which concerns on Embodiment 2 (or Embodiment 4) of this invention. 上記対話システムにおける各装置の要部構成を示す機能ブロックである。It is a functional block which shows the principal part structure of each apparatus in the said interactive system. （ａ）は、対話装置の記憶部に格納されている装置ＩＤの具体例を示す図であり、（ｂ）は、対話装置が生成する操作情報の具体例を示す図であり、（ｃ）は、機器状態管理テーブルの具体例を示す図であり、（ｄ）は、対話装置から発話制御サーバに送信される情報の具体例を示す図である。(A) is a figure which shows the specific example of apparatus ID stored in the memory | storage part of a dialog apparatus, (b) is a figure which shows the specific example of the operation information which a dialog apparatus produces | generates, (c) These are figures which show the specific example of an apparatus state management table, (d) is a figure which shows the specific example of the information transmitted to a speech control server from a dialogue apparatus. 上記対話システムにおいて、対話装置が実行する稼働情報送信処理の流れと、発話制御サーバが実行する発話制御処理の流れとを示すフローチャートである。It is a flowchart which shows the flow of the operation information transmission process which a dialog apparatus performs in the said dialog system, and the flow of the utterance control process which an utterance control server performs. 本発明の実施形態３に係る対話システムの概略を示す図である。It is a figure which shows the outline of the dialogue system which concerns on Embodiment 3 of this invention. 上記対話システムにおける各装置の要部構成を示す機能ブロックである。It is a functional block which shows the principal part structure of each apparatus in the said interactive system. （ａ）は、対話装置の記憶部に格納されている装置ＩＤの具体例を示す図であり、（ｂ）は、呼出音テーブルの具体例を示す図であり、（ｃ）は、機器状態管理テーブルの具体例を示す図であり、（ｄ）は、対話装置から発話制御サーバに送信される情報の具体例を示す図であり、（ｅ）は、ルールテーブルの具体例を示す図である。(A) is a figure which shows the specific example of apparatus ID stored in the memory | storage part of a dialogue apparatus, (b) is a figure which shows the specific example of a ringing tone table, (c) is an apparatus state. It is a figure which shows the specific example of a management table, (d) is a figure which shows the specific example of the information transmitted to a speech control server from a dialogue apparatus, (e) is a figure which shows the specific example of a rule table. is there. 上記対話システムにおいて、対話装置が実行する稼働情報送信処理の流れと、発話制御サーバが実行する発話制御処理の流れとを示すフローチャートである。It is a flowchart which shows the flow of the operation information transmission process which a dialog apparatus performs in the said dialog system, and the flow of the utterance control process which an utterance control server performs. （ａ）は、対話装置が生成する操作情報の具体例を示す図であり、（ｂ）は、機器状態管理テーブルの具体例を示す図であり、（ｃ）は、対話装置から発話制御サーバに送信される情報の具体例を示す図であり、（ｄ）は、ルールテーブルの具体例を示す図である。(A) is a figure which shows the specific example of the operation information which an interactive apparatus produces | generates, (b) is a figure which shows the specific example of an apparatus state management table, (c) is an utterance control server from an interactive apparatus. FIG. 4D is a diagram illustrating a specific example of information transmitted to (1), and FIG. 4D is a diagram illustrating a specific example of a rule table. 変形例２における対話システムの各装置の処理の流れを示すフローチャートである。10 is a flowchart showing a flow of processing of each device of the interactive system in Modification 2. 変形例２における対話システムの各装置の処理の流れを示すフローチャートである。10 is a flowchart showing a flow of processing of each device of the interactive system in Modification 2. 本発明の対話システムに含まれる各装置（機器、サーバ）として利用可能なコンピュータの構成を例示したブロック図である。It is the block diagram which illustrated the composition of the computer which can be used as each device (apparatus, server) included in the dialog system of the present invention.

≪実施形態１≫
本発明の実施形態１について、図１〜図５に基づいて説明すれば、以下のとおりである。実施形態１では、一例として、図２に示す対話システムに、本発明の対話システムを適用した場合について説明する。図２に示す対話システムは、例示にすぎず、本発明の対話
システムを限定するものではない。 Embodiment 1
Embodiment 1 of the present invention will be described below with reference to FIGS. In the first embodiment, as an example, a case where the dialogue system of the present invention is applied to the dialogue system shown in FIG. 2 will be described. The dialogue system shown in FIG. 2 is merely an example, and does not limit the dialogue system of the present invention.

〔対話システム概要〕
図２は、本発明の実施形態１に係る対話システムの概略を示す図である。図２に示す対話システム４０１は、発話制御サーバ１と、対話装置２と、デジタルテレビ（以下、ＴＶ）３と、情報収集サーバ４とを含む。 [Outline of Dialogue System]
FIG. 2 is a diagram showing an outline of the dialogue system according to Embodiment 1 of the present invention. A dialogue system 401 shown in FIG. 2 includes an utterance control server 1, a dialogue device 2, a digital television (hereinafter referred to as TV) 3, and an information collection server 4.

対話装置２は、ユーザの発話を音声として認識し、返答音声を出力することによってユーザと対話する。対話装置２は、例えば、自走式掃除機であるが、上記の対話の機能および発話制御サーバ１と通信する機能を有していれば、どのような情報処理装置であってもよく、例えば、人間型ロボット、パソコン、タブレット端末、スマートフォンなどでもよい。 The dialogue device 2 recognizes the user's utterance as a voice and interacts with the user by outputting a reply voice. The dialogue device 2 is, for example, a self-propelled cleaner, but any information processing device may be used as long as it has the above dialogue function and the function to communicate with the utterance control server 1. It may be a humanoid robot, a personal computer, a tablet terminal, a smartphone, or the like.

発話制御サーバ（発話制御装置）１は、対話装置２に対して発話に係る指示を送信することにより対話装置２の音声出力を制御する。 The utterance control server (speech control device) 1 controls the voice output of the dialog device 2 by transmitting an instruction related to the utterance to the dialog device 2.

発話に係る指示には、例えば、返答音声を出力しない対話モードに切り替わるように対話装置２に対して指示する「出力抑制指示」と、返答音声を出力する対話モードに戻るように対話装置２に対して指示する「抑制解除指示」とを含む。 For the instruction related to the utterance, for example, an “output suppression instruction” for instructing the dialogue apparatus 2 to switch to the dialogue mode that does not output the response voice, and the dialogue apparatus 2 to return to the dialogue mode that outputs the answer voice. "Suppression cancellation instruction" to be instructed.

発話制御サーバ１は、情報収集サーバ４から送信されたＴＶ３の稼働情報に基づいて、対話装置２が置かれている環境を判断し、判断した環境に応じて、出力抑制指示または抑制解除指示を対話装置２に送信する。 The utterance control server 1 determines the environment in which the interactive device 2 is placed based on the operation information of the TV 3 transmitted from the information collection server 4, and issues an output suppression instruction or suppression cancellation instruction according to the determined environment. It transmits to the dialogue apparatus 2.

なお、ユーザのどのような発話に対してどのような返答音声を出力するのかを決定したり、返答音声を出力するタイミングを決定したりするための機能は、公知のものが採用されればよく、当該機能は、発話制御サーバ１および対話装置２の少なくとも１つに設けられている。 It should be noted that a publicly known function may be adopted as a function for determining what kind of response voice is output for what kind of user's utterance and determining the timing for outputting the answer voice. The function is provided in at least one of the utterance control server 1 and the dialogue apparatus 2.

さらに、実施形態１では、対話装置２は、発話制御サーバ１の指示にしたがって、ユーザの発話以外の所定のイベントをトリガとして自発的に音声を出力してもよい。このように出力される音声を、ユーザの発話を受けて出力される返答音声と区別する場合に、自発音声と称する。すなわち、発話に係る指示として、発話制御サーバ１は、自発音声の出力を指示する「自発音声出力指示」を対話装置２に送信してもよい。 Further, in the first embodiment, the dialogue apparatus 2 may spontaneously output a sound in response to an instruction from the utterance control server 1 by using a predetermined event other than the user's utterance as a trigger. When the sound output in this way is distinguished from the response sound output in response to the user's utterance, it is referred to as a spontaneous sound. In other words, the utterance control server 1 may transmit a “spontaneous speech output instruction” that instructs the output of the spontaneous speech to the dialogue apparatus 2 as an instruction related to the utterance.

例えば、対話装置２が自走式掃除機である場合に、発話制御サーバ１は、掃除の完了、あるいは、自走式掃除機の異常など、所定のイベントをユーザに報告するときに、自発音声を出力するように対話装置２に対して指示を送る。あるいは、発話制御サーバ１は、対話装置２が設置されている建物内の他の機器と連携して、それらの機器の異常を検知したときにその旨を報告するように、対話装置２を制御する。あるいは、発話制御サーバ１は、外部の情報提供サーバ（図示せず）と連携して、大雨・地震などの災害警報を受信したときにその旨を報告するように、対話装置２を制御する。 For example, when the interactive device 2 is a self-propelled cleaner, the utterance control server 1 reports a spontaneous sound when reporting a predetermined event such as completion of cleaning or abnormality of the self-propelled cleaner to the user. Is sent to the dialogue apparatus 2 so as to output. Alternatively, the utterance control server 1 controls the interactive device 2 so as to report that when it detects an abnormality of the devices in cooperation with other devices in the building where the interactive device 2 is installed. To do. Alternatively, the utterance control server 1 controls the dialogue apparatus 2 so as to report that when a disaster alarm such as heavy rain / earthquake is received in cooperation with an external information providing server (not shown).

ＴＶ（対象外音声発生源機器／音声出力機器）３は、誤認識が起こり得る環境を対話装置２にもたらす音声出力機器の一例として挙げられている。具体的には、ＴＶ３は、対話装置２との対話の目的で発せられたのではない音声（認識対象外音声）を出力する。 The TV (non-target audio generating source device / audio output device) 3 is cited as an example of an audio output device that brings an environment in which misrecognition may occur to the dialogue apparatus 2. Specifically, the TV 3 outputs a voice (non-recognition target voice) that is not emitted for the purpose of dialogue with the dialogue apparatus 2.

情報収集サーバ（情報収集装置）４は、ＴＶ３の情報を収集してＴＶ３の稼働状況を管理し、発話制御サーバ１に通知する。 The information collection server (information collection device) 4 collects information on the TV 3, manages the operating status of the TV 3, and notifies the utterance control server 1.

なお、図２には、１台のＴＶ３と１台の対話装置２との対のみ記載しているが、発話制御サーバ１および情報収集サーバ４は、この対話システム４０１のサービスを利用しているユーザごとに、ＴＶ３（他の音声出力機器でもよい）と対話装置２との対を複数管理している。したがって、発話制御サーバ１および情報収集サーバ４は、ＴＶ３の識別情報（以下、機器ＩＤ）を用いて、ＴＶ３を一意に特定することができ、対話装置２の識別情報（以下、装置ＩＤ）を用いて、対話装置２を一意に特定することができる。なお、１台の対話装置２に対して、複数の音声出力機器（他のＴＶ３、録画再生装置、音楽再生装置など）が関連付けて管理されていてもよい。 FIG. 2 shows only a pair of one TV 3 and one dialogue device 2, but the utterance control server 1 and the information collection server 4 use the service of this dialogue system 401. For each user, a plurality of pairs of TVs 3 (may be other audio output devices) and interactive devices 2 are managed. Therefore, the utterance control server 1 and the information collection server 4 can uniquely identify the TV 3 using the identification information (hereinafter referred to as device ID) of the TV 3, and the identification information (hereinafter referred to as device ID) of the interactive device 2. By using it, the interactive device 2 can be uniquely specified. A plurality of audio output devices (other TVs 3, recording / playback apparatuses, music playback apparatuses, etc.) may be managed in association with one dialog apparatus 2.

実施形態１では、ユーザがリモコンを用いてＴＶ３を操作すると、リモコンの制御信号を受け付けたＴＶ３が、制御信号の内容を示す操作情報を、自機の機器ＩＤとともに情報収集サーバ４に送信する。情報収集サーバ４は、受信した操作情報に基づいて、ＴＶ３の稼働状況を判断し、ＴＶ３の稼働状況を示す稼働情報をＴＶ３の機器ＩＤとともに送信する。 In the first embodiment, when the user operates the TV 3 using the remote control, the TV 3 that has received the control signal of the remote control transmits operation information indicating the content of the control signal to the information collecting server 4 together with the device ID of the own device. The information collection server 4 determines the operating status of the TV 3 based on the received operation information, and transmits operating information indicating the operating status of the TV 3 together with the device ID of the TV 3.

上述したとおり、発話制御サーバ１は、稼働情報に基づいて、ＴＶ３と対になる対話装置２が置かれている環境を判断し、環境に適した発話に係る指示を対話装置２に対して送信する。こうして、対話装置２は、発話制御サーバ１の指示にしたがうことにより、環境に応じて、適切に、音声出力を抑制することができるので、結果として、音声の誤認識またはそれに伴う誤作動を回避することができる。 As described above, the utterance control server 1 determines the environment in which the dialogue device 2 paired with the TV 3 is placed based on the operation information, and transmits an instruction related to the utterance suitable for the environment to the dialogue device 2. To do. In this way, the dialogue apparatus 2 can appropriately suppress the voice output according to the environment by following the instruction of the utterance control server 1, and as a result, the voice recognition error or the accompanying malfunction is avoided. can do.

誤認識または誤作動の回避を実現するための対話システム４０１の各装置の構成について、以下に詳細に説明する。 The configuration of each device of the interactive system 401 for realizing erroneous recognition or avoiding malfunction will be described in detail below.

〔対話システムの各装置の機能構成〕
図１は、本発明の実施形態１に係る対話システムにおける各装置の要部構成を示す機能ブロックである。 [Functional configuration of each device in the interactive system]
FIG. 1 is a functional block showing the main configuration of each device in the interactive system according to the first embodiment of the present invention.

（発話制御サーバ１の要部構成）
図１に示すとおり、発話制御サーバ１は、発話制御サーバ１を統括して制御する制御部１０と、制御部１０が使用する各種データを記憶する記憶部１１とを備えている。なお、発話制御サーバ１は、対話装置２の制御やインターネット上の情報にアクセスするための通信部、および発話制御サーバ１にデータを入力するための入力部などのブロックを備えているが、これらのブロックについて図示を省略している。 (Main components of the utterance control server 1)
As shown in FIG. 1, the utterance control server 1 includes a control unit 10 that controls the utterance control server 1 and a storage unit 11 that stores various data used by the control unit 10. The utterance control server 1 includes blocks such as a communication unit for controlling the interactive device 2 and accessing information on the Internet, and an input unit for inputting data to the utterance control server 1. The illustration of this block is omitted.

制御部１０は、機能ブロックとして、稼働情報受信部５４、制御対象特定部５５、条件判断部５６および発話制御部５７を含む。 The control unit 10 includes an operation information receiving unit 54, a control target specifying unit 55, a condition determining unit 56, and an utterance control unit 57 as functional blocks.

記憶部１１には、機器装置対応テーブル７１およびルールテーブル７４が格納されている。 The storage unit 11 stores a device device correspondence table 71 and a rule table 74.

稼働情報受信部５４は、上記通信部を介して情報収集サーバ４から送信された稼働情報を受信するものである。 The operation information receiving unit 54 receives operation information transmitted from the information collecting server 4 via the communication unit.

稼働情報は、情報収集サーバ４が管理する音声出力機器（本実施形態ではＴＶ３）の稼働状況を示す情報である。稼働情報は、ＴＶ３の稼働状況が変化したとき、あるいは、定期的に、情報収集サーバ４から発話制御サーバ１に送信される。対話システム４０１では、複数のＴＶ３が管理される。この場合、稼働情報は、ＴＶ３を一意に特定するための機器ＩＤと対応付けて稼働情報受信部５４に送信される。稼働情報受信部５４は、受信した機器ＩＤを制御対象特定部５５に、受信した稼働情報を条件判断部５６にそれぞれ供給す
る。なお、稼働情報の具体例は、図３の（ｃ）を参照して後に詳述する。 The operation information is information indicating the operation status of the audio output device (TV 3 in the present embodiment) managed by the information collection server 4. The operation information is transmitted from the information collection server 4 to the utterance control server 1 when the operation status of the TV 3 changes or periodically. In the dialogue system 401, a plurality of TVs 3 are managed. In this case, the operation information is transmitted to the operation information receiving unit 54 in association with a device ID for uniquely identifying the TV 3. The operation information receiving unit 54 supplies the received device ID to the control target specifying unit 55 and the received operation information to the condition determining unit 56. A specific example of the operation information will be described later in detail with reference to FIG.

制御対象特定部５５は、稼働情報受信部５４から供給された機器ＩＤに基づいて、制御対象となる対話装置２を特定するものである。具体的には、機器ＩＤは、供給された稼働情報がどのＴＶ３のものか特定しているので、制御対象特定部５５は、この機器ＩＤに基づいて、ＴＶ３に関連付けられている対話装置２を、機器装置対応テーブル７１を参照することにより特定することができる。このようにして特定された対話装置２は、ＴＶ３の稼働情報に応じて発話を制御するべき対象として発話制御部５７によって認識される。なお、機器装置対応テーブル７１の具体例は、図３の（ｄ）を参照して後に詳述する。 The control target specifying unit 55 specifies the interactive device 2 to be controlled based on the device ID supplied from the operation information receiving unit 54. Specifically, since the device ID specifies which TV 3 the supplied operation information belongs to, the control target specifying unit 55 selects the interactive device 2 associated with the TV 3 based on the device ID. It can be specified by referring to the device device correspondence table 71. The dialog device 2 specified in this way is recognized by the utterance control unit 57 as an object whose utterance should be controlled according to the operation information of the TV 3. A specific example of the device device correspondence table 71 will be described in detail later with reference to FIG.

条件判断部５６は、稼働情報受信部５４から供給されたＴＶ３の稼働情報に基づいて、ＴＶ３と対になる対話装置２が置かれている環境において、音声の誤認識が起こり得る条件（以下、音声誤認識条件）が成立するか否かを判断するものである。実施形態１では、稼働情報において、音声出力機器がユーザの発話と誤認識する可能性のある音声を出力する（あるいは、出力すると予測される）稼働状況にある旨が示されている場合に、条件判断部５６は、音声誤認識条件が成立すると判断する。例えば、稼働情報が示す内容ごとに、音声誤認識条件の成否が対応付けられているルールテーブル７４を参照してもよい。なお、ルールテーブル７４の具体例は、図３の（ｅ）を参照して後に詳述する。 Based on the operation information of the TV 3 supplied from the operation information receiving unit 54, the condition determination unit 56 is a condition (hereinafter, referred to as “a voice recognition error”) in an environment where the interactive device 2 that is paired with the TV 3 is placed. It is determined whether or not (voice misrecognition condition) is satisfied. In the first embodiment, when the operation information indicates that the audio output device is in an operation state of outputting (or predicted to output) an audio that may be erroneously recognized as a user's utterance, The condition determining unit 56 determines that the voice recognition condition is satisfied. For example, you may refer to the rule table 74 with which the success or failure of voice recognition conditions is matched for every content which operation information shows. A specific example of the rule table 74 will be described in detail later with reference to FIG.

発話制御部５７は、制御対象特定部５５によって特定された対話装置２に対して、発話に係る指示を、条件判断部５６によって判断された音声誤認識条件の成否に応じて送信するものである。具体的には、発話制御部５７は、音声誤認識条件が成立すると判断された場合に、返答音声を出力させないように対話装置２を制御するための出力抑制指示を対話装置２に送信する。一方、音声誤認識条件が成立しないと判断された場合に、返答音声を出力するように対話装置２を制御するための抑制解除指示を対話装置２に送信する。 The utterance control unit 57 transmits an instruction related to the utterance to the interactive device 2 specified by the control target specifying unit 55 according to the success or failure of the voice error recognition condition determined by the condition determination unit 56. . Specifically, the utterance control unit 57 transmits an output suppression instruction for controlling the dialogue apparatus 2 so as not to output a reply voice to the dialogue apparatus 2 when it is determined that the voice recognition condition is satisfied. On the other hand, when it is determined that the voice recognition condition is not satisfied, a suppression release instruction for controlling the dialog device 2 to output a response voice is transmitted to the dialog device 2.

なお、稼働情報が定期的に受信される場合、最新の稼働情報に基づいて判断された音声誤認識条件の成否が、直前に受信された稼働情報に基づいて判断された音声誤認識条件の成否から変化しないことが想定される。このような場合、発話制御部５７は、同じ指示を連続で送信することになるので、当該送信を取りやめる構成であってもよい。 In addition, when operation information is periodically received, the success or failure of the speech error recognition condition determined based on the latest operation information is the success or failure of the speech error recognition condition determined based on the operation information received immediately before. It is assumed that there will be no change. In such a case, since the utterance control unit 57 continuously transmits the same instruction, the utterance control unit 57 may be configured to cancel the transmission.

また、発話制御部５７は、自装置内の各部を制御することにより（すなわち、対話装置２に対して発話に係る指示を送ることなく）、最終的に対話装置２によって返答音声が出力されないように制御する構成であってもよい。発話制御部５７の当該構成については、変形例３において詳細に説明する。 Further, the utterance control unit 57 controls each part in the own device (that is, without sending an instruction related to the utterance to the dialogue device 2), so that the answering sound is not finally output by the dialogue device 2. It may be configured to be controlled. The configuration of the utterance control unit 57 will be described in detail in Modification 3.

（対話装置２の要部構成）
図１に示すとおり、対話装置２は、対話装置２を統括して制御する制御部２０、制御部２０が使用する各種データを記憶する記憶部２１、音声の入力を受け付ける音声入力部２２、および音声を出力する音声出力部２３を備えている。なお、対話装置２は、発話制御サーバ１からの発話に係る指示を受信するための通信部、対話を実現するために音声認識結果を処理する処理部、および自走式掃除機としての機能を実現するための各部を備えているが、これらのブロックについて図示を省略している。 (Main part configuration of the dialogue device 2)
As shown in FIG. 1, the dialogue apparatus 2 includes a control unit 20 that controls the dialogue apparatus 2 in an integrated manner, a storage unit 21 that stores various data used by the control unit 20, a voice input unit 22 that receives voice input, and An audio output unit 23 that outputs audio is provided. The dialog device 2 has functions as a communication unit for receiving an instruction related to an utterance from the utterance control server 1, a processing unit for processing a speech recognition result to realize a dialog, and a self-propelled cleaner. Although each part for implement | achieving is provided, illustration is abbreviate | omitted about these blocks.

制御部２０は、機能ブロックとして、対話モード制御部（対話制御手段）５８、音声認識部（音声認識手段）５９および音声出力制御部（音声出力制御手段）６０を含む。 The control unit 20 includes a dialogue mode control unit (conversation control unit) 58, a voice recognition unit (voice recognition unit) 59, and a voice output control unit (voice output control unit) 60 as functional blocks.

音声認識部５９は、音声入力部２２を介して入力された音声のデジタル信号を解析して、音声に含まれる言葉をテキスト形式に変換するものである。変換されたテキストデータは、適した返答音声を生成するための図示しない下流の各処理部によって取り扱われる。
音声認識部５９としては、公知の音声認識技術が適宜採用されればよい。 The voice recognition unit 59 analyzes the digital signal of the voice input via the voice input unit 22 and converts words included in the voice into a text format. The converted text data is handled by each downstream processing unit (not shown) for generating a suitable response voice.
As the voice recognition unit 59, a known voice recognition technique may be adopted as appropriate.

音声出力制御部６０は、発話制御サーバ１または対話装置２において生成された音声データ（返答音声または自発音声）を音声出力部２３に供給して、ユーザが聴取可能な音声として出力するように、音声出力部２３を制御するものである。 The voice output control unit 60 supplies the voice data (response voice or spontaneous voice) generated in the utterance control server 1 or the dialogue apparatus 2 to the voice output unit 23 and outputs the voice data as voice that can be heard by the user. The audio output unit 23 is controlled.

対話モード制御部５８は、発話制御サーバ１の発話に係る指示にしたがって、音声認識部５９および音声出力制御部６０の動作を制御するものである。具体的には、発話制御サーバ１から出力抑制指示を受信した場合には、対話モード制御部５８は、対話装置２が返答音声を出力しない対話モードになるように、音声認識部５９および音声出力制御部６０の少なくとも一方の動作を抑制する。反対に、発話制御サーバ１から抑制解除指示を受信した場合には、対話モード制御部５８は、対話装置２が返答音声を出力する対話モードに戻るように、音声認識部５９および音声出力制御部６０対する動作の抑制を解除する。なお対話モード制御部５８は、返答音声を出力しない対話モード時であっても、発話制御サーバ１から自発音声出力指示を受信した場合には、自発音声を出力するように音声出力制御部６０を制御してもよい。 The dialogue mode control unit 58 controls the operations of the voice recognition unit 59 and the voice output control unit 60 in accordance with an instruction related to the utterance of the utterance control server 1. Specifically, when an output suppression instruction is received from the utterance control server 1, the dialogue mode control unit 58 causes the voice recognition unit 59 and the voice output so that the dialogue device 2 enters a dialogue mode in which no response voice is output. At least one operation of the control unit 60 is suppressed. On the other hand, when the suppression release instruction is received from the utterance control server 1, the dialogue mode control unit 58 returns the voice recognition unit 59 and the voice output control unit so that the dialogue apparatus 2 returns to the dialogue mode in which a response voice is output. The suppression of the operation for 60 is released. It should be noted that the dialog mode control unit 58 sets the voice output control unit 60 to output the spontaneous voice when receiving the spontaneous voice output instruction from the utterance control server 1 even in the interactive mode in which no reply voice is output. You may control.

以下では、返答音声を出力しない対話モードに切り替える構成として、対話モード制御部５８が、音声認識部５９の音声認識機能を無効にする構成を採用する。「音声認識機能を無効にする」とは、音声入力部２２が何らかの音声を検出し、その音声データを制御部２０に入力したとしても、該音声データを処理してテキストデータを生成することを音声認識部５９に実行させないということを意味する。これにより、対話装置２の周囲で何らかの音声が発生したとしても、対話装置２がそれに対して返答音声を出力しないようにできる。しかし、対話モード制御部５８の構成は上記に限定されない。例えば、対話モード制御部５８は、返答音声を音声出力部２３に出力しないように音声出力制御部６０の動作を抑制してもよい。これにより、何らかの音声が入力され、音声認識部５９によって音声認識され、認識結果に基づいて返答音声が生成されたとしても、対話装置２が該返答音声を出力しないようにできる。あるいは、対話モード制御部５８は、図示しない処理部が音声認識結果のテキストデータを処理しないように当該処理部の動作を抑制してもよい。これにより、何らかの音声が入力され、音声認識部５９によって音声認識され、テキストデータが生成されたとしても、それに対応する返答音声が生成されないので、対話装置２が上記音声に対して返答音声を出力しないようにできる。 In the following, a configuration in which the dialogue mode control unit 58 disables the voice recognition function of the voice recognition unit 59 is adopted as a configuration for switching to the dialogue mode in which no reply voice is output. “Disabling the voice recognition function” means that even if the voice input unit 22 detects some voice and inputs the voice data to the control unit 20, the voice data is processed to generate text data. This means that the voice recognition unit 59 is not executed. As a result, even if some kind of sound is generated around the interactive device 2, the interactive device 2 can be prevented from outputting a response sound. However, the configuration of the interactive mode control unit 58 is not limited to the above. For example, the dialogue mode control unit 58 may suppress the operation of the voice output control unit 60 so that the reply voice is not output to the voice output unit 23. As a result, even if some kind of voice is input, voice is recognized by the voice recognition unit 59, and a reply voice is generated based on the recognition result, the dialogue apparatus 2 can be prevented from outputting the reply voice. Alternatively, the dialogue mode control unit 58 may suppress the operation of the processing unit so that the processing unit (not shown) does not process the text data of the speech recognition result. As a result, even if some voice is input, voice recognition is performed by the voice recognition unit 59, and text data is generated, the corresponding response voice is not generated. Therefore, the dialogue apparatus 2 outputs a response voice to the voice. You can avoid it.

（音声出力機器の要部構成）
図１に示すとおり、ＴＶ（音声出力機器）３は、ＴＶ３を統括して制御する制御部３０、制御部３０が使用する各種データを記憶する記憶部３１、ユーザがＴＶ３を操作するためのリモコンとして機能する操作部３２を備えている。なお、ＴＶ３は、情報収集サーバ４に対して各種情報を送信するための通信部、およびデジタルテレビとしての機能を実現するための各部を備えているが、これらのブロックについて図示を省略している。 (Main components of audio output equipment)
As shown in FIG. 1, a TV (audio output device) 3 includes a control unit 30 that controls the TV 3 in an integrated manner, a storage unit 31 that stores various data used by the control unit 30, and a remote control for a user to operate the TV 3. The operation unit 32 that functions as: The TV 3 includes a communication unit for transmitting various types of information to the information collection server 4 and each unit for realizing a function as a digital television. However, these blocks are not illustrated. .

制御部３０は、機能ブロックとして、操作情報送信部（操作情報送信手段）５０を含む。 The control unit 30 includes an operation information transmission unit (operation information transmission unit) 50 as a functional block.

操作情報送信部５０は、操作部３２から送出される制御信号を受け付けて、当該制御信号の内容を示す操作情報を情報収集サーバ４に送信するものである。具体的には、リモコンとしての操作部３２には、電源オンオフボタン、数字ボタン、十字（上、下、左、右）ボタン、決定ボタン、戻るボタン、データ放送表示ボタン、４色（青、赤、緑、黄）ボタンなどが設けられている。操作情報送信部５０は、ユーザによって押下されたボタンの情報を操作情報として情報収集サーバ４に送信する。このとき、操作情報送信部５０は、記憶部３１に記憶されている自機の機器ＩＤを上記操作情報に対応付けて情報収集サーバ４
に送信する。なお、操作情報の具体例は、図３の（ａ）を参照して後に詳述する。 The operation information transmission unit 50 receives a control signal sent from the operation unit 32 and transmits operation information indicating the content of the control signal to the information collection server 4. Specifically, the operation unit 32 as a remote controller includes a power on / off button, a number button, a cross (up, down, left, right) button, a determination button, a return button, a data broadcast display button, and four colors (blue, red). , Green, yellow) buttons etc. are provided. The operation information transmission unit 50 transmits information on the button pressed by the user to the information collection server 4 as operation information. At this time, the operation information transmission unit 50 associates the device ID of the own device stored in the storage unit 31 with the operation information, and the information collection server 4
Send to. A specific example of the operation information will be described in detail later with reference to FIG.

あるいは、操作情報送信部５０は、ボタン操作が起こる度に上記操作情報を送信するのではなく、ＴＶ３の状態変化が起こる度に上記操作情報を送信する構成であってもよい。詳細には、操作部３２のボタンが押下されたことにしたがって、ＴＶ３が何らかの動作を行ったことに伴い、ＴＶ３の状態が変化した場合のみ、操作情報送信部５０は、押下された上記ボタンの情報を操作情報として情報収集サーバ４に送信する。例えば、操作情報送信部５０は、ボタンの押下によって、ＴＶ３の電源オン、オフの状態が切り替わった時、視聴チャンネルが切り替わった時、外部入力に切り替えられた時などに、当該ボタンの操作情報を情報収集サーバ４に送信する。これにより、ボタンが押下されてもＴＶ３の状態が変化しない場合には操作情報の送信を省略することができる。 Alternatively, the operation information transmitting unit 50 may be configured to transmit the operation information every time a state change of the TV 3 occurs, instead of transmitting the operation information every time a button operation occurs. More specifically, the operation information transmitting unit 50 determines that the pressed button is only in the case where the state of the TV 3 changes as the TV 3 performs some operation according to the button of the operation unit 32 being pressed. Information is transmitted to the information collection server 4 as operation information. For example, the operation information transmission unit 50 displays the operation information of the button when the power of the TV 3 is switched on / off by pressing the button, when the viewing channel is switched, or when the input is switched to the external input. It transmits to the information collection server 4. Thereby, transmission of the operation information can be omitted when the state of the TV 3 does not change even when the button is pressed.

（情報収集サーバ４の要部構成）
図１に示すとおり、情報収集サーバ４は、情報収集サーバ４を統括して制御する制御部４０、および制御部４０が使用する各種データを記憶する記憶部４１を備えている。なお、情報収集サーバ４は、ＴＶ３から送信される情報を受信したり、発話制御サーバ１に情報を送信したりするための通信部、および情報収集サーバ４にデータを入力するための入力部などのブロックを備えているが、これらのブロックについて図示を省略している。 (Main components of the information collection server 4)
As shown in FIG. 1, the information collection server 4 includes a control unit 40 that controls the information collection server 4 in an integrated manner, and a storage unit 41 that stores various data used by the control unit 40. The information collection server 4 receives information transmitted from the TV 3 or transmits information to the utterance control server 1 and an input unit for inputting data to the information collection server 4. Although these blocks are provided, illustration of these blocks is omitted.

制御部４０は、機能ブロックとして、操作情報受信部（操作情報受信手段）５１、稼働情報生成部（稼働情報設定手段）５２、および稼働情報送信部（稼働情報送信手段）５３を含む。 The control unit 40 includes an operation information reception unit (operation information reception unit) 51, an operation information generation unit (operation information setting unit) 52, and an operation information transmission unit (operation information transmission unit) 53 as functional blocks.

記憶部４１には、機器状態管理テーブル７０が格納されている。 The storage unit 41 stores a device state management table 70.

操作情報受信部５１は、ＴＶ３から、ＴＶ３の機器ＩＤとともに操作情報を受信する。そして、機器状態管理テーブル７０において受信した機器ＩＤに対応付けて格納されている機器の状態を、受信された操作情報に基づいて更新する。操作情報受信部５１は、例えば、記憶部４１に格納されている図示しない状態遷移情報を参照する。状態遷移情報は、ＴＶ３の直前の状態と、イベント（ボタン押下）と、遷移後の状態とが対応付けられた情報である。これにより、操作情報受信部５１は、受信した操作情報に基づいて、ＴＶ３がどの状態の場合に、どのボタンが押下されると、どの状態に遷移するのかを決定することができる。 The operation information receiving unit 51 receives operation information from the TV 3 together with the device ID of the TV 3. Then, the device state stored in association with the device ID received in the device state management table 70 is updated based on the received operation information. For example, the operation information receiving unit 51 refers to state transition information (not shown) stored in the storage unit 41. The state transition information is information in which the state immediately before the TV 3, the event (button press), and the state after the transition are associated with each other. Thereby, the operation information receiving part 51 can determine which state is changed when which button is pressed when the TV 3 is in which state based on the received operation information.

なお、操作情報受信部５１は、操作情報に代えて最新の機器の状態をＴＶ３から受信する構成であってもよい。この場合、上述の、操作情報受信部５１における、操作情報に基づいて最新の機器の状態を決定する機能は、ＴＶ３に設けられる。そして、操作情報受信部５１は、機器状態管理テーブル７０に格納されている機器の状態を、受信した最新の機器の状態に更新するだけでよく、操作情報受信部５１の構成を簡素化できる。 The operation information receiving unit 51 may be configured to receive the latest device state from the TV 3 instead of the operation information. In this case, the above-described function of determining the latest device state based on the operation information in the operation information receiving unit 51 is provided in the TV 3. Then, the operation information receiving unit 51 only needs to update the device state stored in the device state management table 70 to the latest received device state, and the configuration of the operation information receiving unit 51 can be simplified.

稼働情報生成部５２は、機器状態管理テーブル７０においてＴＶ３ごとに管理されているＴＶ３の状態に基づいて、稼働情報を生成したり、更新したりするものである。操作情報受信部５１によって、機器状態管理テーブル７０に格納されているＴＶ３の状態が更新された場合には、稼働情報生成部５２は、その最新の状態に基づいて、当該ＴＶ３の稼働情報を更新する。稼働情報は、ＴＶ３が稼働しているのか否か、また、稼働している場合にどのような動作を実行しているのかを示す情報である。稼働情報は、発話制御サーバ１によって、ＴＶ３などの音声出力機器が、音声誤認識条件を満たす環境をもたらしているか否かを判断するのに利用される。なお、機器状態管理テーブル７０の具体例は、図３の（ｂ）を参照して後に詳述する。 The operation information generation unit 52 generates or updates operation information based on the state of the TV 3 managed for each TV 3 in the device state management table 70. When the operation information reception unit 51 updates the state of the TV 3 stored in the device state management table 70, the operation information generation unit 52 updates the operation information of the TV 3 based on the latest state. To do. The operation information is information indicating whether the TV 3 is operating and what operation is being performed when the TV 3 is operating. The operation information is used by the utterance control server 1 to determine whether or not an audio output device such as the TV 3 provides an environment that satisfies the audio error recognition condition. A specific example of the device state management table 70 will be described later in detail with reference to FIG.

稼働情報送信部５３は、稼働情報生成部５２によって生成された稼働情報を送信するものである。具体的には、稼働情報送信部５３は、機器状態管理テーブル７０において、稼働情報が更新されたとき、あるいは、定期的に、稼働情報を発話制御サーバ１に送信する。このとき、発話制御サーバ１がどのＴＶ３の稼働情報かを特定できるよう、機器ＩＤを上記稼働情報に対応付けて送信することが好ましい。 The operation information transmission unit 53 transmits the operation information generated by the operation information generation unit 52. Specifically, the operation information transmission unit 53 transmits the operation information to the utterance control server 1 when the operation information is updated in the device state management table 70 or periodically. At this time, it is preferable to transmit the device ID in association with the operation information so that the utterance control server 1 can identify the operation information of which TV 3.

なお、操作情報が定期的に受信される場合、最新の操作情報に基づいて更新された稼働情報が、直前の操作情報に基づいて更新された稼働情報から変化しないことが想定される。このような場合、稼働情報送信部５３は、同じ稼働情報を連続で送信することになるので、当該送信を取りやめる構成であってもよい。 When the operation information is periodically received, it is assumed that the operation information updated based on the latest operation information does not change from the operation information updated based on the immediately previous operation information. In such a case, the operation information transmission unit 53 continuously transmits the same operation information, and thus may be configured to cancel the transmission.

〔情報およびテーブルについて〕
実施形態１の対話システム４０１において、各装置が取り扱う情報およびテーブルについて、図３の（ａ）〜（ｅ）に示す。図３の（ａ）は、ＴＶ３から情報収集サーバ４に送信される情報の具体例を示す図である。図３の（ｂ）は、機器状態管理テーブル７０の具体例を示す図である。図３の（ｃ）は、情報収集サーバ４から発話制御サーバ１に送信される情報の具体例を示す図である。図３の（ｄ）は、機器装置対応テーブル７１の具体例を示す図である。図３の（ｅ）は、ルールテーブル７４の具体例を示す図である。 [About information and tables]
In the interactive system 401 according to the first embodiment, information and tables handled by each device are shown in (a) to (e) of FIG. FIG. 3A is a diagram illustrating a specific example of information transmitted from the TV 3 to the information collection server 4. FIG. 3B is a diagram illustrating a specific example of the device state management table 70. FIG. 3C is a diagram illustrating a specific example of information transmitted from the information collection server 4 to the utterance control server 1. FIG. 3D is a diagram illustrating a specific example of the device device correspondence table 71. FIG. 3E is a diagram showing a specific example of the rule table 74.

図３は、理解を容易にする目的で各種情報の一具体例を示すものであり、各装置の構成を限定するものではない。また、図３において、各種情報データ構造をテーブル形式にて示したことは一例であって、当該データ構造を、テーブル形式に限定する意図はない。以降、データ構造を説明するためのその他の図においても同様である。 FIG. 3 shows a specific example of various information for the purpose of facilitating understanding, and does not limit the configuration of each device. Also, in FIG. 3, the various information data structures shown in the table format are merely examples, and the data structures are not intended to be limited to the table format. Hereinafter, the same applies to other figures for explaining the data structure.

具体例を挙げて説明すると、まず、対話システム４０１において、ユーザが、ＴＶ３にて２チャンネルを視聴中に、チャンネルを１つ前のチャンネルに切り替えるために、操作部３２（リモコン）の上ボタンを押下したとする。ＴＶ３は、この操作にしたがって１チャンネルを選局する。このとき、ＴＶ３の操作情報送信部５０は、入力された制御信号の内容「上ボタン押下」を示す操作情報と、自機の機器ＩＤとを対応付けて情報収集サーバ４に送信する。このとき送信された機器ＩＤおよび操作情報の具体例が図３の（ａ）に示されている。 A specific example will be described. First, in the interactive system 401, when the user is viewing two channels on the TV 3, in order to switch the channel to the previous channel, the upper button of the operation unit 32 (remote control) is pressed. Suppose that it is pressed. The TV 3 selects one channel according to this operation. At this time, the operation information transmitting unit 50 of the TV 3 transmits the operation information indicating the content of the input control signal “up button pressed” and the device ID of the own device to the information collecting server 4 in association with each other. A specific example of the device ID and operation information transmitted at this time is shown in FIG.

情報収集サーバ４の操作情報受信部５１は、図３の（ａ）に示す機器ＩＤおよび操作情報を受信すると、この機器ＩＤおよび操作情報を用いて、機器状態管理テーブル７０に格納されている情報を更新する。具体的には、図３の（ｂ）に示す機器状態管理テーブル７０のうち、機器ＩＤ「ＴＶ０００１」に対応付けられている状態「電源オン−２ＣＨ選局中」を、「電源オン−１ＣＨ選局中」に更新する。上述したとおり、操作情報受信部５１は、状態「電源オン−２ＣＨ選局中」で、「上ボタン押下」のイベントが起こると、ＴＶ３の状態が、「電源オン−１ＣＨ選局中」に遷移することを、図示しない上記状態遷移情報に基づいて決定することができる。あるいは、操作情報受信部５１は、ＴＶ３にて任意のチャンネルが選局されているときに数字ボタンが押下されたときには、ＴＶ３はその数字のチャンネルを選局中であると決定することができる。あるいは、操作情報受信部５１は、ＥＰＧ（Electronic Program Guide）などのＯＳＤ（On-Screen Display）画像が表
示されている状態で、十字ボタンおよび決定ボタンが押下されたときには、ＴＶ３は上記状態で選択されたチャンネルを選局中であると決定したりすることができる。 When the operation information receiving unit 51 of the information collection server 4 receives the device ID and the operation information shown in FIG. 3A, the information stored in the device state management table 70 using the device ID and the operation information. Update. Specifically, in the device state management table 70 shown in FIG. 3B, the state “power on-2CH selected” associated with the device ID “TV0001” is set to “power on-1CH selected”. Update to "Busy". As described above, the operation information receiving unit 51 changes the state of the TV 3 to “power-on-1CH channel selection” when the “up button pressed” event occurs in the state “power-on-2CH channel selection”. It can be determined based on the state transition information (not shown). Alternatively, when the numeric button is pressed when an arbitrary channel is selected on the TV 3, the operation information receiving unit 51 can determine that the numeric channel is being selected. Alternatively, when the OSD (On-Screen Display) image such as EPG (Electronic Program Guide) is displayed and the cross button and the determination button are pressed, the operation information receiving unit 51 selects the TV 3 in the above state. It is possible to determine that the selected channel is being selected.

情報収集サーバ４の稼働情報生成部５２は、機器の状態が更新されると、更新された機器の状態に基づいて、稼働情報を更新する。実施形態１では、稼働情報生成部５２は、ＴＶ３の状態が「電源オフ」を示す場合には、対応付けられている稼働情報も「電源オフ」を示すように設定（生成または更新）する。一方、稼働情報生成部５２は、ＴＶ３の状態
が「電源オン」かつ「（任意のチャンネルを）選局中」を示す場合には、「（ユーザが任意のチャンネルを）視聴中」を示す稼働情報を生成または更新する。また、稼働情報生成部５２は、ＴＶ３の状態が「電源オン」かつ「（選局以外の任意の動作中）」を示す場合には、「非視聴使用中」を示す稼働情報を生成または更新する。 When the device state is updated, the operation information generation unit 52 of the information collection server 4 updates the operation information based on the updated device state. In the first embodiment, when the state of the TV 3 indicates “power off”, the operation information generation unit 52 sets (generates or updates) the associated operation information to also indicate “power off”. On the other hand, when the state of the TV 3 indicates “power on” and “selecting (selecting any channel)”, the operation information generating unit 52 indicates “operation (viewing any channel) by the user”. Generate or update information. In addition, the operation information generation unit 52 generates or updates operation information indicating “non-viewing in use” when the state of the TV 3 indicates “power on” and “(any operation other than channel selection)”. To do.

上述の具体例では、機器ＩＤ「ＴＶ０００１」の状態「電源オン−２ＣＨ選局中」は、「電源オン−１ＣＨ選局中」に更新された。よって、稼働情報は更新されたが結果的に「視聴中」のままである。なお、稼働情報生成部５２は、稼働情報を生成または更新した日時を、図３の（ｂ）に示すように、最終更新日時のカラムに格納してもよい。 In the above-described specific example, the state of the device ID “TV0001” “power-on-2CH being selected” is updated to “power-on-1CH being selected”. Therefore, the operation information has been updated, but as a result remains “viewing”. Note that the operation information generation unit 52 may store the date and time when the operation information is generated or updated as shown in FIG. 3B in the last update date and time column.

他の具体例において、操作情報受信部５１によって、機器ＩＤ「ＴＶ０００２」の状態「電源オフ」が、「電源オン−１ＣＨ選局中」に更新された場合には、稼働情報生成部５２は、機器ＩＤ「ＴＶ０００２」の稼働情報を「電源オフ」から「視聴中」に更新する。 In another specific example, when the operation information receiving unit 51 updates the state “power off” of the device ID “TV0002” to “powering on—one channel is being selected”, the operation information generating unit 52 The operation information of the device ID “TV0002” is updated from “power off” to “viewing”.

稼働情報送信部５３は、稼働情報生成部５２によって稼働情報が生成されたり、更新されたりすると、対応付けられている機器ＩＤとともに、最新の上記稼働情報を発話制御サーバ１に送信する。例えば、機器ＩＤ「ＴＶ０００１」の稼働情報「視聴中」が更新されたとき、稼働情報送信部５３が送信する機器ＩＤおよび稼働情報の具体例が図３の（ｃ）に示されている。 When the operation information is generated or updated by the operation information generation unit 52, the operation information transmission unit 53 transmits the latest operation information together with the associated device ID to the utterance control server 1. For example, FIG. 3C illustrates a specific example of the device ID and the operation information transmitted by the operation information transmission unit 53 when the operation information “viewing” of the device ID “TV0001” is updated.

発話制御サーバ１の稼働情報受信部（稼働情報受信手段）５４は、図３の（ｃ）に示す機器ＩＤおよび稼働情報を受信すると、上記機器ＩＤを制御対象特定部５５に、上記稼働情報を条件判断部５６に、それぞれ供給する。 When the operation information receiving unit (operation information receiving unit) 54 of the utterance control server 1 receives the device ID and the operation information shown in (c) of FIG. 3, the device ID is transmitted to the control target specifying unit 55. It supplies to the condition judgment part 56, respectively.

まず、制御対象特定部（制御対象特定手段）５５は、供給された上記機器ＩＤを用いて、機器ＩＤが示すＴＶ３の対となる対話装置２であって、発話の制御を行う対象となる対話装置２を特定する。図３の（ｄ）に示す機器装置対応テーブル７１によれば、機器ＩＤに対応付けて、制御対象となる対話装置２の装置ＩＤが対応付けて記憶されている。したがって、制御対象特定部５５は、機器装置対応テーブル７１を参照することにより、制御対象の対話装置２を特定することができる。上述の具体例では、受信された機器ＩＤが「ＴＶ０００１」を示すので、制御対象特定部５５は、制御対象の対話装置２を、装置ＩＤ「ＤＥ０００１」の対話装置２であると特定する。 First, the control target specifying unit (control target specifying means) 55 is a dialog device 2 that is a pair of the TV 3 indicated by the device ID, using the supplied device ID, and is a dialog for which utterance control is performed. The device 2 is specified. According to the device device correspondence table 71 shown in FIG. 3D, the device ID of the interactive device 2 to be controlled is stored in association with the device ID. Therefore, the control target specifying unit 55 can specify the interactive device 2 to be controlled by referring to the device device correspondence table 71. In the above specific example, since the received device ID indicates “TV0001”, the control target specifying unit 55 specifies that the interactive device 2 to be controlled is the interactive device 2 having the device ID “DE0001”.

次に、条件判断部（条件判断手段）５６は、供給された上記稼働情報を用いて、ＴＶ３と対になる対話装置２が置かれている環境において、音声誤認識条件が成立するか否かを判断する。図３の（ｅ）に示すルールテーブル７４によれば、稼働情報の内容ごとに、音声誤認識情報の成否が対応付けられている。したがって、条件判断部５６は、ルールテーブル７４を参照することにより、稼働情報に基づいて音声誤認識条件の成否を判断することができる。上述の具体例では、受信された稼働情報が「視聴中」を示すので、条件判断部５６は、対話装置２が現在置かれている環境において、音声誤認識条件が成立すると判断する。こうして、「ＴＶ０００１」のＴＶ３がユーザによって視聴されているので、「ＤＥ０００１」の対話装置２が音声の誤認識を起こす可能性が高いという状況が対話システム４０１において把握される。 Next, the condition determination unit (condition determination means) 56 uses the supplied operation information to determine whether or not a voice error recognition condition is satisfied in an environment where the interactive device 2 that is paired with the TV 3 is placed. Judging. According to the rule table 74 shown in (e) of FIG. 3, success / failure of the voice error recognition information is associated with each content of the operation information. Therefore, the condition determination unit 56 can determine whether or not the voice error recognition condition is successful based on the operation information by referring to the rule table 74. In the above specific example, since the received operation information indicates “viewing”, the condition determination unit 56 determines that the voice recognition error condition is satisfied in the environment where the interactive apparatus 2 is currently placed. In this way, since the TV 3 of “TV0001” is viewed by the user, the dialog system 401 grasps the situation that the dialog device 2 of “DE0001” is highly likely to cause erroneous recognition of voice.

そこで、この具体例では、発話制御部（発話制御手段）５７は、特定された「ＤＥ０００１」の対話装置２に対して、発話に係る指示として、出力抑制指示を送信する。出力抑制指示を受信した対話装置２の対話モード制御部５８は、この指示にしたがって、既述のとおり音声認識部５９または音声出力制御部６０を制御する。結果として、対話装置２の周囲で何らかの音声が発生しても、対話装置２の音声出力部２３からは、その音声の返答音声は出力されない。こうして、対話装置２が音声の誤認識を起こす可能性が高い状況下
において、対話装置２が音声の誤認識を起こしたり、誤作動を起こしたりすることを回避できる。特に、上述の具体例では、ユーザがＴＶ３を視聴している間、ＴＶ３から出力された音声に反応して対話装置２が誤った返答音声を出力することを回避できる。結果として、ユーザのＴＶ３の視聴を誤作動によって邪魔すること防止できる。 Therefore, in this specific example, the utterance control unit (speech control means) 57 transmits an output suppression instruction as an instruction related to the utterance to the identified dialog device 2 of “DE0001”. The dialogue mode control unit 58 of the dialogue apparatus 2 that has received the output suppression instruction controls the voice recognition unit 59 or the voice output control unit 60 as described above in accordance with this instruction. As a result, even if some kind of sound is generated around the interactive device 2, the response voice of the sound is not output from the sound output unit 23 of the interactive device 2. In this way, it is possible to prevent the interactive device 2 from causing erroneous recognition of voice or malfunction under a situation where the interactive device 2 is likely to cause erroneous recognition of speech. In particular, in the above-described specific example, it is possible to avoid that the dialogue apparatus 2 outputs an erroneous response voice in response to the voice output from the TV 3 while the user is watching the TV 3. As a result, the user's viewing of the TV 3 can be prevented from being disturbed by a malfunction.

なお、機器装置対応テーブル７１は、情報収集サーバ４の記憶部４１に格納されていてもよい。この場合、制御対象特定部５５は、制御部４０に設けられ、稼働情報送信部５３は、制御対象特定部５５が特定した装置ＩＤと稼働情報とを対応付けて、発話制御サーバ１に供給すればよい。 The device device correspondence table 71 may be stored in the storage unit 41 of the information collection server 4. In this case, the control target specifying unit 55 is provided in the control unit 40, and the operation information transmitting unit 53 associates the device ID specified by the control target specifying unit 55 with the operation information and supplies it to the utterance control server 1. That's fine.

〔処理フロー〕
図４は、対話システム４０１において、ＴＶ３が実行する操作情報送信処理の流れと、情報収集サーバ４が実行する稼働情報送信処理の流れとを示すフローチャートである。図５は、対話システム４０１において、発話制御サーバ１が実行する発話制御処理の流れと、対話装置２が実行する対話モード制御処理の流れとを示すフローチャートである。 [Process flow]
FIG. 4 is a flowchart showing the flow of operation information transmission processing executed by the TV 3 and the flow of operation information transmission processing executed by the information collection server 4 in the interactive system 401. FIG. 5 is a flowchart showing the flow of the utterance control process executed by the utterance control server 1 and the flow of the dialogue mode control process executed by the dialog device 2 in the dialog system 401.

（操作情報送信フロー）
図４を参照して、ＴＶ３が、ユーザのリモコン操作によって操作部３２から送出された制御信号を受け付けると（Ｓ１０１においてＹＥＳ）、操作情報送信部５０は、自機の機器ＩＤおよび上記制御信号の内容を示す操作情報を情報収集サーバ４に送信する（Ｓ１０２）（例えば、図３の（ａ））。あるいは、ＴＶ３が前回制御信号を受け付けてから、制御信号を受け付けることなく一定時間以上経過した場合も考えられる（Ｓ１０１においてＹＥＳ）。この場合、操作情報送信部５０は、押下されたボタンを示す代わりに押下されたボタンがないということを示す操作情報を情報収集サーバ４に送信する（Ｓ１０２）。 (Operation information transmission flow)
Referring to FIG. 4, when TV 3 receives a control signal sent from operation unit 32 by a user's remote control operation (YES in S101), operation information transmission unit 50 receives the device ID of the own device and the control signal. Operation information indicating the contents is transmitted to the information collection server 4 (S102) (for example, (a) in FIG. 3). Alternatively, a case where a certain time or more has passed without receiving the control signal since the TV 3 received the previous control signal (YES in S101) can be considered. In this case, instead of indicating the pressed button, the operation information transmitting unit 50 transmits operation information indicating that there is no pressed button to the information collecting server 4 (S102).

（稼働情報送信フロー）
情報収集サーバ４の操作情報受信部５１が機器ＩＤおよび操作情報を受信すると（Ｓ１０３においてＹＥＳ）、操作情報受信部５１は、機器状態管理テーブル７０（図３の（ｂ））において、上記機器ＩＤによって特定されるＴＶ３の状態を、上記操作情報に基づいて更新する（Ｓ１０４）。 (Operation information transmission flow)
When the operation information receiving unit 51 of the information collection server 4 receives the device ID and the operation information (YES in S103), the operation information receiving unit 51 reads the device ID in the device state management table 70 ((b) of FIG. 3). The state of the TV 3 specified by is updated based on the operation information (S104).

ＴＶ３の状態が、電源オンかつ（チャンネルを）選局中を示す場合には（Ｓ１０５においてＹＥＳ、かつ、Ｓ１０６においてＹＥＳ）、稼働情報生成部５２は、ＴＶ３の稼働情報「視聴中」を生成する、または、稼働情報を「視聴中」に更新する（Ｓ１０７）。そして、稼働情報送信部５３は、ＴＶ３の機器ＩＤおよび稼働情報「視聴中」を発話制御サーバ１に送信する（Ｓ１０８）。一方、ＴＶ３の状態が、電源オフまたは選局中以外の電源オンを示す場合には（Ｓ１０５においてＮＯ、または、Ｓ１０６においてＮＯ）、稼働情報生成部５２は、ＴＶ３の稼働情報を、「電源オフ」または「非視聴使用中」を示すように生成または更新する（Ｓ１０９）。そして、稼働情報送信部５３は、ＴＶ３の機器ＩＤおよび稼働情報「電源オフ」または「非視聴使用中」を発話制御サーバ１に送信する（Ｓ１１０）。 When the state of the TV 3 indicates that the power is on and the channel is being selected (YES in S105 and YES in S106), the operation information generation unit 52 generates the operation information “viewing” of the TV 3 Alternatively, the operation information is updated to “viewing” (S107). Then, the operation information transmission unit 53 transmits the device ID of the TV 3 and the operation information “viewing” to the utterance control server 1 (S108). On the other hand, when the status of TV3 indicates power-off or power-on other than during channel selection (NO in S105 or NO in S106), operation information generating unit 52 displays the operation information of TV3 as “power off”. "Or" Non-viewing in use "is generated or updated (S109). Then, the operation information transmission unit 53 transmits the device ID of the TV 3 and the operation information “power off” or “not in use” to the utterance control server 1 (S110).

なお、操作情報受信部５１は、１台のＴＶ３について、前回操作情報を受信してから、一定時間以上操作情報を受信しなかった場合には（Ｓ１０３においてＮＯ、Ｓ１１１においてＹＥＳ）、ＴＶ３の状態を「電源オフ」に更新してもよい。この場合、Ｓ１０９およびＳ１１０が実行される。 If the operation information receiving unit 51 has not received operation information for a certain time after receiving the previous operation information for one TV 3 (NO in S103, YES in S111), the state of the TV 3 May be updated to “power off”. In this case, S109 and S110 are executed.

あるいは、操作情報受信部５１は、すべてのＴＶ３について、定期的に、図３の（ｂ）に示す機器状態管理テーブル７０の最終更新日時をチェックしてもよい。そして、操作情報受信部５１は、最終更新日時から（または、操作情報の前回の受信から）一定時間以上
経過しているすべてのＴＶ３を抽出し、これらのＴＶ３について、状態を「電源オフ」に更新する。この場合、抽出されたすべてのＴＶ３についてまとめて、稼働情報生成部５２によって稼働情報が「電源オフ」に更新され、稼働情報送信部５３によって発話制御サーバ１に送信される。 Alternatively, the operation information receiving unit 51 may periodically check the last update date and time of the device state management table 70 shown in FIG. Then, the operation information receiving unit 51 extracts all TVs 3 that have passed a predetermined time or more from the last update date and time (or since the previous reception of the operation information), and sets the state of these TVs 3 to “power off”. Update. In this case, the operation information is updated to “power off” by the operation information generation unit 52 together with all the extracted TVs 3 and transmitted to the utterance control server 1 by the operation information transmission unit 53.

なお、機器装置対応テーブル７１を情報収集サーバ４が保持する場合、情報収集サーバ４に設けられた制御対象特定部５５は、Ｓ１０３の後から、Ｓ１０８またはＳ１１０より前のいずれかのステップにおいて、機器ＩＤに基づいて装置ＩＤを特定するステップを実行する。この場合、Ｓ１０８またはＳ１１０では、機器ＩＤに代えて装置ＩＤが稼働情報とともに送信される。 Note that when the information collection server 4 holds the device device correspondence table 71, the control target specifying unit 55 provided in the information collection server 4 performs the device in any step after S103 and before S108 or S110. A step of specifying the device ID based on the ID is executed. In this case, in S108 or S110, the device ID is transmitted together with the operation information instead of the device ID.

（発話制御フロー）
図５を参照して、発話制御サーバ１の稼働情報受信部５４が、機器ＩＤおよび稼働情報（例えば、図３の（ｃ））を受信すると（Ｓ１１２においてＹＥＳ）、制御対象特定部５５は、機器装置対応テーブル７１（図３の（ｄ））を参照し、上記ＴＶ３と対になっている、発話制御対象の対話装置２を特定する（Ｓ１１３）。具体的には、受信された機器ＩＤに対応する装置ＩＤを特定する。なお、機器ＩＤに代えて装置ＩＤが稼働情報とともに受信された場合には、Ｓ１１３は省略可能である。 (Speech control flow)
Referring to FIG. 5, when the operation information receiving unit 54 of the utterance control server 1 receives the device ID and the operation information (for example, (c) of FIG. 3) (YES in S112), the control target specifying unit 55 With reference to the device device correspondence table 71 ((d) of FIG. 3), the dialogue device 2 that is paired with the TV 3 and is an utterance control target is specified (S113). Specifically, the device ID corresponding to the received device ID is specified. If the device ID is received together with the operation information instead of the device ID, S113 can be omitted.

一方、条件判断部５６は、ルールテーブル７４（図３の（ｅ））を参照し、受信された稼働情報に基づいて、上記対話装置２の環境において、音声誤認識条件が成立しているか否かを判断する（Ｓ１１４）。図３の（ｅ）に示す例では、条件判断部５６は、稼働情報が「視聴中」を示す場合に、音声誤認識条件が成立すると判断し、稼働情報が「電源オフ」または「非視聴使用中」を示す場合に、音声誤認識条件が成立しないと判断する。 On the other hand, the condition determination unit 56 refers to the rule table 74 ((e) of FIG. 3), and based on the received operation information, whether or not a voice error recognition condition is satisfied in the environment of the interactive device 2 described above. Is determined (S114). In the example shown in (e) of FIG. 3, the condition determination unit 56 determines that the voice recognition condition is satisfied when the operation information indicates “viewing”, and the operation information indicates “power off” or “non-viewing”. When “in use” is indicated, it is determined that the voice recognition condition is not satisfied.

音声誤認識条件が成立すると判断された場合（Ｓ１１４においてＹＥＳ）、発話制御部５７は、Ｓ１１３にて特定された対話装置２に対して、返答音声の出力を抑制する指示、すなわち、出力抑制指示を送信する（Ｓ１１５）。一方、音声誤認識条件が成立しないと判断された場合（Ｓ１１４においてＮＯ）、発話制御部５７は、Ｓ１１３にて特定された対話装置２に対して、返答音声の出力抑制を解除する指示、すなわち、抑制解除指示を送信する（Ｓ１１６）。 When it is determined that the erroneous voice recognition condition is satisfied (YES in S114), the utterance control unit 57 instructs the interactive device 2 specified in S113 to suppress the output of the reply voice, that is, the output suppression instruction. Is transmitted (S115). On the other hand, when it is determined that the voice recognition condition is not satisfied (NO in S114), the utterance control unit 57 instructs the interactive device 2 specified in S113 to cancel the output suppression of the response voice, that is, Then, a suppression release instruction is transmitted (S116).

（対話モード制御フロー）
対話装置２の対話モード制御部５８が発話に係る指示を受信すると（Ｓ１１７においてＹＥＳ）、対話モード制御部５８は指示内容を分析する。受信した指示が出力抑制指示である場合（Ｓ１１８において１）、対話モード制御部５８は、対話装置２の対話モードを、返答音声を出力しない対話モードに切り替える（Ｓ１１９）。具体的には、音声認識部５９の音声認識機能を無効にすることにより、返答音声の出力を抑制する。一方、受信した指示が抑制解除指示である場合（Ｓ１１８において２）、対話モード制御部５８は、対話モードを、返答音声を出力する対話モードに切り替える（Ｓ１２０）。具体的には、音声認識部５９の音声認識機能を有効にすることにより、返答音声の出力抑制を解除する。 (Interactive mode control flow)
When interactive mode control unit 58 of interactive apparatus 2 receives an instruction related to speech (YES in S117), interactive mode control unit 58 analyzes the content of the instruction. When the received instruction is an output suppression instruction (1 in S118), the dialogue mode control unit 58 switches the dialogue mode of the dialogue apparatus 2 to a dialogue mode that does not output a reply voice (S119). Specifically, the output of the response voice is suppressed by invalidating the voice recognition function of the voice recognition unit 59. On the other hand, when the received instruction is a suppression release instruction (2 in S118), the dialogue mode control unit 58 switches the dialogue mode to a dialogue mode that outputs a response voice (S120). Specifically, by suppressing the voice recognition function of the voice recognition unit 59, the output suppression of the reply voice is released.

なお、対話モード制御部５８は、Ｓ１１８において、発話に係る指示が、自発音声出力指示であると分析した場合には、現在のモードが返答音声を出力しない対話モードであっても、音声出力制御部６０に対して、自発音声を出力するよう指示してもよい。 When the conversation mode control unit 58 analyzes in S118 that the instruction related to the utterance is a spontaneous voice output instruction, the voice output control is performed even if the current mode is the conversation mode in which no response voice is output. The unit 60 may be instructed to output a spontaneous sound.

≪実施形態２≫
本発明の実施形態２について、図６〜図９に基づいて説明すれば、以下のとおりである。なお、説明の便宜上、先の実施形態にて説明した部材と同じ機能を有する部材については、同じ符号を付記し、その説明を省略する。以降の実施形態においても同様である。 << Embodiment 2 >>
The second embodiment of the present invention will be described below with reference to FIGS. For convenience of explanation, members having the same functions as those described in the previous embodiment are denoted by the same reference numerals and description thereof is omitted. The same applies to the following embodiments.

実施形態２では、音声出力機器（ＴＶ３）は、インターネットなど、外部ネットワークに接続されておらず、外部の装置と通信する機能を有していない。すなわち、音声出力機器は、自機の状態を外部に送信することができない。このように、音声出力機器がネットワーク接続されていない場合であっても、発話制御サーバ１において音声出力機器の状況および対話装置２の環境を把握し、音声の誤認識を適切に回避することが可能である。 In the second embodiment, the audio output device (TV 3) is not connected to an external network such as the Internet and does not have a function of communicating with an external device. That is, the audio output device cannot transmit its own state to the outside. As described above, even when the voice output device is not connected to the network, the speech control server 1 can grasp the status of the voice output device and the environment of the interactive device 2 and appropriately avoid the erroneous recognition of the voice. Is possible.

〔対話システム概要〕
図６は、本発明の実施形態２に係る対話システムの概略を示す図である。図６に示す対話システム４０２は、発話制御サーバ１と、対話装置２と、音声出力機器（例えば、ＴＶ３）とを含む。 [Outline of Dialogue System]
FIG. 6 is a diagram showing an outline of a dialogue system according to Embodiment 2 of the present invention. The dialogue system 402 shown in FIG. 6 includes the utterance control server 1, the dialogue device 2, and an audio output device (for example, TV 3).

稼働情報は、実施形態１の対話システム４０１では情報収集サーバ４によって生成され発話制御サーバ１に供給される構成であったが、実施形態２の対話システム４０２では、情報収集サーバ４は設けられず、情報収集サーバ４に代えて対話装置２が発話制御サーバ１に供給する構成である。 The operation information is configured to be generated by the information collection server 4 and supplied to the utterance control server 1 in the interactive system 401 of the first embodiment, but the information collection server 4 is not provided in the interactive system 402 of the second embodiment. Instead of the information collection server 4, the dialogue apparatus 2 supplies the utterance control server 1.

実施形態２では、対話装置２は、ＴＶ３との間で赤外線を介して通信する機能を有し、操作部３２と同様に制御信号を送ってＴＶ３を遠隔で操作するとともに、ＴＶ３の状態を把握することが可能である。これにより、対話装置２は、ＴＶ３が受け付けた操作情報と、ＴＶ３の最新の状態とを自装置にて把握しているので、これらの情報に基づいて稼働情報を生成することができる。つまり、対話装置２は、自装置の装置ＩＤと、自装置にて生成した稼働情報とを発話制御サーバ１に送信することができる。発話制御サーバ１は、受信した稼働情報に応じて、発話に係る指示を対話装置２に対して返信することができる。 In the second embodiment, the dialogue apparatus 2 has a function of communicating with the TV 3 via infrared rays, and similarly to the operation unit 32, sends a control signal to operate the TV 3 remotely and grasps the state of the TV 3. Is possible. Thereby, since the dialogue apparatus 2 grasps the operation information received by the TV 3 and the latest state of the TV 3 by itself, it can generate the operation information based on these information. That is, the dialogue apparatus 2 can transmit the apparatus ID of the own apparatus and the operation information generated by the own apparatus to the utterance control server 1. The utterance control server 1 can return an instruction related to the utterance to the dialogue apparatus 2 in accordance with the received operation information.

結果として、ＴＶ３がユーザによって視聴されている間、音声の誤認識またはそれに伴う誤作動を回避することができるという、実施形態１と同様の効果を得ることができる。 As a result, while the TV 3 is being viewed by the user, it is possible to obtain the same effect as in the first embodiment, in which it is possible to avoid erroneous voice recognition or malfunction associated therewith.

〔対話システムの各装置の機能構成〕
図７は、本発明の実施形態２に係る対話システムにおける各装置の要部構成を示す機能ブロックである。図８の（ａ）〜（ｄ）は、実施形態２の対話システム４０２において、各装置が取り扱う情報およびテーブルの具体例を示す図である。 [Functional configuration of each device in the interactive system]
FIG. 7 is a functional block showing a main configuration of each device in the dialogue system according to Embodiment 2 of the present invention. 8A to 8D are diagrams illustrating specific examples of information and tables handled by each device in the interactive system 402 according to the second embodiment.

（発話制御サーバ１の要部構成）
図７に示す発話制御サーバ１において、図１に示す実施形態１の発話制御サーバ１と異なる点は、以下の点である。 (Main components of the utterance control server 1)
The utterance control server 1 shown in FIG. 7 is different from the utterance control server 1 of the first embodiment shown in FIG. 1 in the following points.

実施形態２では、稼働情報受信部５４は、機器ＩＤに代えて装置ＩＤを稼働情報とともに受信する。よって、発話制御サーバ１は、制御対象の対話装置２を、上記装置ＩＤから直接把握することができる。そのため、制御部１０は、制御対象特定部５５を含んでいなくてもよい。また、記憶部１１には、機器装置対応テーブル７１が格納されていなくてもよい。稼働情報受信部５４は、受信した装置ＩＤを発話制御部５７に供給し、受信した稼働情報を条件判断部５６に送信する。 In the second embodiment, the operation information receiving unit 54 receives the device ID together with the operation information instead of the device ID. Therefore, the utterance control server 1 can directly grasp the interactive device 2 to be controlled from the device ID. Therefore, the control unit 10 may not include the control target specifying unit 55. The storage unit 11 may not store the device device correspondence table 71. The operation information reception unit 54 supplies the received device ID to the utterance control unit 57 and transmits the received operation information to the condition determination unit 56.

（対話装置２の要部構成）
図７に示す対話装置２において、図１に示す実施形態１の対話装置２と異なる点は、以下の点である。 (Main part configuration of the dialogue device 2)
The interactive device 2 shown in FIG. 7 is different from the interactive device 2 of the first embodiment shown in FIG. 1 in the following points.

対話装置２は、ＴＶ３を操作するための制御信号を送出する赤外線送信部２４を備えている。制御部２０は、機能ブロックとして、さらに、機器操作部（機器操作手段）６１、
稼働情報生成部５２、および、稼働情報送信部５３を含む。 The dialogue apparatus 2 includes an infrared transmission unit 24 that transmits a control signal for operating the TV 3. The control unit 20 further includes a device operation unit (device operation means) 61 as a functional block,
An operation information generation unit 52 and an operation information transmission unit 53 are included.

記憶部２１には、機器状態管理テーブル７０および装置ＩＤ７２が格納されている。装置ＩＤ７２は、図８の（ａ）に示すとおり、対話装置２に個別に割り当てられている識別情報である。この装置ＩＤ７２は、稼働情報送信部５３が稼働情報を送信するときに利用される。 The storage unit 21 stores a device state management table 70 and a device ID 72. The device ID 72 is identification information individually assigned to the interactive device 2 as shown in FIG. The device ID 72 is used when the operation information transmission unit 53 transmits operation information.

機器操作部６１は、所定のイベントの発生（所定の時刻になる、または、ユーザに指示される、など）に応じて、ＴＶ３を遠隔で操作するものである。また、機器操作部６１は、実施形態１の操作情報受信部５１と同様の方法で、上記の遠隔操作の内容を示す操作情報に応じて、機器状態管理テーブル７０に格納されているＴＶ３の状態を更新する。 The device operation unit 61 is for remotely operating the TV 3 in response to the occurrence of a predetermined event (a predetermined time is reached or the user instructs it). The device operation unit 61 is a method similar to that of the operation information receiving unit 51 of the first embodiment, and the state of the TV 3 stored in the device state management table 70 according to the operation information indicating the content of the remote operation described above. Update.

例えば、ＴＶ３のチャンネルを２チャンネルから１チャンネルに切り替えるようにユーザに指示されたとする。この場合、機器操作部６１は、ＴＶ３の状態が２チャンネル選局中から１チャンネル選局中に遷移するように、例えば、リモコンの上ボタン押下に相当する制御信号を、赤外線送信部２４を制御して、ＴＶ３に向かって送出させる。機器操作部６１は、上記制御信号（上ボタン押下）がＴＶ３に向かって送出された旨を示す操作情報を生成し、これに基づいて、機器状態管理テーブル７０に格納されているＴＶ３の状態を更新する。具体的には、機器操作部６１は、まず、図８の（ｂ）に示す操作情報を生成する。そして、図８の（ｃ）に示す機器状態管理テーブル７０から、操作対象のＴＶ３（機器ＩＤ「ＴＶ０００１」）のレコードを読み出す。そして、上記操作情報に基づいて、ＴＶ３の状態を、実施形態１の操作情報受信部５１と同様の方法で更新する。 For example, it is assumed that the user is instructed to switch the channel of the TV 3 from channel 2 to channel 1. In this case, the device operation unit 61 controls the infrared transmission unit 24 with a control signal corresponding to, for example, pressing the upper button of the remote control so that the state of the TV 3 transitions from channel selection to channel selection. Then, it is sent to the TV 3. The device operation unit 61 generates operation information indicating that the control signal (pressing the upper button) is sent to the TV 3, and based on this, the state of the TV 3 stored in the device state management table 70 is displayed. Update. Specifically, the device operation unit 61 first generates operation information shown in FIG. Then, the record of the operation target TV 3 (device ID “TV0001”) is read out from the device state management table 70 shown in FIG. And based on the said operation information, the state of TV3 is updated by the method similar to the operation information receiving part 51 of Embodiment 1. FIG.

稼働情報生成部５２は、実施形態１の稼働情報生成部５２と同様に、稼働情報を生成したり、更新したりする。図示していないが、実施形態１と同様に稼働情報を生成または更新した日時を、最終更新日時として機器状態管理テーブル７０に格納してもよい。なお、図８の（ｃ）に示す機器状態管理テーブル７０は、対話装置２が遠隔で操作可能な１以上の音声出力機器の状態について管理するためのテーブルである。したがって、ＴＶ３の他にも、対話装置２が遠隔操作できる録画再生装置、音楽再生装置などのレコードが機器状態管理テーブル７０にて管理されていてもよい。 The operation information generation unit 52 generates or updates the operation information, similarly to the operation information generation unit 52 of the first embodiment. Although not shown, the date and time when the operation information is generated or updated as in the first embodiment may be stored in the device status management table 70 as the last update date and time. Note that the device state management table 70 shown in FIG. 8C is a table for managing the state of one or more audio output devices that can be remotely operated by the interactive apparatus 2. Therefore, in addition to the TV 3, records such as a recording / playback device and a music playback device that can be remotely operated by the interactive device 2 may be managed in the device state management table 70.

稼働情報送信部５３は、実施形態１の稼働情報送信部５３と同様に、稼働情報生成部５２によって生成または更新された稼働情報を発話制御サーバ１に送信する。ただし、実施形態２では、機器ＩＤではなく、自装置の識別情報である装置ＩＤ７２を稼働情報に対応付けて送信する。図８の（ｄ）には、稼働情報送信部５３が送信する装置ＩＤおよび稼働情報の具体例が示されている。 The operation information transmission unit 53 transmits the operation information generated or updated by the operation information generation unit 52 to the utterance control server 1, similarly to the operation information transmission unit 53 of the first embodiment. However, in the second embodiment, not the device ID but the device ID 72 that is identification information of the own device is transmitted in association with the operation information. FIG. 8D shows a specific example of the device ID and operation information transmitted by the operation information transmission unit 53.

（音声出力機器の要部構成）
ＴＶ３は、実施形態２では、通信部および制御部３０において操作情報送信部５０を含んでいなくてもよい。ＴＶ３は、少なくとも、対話装置２から送出された制御信号を受信するための赤外線受信部３３を備えている。また、ＴＶ３は、赤外線受信部３３が受信した制御信号にしたがって、デジタルテレビとしての機能を実行するための各部を備えているが、これらのブロックについて図示を省略している。 (Main components of audio output equipment)
In the second embodiment, the TV 3 may not include the operation information transmission unit 50 in the communication unit and the control unit 30. The TV 3 includes at least an infrared receiving unit 33 for receiving a control signal transmitted from the interactive apparatus 2. The TV 3 includes various units for executing functions as a digital television in accordance with the control signal received by the infrared receiver 33, but the illustration of these blocks is omitted.

上記構成によれば、対話装置２が遠隔で操作することによって音声出力機器（ＴＶ３など）の稼働状況が変化すると、対話装置２は、その変化を、稼働情報を送信して、発話制御サーバ１に報告する。発話制御サーバ１は、実施形態１と同様に稼働情報に基づいて、音声誤認識条件の成否を判断し、判断結果に応じて、出力抑制指示または抑制解除指示を、報告元の対話装置２に対して返信する。 According to the above configuration, when the operating status of the audio output device (TV 3 or the like) changes due to remote operation of the dialog device 2, the dialog device 2 transmits the change information to the utterance control server 1 To report to. As in the first embodiment, the utterance control server 1 determines the success or failure of the speech misrecognition condition based on the operation information, and outputs an output suppression instruction or a suppression release instruction to the report source dialog apparatus 2 according to the determination result. Reply to.

このように、音声出力機器がネットワーク接続されていない場合であっても、発話制御サーバ１において音声出力機器の状況および対話装置２の環境を把握し、音声の誤認識を適切に回避することが可能である。 As described above, even when the voice output device is not connected to the network, the speech control server 1 can grasp the status of the voice output device and the environment of the interactive device 2 and appropriately avoid the erroneous recognition of the voice. Is possible.

〔処理フロー〕
図９は、対話システム４０２において、対話装置２が実行する稼働情報送信処理の流れと、発話制御サーバ１が実行する発話制御処理の流れとを示すフローチャートである。 [Process flow]
FIG. 9 is a flowchart showing the flow of the operation information transmission process executed by the dialogue apparatus 2 and the flow of the utterance control process executed by the utterance control server 1 in the dialogue system 402.

（稼働情報送信フロー）
対話装置２の機器操作部６１が、音声出力機器（ここでは、ＴＶ３とする）を遠隔で操作する所定のイベントの発生を検知すると（Ｓ２０１においてＹＥＳ）、機器操作部６１は、赤外線送信部２４を制御して、ＴＶ３を遠隔で操作する。そして、その操作内容を示す操作情報に基づいて、機器状態管理テーブル７０において、ＴＶ３の状態を更新する（Ｓ２０２）。 (Operation information transmission flow)
When the device operation unit 61 of the interactive apparatus 2 detects the occurrence of a predetermined event for remotely operating the audio output device (herein referred to as TV 3) (YES in S201), the device operation unit 61 transmits the infrared transmission unit 24. To control the TV 3 remotely. And based on the operation information which shows the operation content, the state of TV3 is updated in the apparatus state management table 70 (S202).

ＴＶ３の状態が、電源オンかつ（チャンネルを）選局中を示す場合には（Ｓ２０３においてＹＥＳ、かつ、Ｓ２０４においてＹＥＳ）、稼働情報生成部５２は、ＴＶ３の稼働情報「視聴中」を生成する、または、稼働情報を「視聴中」に更新する（Ｓ２０５）。そして、稼働情報送信部５３は、装置ＩＤ７２および稼働情報「視聴中」を発話制御サーバ１に送信する（Ｓ２０６）。一方、ＴＶ３の状態が、電源オフまたは選局中以外の電源オンを示す場合には（Ｓ２０３においてＮＯ、または、Ｓ２０４においてＮＯ）、稼働情報生成部５２は、ＴＶ３の稼働情報を、「電源オフ」または「非視聴使用中」を示すように生成または更新する（Ｓ２０７）。そして、稼働情報送信部５３は、装置ＩＤ７２および稼働情報「電源オフ」または「非視聴使用中」を発話制御サーバ１に送信する（Ｓ２０８）。 When the state of the TV 3 indicates that the power is on and the channel is being selected (YES in S203 and YES in S204), the operation information generation unit 52 generates the operation information “viewing” of the TV 3 Alternatively, the operation information is updated to “viewing” (S205). Then, the operation information transmission unit 53 transmits the device ID 72 and the operation information “viewing” to the utterance control server 1 (S206). On the other hand, when the status of TV3 indicates power-off or power-on other than during channel selection (NO in S203 or NO in S204), operation information generating unit 52 displays the operation information of TV3 as “power off”. "Or" Non-viewing in use "is generated or updated (S207). Then, the operation information transmission unit 53 transmits the device ID 72 and the operation information “power off” or “not in use” to the utterance control server 1 (S208).

（発話制御フロー）
発話制御サーバ１の稼働情報受信部５４が、装置ＩＤ７２および稼働情報を受信すると（Ｓ２０９においてＹＥＳ）、稼働情報受信部５４は、装置ＩＤ７２を発話制御部５７に供給し、稼働情報を条件判断部５６に供給する。以降の処理は、図５に示すＳ１１４〜Ｓ１１６と同様である。 (Speech control flow)
When the operation information receiving unit 54 of the utterance control server 1 receives the device ID 72 and the operation information (YES in S209), the operation information receiving unit 54 supplies the device ID 72 to the utterance control unit 57 and uses the operation information as a condition determination unit. 56. The subsequent processing is the same as S114 to S116 shown in FIG.

（対話モード制御フロー）
図５に示すＳ１１７〜Ｓ１２０と同様である。 (Interactive mode control flow)
This is the same as S117 to S120 shown in FIG.

≪実施形態３≫
本発明の実施形態３について、図１０〜図１３に基づいて説明すれば、以下のとおりである。 << Embodiment 3 >>
The third embodiment of the present invention will be described below with reference to FIGS.

誤認識が起こり得る環境を対話装置２にもたらす原因は、ＴＶ３などの音声出力機器に限られない。ユーザが対話装置に対してではなく、他の対象に向けて発話をしている環境下でも、対話装置２は誤認識および誤操作を起こす可能性がある。例えば、ユーザは、電話機、携帯電話、スマートフォン、インターフォンなどの通話機器を用いて、遠隔の通話相手に対して発話することが考えられる。この発話は、明らかに、対話装置２に対して向けられたものではないが、該発話の音声を、ユーザの近くにいる対話装置２が誤認識する可能性がある。実施形態３では、本発明の対話システムは、通話機器を使用してユーザが発話する環境を把握し、音声の誤認識を適切に回避することが可能である。 The cause of causing an environment in which erroneous recognition may occur to the dialogue apparatus 2 is not limited to the audio output device such as the TV 3. Even in an environment where the user is speaking to another object, not to the interactive device, the interactive device 2 may cause erroneous recognition and erroneous operation. For example, it is conceivable that the user speaks to a remote call partner using a call device such as a telephone, a mobile phone, a smartphone, or an interphone. This utterance is obviously not directed to the dialogue apparatus 2, but the dialogue apparatus 2 near the user may misrecognize the voice of the utterance. In the third embodiment, the interactive system of the present invention can grasp an environment in which a user speaks by using a telephone device, and can appropriately avoid voice misrecognition.

〔対話システム概要〕
図１０は、本発明の実施形態３に係る対話システムの概略を示す図である。図１０に示
す対話システム４０３は、発話制御サーバ１と、対話装置２と、１以上の通話機器（例えば、電話機３ａ）とを含む。 [Outline of Dialogue System]
FIG. 10 is a diagram showing an outline of a dialogue system according to Embodiment 3 of the present invention. The dialogue system 403 shown in FIG. 10 includes the utterance control server 1, the dialogue device 2, and one or more telephone devices (for example, the telephone 3a).

実施形態３では、通話機器は、通話相手の機器から着信があった場合に、所定の呼出音を出力して該着信をユーザに知らせる機能を有する。その後、着信に気づいたユーザが通話機器を使用して、通話相手に対して発話すると予想される。 In the third embodiment, the call device has a function of outputting a predetermined ringing tone and notifying the user of an incoming call when an incoming call is received from the device of the other party. Thereafter, the user who notices the incoming call is expected to speak to the other party using the telephone device.

対話装置２は、通話機器（電話機３ａ）ごとに特有の所定の呼出音を検出することにより、電話機（対象外音声発生源機器／通話機器）３ａを特定するとともに、電話機３ａが通話中の状態になることを把握することができる。そして、電話機３ａが通話中の状態であることに基づいて、電話機３ａの稼働情報を生成することができる。後は、実施形態２と同様に、対話装置２が自装置の装置ＩＤと電話機３ａの稼働情報とを発話制御サーバ１に送信すればよい。 The dialog device 2 detects a specific predetermined ringing tone for each telephone device (telephone 3a), thereby identifying the telephone (non-target audio source device / calling device) 3a, and the telephone 3a is busy Can be grasped. Based on the fact that the telephone 3a is in a call state, the operation information of the telephone 3a can be generated. Thereafter, as in the second embodiment, the dialogue apparatus 2 may transmit the apparatus ID of the own apparatus and the operation information of the telephone 3 a to the utterance control server 1.

結果として、電話機３ａが通話中の状態である間は、音声の誤認識またはそれに伴う誤作動を回避することができるという効果を得ることができる。 As a result, while the telephone 3a is in a call state, it is possible to obtain an effect that it is possible to avoid erroneous voice recognition or a malfunction associated therewith.

〔対話システムの各装置の機能構成〕
図１１は、本発明の実施形態３に係る対話システムにおける各装置の要部構成を示す機能ブロックである。図１２の（ａ）〜（ｄ）は、実施形態３の対話システム４０３において、各装置が取り扱う情報およびテーブルの具体例を示す図である。 [Functional configuration of each device in the interactive system]
FIG. 11 is a functional block showing a main configuration of each device in the dialogue system according to Embodiment 3 of the present invention. (A)-(d) of FIG. 12 is a figure which shows the specific example of the information and table which each apparatus handles in the interactive system 403 of Embodiment 3. FIG.

（発話制御サーバ１の要部構成）
図１１に示す発話制御サーバ１において、図７に示す実施形態２の発話制御サーバ１と異なる点は、以下の点である。 (Main components of the utterance control server 1)
The utterance control server 1 shown in FIG. 11 is different from the utterance control server 1 of the second embodiment shown in FIG. 7 in the following points.

実施形態３では、稼働情報受信部５４は、装置ＩＤ７２とともに、通話機器（電話機３ａ）の稼働情報（例えば、図１２の（ｄ））を受信する。実施形態３において稼働情報は、「通話中」（ユーザによって電話機３ａが使用され、ユーザが通話相手と会話している状態）または「待機中」（着信を待ち受ける状態）を示す。 In the third embodiment, the operation information receiving unit 54 receives the operation information (for example, (d) of FIG. 12) of the telephone device (telephone 3a) together with the device ID 72. In the third embodiment, the operation information indicates “busy” (a state in which the user uses the telephone 3a and the user is talking to the other party) or “standby” (a state in which an incoming call is awaited).

条件判断部５６は、ルールテーブル７４を参照して、受信した稼働情報に基づいて、対話装置２が置かれている環境において音声誤認識条件が成立するか否かを判断する。例えば、図１２の（ｅ）に示すルールテーブル７４にしたがえば、条件判断部５６は、電話機３ａの稼働情報が「通話中」を示す場合に、音声誤認識条件が成立すると判断し、稼働情報が「待機中」を示す場合に、音声誤認識条件が成立しないと判断する。 The condition determining unit 56 refers to the rule table 74 and determines whether or not a voice error recognition condition is satisfied in the environment where the interactive device 2 is placed, based on the received operation information. For example, according to the rule table 74 shown in FIG. 12E, the condition determination unit 56 determines that the voice recognition error condition is satisfied when the operation information of the telephone 3a indicates “busy”, and the operation is performed. When the information indicates “standby”, it is determined that the voice recognition condition is not satisfied.

（対話装置２の要部構成）
図１１に示す対話装置２において、図２に示す実施形態２の対話装置２と異なる点は、以下の点である。 (Main part configuration of the dialogue device 2)
The dialogue apparatus 2 shown in FIG. 11 is different from the dialogue apparatus 2 according to the second embodiment shown in FIG. 2 in the following points.

対話装置２は、赤外線送信部２４を備えていなくてもよい。 The dialogue apparatus 2 may not include the infrared transmission unit 24.

また、制御部２０は、機器操作部６１を含んでいなくてもよく、代わりに、機能ブロックとして、音判定部（音判定手段）６２を備えている。 Moreover, the control part 20 does not need to include the apparatus operation part 61, and is provided with the sound determination part (sound determination means) 62 as a functional block instead.

また、記憶部２１には、さらに、呼出音テーブル７３が格納されている。 The storage unit 21 further stores a ringing tone table 73.

音判定部６２は、音声入力部２２を介して制御部２０に入力された音声データを分析してそれが何の音であるのかを判定するものである。また、音判定部６２は、その判定結果
に応じて、機器状態管理テーブル７０（例えば、図１２の（ｃ））において管理されている各通話機器の状態を更新する。 The sound determination unit 62 analyzes the sound data input to the control unit 20 via the sound input unit 22 and determines what sound it is. Further, the sound determination unit 62 updates the state of each telephone device managed in the device state management table 70 (for example, (c) of FIG. 12) according to the determination result.

図１２の（ｂ）は、呼出音テーブル７３の具体例を示す図である。実施形態３では、音判定部６２は、呼出音テーブル７３にあらかじめ登録されている通話機器ごとの呼出音のサンプルと、入力された音声データとを比較して、該音声データの音が、登録されている通話機器の呼出音であれば、その音がどの呼出音であるのかを呼出音テーブル７３のサンプルの中から特定する。なお、通話機器が電話機３ａの１台しかない場合には、音判定部６２は、上記音が電話機３ａの呼出音であるのか否かを判定するだけでよい。音判定部６２は、呼出音を特定すると、その呼出音に対応付けられている機器ＩＤに基づいて、着信を受けた通話機器を特定することができる。そして、音判定部６２は、機器状態管理テーブル７０に格納されている、上記機器ＩＤに対応付けられている通話機器の状態を更新する。具体的には、音判定部６２は、電話機３ａの呼出音が１７：３０に鳴ったと判定した場合、図１２の（ｃ）の機器状態管理テーブル７０において、機器ＩＤ「ＰＨ１００１」の状態を、「１７：３０着信有」に更新する。 FIG. 12B is a diagram showing a specific example of the ringing tone table 73. In the third embodiment, the sound determination unit 62 compares the ringing sound sample for each telephone device registered in the ringing sound table 73 in advance with the input voice data, and the sound of the voice data is registered. If it is the ringing sound of the telephone device that is being used, the ringing sound is identified from the samples in the ringing sound table 73. When there is only one telephone device 3a, the sound determination unit 62 only needs to determine whether the sound is a ringing sound of the telephone 3a. When the ring determination sound is specified, the sound determination unit 62 can specify a call device that has received an incoming call based on the device ID associated with the ringing sound. Then, the sound determination unit 62 updates the state of the call device associated with the device ID stored in the device state management table 70. Specifically, when the sound determination unit 62 determines that the ringing tone of the telephone 3a is sounded at 17:30, the state of the device ID “PH1001” in the device state management table 70 of FIG. Update to “17:30 Incoming Call”.

さらに、音判定部６２は、記憶部２１に格納されている、人の声の音のサンプル（図示せず）と、入力された音声データとを比較することにより、入力された音声データの音が人の声であるのか否かを判定することが好ましい。音判定部６２は、人の声であるのか否かを判定するだけでよく、発話内容をテキスト化する音声認識部５９と比較して、構成を簡素化することができ、処理負荷も低減される。人の声のサンプルは、事前に用意されたものであってもよいし、対話装置２のユーザの声を登録したものであってもよい。 Furthermore, the sound determination unit 62 compares the sound data of the input voice data by comparing a sample of human voice sound (not shown) stored in the storage unit 21 with the input sound data. It is preferable to determine whether or not is a human voice. The sound determination unit 62 only needs to determine whether or not it is a human voice. Compared with the voice recognition unit 59 that converts the utterance contents into text, the configuration can be simplified and the processing load is also reduced. The The human voice sample may be prepared in advance, or may be a registered voice of the user of the dialogue apparatus 2.

詳細には、音判定部６２は、登録された通話機器の呼出音が鳴ったと判定した後、人の声の入力を監視する。例えば、機器ＩＤ「ＰＨ１００２」のインターフォン（図示せず）の呼出音が１５：４０に鳴ったと判定した後、人の声の入力を監視する。その後、当該人の声の入力が一定時間以上途切れたことを確認すると、上記呼出音をトリガにして開始された通話が終了した（例えば、１５：４２に終了した）と判定する。このとき、音判定部６２は、図１２の（ｃ）に示すとおり、機器ＩＤ「ＰＨ１００２」のインターフォンの状態を、「１５：４０着信有」から「１５：４２通話終了」へと更新する。上記構成により、通話機器が通話中の状態である期間を対話装置２が把握することができる。 Specifically, the sound determination unit 62 monitors the input of a human voice after determining that the ringing sound of the registered telephone device has sounded. For example, after determining that the ringing tone of the interphone (not shown) having the device ID “PH1002” is sounded at 15:40, the input of a human voice is monitored. Thereafter, when it is confirmed that the input of the person's voice has been interrupted for a predetermined time or more, it is determined that the call started with the ringing tone as a trigger is terminated (for example, terminated at 15:42). At this time, as shown in (c) of FIG. 12, the sound determination unit 62 updates the interphone state of the device ID “PH1002” from “15:40 received” to “15:42 call ended”. With the above configuration, the interactive device 2 can grasp the period during which the telephone device is in a telephone conversation.

稼働情報生成部５２は、通話機器の稼働情報を生成したり更新したりする。図示していないが、実施形態１と同様に稼働情報を生成または更新した日時を、最終更新日時として機器状態管理テーブル７０に格納してもよい。稼働情報生成部５２は、通話機器の状態が、「通話終了」から「着信有」に更新された場合に、稼働情報を「待機中」から「通話中」に更新する。また、通話機器の状態が、「着信有」から「通話終了」に更新された場合に、稼働情報を「通話中」から「待機中」に更新する。 The operation information generation unit 52 generates or updates the operation information of the calling device. Although not shown, the date and time when the operation information is generated or updated as in the first embodiment may be stored in the device status management table 70 as the last update date and time. The operation information generation unit 52 updates the operation information from “standby” to “busy” when the state of the call device is updated from “call end” to “call received”. In addition, when the state of the calling device is updated from “present incoming call” to “end call”, the operation information is updated from “busy” to “standby”.

稼働情報送信部５３は、実施形態２の稼働情報送信部５３と同様に、稼働情報生成部５２によって生成または更新された稼働情報を装置ＩＤ７２（例えば、図１２の（ａ））とともに発話制御サーバ１に送信する。図１２の（ｄ）には、稼働情報送信部５３が送信する装置ＩＤおよび稼働情報の具体例が示されている。 Similarly to the operation information transmission unit 53 of the second embodiment, the operation information transmission unit 53 generates the operation information generated or updated by the operation information generation unit 52 together with the device ID 72 (for example, (a) in FIG. 12) and the utterance control server. 1 to send. FIG. 12D shows a specific example of the device ID and operation information transmitted by the operation information transmission unit 53.

（通話機器の要部構成）
通話機器としての電話機３ａは、電話機として一般的な機能を実行するための各部（少なくとも、着信をユーザに報知する音声出力部）を備えていればよく、図示を省略している。 (Main components of telephone equipment)
The telephone 3a as a telephone device only needs to include each unit (at least a voice output unit for notifying a user of an incoming call) for executing a general function as a telephone, and is not illustrated.

対話装置２の周囲にある通話機器（電話機３ａなど）が着信を受けて、呼出音を出力す
ると、その後、ユーザが通話相手に対して発話することにより、音声の誤認識を起こしやすい環境がもたらされる。上記構成によれば、呼出音をトリガにして上記通話機器が着信に応答することにより、通話機器の稼働状況が変化すると、対話装置２は、その変化を呼出音の集音によって検知する。そして、変化後の稼働情報を生成し発話制御サーバ１に送信して、当該変化を発話制御サーバ１に報告する。さらに、対話装置２は、呼出音が鳴ったあとに集音される人（ユーザ）の声の入力を監視し、この声の入力が一定時間以上途切れたら、上記呼出音を契機に開始された通話が終了したと判定することができる。そして、通話終了を示す稼働情報を生成し発話制御サーバ１に送信して、発話制御サーバ１に報告する。発話制御サーバ１は、実施形態２と同様に稼働情報に基づいて、音声誤認識条件の成否を判断し、判断結果に応じて、出力抑制指示または抑制解除指示を、報告元の対話装置２に対して返信する。 When a call device (such as the telephone 3a) around the interactive device 2 receives an incoming call and outputs a ringing tone, the user then speaks to the other party, thereby providing an environment in which voice misrecognition is likely to occur. It is. According to the above configuration, when the call device responds to an incoming call with a ringing tone as a trigger, and the operating status of the call device changes, the interactive device 2 detects the change by collecting the ringing sound. Then, the changed operation information is generated and transmitted to the utterance control server 1, and the change is reported to the utterance control server 1. Furthermore, the dialogue apparatus 2 monitors the input of the voice of the person (user) collected after the ringing tone is heard, and when the input of this voice is interrupted for a certain time or longer, the dialogue apparatus 2 is started with the ringing tone as a trigger. It can be determined that the call has ended. Then, operation information indicating the end of the call is generated, transmitted to the utterance control server 1, and reported to the utterance control server 1. As in the second embodiment, the utterance control server 1 determines the success or failure of the voice recognition condition based on the operation information, and outputs an output suppression instruction or a suppression cancellation instruction to the reporting source interactive apparatus 2 according to the determination result. Reply to.

このように、発話制御サーバ１は、対話装置２以外の対象に対してユーザの発話がなされる環境について、通話機器の状況に基づいて、把握することができる。つまり、通話機器が通話中の状態にある期間、音声の誤認識を適切に回避することが可能である。 As described above, the utterance control server 1 can grasp the environment in which the user utters a target other than the interactive device 2 based on the situation of the telephone device. That is, it is possible to appropriately avoid misrecognition of voice during a period in which the call device is in a call state.

〔処理フロー〕
図１３は、対話システム４０３において、対話装置２が実行する稼働情報送信処理の流れと、発話制御サーバ１が実行する発話制御処理の流れとを示すフローチャートである。 [Process flow]
FIG. 13 is a flowchart showing the flow of the operation information transmission process executed by the dialogue apparatus 2 and the flow of the utterance control process executed by the utterance control server 1 in the dialogue system 403.

（稼働情報送信フロー）
対話装置２の音判定部６２が、通話機器の呼出音が鳴ったことを検知すると（Ｓ３０１においてＹＥＳ）、続いて、音判定部６２は、呼出音テーブル７３（図１２の（ｂ））において、検知した呼出音に対応付けられた機器ＩＤに基づいてどの通話機器が着信を受けたのかを特定する（Ｓ３０２）。ここでは、電話機３ａが特定されたものとする。そして、音判定部６２は、機器状態管理テーブル７０（図１２の（ｃ））に格納されている電話機３ａの状態を更新する（Ｓ３０３）。 (Operation information transmission flow)
When the sound determination unit 62 of the interactive device 2 detects that the calling device has made a ringing sound (YES in S301), the sound determination unit 62 continues in the ringing sound table 73 ((b) of FIG. 12). Based on the device ID associated with the detected ringing tone, it is specified which calling device has received the incoming call (S302). Here, it is assumed that the telephone 3a is specified. And the sound determination part 62 updates the state of the telephone 3a stored in the apparatus state management table 70 ((c) of FIG. 12) (S303).

電話機３ａの状態が、着信有を示す場合には（Ｓ３０４においてＹＥＳ）、稼働情報生成部５２は、電話機３ａの稼働情報「通話中」を生成する、または、稼働情報を「通話中」に更新する（Ｓ３０５）。そして、稼働情報送信部５３は、装置ＩＤ７２（図１２の（ａ））および稼働情報「通話中」（図１２の（ｄ））を発話制御サーバ１に送信する（Ｓ３０６）。一方、電話機３ａの状態が、通話終了を示す場合には（Ｓ３０４においてＮＯ）、稼働情報生成部５２は、電話機３ａの稼働情報を、「待機中」を示すように生成または更新する（Ｓ３０７）。そして、稼働情報送信部５３は、装置ＩＤ７２および稼働情報「待機中」を発話制御サーバ１に送信する（Ｓ３０８）。 When the state of the telephone 3a indicates that there is an incoming call (YES in S304), the operation information generating unit 52 generates the operation information “busy” of the telephone 3a or updates the operation information to “busy”. (S305). Then, the operation information transmission unit 53 transmits the device ID 72 ((a) in FIG. 12) and the operation information “busy” ((d) in FIG. 12) to the utterance control server 1 (S306). On the other hand, when the state of the telephone 3a indicates the end of the call (NO in S304), the operation information generating unit 52 generates or updates the operation information of the telephone 3a so as to indicate "standby" (S307). . Then, the operation information transmission unit 53 transmits the device ID 72 and the operation information “standby” to the utterance control server 1 (S308).

なお、図示していないが、Ｓ３０６のステップの後からＳ３０９のステップの前までの間に、音判定部６２は、人の声の入力を監視する。そして、音判定部６２は、人の声の入力が一定時間以上途切れたことを検知すると（Ｓ３０１においてＮＯ、Ｓ３０９においてＹＥＳ）、直前に着信有の状態に更新した電話機３ａについて、機器状態管理テーブル７０の状態を「通話終了」に更新する（Ｓ３０３）。そして、以降の処理（Ｓ３０４〜３０８）が繰り返される。 Although not shown, the sound determination unit 62 monitors the input of a human voice after the step of S306 and before the step of S309. When sound determination unit 62 detects that the input of human voice has been interrupted for a certain period of time or longer (NO in S301, YES in S309), device status management table for telephone 3a that has been updated to a state with an incoming call immediately before. The state of 70 is updated to “end of call” (S303). Then, the subsequent processes (S304 to 308) are repeated.

（発話制御フロー）
発話制御サーバ１の稼働情報受信部５４が、装置ＩＤ７２および稼働情報を受信すると（Ｓ３０９においてＹＥＳ）、稼働情報受信部５４は、装置ＩＤ７２を発話制御部５７に供給し、稼働情報を条件判断部５６に供給する。以降の処理は、図５に示すＳ１１４〜Ｓ１１６と同様である。ただし、条件判断部５６は、Ｓ１１４において、図１２の（ｅ）に示すルールテーブル７４を参照する点が実施形態２と異なる。 (Speech control flow)
When the operation information receiving unit 54 of the utterance control server 1 receives the device ID 72 and the operation information (YES in S309), the operation information receiving unit 54 supplies the device ID 72 to the utterance control unit 57 and uses the operation information as a condition determination unit. 56. The subsequent processing is the same as S114 to S116 shown in FIG. However, the condition determination unit 56 is different from the second embodiment in that it refers to the rule table 74 shown in FIG.

≪実施形態４≫
本発明の実施形態４について、図６、７、９および１４に基づいて説明すれば、以下のとおりである。 << Embodiment 4 >>
The following description will discuss Embodiment 4 of the present invention with reference to FIGS.

誤認識が起こり得る環境を対話装置２にもたらす原因は、ＴＶ３などの音声出力機器に限られない。近年、音声案内を出力する機能を有した家庭用電気機器（以下、家電機器）が広く普及している。例えば、洗濯機、炊飯器、電子レンジ、給湯器などの家電機器は、稼働中に、さまざまな工程でユーザに向けて音声案内を出力する（例えば、洗濯機が「洗濯を開始します」、「すすぎが完了しました」、「脱水中です、フタを開けることはできません」などと音声案内を出力する）。このような音声案内が家電機器から出力される環境下でも、対話装置２は誤認識および誤操作を起こす可能性がある。実施形態４では、本発明の対話システムは、家電機器が稼働中に音声案内を出力する環境を把握し、音声の誤認識を適切に回避することが可能である。 The cause of causing an environment in which erroneous recognition may occur to the dialogue apparatus 2 is not limited to the audio output device such as the TV 3. In recent years, household electrical appliances (hereinafter referred to as home appliances) having a function of outputting voice guidance have been widely used. For example, home appliances such as washing machines, rice cookers, microwave ovens, and water heaters output voice guidance to users in various processes during operation (for example, the washing machine "starts washing" “Rinse completed”, “Dehydrating, ca n’t open the lid”, etc.) Even in an environment in which such voice guidance is output from the home appliance, the interactive device 2 may cause erroneous recognition and erroneous operation. In Embodiment 4, the dialogue system of the present invention can grasp the environment in which voice guidance is output while the home appliance is in operation, and can appropriately avoid voice misrecognition.

〔対話システム概要〕
本発明の実施形態４に係る対話システム４０４としては、一例として、図６に示す実施形態２の対話システム４０２と略同様の構成が採用される。あるいは、図２に示す実施形態１の対話システム４０１が採用されてもよい。 [Outline of Dialogue System]
As an example of the dialogue system 404 according to the fourth embodiment of the present invention, a configuration substantially the same as that of the dialogue system 402 according to the second embodiment shown in FIG. 6 is employed. Alternatively, the interactive system 401 according to the first embodiment illustrated in FIG. 2 may be employed.

実施形態４の対話システム４０４において、図６に示す実施形態２の対話システム４０２と異なる点は、以下の点である。 The dialog system 404 of the fourth embodiment is different from the dialog system 402 of the second embodiment shown in FIG. 6 in the following points.

実施形態４では、対話システム４０４は、対象外音声発生源機器として、音声出力機器（ＴＶ３）の代わりに、１以上の家電機器（例えば、洗濯機、炊飯器、電子レンジなど）を含む。 In the fourth embodiment, the dialogue system 404 includes one or more home appliances (for example, a washing machine, a rice cooker, a microwave oven, etc.) instead of the audio output device (TV3) as the non-target audio generation source device.

実施形態４では、対話装置２は、これらの家電機器を遠隔で操作し、各家電機器の稼働情報を発話制御サーバ１に供給する構成である。 In the fourth embodiment, the dialogue apparatus 2 is configured to operate these home appliances remotely and supply operation information of each home appliance to the utterance control server 1.

結果として、家電機器が稼働中の間、音声の誤認識またはそれに伴う誤作動を回避することができるという効果を得ることができる。 As a result, it is possible to obtain an effect that it is possible to avoid voice misrecognition or accompanying malfunction while the home appliance is in operation.

〔対話システムの各装置の機能構成〕
図１４の（ａ）〜（ｄ）は、実施形態４の対話システム４０４において、各装置が取り扱う情報およびテーブルの具体例を示す図である。 [Functional configuration of each device in the interactive system]
(A)-(d) of FIG. 14 is a figure which shows the specific example of the information and table which each apparatus handles in the interactive system 404 of Embodiment 4. FIG.

（発話制御サーバ１の要部構成）
実施形態４の発話制御サーバ１において、図７に示す発話制御サーバ１と異なる点は、以下の点である。 (Main components of the utterance control server 1)
The utterance control server 1 of the fourth embodiment is different from the utterance control server 1 shown in FIG. 7 in the following points.

実施形態４では、稼働情報受信部５４は、家電機器（例えば、洗濯機）の稼働情報（例えば、図１４の（ｃ））を受信する。実施形態４において、稼働情報は、後に詳述するとおり、「稼働中」、「待機中」または「電源オフ」を示す。 In the fourth embodiment, the operation information receiving unit 54 receives operation information (for example, (c) of FIG. 14) of home appliances (for example, a washing machine). In the fourth embodiment, the operation information indicates “in operation”, “standby”, or “power off”, as will be described in detail later.

記憶部１１には、図１４の（ｄ）に示すルールテーブル７４が格納されている。条件判断部５６は、上記ルールテーブル７４を参照して、受信した稼働情報に基づいて、対話装
置２が置かれている環境において音声誤認識条件が成立するか否かを判断する。例えば、図１４の（ｄ）に示すルールテーブル７４にしたがえば、条件判断部５６は、洗濯機の稼働情報が「稼働中」を示す場合に、音声誤認識条件が成立すると判断し、稼働情報が「待機中」または「電源オフ」を示す場合に、音声誤認識条件が成立しないと判断する。 The storage unit 11 stores a rule table 74 shown in FIG. The condition determining unit 56 refers to the rule table 74 and determines whether or not a voice error recognition condition is satisfied in the environment where the interactive apparatus 2 is placed, based on the received operation information. For example, according to the rule table 74 shown in FIG. 14D, the condition determination unit 56 determines that the voice recognition error condition is satisfied when the operation information of the washing machine indicates “in operation”, and the operation is performed. When the information indicates “standby” or “power off”, it is determined that the voice recognition condition is not satisfied.

（対話装置２の要部構成）
実施形態４の対話装置２において、図７に示す実施形態２の対話装置２と異なる点は、以下の点である。 (Main part configuration of the dialogue device 2)
The interaction device 2 of the fourth embodiment is different from the interaction device 2 of the second embodiment shown in FIG. 7 in the following points.

機器操作部６１は、家電機器を遠隔で操作するものである。具体的には、機器操作部６１は、操作対象の家電機器の電源オン／オフを制御したり、家電機器が電源オンのときに、所定の機能を実行するように該家電機器に指示したりする。実施形態４では、機器操作部６１は、家電機器を操作すると、操作対象の家電機器の機器ＩＤと操作情報とを稼働情報生成部５２に供給する。実施形態４では、操作情報は、機器操作部６１が家電機器の電源をオフからオンに切り替えた場合に「電源オン」を示し、電源をオンからオフに切り替えた場合に「電源オフ」を示し、家電機器に対して所定の機能を実行するように指示した場合に「実行指示」を示す。例えば、機器操作部６１が、洗濯機に洗濯を開始するように指示した場合に、機器操作部６１から稼働情報生成部５２に供給される機器ＩＤおよび操作情報の具体例が、図１４の（ａ）に示されている。 The device operation unit 61 is for remotely operating home appliances. Specifically, the device operation unit 61 controls the power on / off of the home appliance to be operated, or instructs the home appliance to execute a predetermined function when the home appliance is powered on. To do. In the fourth embodiment, when operating the home appliance, the device operation unit 61 supplies the operation information generating unit 52 with the device ID and operation information of the operation target home appliance. In the fourth embodiment, the operation information indicates “power on” when the device operation unit 61 switches the power of the home appliance from off to on, and indicates “power off” when the power is switched from on to off. When the home appliance is instructed to execute a predetermined function, “execution instruction” is indicated. For example, when the device operation unit 61 instructs the washing machine to start washing, a specific example of the device ID and operation information supplied from the device operation unit 61 to the operation information generation unit 52 is shown in FIG. a).

稼働情報生成部５２は、機器操作部６１から供給された機器ＩＤおよび操作情報に基づいて、機器状態管理テーブル７０（例えば、図１４の（ｂ））において、家電機器の稼働情報を生成または更新する。上述の例では、機器ＩＤ「ＷＡ０００１」の洗濯機の操作情報が「実行指示」を示している。つまり、洗濯機は、対話装置２の指示にしたがって洗濯を開始したので、稼働情報生成部５２は、洗濯機の稼働情報を「稼働中」に更新する。操作情報が「電源オン」を示す場合、洗濯機は、まだ稼働していないが、ユーザまたは対話装置２からの指示を受け付けていつでも洗濯を開始できる状態に遷移する。よって、稼働情報生成部５２は、稼働情報を「待機中」に更新する。操作情報が「電源オフ」を示す場合、洗濯機は稼働できる状態にないので、稼働情報生成部５２は、稼働情報を「電源オフ」に更新する。図示していないが、実施形態１と同様に稼働情報を生成または更新した日時を、最終更新日時として機器状態管理テーブル７０に格納してもよい。 Based on the device ID and operation information supplied from the device operation unit 61, the operation information generation unit 52 generates or updates the operation information of the home appliance in the device state management table 70 (for example, FIG. 14B). To do. In the above example, the operation information of the washing machine with the device ID “WA0001” indicates “execution instruction”. That is, since the washing machine has started washing in accordance with the instruction from the interactive device 2, the operation information generation unit 52 updates the operation information of the washing machine to “in operation”. When the operation information indicates “power on”, the washing machine is not yet in operation, but transitions to a state in which washing can be started at any time when an instruction from the user or the interactive device 2 is received. Therefore, the operation information generation unit 52 updates the operation information to “standby”. When the operation information indicates “power off”, since the washing machine is not in an operable state, the operation information generation unit 52 updates the operation information to “power off”. Although not shown, the date and time when the operation information is generated or updated as in the first embodiment may be stored in the device status management table 70 as the last update date and time.

稼働情報送信部５３は、稼働情報生成部５２によって生成または更新された稼働情報を発話制御サーバ１に送信する。図１４の（ｃ）には、稼働情報送信部５３が送信する装置ＩＤおよび稼働情報の具体例が示されている。 The operation information transmission unit 53 transmits the operation information generated or updated by the operation information generation unit 52 to the utterance control server 1. FIG. 14C illustrates a specific example of the device ID and the operation information transmitted by the operation information transmission unit 53.

（家電機器の要部構成）
図７において、ＴＶ３に代わる家電機器（洗濯機など）は、図７に示すブロックに加えて、それぞれの家電機器としての一般的な機能を実行するための各部（少なくとも、所定の工程にて音声案内を出力する音声出力部）を備えているが、このブロックについて図示を省略している。 (Main components of home appliances)
7, in addition to the blocks shown in FIG. 7, each home appliance (such as a washing machine) replacing the TV 3 performs a general function as each home appliance (at least in a predetermined process, voice). A voice output unit for outputting guidance) is provided, but this block is not shown.

上記構成によれば、対話装置２が遠隔で操作することによって家電機器（洗濯機など）の稼働状況が変化すると、対話装置２は、その変化を、稼働情報を送信して、発話制御サーバ１に報告する。発話制御サーバ１は、稼働情報に基づいて、音声誤認識条件の成否を判断し、判断結果に応じて、出力抑制指示または抑制解除指示を、報告元の対話装置２に対して返信する。具体的には、発話制御サーバ１は、稼働情報が「稼働中」を示しており、家電機器が何らかの音声案内を出力する状態である場合には、対話装置２に対して返答音声を出力しないように指示する。一方、稼働情報が「待機中」または「電源オフ」を示しており、家電機器が音声案内を出力する状態でない場合には、対話装置２に対して返答
音声を出力するように指示する。 According to the above configuration, when the operation status of the household electrical appliance (such as a washing machine) is changed by remote operation of the dialog device 2, the dialog device 2 transmits the change to the utterance control server 1. To report to. The utterance control server 1 determines the success / failure of the voice misrecognition condition based on the operation information, and returns an output suppression instruction or a suppression release instruction to the reporter interactive apparatus 2 according to the determination result. Specifically, the utterance control server 1 does not output a response voice to the dialogue device 2 when the operation information indicates “in operation” and the home appliance is in a state of outputting some voice guidance. To instruct. On the other hand, when the operation information indicates “standby” or “power off” and the home appliance is not in a state of outputting voice guidance, the interactive device 2 is instructed to output a response voice.

このように、音声案内を出力する家電機器が対話装置２の周囲にある場合であっても、発話制御サーバ１において、家電機器の状況および対話装置２の環境を把握し、音声の誤認識を適切に回避することが可能である。 As described above, even when the home appliance that outputs the voice guidance is in the vicinity of the dialogue device 2, the utterance control server 1 grasps the situation of the home appliance and the environment of the dialogue device 2, and recognizes erroneous voice recognition. It is possible to avoid it appropriately.

なお、上述の例では、対話システム４０４において、対話装置２が家電機器を遠隔で操作する構成を採用したがこれに限られない。実施形態４では、対話装置２と家電機器とがＬＡＮ（Local Area Network）などの構内ネットワークで通信に可能に接続されており、各家電機器が、電源オン、および、機能の実行開始を対話装置２に報告する構成も、実施形態４の対話システム４０４として採用することができる。この場合、電源オフは、対話装置２において、上記報告が一定時間以上受信されないことによって、判断される。 In the above-described example, the dialog system 404 employs a configuration in which the dialog device 2 remotely operates the home appliance, but is not limited thereto. In the fourth embodiment, the dialog device 2 and the home appliance are connected to each other via a local network such as a LAN (Local Area Network) so that the home appliance can turn on the power and start executing the function. 2 can also be adopted as the interactive system 404 of the fourth embodiment. In this case, power-off is determined by the dialog device 2 not receiving the report for a certain period of time.

〔処理フロー〕
実施形態４の対話システム４０４における各装置の処理の流れを、図９および図５を参照して説明すると以下のとおりである。 [Process flow]
The processing flow of each device in the interactive system 404 of the fourth embodiment will be described below with reference to FIGS. 9 and 5.

（稼働情報送信フロー）
図９に示すとおり、対話装置２の機器操作部６１が、家電機器（ここでは、洗濯機とする）を遠隔で操作する所定のイベントの発生を検知すると（Ｓ２０１においてＹＥＳ）、機器操作部６１は、赤外線送信部２４を制御して、洗濯機を遠隔で操作する。そして、操作対象の機器ＩＤと、その操作内容を示す操作情報を稼働情報生成部５２に供給する（Ｓ２０２）。なお、実施形態４では、機器状態管理テーブル７０において、家電機器の状態は格納されていないので、ここで機器状態管理テーブル７０は更新されない。 (Operation information transmission flow)
As illustrated in FIG. 9, when the device operation unit 61 of the interactive apparatus 2 detects the occurrence of a predetermined event for remotely operating the home appliance (here, a washing machine) (YES in S201), the device operation unit 61 Controls the infrared transmitter 24 to remotely operate the washing machine. Then, the operation information generation unit 52 is supplied with the operation target device ID and operation information indicating the operation content (S202). In the fourth embodiment, since the state of the home appliance is not stored in the device state management table 70, the device state management table 70 is not updated here.

実施形態４では、Ｓ２０３のステップは省略される。そして、Ｓ２０４のステップに代えて、操作情報が「実行指示」であるか否かを判断するステップ（図示せず。Ｓ２０４’とする。）が実行される。洗濯機の操作情報が、実行指示を示す場合には（Ｓ２０４’においてＹＥＳ）、稼働情報生成部５２は、洗濯機の稼働情報「稼働中」を生成する、または、稼働情報を「稼働中」に更新する（Ｓ２０５）。そして、稼働情報送信部５３は、装置ＩＤ７２および上記稼働情報（実施形態４では「稼働中」）を発話制御サーバ１に送信する（Ｓ２０６）。一方、洗濯機の操作情報が、電源オンまたは電源オフを示す場合には（Ｓ２０４’においてＮＯ）、稼働情報生成部５２は、洗濯機の稼働情報を、「待機中」または「電源オフ」を示すように生成または更新する（Ｓ２０７）。そして、稼働情報送信部５３は、装置ＩＤ７２および上記稼働情報（実施形態４では、「待機中」または「電源オフ」）を発話制御サーバ１に送信する（Ｓ２０８）。 In the fourth embodiment, step S203 is omitted. Then, instead of the step of S204, a step of determining whether or not the operation information is an “execution instruction” (not shown; S204 ′) is executed. When the operation information of the washing machine indicates an execution instruction (YES in S204 ′), the operation information generation unit 52 generates the operation information “in operation” of the washing machine or sets the operation information to “in operation”. (S205). Then, the operation information transmission unit 53 transmits the device ID 72 and the operation information (“in operation” in the fourth embodiment) to the utterance control server 1 (S206). On the other hand, when the operation information of the washing machine indicates power on or power off (NO in S204 ′), the operation information generation unit 52 sets the operation information of the washing machine as “standby” or “power off”. Generate or update as shown (S207). Then, the operation information transmission unit 53 transmits the device ID 72 and the operation information (in the fourth embodiment, “standby” or “power off”) to the utterance control server 1 (S208).

（発話制御フロー）
発話制御サーバ１の稼働情報受信部５４が、装置ＩＤ７２および稼働情報を受信すると（Ｓ２０９においてＹＥＳ）、稼働情報受信部５４は、装置ＩＤ７２を発話制御部５７に供給し、稼働情報を条件判断部５６に供給する。以降の処理は、図５に示すＳ１１４〜Ｓ１１６と同様である。ただし、条件判断部５６は、Ｓ１１４において、図１４の（ｄ）に示すルールテーブル７４を参照する点が実施形態２と異なる。 (Speech control flow)
When the operation information receiving unit 54 of the utterance control server 1 receives the device ID 72 and the operation information (YES in S209), the operation information receiving unit 54 supplies the device ID 72 to the utterance control unit 57 and uses the operation information as a condition determination unit. 56. The subsequent processing is the same as S114 to S116 shown in FIG. However, the condition determination unit 56 is different from the second embodiment in that it refers to the rule table 74 shown in FIG.

≪実施形態５≫
〔ソフトウェアによる実現例〕
発話制御サーバ１、対話装置２、ＴＶ３、および、情報収集サーバ４の制御ブロック（
特に、制御部１０、２０、３０および４０）は、集積回路（ＩＣチップ）等に形成された論理回路（ハードウェア）によって実現してもよいし、ＣＰＵ（Central Processing Unit）を用いてソフトウェアによって実現してもよい。 << Embodiment 5 >>
[Example of software implementation]
Control blocks of the utterance control server 1, the interactive device 2, the TV 3, and the information collection server 4 (
In particular, the control units 10, 20, 30 and 40) may be realized by a logic circuit (hardware) formed in an integrated circuit (IC chip) or the like, or by software using a CPU (Central Processing Unit). It may be realized.

後者の場合、上記の発話制御サーバ１、対話装置２、ＴＶ３、および、情報収集サーバ４を図１７に示すようなコンピュータ（電子計算機）を用いて構成することができる。図１７は、上記の発話制御サーバ１、対話装置２、ＴＶ３、または、情報収集サーバ４として利用可能なコンピュータ１００の構成を例示したブロック図である。 In the latter case, the utterance control server 1, the interactive device 2, the TV 3, and the information collection server 4 can be configured using a computer (electronic computer) as shown in FIG. FIG. 17 is a block diagram illustrating the configuration of a computer 100 that can be used as the utterance control server 1, the interactive device 2, the TV 3, or the information collection server 4.

コンピュータ１００は、図１７に示すように、バス１１０を介して互いに接続された演算装置１２０と、主記憶装置１３０と、補助記憶装置１４０と、入出力インタフェース１５０とを備えている。演算装置１２０、主記憶装置１３０、および補助記憶装置１４０は、それぞれ、例えばＣＰＵ、ＲＡＭ（random access memory）、ハードディスクドライブであってもよい。なお、主記憶装置１３０は、コンピュータ読み取り可能な「一時的でない有形の媒体」であればよく、例えば、テープ、ディスク、カード、半導体メモリ、プログラマブル論理回路などを用いることができる。 As shown in FIG. 17, the computer 100 includes an arithmetic device 120, a main storage device 130, an auxiliary storage device 140, and an input / output interface 150 connected to each other via a bus 110. The arithmetic device 120, the main storage device 130, and the auxiliary storage device 140 may be, for example, a CPU, a random access memory (RAM), and a hard disk drive, respectively. The main storage device 130 may be a computer-readable “non-temporary tangible medium”. For example, a tape, a disk, a card, a semiconductor memory, a programmable logic circuit, or the like can be used.

入出力インタフェース１５０には、入力装置２００および出力装置３００が接続される。上記各サーバの入力装置２００および出力装置３００は、他のサーバ、ＴＶ３または対話装置２から送信されるデータの受信、および、他のサーバ、ＴＶ３または対話装置２へのデータの送信を行う。対話装置２の入力装置２００および出力装置３００は、発話制御サーバ１からのデータの受信およびユーザの音声の取得、ならびに、発話制御サーバ１へのデータの送信およびユーザへの発話等を行う。ＴＶ３の入力装置２００および出力装置３００は、ユーザから操作指示の取得および各サーバまたは対話装置２からのデータの受信、ならびに、各サーバまたは対話装置２からのデータの送信を行う。 The input device 200 and the output device 300 are connected to the input / output interface 150. The input device 200 and the output device 300 of each server receive data transmitted from another server, the TV 3 or the interactive device 2, and transmit data to the other server, the TV 3 or the interactive device 2. The input device 200 and the output device 300 of the dialogue apparatus 2 receive data from the utterance control server 1 and acquire a user's voice, transmit data to the utterance control server 1, and utter the user. The input device 200 and the output device 300 of the TV 3 acquire an operation instruction from the user, receive data from each server or interactive device 2, and transmit data from each server or interactive device 2.

補助記憶装置１４０には、コンピュータ１００を上記の発話制御サーバ１、対話装置２、ＴＶ３、または、情報収集サーバ４として動作させるための各プログラムが格納されている。そして、演算装置１２０は、補助記憶装置１４０に格納された上記各プログラムを主記憶装置１３０上に展開し、主記憶装置１３０上に展開された上記各プログラムに含まれる命令を実行することによって、コンピュータ１００を、上記の発話制御サーバ１、対話装置２、ＴＶ３、または、情報収集サーバ４が備える各部として機能させる。また、補助記憶装置１４０は、上記各プログラムおよび各種データが演算装置１２０（またはＣＰＵ）で読み取り可能に記録されたＲＯＭ（Read Only Memory）であってもよい。 The auxiliary storage device 140 stores programs for operating the computer 100 as the utterance control server 1, the interactive device 2, the TV 3, or the information collection server 4. Then, the arithmetic device 120 expands the respective programs stored in the auxiliary storage device 140 on the main storage device 130, and executes instructions included in the respective programs expanded on the main storage device 130. The computer 100 is caused to function as each unit included in the utterance control server 1, the interactive device 2, the TV 3, or the information collection server 4. Further, the auxiliary storage device 140 may be a ROM (Read Only Memory) in which the above programs and various data are recorded so as to be readable by the arithmetic device 120 (or CPU).

なお、ここでは、内部記録媒体である補助記憶装置１４０に記録されている上記各プログラムを用いてコンピュータ１００機能させる構成について説明したが、外部記録媒体に記録されているプログラムを用いてもよい。また、上記プログラムは、該プログラムを伝送可能な任意の伝送媒体（通信ネットワークや放送波等）を介して上記コンピュータに供給されてもよい。なお、本発明は、上記プログラムが電子的な伝送によって具現化された、搬送波に埋め込まれたデータ信号の形態でも実現され得る。 Here, the configuration for causing the computer 100 to function using each of the programs recorded in the auxiliary storage device 140, which is an internal recording medium, has been described, but a program recorded on an external recording medium may be used. The program may be supplied to the computer via an arbitrary transmission medium (such as a communication network or a broadcast wave) that can transmit the program. The present invention can also be realized in the form of a data signal embedded in a carrier wave in which the program is embodied by electronic transmission.

≪変形例１≫
各実施形態の対話装置２の対話モード制御部５８は、返答音声を出力しない対話モードの間、すなわち、音声認識機能を無効にしている間に、発話制御サーバ１から自発音声を出力する自発音声出力指示を受信した場合には、音声出力制御部６０を制御して、指定された自発音声を出力してもよい。この場合、対話モード制御部５８は、返答音声の出力を抑制している期間であっても、上記自発音声の出力後一定の期間だけ、返答音声の出力抑制を解除することが好ましい。 << Modification 1 >>
The dialogue mode control unit 58 of the dialogue device 2 of each embodiment outputs a spontaneous voice from the utterance control server 1 during the dialogue mode in which no reply voice is output, that is, while the voice recognition function is disabled. When the output instruction is received, the voice output control unit 60 may be controlled to output the designated spontaneous voice. In this case, it is preferable that the dialogue mode control unit 58 cancels the output suppression of the response voice only for a certain period after the output of the spontaneous voice, even in a period during which the output of the response voice is suppressed.

ユーザは、対話装置２が自発的に発言した場合、ＴＶ３の視聴中であっても、当該発言を受けて対話装置２に対して何らかの応答を返す可能性が考えられる。その応答に対して対話装置２が反応を示さないのは、対話として不自然である。上記構成によれば、自発音声を出力後の一定期間は、返答音声の出力抑制を解除するので、対話装置２は、上記応答に対してさらに返答音声を出力することができる。結果として、上記の不自然さを解消することが可能である。 When the dialogue apparatus 2 speaks spontaneously, there is a possibility that the user may return some response to the dialogue apparatus 2 in response to the said speech even while watching the TV 3. It is unnatural as a dialogue that the dialogue device 2 does not respond to the response. According to the above configuration, since the output suppression of the response voice is canceled for a certain period after the spontaneous voice is output, the dialogue apparatus 2 can further output the response voice in response to the response. As a result, it is possible to eliminate the unnaturalness described above.

発話制御サーバ１から自発音声出力指示が送信されるユースケースとしては、例えば、以下のようなものが想定されている。ユーザは、所定時刻に所定の内容を対話装置２に発言させるように、事前にタイマ設定を発話制御サーバ１に対して行うことができる。具体的には、ユーザは、所定の時刻（例えば、２０時）になったら、翌日の天気について対話装置２が発言するように発話制御サーバ１に対して事前に設定しているものとする。この場合、発話制御サーバ１は、上記設定にしたがって、翌日の天気の情報を外部の情報提供サーバから取得して、「明日晴れだよ」という自発音声を生成し、これを２０時に出力するように対話装置２に対して指示する。 As a use case in which a spontaneous voice output instruction is transmitted from the utterance control server 1, for example, the following is assumed. The user can perform timer setting on the speech control server 1 in advance so that the dialog device 2 speaks predetermined content at a predetermined time. Specifically, it is assumed that the user has set in advance in the utterance control server 1 so that the dialogue apparatus 2 speaks about the weather of the next day when a predetermined time (for example, 20:00) comes. In this case, the utterance control server 1 obtains the weather information of the next day from the external information providing server according to the above settings, generates a spontaneous speech “It is sunny tomorrow”, and outputs it at 20:00. To the interactive device 2.

対話装置２の音声出力制御部６０は「明日晴れだよ」という自発音声を出力する。ここで、対話モード制御部５８は、一定期間（例えば、５秒）返答音声の出力抑制を解除する。ユーザは、上記自発音声に反応して、例えば、「わかった、ありがとう」と発話する。出力抑制が解除されているので、音声認識部５９は、上記ユーザの発話を音声認識することができる。そして、例えば、音声出力制御部６０は、上記ユーザの発話に対する返答音声として「どういたしまして」を出力することができる。対話モード制御部５８は、自発音声が出力されてから５秒が経過すると、再び、対話装置２の対話モードを、返答音声を出力しない対話モードに戻す。すなわち、音声認識部５９の音声認識機能を無効にする。 The voice output control unit 60 of the dialogue apparatus 2 outputs a spontaneous voice “It is sunny tomorrow”. Here, the dialogue mode control unit 58 releases the output suppression of the response voice for a certain period (for example, 5 seconds). In response to the spontaneous voice, the user speaks, for example, “I understand, thank you”. Since the output suppression is released, the voice recognition unit 59 can recognize the user's speech as a voice. For example, the voice output control unit 60 can output “you are welcome” as a response voice to the user's utterance. The dialog mode control unit 58 returns the dialog mode of the dialog apparatus 2 to the dialog mode in which no response voice is output again after 5 seconds have elapsed since the spontaneous voice was output. That is, the voice recognition function of the voice recognition unit 59 is disabled.

≪変形例２≫
各実施形態において、音声誤認識条件の成否の判断は、発話制御サーバ１に設けられた条件判断部５６によって実行される構成であった。しかし、本発明の対話システムは、上記の構成に限られない。本発明の対話システムは、稼働情報受信部５４および条件判断部５６を対話装置２に設けることにより、音声誤認識条件の成否の判断を対話装置２が行う構成を採用することができる。この場合、成否の判断のために必要となる機器の稼働情報は、実施形態１の変形例２においては、情報収集サーバ４から発話制御サーバ１を経由して対話装置２に供給される。あるいは、対話装置２が情報収集サーバ４に対して直接要求することにより取得される。また、実施形態２および４の変形例２においては、対話装置２が各機器と通信して各機器より稼働情報を直接取得するか、あるいは、上記通信によって各機器の状態を把握し、それに基づいて対話装置２自身が稼働情報を生成すればよい。また、実施形態３の変形例２においては、対話装置２が各機器の状態を各種センサにより監視して各機器の状態を把握し、それに基づいて稼働情報を生成すればよい。そして、条件判断部５６の音声誤認識条件の成否の判断に応じて、対話モード制御部５８が、音声認識部５９および音声出力制御部６０の少なくともいずれか一方を制御して対話モードの切り替えを行う。 << Modification 2 >>
In each embodiment, the determination as to whether or not the voice recognition condition is successful is performed by the condition determination unit 56 provided in the utterance control server 1. However, the interactive system of the present invention is not limited to the above configuration. The dialog system of the present invention can employ a configuration in which the dialog device 2 determines whether or not the voice recognition error condition is successful by providing the operation information reception unit 54 and the condition determination unit 56 in the dialog device 2. In this case, device operation information necessary for success / failure determination is supplied from the information collection server 4 to the dialogue device 2 via the utterance control server 1 in the second modification of the first embodiment. Alternatively, the dialogue apparatus 2 is acquired by making a direct request to the information collection server 4. Further, in the second modification of the second and fourth embodiments, the interactive device 2 communicates with each device and directly obtains operation information from each device, or grasps the state of each device through the communication and is based on it. The dialogue device 2 itself may generate the operation information. Further, in the second modification of the third embodiment, the interactive device 2 may monitor the state of each device with various sensors, grasp the state of each device, and generate operation information based thereon. Then, in response to the determination of whether or not the erroneous speech recognition condition is satisfied by the condition determination unit 56, the dialogue mode control unit 58 controls at least one of the speech recognition unit 59 and the voice output control unit 60 to switch the dialogue mode. Do.

さらに、変形例２において、対話装置２は、稼働情報を一定時間間隔で定期的に取得し、その都度、音声誤認識条件の成否の判断を行うことが好ましい。上記構成によれば、稼働情報供給側の装置（発話制御サーバ１、情報収集サーバ４、もしくは、音声出力機器、通話機器および家電機器などの各機器）からの稼働情報の供給が通信エラーのために滞っている場合でも、対話装置２から自発的に要求することによって、各機器の最新の稼働情報から、対話装置２がおかれている環境を常に正確に判断し、適切に、返答音声の出力を制御できる。 Furthermore, in the modified example 2, it is preferable that the dialogue apparatus 2 periodically obtains the operation information at regular time intervals, and determines whether or not the voice recognition condition is successful each time. According to the above configuration, the operation information supplied from the device on the operation information supply side (the speech control server 1, the information collection server 4, or each device such as a voice output device, a telephone device, and a home appliance) is a communication error. Even if the communication device 2 is stuck, the interactive device 2 makes a voluntary request to always accurately determine the environment in which the interactive device 2 is placed from the latest operating information of each device, Output can be controlled.

図１５および図１６は、変形例２における、対話システムの各装置の処理の流れを示すフローチャートである。 FIG. 15 and FIG. 16 are flowcharts showing the processing flow of each device of the interactive system in the second modification.

図１５を参照して、対話装置２の稼働情報受信部５４が、対話装置２と対になる機器について、前回稼働情報を取得した時点から一定時間経過したと判断した場合（Ｓ４０１においてＹＥＳ）、稼働情報受信部５４は、稼働情報を発話制御サーバ１に要求する（Ｓ４０２）。具体的には、稼働情報受信部５４は、対話装置２の装置ＩＤを含む稼働情報リクエストを生成して発話制御サーバ１に送信する。 Referring to FIG. 15, when the operation information receiving unit 54 of the dialog device 2 determines that a certain time has elapsed from the time when the previous operation information is acquired for the device paired with the dialog device 2 (YES in S401). The operation information receiving unit 54 requests the utterance control server 1 for operation information (S402). Specifically, the operation information receiving unit 54 generates an operation information request including the device ID of the interactive device 2 and transmits it to the utterance control server 1.

変形例２では、発話制御サーバ１は、稼働情報を情報収集サーバ４から代理で取得する稼働情報代理取得処理を実行する。発話制御サーバ１が上記稼働情報リクエストを受信すると（Ｓ４０３においてＹＥＳ）、制御対象特定部５５は、機器装置対応テーブル７１を参照して、受信された装置ＩＤに対応するＴＶ３を特定する（Ｓ４０４）。そして、発話制御サーバ１の稼働情報受信部５４は、特定したＴＶ３の機器ＩＤを含む稼働情報リクエストを生成して情報収集サーバ４に送信する（Ｓ４０５）。 In the second modification, the utterance control server 1 executes an operation information proxy acquisition process for acquiring operation information from the information collection server 4 as a proxy. When the utterance control server 1 receives the operation information request (YES in S403), the control target specifying unit 55 refers to the device / device correspondence table 71 and specifies the TV 3 corresponding to the received device ID (S404). . Then, the operation information receiving unit 54 of the utterance control server 1 generates an operation information request including the identified device ID of the TV 3 and transmits it to the information collection server 4 (S405).

情報収集サーバ４が上記稼働情報リクエストを受信すると（Ｓ４０６においてＹＥＳ）、稼働情報生成部５２は、受信された機器ＩＤに基づいて、どの機器の稼働情報を出力すべきかを特定する（Ｓ４０７）。例えば、稼働情報生成部５２は、ＴＶ３の稼働情報が要求されていると判断する。稼働情報生成部５２は、機器状態管理テーブル７０（図３の（ｂ））から、ＴＶ３の稼働情報を取得する（Ｓ４０８）。ここで、稼働情報生成部５２が、取得しようとする稼働情報に対応付けられた最終更新日時が現在時刻と比較して所定以上古い情報であると判断することが想定される。この場合には、稼働情報生成部５２は、稼働情報に代えて、当該稼働情報が古くて無効である旨を示すエラーメッセージを取得してもよい。あるいは、稼働情報が古い場合、操作情報受信部５１がＴＶ３に対して最新の操作情報を要求して、該操作情報に基づいてＴＶ３の状態を最新の状態に更新し、稼働情報生成部５２が最新の状態に基づいて稼働状態を更新してもよい。 When the information collection server 4 receives the operation information request (YES in S406), the operation information generation unit 52 specifies which device operation information should be output based on the received device ID (S407). For example, the operation information generation unit 52 determines that operation information of the TV 3 is requested. The operation information generation unit 52 acquires the operation information of the TV 3 from the device state management table 70 ((b) of FIG. 3) (S408). Here, it is assumed that the operation information generation unit 52 determines that the last update date and time associated with the operation information to be acquired is information that is older than a predetermined time compared to the current time. In this case, the operation information generation unit 52 may acquire an error message indicating that the operation information is old and invalid, instead of the operation information. Alternatively, when the operation information is old, the operation information receiving unit 51 requests the latest operation information from the TV 3, updates the state of the TV 3 to the latest state based on the operation information, and the operation information generation unit 52 The operating state may be updated based on the latest state.

稼働情報送信部５３は、上記稼働情報リクエストに対する応答として、Ｓ４０８にて取得された稼働情報（またはエラーメッセージ）とＳ４０６にて受信された機器ＩＤとを含む稼働情報レスポンスを、発話制御サーバ１に返す（Ｓ４０９）。 The operation information transmission unit 53 sends an operation information response including the operation information (or error message) acquired in S408 and the device ID received in S406 to the utterance control server 1 as a response to the operation information request. Return (S409).

発話制御サーバ１の稼働情報受信部５４は、上記稼働情報レスポンスを受信する（Ｓ４１０）。制御対象特定部５５は、受信された機器ＩＤに対応する装置ＩＤを機器装置対応テーブル７１から取得して、上記稼働情報レスポンスを返すべき対話装置２を特定する（Ｓ４１１）。図示しない発話制御サーバ１の稼働情報送信部は、Ｓ４０３にて受信された稼働情報リクエストに対する応答として、特定された対話装置２に対し、稼働情報またはエラーメッセージを含む稼働情報レスポンスを返信する（Ｓ４１２）。 The operation information receiving unit 54 of the utterance control server 1 receives the operation information response (S410). The control target specifying unit 55 acquires the device ID corresponding to the received device ID from the device device correspondence table 71, and specifies the interactive device 2 to which the operation information response is to be returned (S411). The operation information transmission unit of the utterance control server 1 (not shown) returns an operation information response including operation information or an error message to the specified dialog device 2 as a response to the operation information request received in S403 (S412). ).

図１６を参照して、対話装置２の稼働情報受信部５４が上記稼働情報レスポンスを受信する（Ｓ４１３）。 Referring to FIG. 16, the operation information receiving unit 54 of the dialogue apparatus 2 receives the operation information response (S413).

対話装置２の条件判断部５６は、上記稼働情報レスポンスに含まれているのが、「視聴中」を示す稼働情報である場合（Ｓ４１４において１）、自装置が置かれている環境について、音声誤認識条件が成立していると判断する。そして、対話モード制御部５８は、対話モードを、返答音声を出力しない対話モードに切り替える（Ｓ４１５）。上記稼働情報レスポンスに含まれているのが、「電源オフまたは非視聴使用中」を示す稼働情報である場合（Ｓ４１４において２）、自装置が置かれている環境について、音声誤認識条件が成立していないと判断する。そして、対話モード制御部５８は、返答音声を出力する対話モードに切り替える（Ｓ４１６）。あるいは、上記稼働情報レスポンスに含まれているのが
、上記エラーメッセージである場合（Ｓ４１４において３）、対話モード制御部５８は、デフォルトの対話モードに切り替える（Ｓ４１７）。なお、デフォルトの対話モードは、返答音声を出力する対話モードであってもよい。 If the operation information response includes operation information indicating “viewing” (1 in S414), the condition determination unit 56 of the interactive device 2 uses the voice for the environment in which the device is placed. It is determined that the misrecognition condition is satisfied. Then, the dialogue mode control unit 58 switches the dialogue mode to a dialogue mode that does not output a reply voice (S415). If the operation information response includes operation information indicating “power off or not in use” (2 in S414), the voice recognition condition is satisfied for the environment in which the device is located. Judge that it is not. Then, the dialogue mode control unit 58 switches to the dialogue mode that outputs a response voice (S416). Alternatively, when the operation information response includes the error message (3 in S414), the dialogue mode control unit 58 switches to the default dialogue mode (S417). The default interactive mode may be an interactive mode that outputs a response voice.

≪変形例３≫
上述の各実施形態では、対話装置２は、音声認識部５９を備え、自装置に入力された音声を認識して、自装置で生成した返答音声、または、発話制御サーバ１から供給された返答音声を出力する構成であった。しかし、本発明の対話システムにおける対話装置２の構成は、上記構成に限られない。 << Modification 3 >>
In each of the above-described embodiments, the dialogue apparatus 2 includes the voice recognition unit 59, recognizes the voice input to the own apparatus, and generates the reply voice generated by the own apparatus or the reply supplied from the utterance control server 1. It was the structure which outputs an audio | voice. However, the configuration of the dialog device 2 in the dialog system of the present invention is not limited to the above configuration.

例えば、音声認識部５９を持たない対話装置２を本発明の対話システムに採用することができる。すなわち、クライアント側（対話装置２）には音声認識機能がなく、対話装置２が音声入力部２２（マイク）で拾った音声を発話制御サーバ１に送信し、発話制御サーバ１が当該音声を認識する。このような対話システムも本発明の範疇に入る。より具体的には、音声認識部５９は、発話制御サーバ１の制御部１０に設けられる。そして、対話装置２の制御部２０は、音声認識部５９に代えて、図示しない音声データ送信部（音声データ送信手段）を備えている。音声データ送信部は、音声入力部２２を介して入力された音声データを、発話制御サーバ１に送信する。発話制御サーバ１の音声認識部５９は、上記音声データ送信部から受信した音声データを認識する。発話制御サーバ１は、図示しない返答音声を生成するための図示しない下流の各処理部にて返答音声を生成する。そして、発話制御サーバ１の発話制御部５７は、生成された返答音声を対話装置２に返信する。対話装置２では、対話モード制御部５８は、発話制御サーバ１から供給された返答音声の出力を音声出力制御部６０に指示し、音声出力制御部６０が音声出力部２３を介して上記返答音声を出力する。このように、対話装置２が音声認識部５９を有していない場合でも、対話装置２は、ユーザと対話することが可能である。 For example, the dialogue apparatus 2 that does not have the voice recognition unit 59 can be employed in the dialogue system of the present invention. That is, the client side (dialogue device 2) does not have a voice recognition function, the dialogue device 2 transmits the voice picked up by the voice input unit 22 (microphone) to the utterance control server 1, and the utterance control server 1 recognizes the voice. To do. Such an interactive system also falls within the scope of the present invention. More specifically, the voice recognition unit 59 is provided in the control unit 10 of the utterance control server 1. The control unit 20 of the interactive apparatus 2 includes a voice data transmission unit (voice data transmission unit) (not shown) instead of the voice recognition unit 59. The voice data transmission unit transmits the voice data input via the voice input unit 22 to the utterance control server 1. The voice recognition unit 59 of the utterance control server 1 recognizes the voice data received from the voice data transmission unit. The utterance control server 1 generates a response voice in each downstream processing unit (not shown) for generating a response voice (not shown). Then, the utterance control unit 57 of the utterance control server 1 returns the generated response voice to the dialogue apparatus 2. In the dialogue apparatus 2, the dialogue mode control unit 58 instructs the voice output control unit 60 to output the response voice supplied from the utterance control server 1, and the voice output control unit 60 sends the response voice via the voice output unit 23. Is output. Thus, even when the dialogue apparatus 2 does not have the voice recognition unit 59, the dialogue apparatus 2 can interact with the user.

上記構成においては、さらに、返答音声の出力抑制を以下のようにして実現することが可能である。 In the above-described configuration, it is possible to further suppress output of response voice as follows.

第１に、発話制御サーバ１の発話制御部５７は、条件判断部５６によって音声誤認識条件が成立すると判断された場合には、対話装置２に対して音声データを送信しないように指示する。この指示にしたがって、対話装置２の対話モード制御部５８は、上記音声データ送信部に対して音声データの送信を禁止する。上記構成によれば、発話制御サーバ１によって音声データが処理されないので、返答音声が生成されず、対話装置２に供給できない。 First, the utterance control unit 57 of the utterance control server 1 instructs the dialogue apparatus 2 not to transmit voice data when the condition determination unit 56 determines that the voice recognition condition is satisfied. In accordance with this instruction, the dialogue mode control unit 58 of the dialogue apparatus 2 prohibits the voice data transmission unit from transmitting voice data. According to the above configuration, since the voice data is not processed by the utterance control server 1, a reply voice is not generated and cannot be supplied to the dialogue apparatus 2.

第２に、発話制御サーバ１の発話制御部５７は、条件判断部５６によって音声誤認識条件が成立すると判断された場合には、対話装置２の音声データ送信部から送信された音声データの受信を拒否する。上記構成によれば、音声データが受信されないので、発話制御サーバ１において返答音声が生成されず、返答音声を対話装置２に供給できない。 Secondly, the speech control unit 57 of the speech control server 1 receives the voice data transmitted from the voice data transmission unit of the interactive apparatus 2 when the condition determination unit 56 determines that the voice recognition condition is satisfied. To refuse. According to the above configuration, since voice data is not received, a reply voice is not generated in the utterance control server 1 and a reply voice cannot be supplied to the dialogue apparatus 2.

第３に、発話制御サーバ１の発話制御部５７は、条件判断部５６によって音声誤認識条件が成立すると判断された場合には、受信された音声データを処理しないように、返答音声を生成するための各処理部を制御する。上記構成によれば、発話制御サーバ１において返答音声が生成されず、返答音声を対話装置２に供給できない。 Third, the utterance control unit 57 of the utterance control server 1 generates a response voice so as not to process the received voice data when the condition judgment unit 56 determines that the voice error recognition condition is satisfied. Control each processing unit. According to the above configuration, the response voice is not generated in the utterance control server 1 and the response voice cannot be supplied to the dialogue apparatus 2.

第４に、発話制御サーバ１の発話制御部５７は、条件判断部５６によって音声誤認識条件が成立すると判断された場合には、自装置にて生成された返答音声を対話装置２に送信しない。上記構成によれば、対話装置２に対して返答音声が供給されない。 Fourth, the utterance control unit 57 of the utterance control server 1 does not transmit the response voice generated by the own apparatus to the dialogue apparatus 2 when the condition determination unit 56 determines that the voice recognition condition is satisfied. . According to the above configuration, the response voice is not supplied to the dialogue apparatus 2.

以上の各構成によれば、結果として、音声誤認識条件の成立時、認識対象外音声が発生しても、返答音声の出力が抑制され、対話装置２の誤認識および誤動作を回避することができる。 According to each of the above configurations, as a result, even if a voice that is not a recognition target is generated when the voice recognition condition is satisfied, the output of the reply voice is suppressed, and erroneous recognition and malfunction of the interactive device 2 can be avoided. it can.

≪変形例４≫
実施形態１において、音声誤認識条件の成否の判断は、情報収集サーバ４によって実行されてもよい。この場合、本発明の対話システムにおいて、情報収集サーバ４には、条件判断部５６が設けられ、稼働情報送信部５３に代えて、成否の判断結果を発話制御サーバ１に通知する通知部（図示せず）が設けられる。 << Modification 4 >>
In the first embodiment, whether or not the erroneous voice recognition condition is satisfied may be determined by the information collection server 4. In this case, in the dialogue system of the present invention, the information collection server 4 is provided with a condition determination unit 56, and instead of the operation information transmission unit 53, a notification unit (not shown) for notifying the utterance control server 1 of the determination result of success / failure. Not shown).

≪変形例５≫
実施形態１において、情報収集サーバ４と、発話制御サーバ１とは、１台のコンピュータによって構成されてもよい。この場合、情報収集サーバ４に設けられた稼働情報送信部５３と、発話制御サーバ１に設けられた稼働情報受信部５４とを省略することが可能である。 << Modification 5 >>
In the first embodiment, the information collection server 4 and the utterance control server 1 may be configured by a single computer. In this case, the operation information transmission unit 53 provided in the information collection server 4 and the operation information reception unit 54 provided in the speech control server 1 can be omitted.

≪変形例６≫
実施形態２〜４において、各機器の稼働状況を管理する情報収集サーバ４は省略されているが、ユーザごとに各機器の稼働状況を一元的に管理するという目的がある場合、それぞれの対話システム４０２〜４０４は、情報収集サーバ４を含んで構築されてもよい。具体的には、対話装置２は、対話システム４０２〜４０４において、発話制御サーバ１に送信していた装置ＩＤおよび稼働情報を、情報収集サーバ４にも送信するように構成される。あるいは、対話装置２は、装置ＩＤおよび稼働情報を情報収集サーバ４にだけに送信し、情報収集サーバ４が、装置ＩＤおよび稼働情報を発話制御サーバ１に転送する構成であってもよい。これにより、対話装置２からのリクエストは情報収集サーバ４に集約されるので、情報収集サーバ４において各機器の稼働状況を一元的に管理できるとともに、発話制御サーバ１の処理負荷を情報収集サーバ４に分散させることが可能となる。 << Modification 6 >>
In the second to fourth embodiments, the information collection server 4 for managing the operation status of each device is omitted. However, when there is a purpose to centrally manage the operation status of each device for each user, each interactive system 402 to 404 may be constructed including the information collection server 4. Specifically, the dialog device 2 is configured to transmit the device ID and the operation information transmitted to the utterance control server 1 to the information collection server 4 in the dialog systems 402 to 404. Alternatively, the dialog device 2 may be configured to transmit the device ID and the operation information only to the information collection server 4, and the information collection server 4 transfers the device ID and the operation information to the utterance control server 1. As a result, requests from the interactive device 2 are collected in the information collection server 4, so that the operation status of each device can be managed in the information collection server 4 and the processing load of the utterance control server 1 can be controlled. Can be dispersed.

〔まとめ〕
本発明の態様１に係る対話システム（４０１〜４０４）は、ユーザが発した音声を認識して、該音声に対し返答音声を出力する対話装置（２）を制御する対話システムであって、ユーザが上記対話装置に対して発した認識対象音声ではない認識対象外音声が、上記対話装置によって認識対象音声として誤検知され得る場合に、上記対話装置の音声誤認識条件が成立すると判断する条件判断手段（条件判断部５６）と、上記条件判断手段によって上記音声誤認識条件が成立すると判断された場合に、上記対話装置によって上記返答音声が出力されないように制御する発話制御手段（発話制御部５７）とを含み、上記条件判断手段は、上記認識対象外音声を直接または間接的に発生させる対象外音声発生源機器（例えば、音声出力機器、通話機器、家電機器など、より具体的には、ＴＶ３、電話機３ａ、洗濯機など）の稼働状況を示す稼働情報に基づいて、上記音声誤認識条件の成否を判断する。 [Summary]
A dialogue system (401 to 404) according to an aspect 1 of the present invention is a dialogue system that controls a dialogue device (2) that recognizes a voice uttered by a user and outputs a response voice to the voice. A condition determination that determines that a speech misrecognition condition of the interactive device is satisfied when a non-recognition speech that is not a recognition target speech uttered to the interactive device can be erroneously detected as a recognition target speech by the interactive device Utterance control means (speech control section 57) for controlling the answering voice not to be output by the dialogue device when it is determined by the means (condition judgment section 56) and the condition judgment means that the voice recognition error condition is satisfied. ), And the condition determination means includes a non-recognized sound source device (for example, a sound output device, a call device) that directly or indirectly generates the non-recognized sound. Such as home appliances, and more specifically, TV 3, based on the telephone 3a, operation information indicating the operation status such as washing machines), which determine the success of the speech misrecognition conditions.

上記の構成によれば、対象外音声発生源機器の稼働状況を示す稼働情報に基づいて、条件判断手段は、対話装置の音声誤認識条件の成否を判断する。例えば、条件判断手段は、上記稼働情報が設定される度に（新たに生成されたり、更新されたりする度に）成否の判断を行えばよい。条件判断手段によって音声誤認識条件が成立すると判断されると、発話制御手段は、対話装置によって上記返答音声が出力されないように制御する。 According to said structure, based on the operation information which shows the operation condition of a non-target audio | voice generation | occurrence | production source apparatus, a condition determination means determines the success or failure of the audio | voice misrecognition conditions of a dialogue apparatus. For example, the condition determination means may determine success or failure every time the operation information is set (every time it is newly generated or updated). If it is determined by the condition determining means that the voice recognition condition is satisfied, the utterance control means controls the dialog device so that the response voice is not output.

具体的には、上記対話装置の周囲にある対象外音声発生源機器が、認識対象外音声を発生させる稼働状況である場合に、対話装置が、上記対象外音声発生源機器によって直接または間接的に発生させられた認識対象外音声を認識対象音声として誤って検知し得ると考
えられる。したがって、条件判断手段は、対象外音声発生源機器の稼働情報に基づいて、対象外音声発生源機器が認識対象外音声を発生させる稼働状況である場合に、上記対話装置について音声誤認識条件が成立すると判断する。そして、この場合に、発話制御手段によって、返答音声の出力は抑制される。例えば、発話制御手段は、返答音声を出力しないように上記対話装置に指示することによって出力を抑制してもよい。あるいは、上記対話装置が自装置で返答音声を生成する機能を持たない場合には、発話制御手段は、返答音声を上記対話装置に供給しないことによって出力を抑制してもよい。 Specifically, when the non-target audio source device around the dialog device is in an operating state in which non-recognition target sound is generated, the dialog device is directly or indirectly connected by the non-target sound source device. It is considered that the non-recognition voice generated in the above can be erroneously detected as the recognition target voice. Therefore, the condition determination means determines that the voice recognition condition is not correct for the interactive device when the non-target sound source device is in an operating state in which non-recognition target sound is generated based on the operation information of the non-target sound source device. Judgment is made. In this case, the output of response voice is suppressed by the speech control means. For example, the utterance control means may suppress the output by instructing the dialogue apparatus not to output a reply voice. Alternatively, when the dialogue apparatus does not have a function of generating a response voice by itself, the utterance control unit may suppress the output by not supplying the answer voice to the dialogue apparatus.

なお、対象外音声発生源機器が「間接的に認識対象外音声を発生させる」とは、当該対象外音声発生源機器が自機に備わっている機能を実行したことを契機に、当該対象外音声発生源機器とは別の実体（例えば、ユーザまたは当該対象外音声発生源機器とは別の機器）が、認識対象外音声を発生させることを指す。 Note that “indirect generation of non-recognized audio” means that the non-target audio source device is “not indirectly recognized” when the non-target audio source device has executed a function of its own device. This means that an entity different from the sound source device (for example, a user or a device different from the non-target sound source device) generates non-recognition sound.

以上のことから、対象外音声発生源機器の稼働状況に応じて、誤認識が起こりやすい状況、すなわち、認識対象外音声が対話装置の周囲で発生し得る状況では、当該対話装置は、認識対象外音声が発生しても、それに対して返答音声を出力することがなくなる。結果として、認識対象外音声が発生しても、音声の誤認識またはそれに伴う誤作動を回避することができるという効果を奏する。 Based on the above, in a situation where misrecognition is likely to occur depending on the operating status of the non-target speech source device, that is, in a situation where unrecognized voice can occur around the dialog device, the dialog device Even if external audio is generated, no response audio is output. As a result, even if a speech that is not a recognition target is generated, there is an effect that it is possible to avoid erroneous speech recognition or malfunction associated therewith.

本発明の態様２に係る対話システムでは、上記態様１において、上記対象外音声発生源機器は、認識対象外音声を出力する音声出力機能を少なくとも有する音声出力機器であり、上記対話システムは、上記音声出力機器が上記音声出力機能を実行している間、当該音声出力機器の稼働情報を、上記音声出力機能を実行中であることを示すように設定する稼働情報設定手段（稼働情報生成部５２）を含み、上記条件判断手段は、上記稼働情報設定手段によって設定された、上記音声出力機器の稼働情報が、上記音声出力機能を実行中であることを示す場合に、上記対話装置の音声誤認識条件が成立すると判断してもよい。 In the dialog system according to aspect 2 of the present invention, in the above aspect 1, the non-target sound generation source device is a sound output device having at least a sound output function for outputting non-recognition target sound. While the sound output device is executing the sound output function, operation information setting means (operation information generating unit 52) sets operation information of the sound output device to indicate that the sound output function is being executed. ) And the condition determination means indicates that the operation information of the voice output device set by the operation information setting means indicates that the voice output function is being executed. It may be determined that the recognition condition is satisfied.

上記の構成によれば、対象外音声発生源機器が、認識対象外音声を出力する音声出力機能を少なくとも有する音声出力機器である場合に、当該音声出力機器が、認識対象外音声を直接的に出力している間、上記対話装置の音声誤認識条件が成立すると判断される。 According to the above configuration, when the untargeted sound source device is a sound output device having at least a sound output function for outputting unrecognized sound, the sound output device directly outputs unrecognized sound. While the data is being output, it is determined that the voice recognition condition for the interactive device is satisfied.

以上のことから、音声出力機器が対話装置の周囲で認識対象外音声を直接的に出力している状況では、上記対話装置は、上記認識対象外音声が発生しても、それに対して返答音声を出力することがなくなる。結果として、音声の誤認識またはそれに伴う誤作動を回避することができるという効果を奏する。なお、音声出力機器が認識対象外音声を直接的に出力している状況としては、例えば、これに限定されないが、ＴＶが任意のチャンネルを選局して映像と併せて音声を出力している状況、あるいは、録画再生装置が録画した番組の映像と音声とを出力している状況、あるいは、音楽再生装置が音楽を再生している状況などが想定される。ユーザが、これらの音声出力機器を利用して出力される認識対象外音声を視聴している場合には、対話装置の誤認識および誤作動によって当該視聴が邪魔されることを回避できるので特にメリットが大きい。 From the above, in a situation where the voice output device directly outputs non-recognition target sound around the dialog device, the dialog device responds to the occurrence of the non-recognition sound. Will not be output. As a result, there is an effect that it is possible to avoid erroneous recognition of voice or the accompanying malfunction. Note that the situation in which the audio output device directly outputs unrecognized audio is not limited to this, for example, but the TV selects an arbitrary channel and outputs the audio together with the video. A situation, a situation in which video and audio of a program recorded by the recording / playback apparatus are output, or a situation in which the music playback apparatus is playing back music are assumed. When the user is viewing non-recognized audio output using these audio output devices, it is particularly advantageous because the viewing can be prevented from being disturbed by erroneous recognition and malfunction of the interactive device. Is big.

本発明の態様３に係る対話システムでは、上記態様１において、上記対象外音声発生源機器は、ユーザが遠隔の通話相手と通話するための通話機器であり、上記対話システムは、上記通話機器が着信を知らせる呼出音を出力してから、ユーザが発する音声が一定時間以上途切れるまでの間、当該通話機器の稼働情報を、通話中であることを示すように設定する稼働情報設定手段（稼働情報生成部５２）を含み、上記条件判断手段は、上記稼働情報設定手段によって設定された、上記通話機器の稼働情報が、通話中であることを示す場合に、上記対話装置の音声誤認識条件が成立すると判断してもよい。 In the dialogue system according to aspect 3 of the present invention, in the above aspect 1, the non-target audio source device is a call device for a user to make a call with a remote call partner, and the conversation system includes the call device. Operation information setting means (operation information) for setting operation information of the calling device so as to indicate that the call is in progress until a voice uttered by the user is interrupted for a predetermined time or more after outputting a ringing tone for notifying an incoming call. The condition determining means includes a generating unit 52), and the condition determination means determines that the voice recognition condition of the interactive device is in a state where the operation information of the calling device set by the operation information setting means indicates that a call is in progress. It may be determined that it is established.

上記の構成によれば、対象外音声発生源機器が通話機器であって、当該通話機器が、呼出音を出力することによって通話開始の契機となった場合に、当該通話の間、上記対話装置の音声誤認識条件が成立すると判断される。 According to the above configuration, when the non-target audio generation source device is a call device, and the call device triggers the start of a call by outputting a ringing tone, the dialogue apparatus during the call Is determined to be satisfied.

以上のことから、通話機器が呼出音の出力によって間接的に認識対象外音声を発生させる状況、つまり、着信後にユーザが通話相手と通話している状況では、上記対話装置は、上記認識対象外音声（通話相手と通話しているユーザの声）が発生しても、それに対して返答音声を出力することがなくなる。結果として、音声の誤認識またはそれに伴う誤作動を回避することができるという効果を奏する。ユーザが、通話機器を用いて通話している場合には、対話装置の誤認識および誤作動によって当該通話が邪魔されることを回避できるので特にメリットが大きい。なお、通話機器としては、これには限定されないが、固定電話機、携帯電話、スマートフォン、インターフォンなどが想定される。 From the above, in a situation where the calling device indirectly generates unrecognized sound by outputting a ringing tone, that is, in a situation where the user is talking to the other party after receiving a call, the interactive device is not recognized. Even if voice (the voice of the user who is talking to the other party) is generated, no response voice is output. As a result, there is an effect that it is possible to avoid erroneous recognition of voice or the accompanying malfunction. When the user is making a call using a call device, the merit is particularly great because the call can be prevented from being disturbed by erroneous recognition and malfunction of the interactive device. In addition, although it is not limited to this as a telephone call apparatus, a fixed telephone, a mobile phone, a smart phone, an intercom etc. are assumed.

本発明の態様４に係る対話システムでは、上記態様１において、上記対象外音声発生源機器は、自機が稼働している間、所定のタイミングで認識対象外音声を出力する音声出力機能を少なくとも有する家電機器であり、上記対話システムは、上記家電機器が稼働している間、当該家電機器の稼働情報を、稼働中であることを示すように設定する稼働情報設定手段（稼働情報生成部５２）を含み、上記条件判断手段は、上記稼働情報設定手段によって設定された、上記家電機器の稼働情報が、稼働中であることを示す場合に、上記対話装置の音声誤認識条件が成立すると判断してもよい。 In the dialog system according to aspect 4 of the present invention, in the above aspect 1, the non-target audio generation source device has at least a sound output function for outputting non-recognition target sound at a predetermined timing while the device is operating. An operation information setting unit (operation information generation unit 52) that sets operation information of the home appliance to indicate that it is in operation while the home appliance is in operation. And the condition determination means determines that the voice recognition condition for the interactive device is satisfied when the operation information of the home appliance set by the operation information setting means indicates that the operation is in progress. May be.

上記の構成によれば、対象外音声発生源機器が、自機が稼働中に所定のタイミングで認識対象外音声を出力する音声出力機能を少なくとも有する家電機器である場合に、当該家電機器が、稼働している間、上記対話装置の音声誤認識条件が成立すると判断される。 According to the above configuration, when the non-target sound source device is a home device that has at least a sound output function of outputting non-recognition target sound at a predetermined timing while the device is operating, While operating, it is determined that the voice recognition condition of the interactive device is satisfied.

以上のことから、家電機器が対話装置の周囲で稼働中であり、認識対象外音声をいつ出力してもおかしくない状況において、上記対話装置は、上記認識対象外音声が発生しても、それに対して返答音声を出力することがなくなる。結果として、音声の誤認識またはそれに伴う誤作動を回避することができるという効果を奏する。なお、家電機器が認識対象外音声を直接的に出力している状況としては、例えば、これに限定されないが、自機の稼働状況をユーザに通知する音声案内を出力している状況、自機にエラーが発生しそれをユーザに通知する音声案内を出力している状況、自機に対する操作をユーザに促すための音声案内を出力している状況などが想定される。ユーザがこれらの家電機器を利用中、対話装置が誤認識および誤作動によって発言することがなくなる。よって、ユーザは煩わしい思いをすることなく家電機器を利用することができので、特にメリットが大きい。 From the above, in a situation where home appliances are operating around the interactive device and it is not always possible to output unrecognized speech, the interactive device On the other hand, no response voice is output. As a result, there is an effect that it is possible to avoid erroneous recognition of voice or the accompanying malfunction. The situation in which the home appliance directly outputs the non-recognized voice is not limited to this, for example, but the situation in which voice guidance for notifying the user of the operation status of the own device is output, A situation in which a voice guidance for notifying the user of the occurrence of an error and outputting a voice guidance for prompting the user to perform an operation on the device is assumed. While the user is using these home appliances, the interactive device does not speak due to erroneous recognition and malfunction. Therefore, since the user can use the home appliance without annoying thoughts, the merit is particularly great.

本発明の態様５に係る対話システムでは、上記態様１〜４において、上記対話装置は、上記発話制御手段から上記返答音声の出力を抑制する指示を取得した場合に、自装置に入力された音声を認識する音声認識手段（音声認識部５９）および上記返答音声の出力を実行する音声出力制御手段（音声出力制御部６０）の少なくともいずれか一方の機能を無効にする対話制御手段（対話モード制御部５８）を備えていてもよい。 In the dialog system according to aspect 5 of the present invention, in the above aspects 1 to 4, when the dialog apparatus acquires an instruction to suppress the output of the reply voice from the utterance control unit, the voice input to the own apparatus Dialogue control means (dialogue mode control) for disabling at least one of the functions of voice recognition means (voice recognition unit 59) for recognizing voice and voice output control means (voice output control unit 60) for executing output of the reply voice Part 58).

上記の構成によれば、上記対話装置は、認識対象外音声を認識対象音声として誤って検知し得る状況では、上記発話制御手段から上記返答音声の出力を抑制する指示を取得する。この場合、対話装置の対話制御手段は、音声認識手段および音声出力制御手段の少なくともいずれか一方の機能を無効にする。 According to said structure, the said dialogue apparatus acquires the instruction | indication which suppresses the output of the said response voice from the said speech control means in the condition which can detect unrecognized audio | voice as recognition object audio | voice accidentally. In this case, the dialog control means of the dialog device disables the function of at least one of the voice recognition means and the voice output control means.

これにより、音声認識手段の機能が無効になれば、上記音声認識手段は、認識対象外音声が発生しても、これを音声認識の処理にかけることがない。したがって、上記認識対象外音声に対して誤って返答音声が出力されることが抑制される。また、音声出力制御手段
の機能が無効になれば、認識対象外音声が発生して誤って認識対象音声として音声認識の処理にかけられたとしても、これに応答するための返答音声は出力されない。結果として、認識対象外音声が発生しても、音声の誤認識またはそれに伴う誤作動を回避することができるという効果を奏する。 As a result, if the function of the voice recognition unit is disabled, the voice recognition unit does not subject the voice recognition process to the voice recognition process even if a voice that is not a recognition target is generated. Therefore, it is possible to suppress a response voice from being erroneously output with respect to the non-recognition target voice. Further, if the function of the voice output control means is disabled, even if an unrecognized voice is generated and erroneously subjected to voice recognition processing as a recognition target voice, a reply voice for responding thereto is not output. As a result, even if a speech that is not a recognition target is generated, there is an effect that it is possible to avoid erroneous speech recognition or malfunction associated therewith.

本発明の態様６に係る対話システムでは、上記態様５において、上記発話制御手段は、さらに、ユーザが発した音声の入力がなくとも、所定のイベントの発生に応じて、該イベントに対応する発言を内容とする音声を自発音声として自発的に出力するように上記対話装置に指示するものであり、上記対話制御手段は、上記発話制御手段の上記自発音声を出力する指示にしたがって該自発音声を出力した後、上記返答音声の出力の抑制を一定期間解除することが好ましい。 In the dialog system according to Aspect 6 of the present invention, in the Aspect 5, the speech control means further includes a speech corresponding to the event in response to the occurrence of the predetermined event even if the user does not input the voice. The dialogue control means instructs the dialogue apparatus to voluntarily output a voice having the content as a spontaneous voice, and the dialogue control means outputs the spontaneous voice according to an instruction of the utterance control means to output the spontaneous voice. After outputting, it is preferable to release the suppression of the response voice output for a certain period.

ユーザは、対話装置が自発的に発言した場合には、当該発言を受けて対話装置に対して何らかの応答を返す可能性が高い。しかし、返答音声の出力抑制中に上記ユーザの応答に対して対話装置が反応を示さないのは、対話として不自然である。 When the dialog device speaks spontaneously, the user is likely to receive some response and return some response to the dialog device. However, it is unnatural as a dialogue that the dialogue device does not respond to the user's response while the output of the reply voice is suppressed.

しかし、上記の構成によれば、対話装置が自発音声を出力した後の一定期間は、返答音声の出力抑制が解除される。この期間、対話装置は、一時的にユーザの発話に対して返答音声を出力することができる。結果として、上記の不自然さを解消することが可能である。 However, according to the above configuration, the output suppression of the response voice is released for a certain period after the dialogue apparatus outputs the spontaneous voice. During this period, the dialogue apparatus can temporarily output a response voice in response to the user's utterance. As a result, it is possible to eliminate the unnaturalness described above.

本発明の態様７に係る発話制御装置（発話制御サーバ１）は、ユーザが発した音声を認識して、該音声に対し返答音声を出力する対話装置を制御する発話制御装置であって、ユーザが上記対話装置に対して発した認識対象音声ではない認識対象外音声が、上記対話装置によって認識対象音声として誤検知され得る場合に、上記対話装置の音声誤認識条件が成立すると判断する条件判断手段（条件判断部５６）と、上記条件判断手段によって上記音声誤認識条件が成立すると判断された場合に、上記対話装置によって上記返答音声が出力されないように制御する発話制御手段（発話制御部５７）とを備え、上記条件判断手段は、上記認識対象外音声を直接または間接的に発生させる対象外音声発生源機器の稼働状況を示す稼働情報に基づいて、上記音声誤認識条件の成否を判断する。 An utterance control device (speech control server 1) according to an aspect 7 of the present invention is an utterance control device that recognizes a voice uttered by a user and controls an interactive device that outputs a response voice to the voice. A condition determination that determines that a speech misrecognition condition of the interactive device is satisfied when a non-recognition speech that is not a recognition target speech uttered to the interactive device can be erroneously detected as a recognition target speech by the interactive device Utterance control means (speech control section 57) for controlling the answering voice not to be output by the dialogue device when it is determined by the means (condition judgment section 56) and the condition judgment means that the voice recognition error condition is satisfied. ), And the condition determination means is based on operation information indicating an operation status of the non-recognized sound generation source device that directly or indirectly generates the non-recognized sound, Serial to determine the success or failure of speech recognition error conditions.

上記の構成によれば、対象外音声発生源機器の稼働状況を示す稼働情報に基づいて、条件判断部は、対話装置の音声誤認識条件の成否を判断する。条件判断手段によって音声誤認識条件が成立すると判断されると、発話制御手段は、対話装置によって上記返答音声が出力されないように制御する。 According to said structure, a condition judgment part judges the success or failure of the audio | voice misrecognition conditions of a dialogue apparatus based on the operation information which shows the operation condition of a non-target audio | voice generation source apparatus. If it is determined by the condition determining means that the voice recognition condition is satisfied, the utterance control means controls the dialog device so that the response voice is not output.

これにより、対象外音声発生源機器の稼働状況に応じて、誤認識が起こりやすい状況、すなわち、認識対象外音声が対話装置の周囲で発生し得る状況では、当該対話装置は、認識対象外音声が発生しても、それに対して返答音声を出力することがなくなる。結果として、認識対象外音声が発生しても、音声の誤認識またはそれに伴う誤作動を回避することができるという効果を奏する。 As a result, in situations where misrecognition is likely to occur depending on the operating status of the non-target audio source device, that is, in situations where non-recognition target audio may occur around the dialog device, the dialog device No response voice will be output. As a result, even if a speech that is not a recognition target is generated, there is an effect that it is possible to avoid erroneous speech recognition or malfunction associated therewith.

本発明の態様８に係る発話制御装置は、上記態様７において、上記対象外音声発生源機器と通信網を介して通信して該対象外音声発生源機器の情報を収集することにより、上記対象外音声発生源機器の稼働情報を生成する情報収集装置（情報収集サーバ４）から、上記稼働情報を受信する稼働情報受信手段（稼働情報受信部５４）を備えていてもよい。 The speech control apparatus according to aspect 8 of the present invention provides the speech control apparatus according to aspect 7 described above, by communicating with the non-target sound generation source device via a communication network and collecting information on the non-target sound generation source device. Operation information receiving means (operation information receiving unit 54) that receives the operation information from an information collection device (information collection server 4) that generates operation information of the external sound source device may be provided.

上記の構成によれば、対象外音声発生源機器が通信機能を有している場合に、情報収集装置に各対象外音声発生源機器の稼働状況を把握させて、各対象外音声発生源機器の稼働情報を生成させることができる。発話制御装置は、情報収集装置から供給された稼働情報
を用いて音声誤認識条件の成否を判断すればよく、発話制御装置の構成を簡素化することができる。 According to the above configuration, when the non-target sound source device has a communication function, the information collection device grasps the operating status of each non-target sound source device, and each non-target sound source device Operation information can be generated. The utterance control device only has to determine whether or not the voice recognition condition is satisfied using the operation information supplied from the information collecting device, and the configuration of the utterance control device can be simplified.

本発明の態様９に係る発話制御装置は、上記態様７において、上記対象外音声発生源機器の動作を近距離無線通信によって制御する上記対話装置から、上記対象外音声発生源機器の稼働情報を受信する稼働情報受信手段（稼働情報受信部５４）を備えていてもよい。 The speech control apparatus according to aspect 9 of the present invention provides the operation information of the non-target sound generation source device in the aspect 7 from the interactive device that controls the operation of the non-target sound generation source device by short-range wireless communication. Operation information receiving means (operation information receiving unit 54) for receiving may be provided.

上記の構成によれば、対話装置が対象外音声発生源機器を遠隔で制御する制御機能を有している場合に、対話装置に各対象外音声発生源機器の稼働状況を把握させて、各対象外音声発生源機器の稼働情報を生成させることができる。対象外音声発生源機器が通信機能を有していない場合であっても、音声の誤認識またはそれに伴う誤作動を回避するという目的を達成することができる。 According to the above configuration, when the interactive device has a control function for remotely controlling the non-target audio source device, the interactive device grasps the operating status of each non-target audio source device, It is possible to generate operation information of a non-target audio source device. Even when the non-target sound source device does not have a communication function, the object of erroneous recognition of sound or the accompanying malfunction can be achieved.

本発明の態様１０に係る対話装置（２）は、ユーザが発した音声を認識して、該音声に対し返答音声を出力する対話装置であって、ユーザが上記対話装置に対して発した認識対象音声ではない認識対象外音声が、上記対話装置によって認識対象音声として誤検知され得る場合に、上記対話装置の音声誤認識条件が成立すると判断する条件判断手段（条件判断部５６）と、上記条件判断手段によって上記音声誤認識条件が成立すると判断された場合に、上記返答音声の出力を抑制する対話制御手段（対話モード制御部５８）とを備え、上記条件判断手段は、上記認識対象外音声を直接または間接的に発生させる対象外音声発生源機器の稼働状況を示す稼働情報に基づいて、上記音声誤認識条件の成否を判断する。 The dialogue apparatus (2) according to the tenth aspect of the present invention is a dialogue apparatus that recognizes a voice uttered by a user and outputs a response voice in response to the voice. Condition determination means (condition determination unit 56) for determining that a voice error recognition condition of the dialog device is satisfied when a non-recognition voice that is not a target voice can be erroneously detected as a recognition target voice by the dialog device; A dialogue control means (interaction mode control unit 58) that suppresses output of the reply voice when the condition judgment means judges that the voice error recognition condition is satisfied, and the condition judgment means excludes the recognition target The success or failure of the speech error recognition condition is determined based on operation information indicating the operation status of the non-target sound generation source device that directly or indirectly generates sound.

上記の構成によれば、対象外音声発生源機器の稼働状況を示す稼働情報に基づいて、条件判断部は、対話装置の音声誤認識条件の成否を判断する。条件判断手段によって音声誤認識条件が成立すると判断されると、対話制御手段は、自装置の上記返答音声の出力を抑制する。 According to said structure, a condition judgment part judges the success or failure of the audio | voice misrecognition conditions of a dialogue apparatus based on the operation information which shows the operation condition of a non-target audio | voice generation source apparatus. If it is determined by the condition determining means that the voice recognition condition is satisfied, the dialogue control means suppresses the output of the reply voice of the own device.

本発明の態様１１に係る対話装置では、上記態様１０において、上記対話制御手段は、自装置に入力された音声を認識する音声認識手段（音声認識部５９）および上記返答音声の出力を実行する音声出力制御手段（音声出力制御部６０）の少なくともいずれか一方の機能を無効にすることにより、上記返答音声の出力を抑制してもよい。 In the dialog device according to aspect 11 of the present invention, in the above aspect 10, the dialog control means executes speech recognition means (speech recognition unit 59) for recognizing the sound input to the own device and output of the reply voice. The output of the response voice may be suppressed by disabling at least one of the functions of the voice output control means (voice output control unit 60).

上記の構成によれば、対話制御手段は、認識対象外音声を認識対象音声として誤って検知し得る状況では、音声認識手段および音声出力制御手段の少なくともいずれか一方の機能を無効にする。 According to the above configuration, the dialog control unit disables at least one of the functions of the voice recognition unit and the voice output control unit in a situation where the non-recognition target voice can be erroneously detected as the recognition target voice.

これにより、音声認識手段の機能が無効になれば、上記音声認識手段は、認識対象外音声が発生しても、これを音声認識の処理にかけることがない。したがって、上記認識対象外音声に対して誤って返答音声が出力されることが抑制される。また、音声出力制御手段の機能が無効になれば、認識対象外音声が発生して誤って認識対象音声として音声認識の処理にかけられたとしても、これに応答するための返答音声は出力されない。結果として、認識対象外音声が発生しても、音声の誤認識またはそれに伴う誤作動を回避することができるという効果を奏する。 As a result, if the function of the voice recognition unit is disabled, the voice recognition unit does not subject the voice recognition process to the voice recognition process even if a voice that is not a recognition target is generated. Therefore, it is possible to suppress a response voice from being erroneously output with respect to the non-recognition target voice. Further, if the function of the voice output control means is disabled, even if an unrecognized voice is generated and erroneously subjected to voice recognition processing as a recognition target voice, a reply voice for responding thereto is not output. As a result, even if a speech that is not a recognition target is generated, there is an effect that it is possible to avoid erroneous speech recognition or malfunction associated therewith.

本発明の態様１２に係る対話装置では、上記態様１０または１１において、上記対話制御手段は、さらに、ユーザが発した音声の入力がなくとも、所定のイベントの発生に応じて、該イベントに対応する発言を内容とする音声を自発音声として自発的に出力するものであり、上記返答音声の出力を抑制している間に上記自発音声を出力した後、当該抑制を一定期間解除することが好ましい。 In the dialog device according to aspect 12 of the present invention, in the above aspect 10 or 11, the dialog control means further responds to the event according to the occurrence of a predetermined event without the input of the voice uttered by the user. It is preferable that the voice having the content of the voice to be output is spontaneously output as the spontaneous voice, and the suppression is canceled for a certain period after the spontaneous voice is output while the output of the reply voice is being suppressed. .

本発明の態様１３に係る発話制御方法は、ユーザが発した音声を認識して、該音声に対し返答音声を出力する対話装置を制御する発話制御方法であって、ユーザが上記対話装置に対して発した認識対象音声ではない認識対象外音声が、上記対話装置によって認識対象音声として誤検知され得る場合に、上記対話装置の音声誤認識条件が成立すると判断する条件判断ステップ（Ｓ１１４）と、上記条件判断ステップにて上記音声誤認識条件が成立すると判断された場合に、上記対話装置によって上記返答音声が出力されないように制御する発話制御ステップ（Ｓ１１５）とを含み、上記条件判断ステップでは、上記認識対象外音声を直接または間接的に発生させる対象外音声発生源機器の稼働状況を示す稼働情報に基づいて、上記音声誤認識条件の成否を判断する。 An utterance control method according to an aspect 13 of the present invention is an utterance control method for recognizing a voice uttered by a user and controlling a dialog device that outputs a response voice to the voice. A condition determination step (S114) for determining that the voice recognition condition of the interactive device is satisfied when the non-recognition speech that is not the recognition target speech generated in the above can be erroneously detected as the recognition target speech by the interactive device; An utterance control step (S115) for controlling the interactive device not to output the response voice when it is determined in the condition determining step that the voice error recognition condition is satisfied, and in the condition determining step, Based on the operation information indicating the operation status of the non-target audio generating device that directly or indirectly generates the non-recognized audio, the speech misrecognition condition To determine the success or failure of.

上記の方法によれば、条件判断ステップにて、対象外音声発生源機器の稼働状況を示す稼働情報に基づいて、対話装置の音声誤認識条件の成否が判断される。条件判断ステップにて音声誤認識条件が成立すると判断されると、次に、発話制御ステップにて、対話装置によって上記返答音声が出力されないように制御が行われる。 According to the above method, in the condition determination step, whether or not the voice error recognition condition of the interactive device is satisfied is determined based on the operation information indicating the operation status of the non-target audio generation source device. If it is determined in the condition determination step that the erroneous voice recognition condition is satisfied, then in the utterance control step, control is performed so that the response voice is not output by the dialogue apparatus.

これにより、対象外音声発生源機器の稼働状況に応じて、誤認識が起こりやすい状況、すなわち、認識対象外音声が対話装置の周囲で発生し得る状況では、当該対話装置は、認識対象外音声が発生しても、それに対して返答音声を出力することがなくなる。結果として、音声の誤認識またはそれに伴う誤作動を回避することができるという効果を奏する。 As a result, in situations where misrecognition is likely to occur depending on the operating status of the non-target audio source device, that is, in situations where non-recognition target audio may occur around the dialog device, the dialog device No response voice will be output. As a result, there is an effect that it is possible to avoid erroneous recognition of voice or the accompanying malfunction.

本発明の各態様に係る発話制御装置、対話装置、および、対話システムに含まれる各装置は、コンピュータによって実現してもよい。この場合には、コンピュータを上記発話制御装置（または上記対話装置）が備える各手段として動作させることにより上記発話制御装置（または上記対話装置）をコンピュータにて実現させる発話制御装置の制御プログラム、およびそれを記録したコンピュータ読み取り可能な記録媒体も、本発明の範疇に入る。 Each device included in the speech control device, the interactive device, and the interactive system according to each aspect of the present invention may be realized by a computer. In this case, a control program for the utterance control apparatus that causes the utterance control apparatus (or the interactive apparatus) to be realized by the computer by operating the computer as each unit included in the utterance control apparatus (or the interactive apparatus), and A computer-readable recording medium on which it is recorded also falls within the scope of the present invention.

本発明は上述した各実施形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能であり、異なる実施形態にそれぞれ開示された技術的手段を適宜組み合わせて得られる実施形態についても本発明の技術的範囲に含まれる。さらに、各実施形態にそれぞれ開示された技術的手段を組み合わせることにより、新しい技術的特徴を形成することができる。 The present invention is not limited to the above-described embodiments, and various modifications are possible within the scope shown in the claims, and embodiments obtained by appropriately combining technical means disclosed in different embodiments. Is also included in the technical scope of the present invention. Furthermore, a new technical feature can be formed by combining the technical means disclosed in each embodiment.

本発明は、ユーザが発した音声を認識して、該音声に対し返答音声を出力する対話装置
、およびその制御に利用することができる。 INDUSTRIAL APPLICABILITY The present invention can be used for an interactive apparatus that recognizes a voice uttered by a user and outputs a response voice in response to the voice and controls the voice.

１発話制御サーバ（発話制御装置）
２対話装置
３ＴＶ（対象外音声発生源機器／音声出力機器）
３ａ電話機（対象外音声発生源機器／通話機器）
４情報収集サーバ（情報収集装置）
２２音声入力部
２３音声出力部
２４赤外線送信部
３２操作部
３３赤外線受信部
５０操作情報送信部（操作情報送信手段）
５１操作情報受信部（操作情報受信手段）
５２稼働情報生成部（稼働情報設定手段）
５３稼働情報送信部（稼働情報送信手段）
５４稼働情報受信部（稼働情報受信手段）
５５制御対象特定部（制御対象特定手段）
５６条件判断部（条件判断手段）
５７発話制御部（発話制御手段）
５８対話モード制御部（対話制御手段）
５９音声認識部（音声認識手段）
６０音声出力制御部（音声出力制御手段）
６１機器操作部（機器操作手段）
６２音判定部（音判定手段）
４０１、４０２、４０３、４０４対話システム 1 Speech control server (speech control device)
2 Dialogue device 3 TV (excluded audio source device / audio output device)
3a Telephone (non-target audio source / calling equipment)
4 Information collection server (information collection device)
22 voice input unit 23 voice output unit 24 infrared transmission unit 32 operation unit 33 infrared reception unit 50 operation information transmission unit (operation information transmission means)
51 Operation information receiving unit (operation information receiving means)
52 Operation information generation unit (operation information setting means)
53 Operation information transmission unit (operation information transmission means)
54 Operation information receiving unit (operation information receiving means)
55 Control object specifying part (control object specifying means)
56 Condition Judgment Unit (Condition Judgment Unit)
57 Speech control unit (speech control means)
58 Dialog mode control unit (dialog control means)
59 Voice recognition unit (voice recognition means)
60 Audio output control unit (audio output control means)
61 Device operation unit (device operation means)
62 Sound determination unit (sound determination means)
401, 402, 403, 404 Dialogue system

Claims

A dialogue system for controlling a dialogue device that outputs a response voice in response to a voice uttered by a user,
A condition for determining that a speech misrecognition condition of the interactive device is satisfied when a non-recognition speech that is not a recognition target speech uttered by the user to the interactive device can be erroneously detected as a recognition target speech by the interactive device Judgment means,
Utterance control means for controlling so that the reply voice is not output by the dialogue device when the condition judgment means judges that the voice error recognition condition is satisfied,
The condition judging means judges success or failure of the speech misrecognition condition based on operating information indicating an operating status of a non-target speech generating source device that directly or indirectly generates the non-recognized speech. Interactive system.

The non-target sound generation source device is a sound output device having at least a sound output function for outputting non-recognition target sound,
The dialogue system sets operation information of the audio output device so as to indicate that the audio output function is being executed while the audio output device is executing the audio output function. Including
The condition judging means satisfies the voice recognition condition for the interactive device when the voice output device operating information set by the voice information setting means indicates that the voice output function is being executed. The dialogue system according to claim 1, wherein the dialogue system is determined to be.

The non-target audio source device is a call device for a user to call a remote call partner,
The interactive system displays the operation information of the call device to indicate that the call device is busy during a period from when the call device outputs a ringing tone notifying an incoming call until the voice uttered by the user is interrupted for a predetermined time or more. Including operation information setting means to set,
The condition determining means determines that the voice recognition condition of the interactive device is satisfied when the operating information of the calling device set by the operating information setting means indicates that a call is in progress. The interactive system according to claim 1.

The non-target audio generation source device is a household electrical appliance having at least a sound output function for outputting non-recognition target sound at a predetermined timing while the device is operating,
The interactive system includes operation information setting means for setting operation information of the home appliance to indicate that the home appliance is in operation while the home appliance is operating,
The condition determining means determines that a voice error recognition condition of the interactive device is satisfied when the operation information of the household electrical appliance set by the operation information setting means indicates that it is operating. The interactive system according to claim 1.

When the dialogue apparatus obtains an instruction to suppress the output of the reply voice from the utterance control means, the voice recognition means for recognizing the voice input to the own apparatus and the voice output control for executing the output of the reply voice The dialogue system according to any one of claims 1 to 4, further comprising dialogue control means for disabling at least one of the functions of the means.

Further, the speech control means may spontaneously output a speech having a speech corresponding to the event as a spontaneous speech in response to the occurrence of a predetermined event, even if the speech uttered by the user is not input. Instructing the dialogue device,
6. The dialog control unit according to claim 5, wherein after the speech control unit outputs the spontaneous speech in accordance with an instruction to output the spontaneous speech, the suppression of the output of the response speech is canceled for a certain period. Interactive system.

An utterance control device that controls an interactive device that outputs a response voice to a voice uttered by a user,
A condition for determining that a speech misrecognition condition of the interactive device is satisfied when a non-recognition speech that is not a recognition target speech uttered by the user to the interactive device can be erroneously detected as a recognition target speech by the interactive device Judgment means,
Utterance control means for controlling the interactive voice so that the reply voice is not output by the dialogue device when the voice judgment condition is judged to be satisfied by the condition judgment means,
The condition judging means judges success or failure of the speech misrecognition condition based on operating information indicating an operating status of a non-target speech generating source device that directly or indirectly generates the non-recognized speech. An utterance control device.

From the information collection device that generates the operation information of the non-target sound source device by collecting information on the non-target sound source device by communicating with the non-target sound source device through a communication network, The speech control apparatus according to claim 7, further comprising an operation information receiving unit that receives the operation information.

The operation information receiving means for receiving operation information of the non-target audio source device from the interactive device that controls the operation of the non-target audio source device by short-range wireless communication. 8. The utterance control device according to 7.

An interactive device that outputs a response voice in response to a voice uttered by a user,
A condition for determining that a speech misrecognition condition of the interactive device is satisfied when a non-recognition speech that is not a recognition target speech uttered by the user to the interactive device can be erroneously detected as a recognition target speech by the interactive device Judgment means,
Dialogue control means for suppressing the output of the answer voice when the condition judgment means judges that the voice error recognition condition is satisfied,
The condition judging means judges success or failure of the speech misrecognition condition based on operating information indicating an operating status of a non-target speech generating source device that directly or indirectly generates the non-recognized speech. Interactive device.

The dialogue control means disables the function of at least one of the voice recognition means for recognizing the voice input to the device and the voice output control means for executing the output of the reply voice, thereby The interactive apparatus according to claim 10, wherein output is suppressed.

Further, the dialogue control means spontaneously outputs a voice having a speech corresponding to the event as a spontaneous voice in response to the occurrence of a predetermined event without the input of the voice uttered by the user. Yes,
The interactive apparatus according to claim 10 or 11, wherein after the spontaneous voice is output while the output of the reply voice is suppressed, the suppression is released for a certain period.

An utterance control method for controlling an interactive device that outputs a response voice in response to a voice uttered by a user,
A condition for determining that a speech misrecognition condition of the interactive device is satisfied when a non-recognition speech that is not a recognition target speech uttered by the user to the interactive device can be erroneously detected as a recognition target speech by the interactive device A decision step;
An utterance control step for controlling so that the answering voice is not output by the dialogue device when it is determined in the condition determining step that the voice error recognition condition is satisfied,
In the condition determining step, the success or failure of the speech misrecognition condition is determined based on operation information indicating an operation status of an untargeted sound generation source device that directly or indirectly generates the unrecognized sound. Utterance control method.

A control program for causing a computer to function as the speech control apparatus according to any one of claims 7 to 9, wherein the control program causes the computer to function as each of the means.

A control program for causing a computer to function as the interactive apparatus according to any one of claims 10 to 12, wherein the control program causes the computer to function as each of the above means.