JP2021026124A

JP2021026124A - Voice interactive device, voice interactive method, and program

Info

Publication number: JP2021026124A
Application number: JP2019144528A
Authority: JP
Inventors: 智彰萩原; Tomoaki Hagiwara
Original assignee: Honda Motor Co Ltd
Current assignee: Honda Motor Co Ltd
Priority date: 2019-08-06
Filing date: 2019-08-06
Publication date: 2021-02-22
Anticipated expiration: 2039-08-06
Also published as: JP7217209B2

Abstract

To provide a voice interactive device, a voice interactive method and a program capable of expressing accuracy of voice recognition results.SOLUTION: In a vehicle, a voice interactive device includes: a voice acquisition unit which acquires voice information; a first information acquisition unit which receives and acquires external response information for the voice information acquired by the voice acquisition unit from an external device; a storage unit which stores internal response information for specific voice information; a second information acquisition unit which acquires the internal response information for the voice information acquired by the voice acquisition unit from the storage unit; and an output control unit for outputting at least one of the external response information and the internal response information to an output unit. The output control unit causes the output unit to output a sound effect together with the internal response information, when outputting the internal response information.SELECTED DRAWING: Figure 2

Description

本発明は、音声対話装置、音声対話方法、及びプログラムに関する。 The present invention relates to a voice dialogue device, a voice dialogue method, and a program.

従来、ユーザと対話して、ユーザの発話による要求に応じた回答を提供する音声対話システム、特に、音声対話機能を有するエージェントシステムがある。音声対話システムでは、例えば、ローカルの音声対話装置でユーザの発話を取得して、サーバに送信し、サーバから得られた回答をユーザに提供する。また、受信環境が悪い場合や、ユーザの要求が簡易であり、回答が短文で済む場合などには、サーバからの回答に代えて、音声対話装置自体で回答を提供するハイブリッド形式のものもある。 Conventionally, there are voice dialogue systems that interact with a user and provide an answer according to a request made by the user, particularly an agent system having a voice dialogue function. In a voice dialogue system, for example, a local voice dialogue device acquires a user's utterance, sends it to a server, and provides the user with an answer obtained from the server. In addition, when the reception environment is poor, or when the user's request is simple and the answer is short, there is also a hybrid format in which the answer is provided by the voice dialogue device itself instead of the answer from the server. ..

他方、例えば、車両に搭載される車載装置では、受信環境が悪く、エラーレートが大きくなると、ノイズの発生の原因となる。従来、受信状態の悪化により音量レベルを調整した音声に対して、ノイズレベルの調整された擬似ノイズを付加する情報受信器がある（例えば、特許文献１参照）。 On the other hand, for example, in an in-vehicle device mounted on a vehicle, if the reception environment is poor and the error rate becomes large, it causes noise. Conventionally, there is an information receiver that adds pseudo noise with an adjusted noise level to a voice whose volume level has been adjusted due to deterioration of a reception state (see, for example, Patent Document 1).

特開２００７−１７３９６７号公報Japanese Unexamined Patent Publication No. 2007-173967

上記特許文献１に記載の情報受信機は、受信状態の悪化により音量レベルを調整した音声に対して、ノイズレベルの調整された擬似ノイズを付加することにより、無音状態を避けるとともに音声と擬似ノイズの混合結果をユーザに聴取させることができる。しかし、この情報受信器は、受信したディジタル音声放送信号のエラーレートに基づいて擬似ノイズが付加されるものであるので、例えば、ユーザの要求に対する回答がどの程度対応しているかの精度を表すことはできず、ユーザは、なぜその回答となるかわかりにくいことがあった。 The information receiver described in Patent Document 1 avoids a silent state and avoids a silent state by adding pseudo noise whose volume level is adjusted to the voice whose volume level is adjusted due to deterioration of the reception state. The mixed result can be heard by the user. However, since this information receiver adds pseudo noise based on the error rate of the received digital audio broadcast signal, for example, it indicates the accuracy of how well the response to the user's request corresponds. It was not possible, and users sometimes had difficulty understanding why the answer was.

本発明は、このような事情を考慮してなされたものであり、ユーザの要求に対する回答の精度をユーザに知らせることができる音声対話装置、音声対話方法、及びプログラムを提供することを目的の一つとする。 The present invention has been made in consideration of such circumstances, and an object of the present invention is to provide a voice dialogue device, a voice dialogue method, and a program capable of informing the user of the accuracy of a response to a user's request. Let's try.

この発明に係る音声対話装置、音声対話方法、及びプログラムは、以下の構成を採用した。
（１）：この発明の一態様に係る音声対話装置は、音声情報を取得する音声取得部と、外部装置から、前記音声取得部により取得された前記音声情報に対する外部応答情報を受信して取得する第１情報取得部と、特定の音声情報に対する内部応答情報を記憶する記憶部と、前記記憶部から、前記音声取得部により取得された前記音声情報に対する内部応答情報を取得する第２情報取得部と、前記外部応答情報及び前記内部応答情報のうち少なくとも一方を出力部に出力させる出力制御部と、を備え、前記出力制御部は、前記内部応答情報を出力させる場合、前記内部応答情報とともに効果音を前記出力部に出力させる、音声対話装置である。 The voice dialogue device, the voice dialogue method, and the program according to the present invention have adopted the following configurations.
(1): The voice dialogue device according to one aspect of the present invention receives and acquires external response information to the voice information acquired by the voice acquisition unit from a voice acquisition unit that acquires voice information and an external device. A second information acquisition unit that acquires internal response information to the voice information acquired by the voice acquisition unit from the first information acquisition unit, a storage unit that stores internal response information for specific voice information, and the storage unit. A unit and an output control unit that outputs at least one of the external response information and the internal response information to the output unit are provided, and the output control unit together with the internal response information when outputting the internal response information. It is a voice dialogue device that outputs a sound effect to the output unit.

（２）：上記（１）の態様において、前記内部応答情報及び前記外部応答情報のいずれを出力するかを判定する判定部と、前記第１情報取得部と前記外部装置との間の通信の品質に関する通信品質情報を取得する通信品質取得部と、を更に備え、前記判定部は、前記通信品質取得部により取得された通信品質情報に基づいて、出力する応答情報を判定し、前記出力制御部は、前記判定部により判定された前記内部応答情報または前記外部応答情報を前記出力部に出力させるものである。 (2): In the aspect of (1) above, the communication between the determination unit that determines whether to output the internal response information or the external response information, and the first information acquisition unit and the external device. It further includes a communication quality acquisition unit that acquires communication quality information related to quality, and the determination unit determines response information to be output based on the communication quality information acquired by the communication quality acquisition unit, and the output control. The unit outputs the internal response information or the external response information determined by the determination unit to the output unit.

（３）：上記（２）の態様において、前記判定部は、前記通信品質取得部により取得された通信品質情報が第１判定品質以下である場合に、前記内部応答情報を出力すると判定するものである。 (3): In the aspect of (2) above, the determination unit determines to output the internal response information when the communication quality information acquired by the communication quality acquisition unit is equal to or lower than the first determination quality. Is.

（４）：上記の（３）の態様において、前記判定部は、前記通信品質取得部により取得された通信品質情報が第１判定品質を超える場合に、前記外部応答情報を出力すると判定し、前記出力制御部は、前記通信品質取得部により取得された通信品質情報が第２判定品質以下である場合に、前記外部応答情報とともに効果音を前記出力部に出力させるものである。 (4): In the aspect of (3) above, the determination unit determines that the external response information is output when the communication quality information acquired by the communication quality acquisition unit exceeds the first determination quality. The output control unit outputs a sound effect to the output unit together with the external response information when the communication quality information acquired by the communication quality acquisition unit is equal to or lower than the second determination quality.

（５）：上記（１）から（４）のいずれかの態様において、前記出力制御部は、前記効果音を前記内部応答情報に重ねて出力させるものである。 (5): In any of the above aspects (1) to (4), the output control unit outputs the sound effect overlaid on the internal response information.

（６）：上記（１）から（５）のいずれかの態様において、車載機器を搭載する車両に搭載され、前記出力制御部は、前記内部応答情報として、前記車載機器に関する情報を出力する際には、前記効果音を出力させないものである。 (6): In any of the above (1) to (5), when the vehicle is mounted on a vehicle and the output control unit outputs information about the vehicle-mounted device as the internal response information. Does not output the sound effect.

（７）：この発明の一態様に係る音声対話方法は、音声情報を取得し、外部装置から、前記音声情報に対する外部応答情報を受信して取得し、前記外部装置との間の通信の品質に関する通信品質情報を取得し、特定の音声情報に対する内部応答情報を記憶する記憶部から、前記音声情報に対する内部応答情報を取得し、前記外部応答情報及び前記内部応答情報のうち少なくとも一方を出力部に出力させ、前記内部応答情報を出力させる場合、前記内部応答情報とともに効果音を前記出力部に出力させる音声対話方法である。 (7): The voice dialogue method according to one aspect of the present invention acquires voice information, receives and acquires external response information to the voice information from an external device, and quality of communication with the external device. The internal response information for the voice information is acquired from the storage unit that acquires the communication quality information regarding the voice information and stores the internal response information for the specific voice information, and at least one of the external response information and the internal response information is output as an output unit. This is a voice dialogue method in which a sound effect is output to the output unit together with the internal response information when the internal response information is output.

（８）：この発明の一態様に係るプログラムは、音声対話装置のコンピュータに、音声情報を取得させ、外部装置から、前記音声情報に対する外部応答情報を受信して取得させ、前記外部装置との間の通信の品質に関する通信品質情報を取得させ、特定の音声情報に対する内部応答情報を記憶する記憶部から、前記音声情報に対する内部応答情報を取得させ、前記外部応答情報及び前記内部応答情報のうち少なくとも一方を出力部に出力させる処理を実行させ、前記内部応答情報を出力させる場合、前記内部応答情報とともに効果音を前記出力部に出力させる処理を実行させるプログラムである。 (8): In the program according to one aspect of the present invention, a computer of a voice dialogue device is made to acquire voice information, and external response information to the voice information is received and acquired from the external device, and the program is connected to the external device. Of the external response information and the internal response information, the communication quality information regarding the quality of communication between the two is acquired, and the internal response information for the voice information is acquired from the storage unit that stores the internal response information for the specific voice information. When the process of outputting at least one of them to the output unit is executed and the internal response information is output, the program executes the process of outputting the sound effect to the output unit together with the internal response information.

（１）〜（８）によれば、ユーザの要求に対する回答の精度をユーザに知らせることができる。
（２）〜（５）によれば、ユーザの要求に対する回答の精度が低いことをユーザに認識させることができる。
（６）によれば、車両機器に関する情報についてはユーザに認識させやすくすることができる。 According to (1) to (8), it is possible to inform the user of the accuracy of the response to the user's request.
According to (2) to (5), it is possible to make the user recognize that the accuracy of the response to the user's request is low.
According to (6), it is possible to make it easier for the user to recognize the information about the vehicle equipment.

音声対話装置１００を含むエージェントシステム１の構成図である。It is a block diagram of the agent system 1 including the voice dialogue apparatus 100. 音声対話装置１００の構成と、車両Ｍに搭載された機器とを示す図である。It is a figure which shows the structure of the voice dialogue apparatus 100, and the apparatus mounted on the vehicle M. エージェントサーバ２００の構成と、音声対話装置１００の構成の一部とを示す図である。It is a figure which shows the configuration of the agent server 200, and a part of the configuration of a voice dialogue device 100. 音声対話装置１００において実行される処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of the process executed in the voice dialogue apparatus 100. 音声対話装置１００において実行される処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of the process executed in the voice dialogue apparatus 100. 表示・操作装置２０の表示及びピーカ３０の出力の一例を示す説明図である。It is explanatory drawing which shows an example of the display of the display / operation device 20 and the output of a peaker 30. 表示・操作装置２０の表示及びピーカ３０の出力の一例を示す説明図である。It is explanatory drawing which shows an example of the display of the display / operation device 20 and the output of a peaker 30.

以下、図面を参照し、本発明の音声対話装置、音声対話方法、及びプログラムの実施形態について説明する。音声対話装置は、例えば、エージェント機能を備える。エージェント機能とは、例えば、車両Ｍのユーザである乗員と対話をしながら、乗員の発話の中に含まれる要求（コマンド）に基づく各種の情報提供を行ったり、ネットワークサービスを仲介したりする機能である。エージェントは、単数でもよいし、複数種類でもよい。複数種類のエージェントはそれぞれに果たす機能、処理手順、制御、出力態様・内容がそれぞれ異なってもよい。また、エージェント機能の中には、車両内の機器（例えば運転制御や車体制御に関わる機器）の制御等を行う機能を有するものがあってよい。 Hereinafter, embodiments of the voice dialogue device, the voice dialogue method, and the program of the present invention will be described with reference to the drawings. The voice dialogue device includes, for example, an agent function. The agent function is, for example, a function of providing various information based on a request (command) included in the utterance of the occupant or mediating a network service while interacting with the occupant who is the user of the vehicle M. Is. The agent may be singular or multiple types. The functions, processing procedures, controls, output modes and contents of each of the plurality of types of agents may be different. In addition, some of the agent functions may have a function of controlling equipment in the vehicle (for example, equipment related to driving control and vehicle body control).

エージェント機能は、例えば、乗員の音声を認識する音声認識機能（音声をテキスト化する機能）に加え、自然言語処理機能（テキストの構造や意味を理解する機能）、対話管理機能、他装置を検索し、或いは自装置が保有する所定のデータベースを検索するネットワーク検索機能等を統合的に利用して実現される。これらの機能の一部または全部は、ＡＩ（Artificial Intelligence）技術によって実現されてよい。また、これらの機能を行うための構成の一部（特に、音声認識機能や自然言語処理解釈機能）は、車両Ｍの車載通信装置または車両Ｍに持ち込まれた汎用通信端末（携帯端末）と通信可能なエージェントサーバに搭載されてもよい。以下の説明では、構成の一部がエージェントサーバに搭載されており、音声対話装置とエージェントサーバが協働してエージェント機能を実現することを前提とする。また、音声対話装置とエージェントサーバが協働して仮想的に出現させるサービス提供主体（サービス・エンティティ）をエージェントと称する。エージェントサーバは、「外部装置」の一例である。 Agent functions include, for example, a voice recognition function that recognizes the voice of an occupant (a function that converts voice into text), a natural language processing function (a function that understands the structure and meaning of text), a dialogue management function, and a search for other devices. Alternatively, it is realized by integratedly using a network search function or the like for searching a predetermined database owned by the own device. Some or all of these functions may be realized by AI (Artificial Intelligence) technology. In addition, a part of the configuration for performing these functions (particularly, the voice recognition function and the natural language processing interpretation function) communicates with the in-vehicle communication device of the vehicle M or the general-purpose communication terminal (mobile terminal) brought into the vehicle M. It may be installed in a possible agent server. In the following description, it is assumed that a part of the configuration is installed in the agent server, and the voice dialogue device and the agent server cooperate to realize the agent function. Further, a service provider (service entity) in which a voice dialogue device and an agent server cooperate to appear virtually is called an agent. The agent server is an example of an "external device".

［全体構成］
図１は、音声対話装置１００を含むエージェントシステム１の構成図である。エージェントシステム１は、例えば、音声対話装置１００と、エージェントサーバ２００と、を備える。エージェントサーバ２００は、単数でもよいし、複数でもよい。複数のエージェントサーバ２００が存在する場合には、複数のエージェントサーバ２００は、互いに異なるエージェントシステムの提供者が運営するものである。この場合のエージェントは、互いに異なるエージェントシステムの提供者が運営するものである。 [overall structure]
FIG. 1 is a configuration diagram of an agent system 1 including a voice dialogue device 100. The agent system 1 includes, for example, a voice dialogue device 100 and an agent server 200. The agent server 200 may be singular or plural. When a plurality of agent servers 200 exist, the plurality of agent servers 200 are operated by providers of different agent systems. The agents in this case are operated by providers of different agent systems.

音声対話装置１００は、エージェントサーバ２００を有するエージェントシステムの提供者が提供する。エージェントサーバ２００は、音声対話装置１００におけるエージェント機能部１５０の親サーバである。エージェントシステムの提供者としては、例えば、自動車メーカー、ネットワークサービス事業者、電子商取引事業者、携帯端末の販売者や製造者などが挙げられ、任意の主体（法人、団体、個人等）がエージェントシステムの提供者となり得る。 The voice dialogue device 100 is provided by an agent system provider having an agent server 200. The agent server 200 is a parent server of the agent function unit 150 in the voice dialogue device 100. Examples of the agent system provider include automobile manufacturers, network service providers, e-commerce businesses, mobile terminal sellers and manufacturers, and any entity (corporation, group, individual, etc.) is the agent system. Can be a provider of

音声対話装置１００は、ネットワークＮＷを介してエージェントサーバ２００と通信する。ネットワークＮＷは、例えば、インターネット、セルラー網、Ｗｉ−Ｆｉ網、ＷＡＮ（Wide Area Network）、ＬＡＮ（Local Area Network）、公衆回線、電話回線、無線基地局などのうち一部または全部を含む。ネットワークＮＷには、各種ウェブサーバ５００が接続されている。音声対話装置１００及びエージェントサーバ２００は、いずれもネットワークＮＷを介して各種ウェブサーバ５００からウェブページを取得することができる。 The voice dialogue device 100 communicates with the agent server 200 via the network NW. The network NW includes, for example, a part or all of the Internet, a cellular network, a Wi-Fi network, a WAN (Wide Area Network), a LAN (Local Area Network), a public line, a telephone line, a wireless base station, and the like. Various web servers 500 are connected to the network NW. Both the voice dialogue device 100 and the agent server 200 can acquire web pages from various web servers 500 via the network NW.

音声対話装置１００は、車両Ｍの乗員と対話を行い、乗員からの音声に基づいて、応答文などの応答情報を生成したり、乗員からの音声についての情報をエージェントサーバ２００に送信し、エージェントサーバ２００から応答文を得たりして、これらの応答文を音声出力や画像表示の形で乗員に提示する。 The voice dialogue device 100 interacts with the occupant of the vehicle M, generates response information such as a response sentence based on the voice from the occupant, transmits information about the voice from the occupant to the agent server 200, and causes an agent. Response sentences are obtained from the server 200, and these response sentences are presented to the occupants in the form of voice output or image display.

［車両］
図２は、音声対話装置１００の構成と、車両Ｍに搭載された機器とを示す図である。車両Ｍには、例えば、一以上のマイク１０と、表示・操作装置２０と、スピーカ３０と、ナビゲーション装置４０と、車両機器５０と、車載通信装置６０と、音声対話装置１００とが搭載される。また、汎用通信端末が車室内に持ち込まれ、通信装置として使用される。これらの装置は、ＣＡＮ（Controller Area Network）通信線等の多重通信線やシリアル通信線、無線通信網等によって互いに接続される。なお、図２に示す構成はあくまで一例であり、構成の一部が省略されてもよいし、更に別の構成が追加されてもよい。 [vehicle]
FIG. 2 is a diagram showing the configuration of the voice dialogue device 100 and the equipment mounted on the vehicle M. The vehicle M is equipped with, for example, one or more microphones 10, a display / operation device 20, a speaker 30, a navigation device 40, a vehicle device 50, an in-vehicle communication device 60, and a voice dialogue device 100. .. In addition, a general-purpose communication terminal is brought into the vehicle interior and used as a communication device. These devices are connected to each other by a multiplex communication line such as a CAN (Controller Area Network) communication line, a serial communication line, a wireless communication network, or the like. The configuration shown in FIG. 2 is merely an example, and a part of the configuration may be omitted or another configuration may be added.

マイク１０は、車室内で発せられた音声を収集する収音部である。表示・操作装置２０は、画像を表示すると共に、入力操作を受付可能な装置（或いは装置群）である。表示・操作装置２０は、例えば、タッチパネルとして構成されたディスプレイ装置を含む。表示・操作装置２０は、更に、ＨＵＤ（Head Up Display）や機械式の入力装置を含んでもよい。スピーカ３０は、例えば、車室内の互いに異なる複数の位置に配設される。表示・操作装置２０は、音声対話装置１００とナビゲーション装置４０とで共用されてもよい。表示・操作装置２０及びスピーカ３０は、「出力部」の一例である。 The microphone 10 is a sound collecting unit that collects sounds emitted in the vehicle interior. The display / operation device 20 is a device (or a group of devices) capable of displaying an image and accepting an input operation. The display / operation device 20 includes, for example, a display device configured as a touch panel. The display / operation device 20 may further include a HUD (Head Up Display) or a mechanical input device. The speakers 30 are arranged at a plurality of positions different from each other in the vehicle interior, for example. The display / operation device 20 may be shared by the voice dialogue device 100 and the navigation device 40. The display / operation device 20 and the speaker 30 are examples of the “output unit”.

ナビゲーション装置４０は、ナビＨＭＩ（Human machine Interface）と、ＧＰＳ（Global Positioning System）などの位置測位装置と、地図情報を記憶した記憶装置と、経路探索などを行う制御装置（ナビゲーションコントローラ）とを備える。マイク１０、表示・操作装置２０、およびスピーカ３０のうち一部または全部がナビＨＭＩとして用いられてもよい。ナビゲーション装置４０は、位置測位装置によって特定された車両Ｍの位置から、乗員によって入力された目的地まで移動するための経路（ナビ経路）を探索し、経路に沿って車両Ｍが走行できるように、ナビＨＭＩを用いて案内情報を出力する。経路探索機能は、ネットワークＮＷを介してアクセス可能なナビゲーションサーバにあってもよい。この場合、ナビゲーション装置４０は、ナビゲーションサーバから経路を取得して案内情報を出力する。なお、音声対話装置１００は、ナビゲーションコントローラを基盤として構築されてもよく、その場合、ナビゲーションコントローラと音声対話装置１００は、ハードウェア上は一体に構成される。 The navigation device 40 includes a navigation HMI (Human machine Interface), a positioning device such as a GPS (Global Positioning System), a storage device that stores map information, and a control device (navigation controller) that performs route search and the like. .. A part or all of the microphone 10, the display / operation device 20, and the speaker 30 may be used as the navigation HMI. The navigation device 40 searches for a route (navigation route) for moving from the position of the vehicle M specified by the positioning device to the destination input by the occupant, so that the vehicle M can travel along the route. , Navi HMI is used to output guidance information. The route search function may be provided in a navigation server accessible via the network NW. In this case, the navigation device 40 acquires a route from the navigation server and outputs guidance information. The voice dialogue device 100 may be constructed based on the navigation controller. In that case, the navigation controller and the voice dialogue device 100 are integrally configured on the hardware.

車両機器５０は、例えば、エンジンや走行用モータなどの駆動力出力装置、エンジンの始動モータ、ドアロック装置、ドア開閉装置、窓、窓の開閉装置及び窓の開閉制御装置、シート、シート位置の制御装置、ルームミラー及びその角度位置制御装置、車両内外の照明装置及びその制御装置、ワイパーやデフォッガー及びそれぞれの制御装置、方向指示灯及びその制御装置、空調装置、走行距離やタイヤの空気圧の情報や燃料の残量情報などの車両情報装置などを含む。 The vehicle device 50 includes, for example, a driving force output device such as an engine or a traveling motor, an engine start motor, a door lock device, a door opening / closing device, a window, a window opening / closing device, a window opening / closing control device, a seat, and a seat position. Control device, room mirror and its angle position control device, lighting device inside and outside the vehicle and its control device, wiper and defogger and their respective control devices, direction indicator and its control device, air conditioner, mileage and tire pressure information And vehicle information devices such as fuel level information.

車載通信装置６０は、例えば、セルラー網やＷｉ−Ｆｉ網を利用してネットワークＮＷにアクセス可能な無線通信装置である。 The in-vehicle communication device 60 is, for example, a wireless communication device that can access the network NW using a cellular network or a Wi-Fi network.

音声対話装置１００は、例えば、管理部１１０と、エージェント機能部１５０と、を備える。管理部１１０は、例えば、音響処理部１１２と、ＷＵ（Wake Up）判定部１１４と、表示制御部１１６と、音声制御部１１８と、を備える。エージェント機能部１５０は、例えば、音声取得部１５１と、第１情報取得部１５２と、第２情報取得部１５３と、通信品質取得部１５４と、判定部１５５と、出力制御部１５６と、車載機器指令部１５７と、記憶部１６０と、を備える。図２に示す各機能部の構成は説明のために簡易に示しており、実際には、例えば、エージェント機能部１５０と車載通信装置６０の間に管理部１１０が介在してもよいように、任意に改変することができる。 The voice dialogue device 100 includes, for example, a management unit 110 and an agent function unit 150. The management unit 110 includes, for example, an sound processing unit 112, a WU (Wake Up) determination unit 114, a display control unit 116, and a voice control unit 118. The agent function unit 150 includes, for example, a voice acquisition unit 151, a first information acquisition unit 152, a second information acquisition unit 153, a communication quality acquisition unit 154, a determination unit 155, an output control unit 156, and an in-vehicle device. It includes a command unit 157 and a storage unit 160. The configuration of each functional unit shown in FIG. 2 is simply shown for the sake of explanation. In practice, for example, the management unit 110 may intervene between the agent functional unit 150 and the in-vehicle communication device 60. It can be modified arbitrarily.

音声対話装置１００の各構成要素は、例えば、ＣＰＵ（Central Processing Unit）などのハードウェアプロセッサがプログラム（ソフトウェア）を実行することにより実現される。これらの構成要素のうち一部または全部は、ＬＳＩ（Large Scale Integration）やＡＳＩＣ（Application Specific Integrated Circuit）、ＦＰＧＡ（Field-Programmable Gate Array）、ＧＰＵ（Graphics Processing Unit）などのハードウェア（回路部；circuitryを含む）によって実現されてもよいし、ソフトウェアとハードウェアの協働によって実現されてもよい。プログラムは、予めＨＤＤ（Hard Disk Drive）やフラッシュメモリなどの記憶装置（非一過性の記憶媒体を備える記憶装置）に格納されていてもよいし、ＤＶＤやＣＤ−ＲＯＭなどの着脱可能な記憶媒体（非一過性の記憶媒体）に格納されており、記憶媒体がドライブ装置に装着されることでインストールされてもよい。エージェント機能部１５０の記憶部１６０は、例えば、ローカル辞書ＤＢ（データベース）１６２、ローカル知識ベースＤＢ１６４、ローカル応答規則ＤＢ１６６を記憶する。記憶部１６０は、音声対話装置１００に含まれる上記の各種記憶装置により実現される。 Each component of the voice dialogue device 100 is realized, for example, by executing a program (software) by a hardware processor such as a CPU (Central Processing Unit). Some or all of these components are hardware (circuit parts;) such as LSI (Large Scale Integration), ASIC (Application Specific Integrated Circuit), FPGA (Field-Programmable Gate Array), GPU (Graphics Processing Unit). It may be realized by (including circuits), or it may be realized by the cooperation of software and hardware. The program may be stored in advance in a storage device (a storage device including a non-transient storage medium) such as an HDD (Hard Disk Drive) or a flash memory, or a removable storage device such as a DVD or a CD-ROM. It is stored in a medium (non-transient storage medium) and may be installed by mounting the storage medium in a drive device. The storage unit 160 of the agent function unit 150 stores, for example, a local dictionary DB (database) 162, a local knowledge base DB 164, and a local response rule DB 166. The storage unit 160 is realized by the above-mentioned various storage devices included in the voice dialogue device 100.

、
管理部１１０は、ＯＳ（Operating System）やミドルウェアなどのプログラムが実行されることで機能する。管理部１１０の音響処理部１１２は、音声対話装置１００に対して予め設定されているウエイクアップワードを認識するのに適した状態になるように、入力された音声に対して音響処理を行い、音声ストリームを生成する。 ,
The management unit 110 functions by executing a program such as an OS (Operating System) or middleware. The sound processing unit 112 of the management unit 110 performs sound processing on the input voice so as to be in a state suitable for recognizing a wakeup word preset for the voice dialogue device 100. Generate an audio stream.

ＷＵ判定部１１４は、音声対話装置１００に対して予め定められているウエイクアップワードを認識する。ＷＵ判定部１１４は、音響処理部１１２において生成された音声ストリームから音声の意味を認識する。まず、ＷＵ判定部１１４は、音声ストリームにおける音声波形の振幅と零交差に基づいて音声区間を音声情報として検出する。ＷＵ判定部１１４は、混合ガウス分布モデル（ＧＭＭ；Gaussian mixture model）に基づくフレーム単位の音声識別および非音声識別に基づく区間検出を行ってもよい。 The WU determination unit 114 recognizes a predetermined wake-up word for the voice dialogue device 100. The WU determination unit 114 recognizes the meaning of voice from the voice stream generated by the sound processing unit 112. First, the WU determination unit 114 detects the voice section as voice information based on the amplitude and zero intersection of the voice waveform in the voice stream. The WU determination unit 114 may perform frame-by-frame speech recognition based on a mixture Gaussian mixture model (GMM) and section detection based on non-speech recognition.

次に、ＷＵ判定部１１４は、検出した音声区間の音声情報における音声をテキスト化し、文字情報とする。そして、ＷＵ判定部１１４は、テキスト化した文字情報がウエイクアップワードに該当するか否かを判定する。ウエイクアップワードであると判定した場合、ＷＵ判定部１１４は、エージェント機能部１５０を起動させる。ＷＵ判定部１１４は、ウエイクアップワードでないと判定した文字情報をエージェント機能部１５０に出力する。 Next, the WU determination unit 114 converts the voice in the voice information of the detected voice section into text and converts it into character information. Then, the WU determination unit 114 determines whether or not the textualized character information corresponds to the wakeup word. When it is determined that the word is a wakeup word, the WU determination unit 114 activates the agent function unit 150. The WU determination unit 114 outputs character information determined to be not a wakeup word to the agent function unit 150.

なお、ＷＵ判定部１１４に相当する機能がエージェントサーバ２００に搭載されてもよい。この場合、管理部１１０は、音響処理部１１２によって音響処理が行われた音声ストリームをエージェントサーバ２００に送信し、エージェントサーバ２００がウエイクアップワードであると判定した場合、エージェントサーバ２００からの指示に従ってエージェント機能部１５０が起動する。なお、エージェント機能部１５０は、常時起動しており且つウエイクアップワードの判定を自ら行うものであってよい。この場合、管理部１１０がＷＵ判定部１１４を備える必要はない。 The agent server 200 may be equipped with a function corresponding to the WU determination unit 114. In this case, when the management unit 110 transmits the voice stream to which the sound processing has been performed by the sound processing unit 112 to the agent server 200 and determines that the agent server 200 is a wakeup word, the management unit 110 follows an instruction from the agent server 200. The agent function unit 150 starts. The agent function unit 150 may be always activated and may determine the wakeup word by itself. In this case, the management unit 110 does not need to include the WU determination unit 114.

表示制御部１１６は、エージェント機能部１５０からの指示に応じて表示・操作装置２０に画像を表示させる。表示制御部１１６は、エージェント機能部１５０の制御により、例えば、車室内で乗員とのコミュニケーションを行う擬人化されたエージェントの画像（以下、エージェント画像と称する）を生成し、生成したエージェント画像を表示・操作装置２０に表示させる。エージェント画像は、例えば、乗員に対して話しかける態様の画像である。エージェント画像は、例えば、少なくとも観者（乗員）によって表情や顔向きが認識される程度の顔画像を含んでよい。例えば、エージェント画像は、顔領域の中に目や鼻に擬したパーツが表されており、顔領域の中のパーツの位置に基づいて表情や顔向きが認識されるものであってよい。また、エージェント画像は、立体的に感じられ、観者によって三次元空間における頭部画像を含むことでエージェントの顔向きが認識されたり、本体（胴体や手足）の画像を含むことで、エージェントの動作や振る舞い、姿勢等が認識されたりするものであってもよい。また、エージェント画像は、アニメーション画像であってもよい。 The display control unit 116 causes the display / operation device 20 to display an image in response to an instruction from the agent function unit 150. Under the control of the agent function unit 150, the display control unit 116 generates, for example, an image of an anthropomorphic agent (hereinafter referred to as an agent image) that communicates with the occupant in the vehicle interior, and displays the generated agent image. -Display on the operating device 20. The agent image is, for example, an image of a mode of talking to an occupant. The agent image may include, for example, a facial image such that the facial expression and the facial orientation are recognized by the viewer (occupant) at least. For example, in the agent image, parts imitating eyes and nose are represented in the face area, and the facial expression and face orientation may be recognized based on the positions of the parts in the face area. In addition, the agent image is felt three-dimensionally, and the viewer can recognize the face orientation of the agent by including the head image in the three-dimensional space, or the agent's image can be included by including the image of the main body (body and limbs). The movement, behavior, posture, etc. may be recognized. Further, the agent image may be an animation image.

音声制御部１１８は、エージェント機能部１５０からの指示に応じて、スピーカ３０に音声を出力させる。音声制御部１１８は、複数のスピーカ３０を用いて、エージェント画像の表示位置に対応する位置にエージェント音声の音像を定位させる制御を行ってもよい。音声制御部１１８は、「出力制御部」の一例である。 The voice control unit 118 causes the speaker 30 to output voice in response to an instruction from the agent function unit 150. The voice control unit 118 may use a plurality of speakers 30 to control the localization of the sound image of the agent voice at a position corresponding to the display position of the agent image. The voice control unit 118 is an example of an “output control unit”.

音声による応答を含むサービスを提供するための車載用のアプリケーションプログラム（以下、車載エージェントアプリ）が実行されることで、エージェントサーバ２００と協働してエージェントを出現させ、車両の乗員の発話に基づく発話情報に応じて、音声による応答を含むサービスを提供する。エージェント機能部１５０には、車両機器５０を制御する権限が付与されたものであるが、車両機器５０を制御する権限が付与されていないものでもよい。 By executing an in-vehicle application program (hereinafter referred to as an in-vehicle agent application) for providing a service including a voice response, an agent appears in cooperation with the agent server 200, and is based on the utterance of a vehicle occupant. Provide services including voice responses according to utterance information. The agent function unit 150 is authorized to control the vehicle device 50, but may not be authorized to control the vehicle device 50.

音声取得部１５１は、管理部１１０のＷＵ判定部１１４により出力される文字情報を取得して認識する。音声取得部１５１は、文字情報の形で音声情報を取得して認識する。音声取得部１５１は、認識した文字情報に対して、記憶部１６０が記憶するローカル辞書ＤＢ１６２を参照しながら意味解釈を行う。ローカル辞書ＤＢ１６２は、同義語や類義語の一覧情報を含んでもよい。文字情報を認識する処理と、意味解釈を行う処理は、段階が明確に分かれるものではなく、意味解釈の結果を受けて、文字認識の認識結果を修正するなど、相互に影響し合って行われてよい。 The voice acquisition unit 151 acquires and recognizes the character information output by the WU determination unit 114 of the management unit 110. The voice acquisition unit 151 acquires and recognizes voice information in the form of character information. The voice acquisition unit 151 interprets the meaning of the recognized character information with reference to the local dictionary DB 162 stored in the storage unit 160. The local dictionary DB 162 may include list information of synonyms and synonyms. The process of recognizing character information and the process of interpreting meaning are not clearly separated in stages, and they are performed by interacting with each other, such as correcting the recognition result of character recognition based on the result of semantic interpretation. You can.

音声取得部１５１は、例えば、認識結果として、「今日の天気は」、「天気はどうですか」等の意味が認識された場合、標準文字情報「今日の天気」に置き換えたコマンドを生成する。これにより、リクエストの音声に文字揺らぎがあった場合にも要求にあった対話をし易くすることができる。音声取得部１５１は、文字認識の認識結果を、車載通信装置６０を用いて、エージェントサーバ２００に送信する。音声取得部１５１は、情報を送信する送信部または情報を送受信する通信部の一部となる。 For example, when the recognition result recognizes meanings such as "today's weather" and "how is the weather", the voice acquisition unit 151 generates a command replaced with the standard character information "today's weather". As a result, even if the voice of the request has character fluctuations, it is possible to facilitate the dialogue that meets the request. The voice acquisition unit 151 transmits the recognition result of character recognition to the agent server 200 by using the in-vehicle communication device 60. The voice acquisition unit 151 becomes a part of a transmission unit that transmits information or a communication unit that transmits and receives information.

第１情報取得部１５２は、車載通信装置６０を用いて、エージェントサーバ２００により送信されるサーバ応答文を受信して取得する。サーバ応答文には、そのサーバ応答文を生成する過程に関する決定過程情報が付加されている。第１情報取得部１５２は、取得したサーバ応答文を判定部１５５に出力し、決定過程情報を通信品質取得部１５４に出力する。第１情報取得部１５２は、情報を受信する受信部または情報を送受信する通信部の一部となる。サーバ応答文の生成及び送信、決定過程情報等については、エージェントサーバ２００の説明の際に説明する。 The first information acquisition unit 152 receives and acquires the server response statement transmitted by the agent server 200 by using the in-vehicle communication device 60. Decision process information regarding the process of generating the server response statement is added to the server response statement. The first information acquisition unit 152 outputs the acquired server response statement to the determination unit 155, and outputs the determination process information to the communication quality acquisition unit 154. The first information acquisition unit 152 becomes a part of a receiving unit that receives information or a communication unit that transmits and receives information. The generation and transmission of the server response statement, the determination process information, and the like will be described when the agent server 200 is described.

第２情報取得部１５３は、音声取得部１５１により認識された認識結果に基づいて、ローカル知識ベースＤＢ１６４、ローカル応答規則ＤＢ１６６を参照しながら車両Ｍの乗員に対する応答文を決定して取得する。ローカル知識ベースＤＢ１６４は、物事の関係性を規定した情報である。ローカル応答規則ＤＢ１６６は、コマンドに対してエージェントが行うべき動作（回答や機器制御の内容など）を規定した情報である。 The second information acquisition unit 153 determines and acquires a response sentence to the occupant of the vehicle M with reference to the local knowledge base DB 164 and the local response rule DB 166 based on the recognition result recognized by the voice acquisition unit 151. The local knowledge base DB 164 is information that defines the relationships between things. The local response rule DB 166 is information that defines an operation (answer, device control content, etc.) that the agent should perform in response to the command.

第２情報取得部１５３は、決定した応答文が車両Ｍの乗員に伝わるように、ローカル応答文を生成して取得する。第２情報取得部１５３は、乗員の名前を呼んだり、乗員の話し方に似せた話し方にしたりしたローカル応答文を決定して取得してもよい。第２情報取得部１５３は、ローカル応答文を判定部１５５に出力する。 The second information acquisition unit 153 generates and acquires a local response sentence so that the determined response sentence is transmitted to the occupant of the vehicle M. The second information acquisition unit 153 may determine and acquire a local response sentence that calls the occupant's name or makes the occupant speak in a manner similar to that of the occupant. The second information acquisition unit 153 outputs the local response statement to the determination unit 155.

通信品質取得部１５４は、車載通信装置６０の通信品質情報としての通信品質を取得する。車載通信装置６０の通信品質は、例えば、車両Ｍの送受信状態、エージェントサーバ２００の送受信状態、通信制限の有無・受信成否・タイムアウト等の情報に基づく品質である。通信品質取得部１５４が取得する通信品質は、例えば、車載通信装置６０の受信品質でもよいし、エージェント機能部１５０におけるエージェントサーバ２００により送信される情報の受信品質でもよい。または、車載通信装置６０とエージェントサーバ２００の間における通信の品質、例えばエージェントサーバ２００の送信品質でもよい。さらに、通信品質取得部１５４は、第１情報取得部１５２が取得したサーバ応答文に付加された決定過程情報を取得し、判定部１５５に出力する。 The communication quality acquisition unit 154 acquires the communication quality as the communication quality information of the in-vehicle communication device 60. The communication quality of the in-vehicle communication device 60 is, for example, a quality based on information such as a transmission / reception state of the vehicle M, a transmission / reception state of the agent server 200, presence / absence of communication restriction, reception success / failure, and timeout. The communication quality acquired by the communication quality acquisition unit 154 may be, for example, the reception quality of the in-vehicle communication device 60 or the reception quality of the information transmitted by the agent server 200 in the agent function unit 150. Alternatively, the quality of communication between the vehicle-mounted communication device 60 and the agent server 200, for example, the transmission quality of the agent server 200 may be used. Further, the communication quality acquisition unit 154 acquires the determination process information added to the server response statement acquired by the first information acquisition unit 152, and outputs it to the determination unit 155.

判定部１５５は、第２情報取得部１５３により出力されたローカル応答文が即答応答文であると判定した場合に、即答応答文を出力制御部１５６に出力する。即答応答文とは、内容が比較的単純であり、音声取得部１５１により認識された認識結果に対する応答内容として、ローカル応答文とサーバ応答文に差がつかないまたはつきにくい認識結果に対する応答文である。即答応答文としては、例えば、車両機器の操作に関するに関する認識結果、例えば、「窓を開けて」という認識結果に対して、「窓を開けます」といった車両機器の操作に関する指令応答文が挙げられる。あるいは、即答応答文としては、例えば、「今何時？」といった単純な問いかけの認識結果に対して、「９時１５分です」といった単純な応答文が挙げられる。即答応答文以外の応答文としては、例えば、天気情報など、エージェントサーバ２００に記憶された情報のみでは生成することが困難である応答文や、地名、場所名、曲名など無数に表現があるために、意図を理解するためには膨大な辞書データベースが必要になる応答文がある。 When the determination unit 155 determines that the local response sentence output by the second information acquisition unit 153 is a prompt answer response sentence, the determination unit 155 outputs the prompt answer response sentence to the output control unit 156. The prompt answer response sentence is a response sentence to the recognition result that has a relatively simple content and has no difference or is difficult to attach to the local response sentence and the server response sentence as the response content to the recognition result recognized by the voice acquisition unit 151. is there. Examples of the prompt response sentence include a recognition result regarding the operation of the vehicle device, for example, a command response sentence regarding the operation of the vehicle device such as "open the window" in response to the recognition result "open the window". .. Alternatively, as an immediate answer response sentence, for example, a simple response sentence such as "9:15" may be given to the recognition result of a simple question such as "what time is it now?". As the response sentence other than the prompt answer sentence, for example, there are innumerable expressions such as a response sentence such as weather information, which is difficult to generate only with the information stored in the agent server 200, a place name, a place name, and a song name. In addition, there is a response statement that requires a huge dictionary database to understand the intention.

判定部１５５は、第２情報取得部１５３により出力された応答文が車両機器の操作に関する指令応答文である場合に、指令応答文を出力制御部１５６に出力し、指令情報を車載機器指令部１５７に出力する。指令応答文は、車両機器の操作に関する指令文と認識した認識結果に応答する応答文である。指令情報は、指令応答文に対応する操作を車両機器に行わせる情報である。指令情報は、例えば、指令応答文が「窓を開けます」である場合に、車両の窓を開けさせるための情報である。 When the response sentence output by the second information acquisition unit 153 is a command response sentence related to the operation of the vehicle device, the determination unit 155 outputs the command response sentence to the output control unit 156 and outputs the command information to the in-vehicle device command unit. Output to 157. The command response statement is a response statement that responds to the recognition result recognized as the command statement related to the operation of the vehicle equipment. The command information is information that causes the vehicle equipment to perform an operation corresponding to the command response statement. The command information is, for example, information for opening the window of the vehicle when the command response statement is "open the window".

判定部１５５は、第２情報取得部１５３により出力された応答文が即答応答文でない場合に、通信品質取得部１５４が取得した通信品質に基づいて、音声取得部１５１が取得した認識結果に対する応答文を、第１情報取得部１５２により出力されたサーバ応答文とするか第２情報取得部１５３により出力されたローカル応答文とするかを判定する。判定部１５５は、通信品質取得部１５４が取得した通信品質が第１判定品質を超える場合に、音声取得部１５１が取得した認識結果に対する応答文を、第１情報取得部１５２により出力されたサーバ応答文とする。判定部１５５は、通信品質取得部１５４が取得した通信品質が第１判定品質以下の場合に、音声取得部１５１が取得した認識結果に対する応答文を、第２情報取得部１５３により出力されたローカル応答文とする。通信品質が第１判定品質以下である場合の例としては、第１情報取得部１５３がサーバ応答文を取得する前に、サーバ応答文の取得までに設定された待機時間が所定の時間を超えた場合、車両Ｍまたはエージェントサーバ２００の送受信状態不良または通信制限があり、サーバ応答文を受信できない場合、サーバ応答文の受信はできたが、受信したサーバ応答文が不完全なものであった場合等がある。 The determination unit 155 responds to the recognition result acquired by the voice acquisition unit 151 based on the communication quality acquired by the communication quality acquisition unit 154 when the response sentence output by the second information acquisition unit 153 is not an immediate answer response sentence. It is determined whether the statement is a server response statement output by the first information acquisition unit 152 or a local response statement output by the second information acquisition unit 153. When the communication quality acquired by the communication quality acquisition unit 154 exceeds the first determination quality, the determination unit 155 outputs a response sentence to the recognition result acquired by the voice acquisition unit 151 to the server output by the first information acquisition unit 152. Make it a response statement. When the communication quality acquired by the communication quality acquisition unit 154 is equal to or lower than the first determination quality, the determination unit 155 outputs a response sentence to the recognition result acquired by the voice acquisition unit 151 to the local output by the second information acquisition unit 153. Make it a response statement. As an example of the case where the communication quality is equal to or lower than the first judgment quality, the waiting time set until the server response statement is acquired before the first information acquisition unit 153 acquires the server response statement exceeds a predetermined time. In that case, if the transmission / reception status of the vehicle M or the agent server 200 is poor or the communication is restricted and the server response statement cannot be received, the server response statement can be received, but the received server response statement is incomplete. There are cases.

判定部１５５は、通信品質取得部１５４が取得した通信品質が第１判定品質を超え、応答文をサーバ応答文とすると判定した場合、通信品質取得部１５４により出力される決定過程情報に基づいて、通信品質が第２判定品質以下であるか否かを判定する。通信品質が第２判定品質以下である場合の例としては、例えば、エージェントサーバ２００におけるサーバ応答文の生成過程において、十分な情報が得られなかった場合などがある。判定部１５５は、通信品質が第２判定品質以下であると判定した場合に、出力制御部１５６にノイズ情報を出力させる。 When the determination unit 155 determines that the communication quality acquired by the communication quality acquisition unit 154 exceeds the first determination quality and the response statement is the server response statement, the determination unit 155 is based on the determination process information output by the communication quality acquisition unit 154. , It is determined whether or not the communication quality is equal to or lower than the second determination quality. As an example of the case where the communication quality is equal to or lower than the second determination quality, for example, there is a case where sufficient information is not obtained in the process of generating the server response statement in the agent server 200. When the determination unit 155 determines that the communication quality is equal to or lower than the second determination quality, the determination unit 156 causes the output control unit 156 to output noise information.

判定部１５５は、認識結果に対する応答文をサーバ応答文とローカル応答文のいずれか一方としてもよいし両方としてもよい。判定部１５５は、認識結果に対する応答文をサーバ応答文とローカル応答文の両方とする場合、サーバ応答文とローカル応答文のいずれを先に出力するかを適宜の基準で判定してもよい。 The determination unit 155 may use either the server response statement or the local response statement as the response statement for the recognition result, or both. When the response statement for the recognition result is both the server response statement and the local response statement, the determination unit 155 may determine whether to output the server response statement or the local response statement first based on an appropriate criterion.

出力制御部１５６は、認識結果に対する応答文として判定部１５５により判定され、スピーカ３０により出力させるための応答文を管理部１１０に出力する。例えば、判定部１５５によりローカル応答文を出力させると判定された場合に、出力制御部１５６は、ローカル応答文を内部応答情報として管理部１１０に出力する。出力制御部１５６により内部応答情報を出力された管理部１１０は、音声制御部１１８を用いて、ローカル応答文を出力させるようにスピーカ３０を制御する。また、判定部１５５によりサーバ応答文を出力させると判定された場合に、出力制御部１５６は、サーバ応答文を外部応答情報として管理部１１０に出力する。出力制御部１５６により外部応答情報を出力された管理部１１０は、サーバ応答文を出力させるように、表示制御部１１６を用いて表示・操作装置２０を制御したり、音声制御部１１８を用いてスピーカ３０を制御したりする。このように、出力制御部１５６は、判定部１５５により判定されたローカル応答文またはサーバ応答文を表示・操作装置２０に表示させ、スピーカ３０に出力させる。 The output control unit 156 is determined by the determination unit 155 as a response sentence to the recognition result, and outputs a response sentence to be output by the speaker 30 to the management unit 110. For example, when the determination unit 155 determines that the local response statement is to be output, the output control unit 156 outputs the local response statement as internal response information to the management unit 110. The management unit 110, which has output the internal response information by the output control unit 156, uses the voice control unit 118 to control the speaker 30 so as to output a local response statement. Further, when the determination unit 155 determines that the server response statement is to be output, the output control unit 156 outputs the server response statement to the management unit 110 as external response information. The management unit 110 to which the external response information is output by the output control unit 156 controls the display / operation device 20 by using the display control unit 116 or uses the voice control unit 118 so as to output the server response statement. It controls the speaker 30. In this way, the output control unit 156 causes the display / operation device 20 to display the local response statement or the server response statement determined by the determination unit 155, and outputs the local response statement to the speaker 30.

出力制御部１５６は、ローカル応答文を出力させるための内部応答情報を管理部１１０に出力する際に、スピーカ３０に効果音としてのノイズ（ノイズ音）を出力させるためのノイズ情報を合わせて出力する。出力制御部１５６は、判定部１５５に、通信品質が第２判定品質以下であると判定された場合には、サーバ応答文を出力させるための外部応答情報を管理部１１０に出力する際にスピーカ３０に効果音としてのノイズを出力させるためのノイズ情報を合わせて出力する。出力制御部１５６により内部応答情報または外部応答情報とともにノイズ情報を出力された管理部１１０の音声制御部１１８は、ノイズを重畳させて（重ねて）ローカル応答文またはサーバ応答文を出力させるようにスピーカ３０を制御する。出力制御部１５６は、ノイズを重畳させてローカル応答文またはサーバ応答文を出力させる代わりに、ローカル応答文またはサーバ応答文の前または後にノイズなどの効果音を出力させるための情報を出力するようにしてもよい。 When the output control unit 156 outputs the internal response information for outputting the local response statement to the management unit 110, the output control unit 156 also outputs the noise information for outputting the noise (noise sound) as a sound effect to the speaker 30. To do. When the determination unit 155 determines that the communication quality is equal to or lower than the second determination quality, the output control unit 156 outputs the external response information for outputting the server response statement to the management unit 110. Noise information for outputting noise as a sound effect is also output to 30. The voice control unit 118 of the management unit 110, which outputs the noise information together with the internal response information or the external response information by the output control unit 156, outputs the local response statement or the server response statement by superimposing (overlapping) the noise. Controls the speaker 30. The output control unit 156 outputs information for outputting a sound effect such as noise before or after the local response statement or the server response statement instead of outputting the local response statement or the server response statement by superimposing noise. It may be.

出力制御部１５６は、第２情報取得部１５３により即答応答文が出力された場合に、即答応答文を内部応答情報として管理部１１０に出力する。出力制御部１５６により内部応答情報を出力された管理部１１０は、音声制御部１１８を用いて、指令応答文を出力させるようにスピーカ３０を制御する。 When the prompt answer response sentence is output by the second information acquisition unit 153, the output control unit 156 outputs the prompt answer response sentence to the management unit 110 as internal response information. The management unit 110 to which the internal response information is output by the output control unit 156 controls the speaker 30 so as to output a command response sentence by using the voice control unit 118.

車載機器指令部１５７は、第２情報取得部１５３により指令情報が出力された場合に、指令情報に基づいて車両機器５０を制御する。ここでの車両機器５０の制御は、例えば、ドアの開閉、窓の開閉、シート位置の制御等がある。車両機器５０の制御は、制御対象を特定して行うものでもよく、例えば、運転席のドアを開閉させるものでもよい。 The in-vehicle device command unit 157 controls the vehicle device 50 based on the command information when the command information is output by the second information acquisition unit 153. The control of the vehicle equipment 50 here includes, for example, opening / closing of doors, opening / closing of windows, control of seat position, and the like. The vehicle device 50 may be controlled by specifying the control target, for example, opening and closing the door of the driver's seat.

［エージェントサーバ］
図３は、エージェントサーバ２００の構成と、音声対話装置１００の構成の一部とを示す図である。以下、エージェントサーバ２００の構成と共にエージェント機能部１５０等の動作について説明する。ここでは、音声対話装置１００からネットワークＮＷまでの物理的な通信についての説明を省略する。 [Agent server]
FIG. 3 is a diagram showing a configuration of the agent server 200 and a part of the configuration of the voice dialogue device 100. Hereinafter, the operation of the agent function unit 150 and the like together with the configuration of the agent server 200 will be described. Here, the description of the physical communication from the voice dialogue device 100 to the network NW will be omitted.

エージェントサーバ２００は、通信部２１０を備える。通信部２１０は、例えばＮＩＣ（Network Interface Card）などのネットワークインターフェースである。更に、エージェントサーバ２００は、例えば、自然言語処理部２２２と、対話管理部２２４と、ネットワーク検索部２２６と、応答文生成部２２８とを備える。これらの構成要素は、例えば、ＣＰＵなどのハードウェアプロセッサがプログラム（ソフトウェア）を実行することにより実現される。これらの構成要素のうち一部または全部は、ＬＳＩやＡＳＩＣ、ＦＰＧＡ、ＧＰＵなどのハードウェア（回路部；circuitryを含む）によって実現されてもよいし、ソフトウェアとハードウェアの協働によって実現されてもよい。プログラムは、予めＨＤＤやフラッシュメモリなどの記憶装置（非一過性の記憶媒体を備える記憶装置）に格納されていてもよいし、ＤＶＤやＣＤ−ＲＯＭなどの着脱可能な記憶媒体（非一過性の記憶媒体）に格納されており、記憶媒体がドライブ装置に装着されることでインストールされてもよい。 The agent server 200 includes a communication unit 210. The communication unit 210 is a network interface such as a NIC (Network Interface Card). Further, the agent server 200 includes, for example, a natural language processing unit 222, a dialogue management unit 224, a network search unit 226, and a response sentence generation unit 228. These components are realized, for example, by a hardware processor such as a CPU executing a program (software). Some or all of these components may be realized by hardware such as LSI, ASIC, FPGA, GPU (including circuit part; circuitry), or realized by collaboration between software and hardware. May be good. The program may be stored in advance in a storage device such as an HDD or a flash memory (a storage device including a non-transient storage medium), or a removable storage medium (non-transient) such as a DVD or a CD-ROM. It is stored in a sex storage medium) and may be installed by attaching the storage medium to a drive device.

エージェントサーバ２００は、記憶部２５０を備える。記憶部２５０は、エージェントサーバ２００に含まれる上記の各種記憶装置により実現される。記憶部２５０には、パーソナルプロファイル２５２、辞書ＤＢ２５４、知識ベースＤＢ２５６、応答規則ＤＢ２５８などのデータやプログラムが格納される。パーソナルプロファイル２５２、辞書ＤＢ２５４、知識ベースＤＢ２５６、応答規則ＤＢ２５８などのデータやプログラムは、車載エージェントアプリに対応するものである。なお、パーソナルプロファイル２５２は、複数のユーザの個々のものであるが、エージェント機能部１５０には、車両Ｍのユーザのパーソナルプロファイルに相当する情報が記憶される。また、エージェント機能部１５０の記憶部１６０に記憶されたローカル辞書ＤＢ１６２、ローカル知識ベースＤＢ１６４、ローカル応答規則１６６ＤＢは、辞書ＤＢ２５４、知識ベースＤＢ２５６、応答規則ＤＢ２５８より簡易なものである。このため、エージェント機能部１５０で生成できるローカル応答文は、エージェントサーバ２００で生成されるサーバ応答文よりも簡易であり、いわばユーザの要求に対する回答としての精度の低い応答文となる可能性が高い。 The agent server 200 includes a storage unit 250. The storage unit 250 is realized by the above-mentioned various storage devices included in the agent server 200. Data and programs such as a personal profile 252, a dictionary DB 254, a knowledge base DB 256, and a response rule DB 258 are stored in the storage unit 250. The data and programs such as the personal profile 252, the dictionary DB 254, the knowledge base DB 256, and the response rule DB 258 correspond to the in-vehicle agent application. Although the personal profile 252 is for each of the plurality of users, the agent function unit 150 stores information corresponding to the personal profile of the user of the vehicle M. Further, the local dictionary DB 162, the local knowledge base DB 164, and the local response rule 166 DB stored in the storage unit 160 of the agent function unit 150 are simpler than the dictionary DB 254, the knowledge base DB 256, and the response rule DB 258. Therefore, the local response statement that can be generated by the agent function unit 150 is simpler than the server response statement generated by the agent server 200, and is likely to be a response statement with low accuracy as a response to the user's request. ..

音声対話装置１００において、エージェント機能部１５０は、ローカル処理（エージェントサーバ２００を介さない処理）が可能な音声コマンドを認識した場合は、音声コマンドで要求された処理を行ってよい。ローカル処理が可能な音声コマンドとは、音声対話装置１００が備える記憶部１６０を参照することで回答可能な音声コマンドであったり、車両機器５０を制御する音声コマンド（例えば、空調装置をオンにするコマンドなど）であったりする。従って、エージェント機能部１５０は、エージェントサーバ２００が備える機能の一部を有する。 When the agent function unit 150 recognizes a voice command capable of local processing (processing that does not go through the agent server 200) in the voice dialogue device 100, the agent function unit 150 may perform the processing requested by the voice command. The voice command capable of local processing is a voice command that can be answered by referring to the storage unit 160 included in the voice dialogue device 100, or a voice command that controls the vehicle device 50 (for example, turning on the air conditioner). Command, etc.) Therefore, the agent function unit 150 has a part of the functions included in the agent server 200.

自然言語処理部２２２は、エージェント機能部１５０により送信された文字情報に対して辞書ＤＢ２５４を参照しながら意味解釈を行う。辞書ＤＢ２５４は、文字情報に対して抽象化された意味情報が対応付けられたものである。辞書ＤＢ２５４は、同義語や類義語の一覧情報を含んでもよい。なお、エージェントサーバ２００に音声認識部を設け、エージェント機能部１５０からは音声ストリームを送信し、音声認識部が音声認識を行ってテキスト化して自然言語処理部２２２における処理を行ってもよい。この場合、音声認識部の処理と、自然言語処理部２２２の処理は、段階が明確に分かれるものではなく、自然言語処理部２２２の処理結果を受けて音声認識部が認識結果を修正するなど、相互に影響し合って行われてよい。 The natural language processing unit 222 interprets the meaning of the character information transmitted by the agent function unit 150 with reference to the dictionary DB 254. The dictionary DB 254 is associated with abstract semantic information with respect to character information. The dictionary DB 254 may include list information of synonyms and synonyms. A voice recognition unit may be provided in the agent server 200, a voice stream may be transmitted from the agent function unit 150, the voice recognition unit may perform voice recognition, convert it into text, and perform processing in the natural language processing unit 222. In this case, the processing of the voice recognition unit and the processing of the natural language processing unit 222 are not clearly separated in stages, and the voice recognition unit corrects the recognition result in response to the processing result of the natural language processing unit 222. It may be done by interacting with each other.

自然言語処理部２２２は、例えば、音声取得部１５１と同様にして、得られた認識結果を、標準文字情報に置き換えたコマンドを生成する。自然言語処理部２２２や音声取得部１５１は、例えば、確率を利用した機械学習処理等の人工知能処理を用いて文字情報の意味を認識したり、認識結果に基づくコマンドを生成したりしてもよい。 The natural language processing unit 222 generates a command in which the obtained recognition result is replaced with standard character information in the same manner as the voice acquisition unit 151, for example. Even if the natural language processing unit 222 and the voice acquisition unit 151 recognize the meaning of character information by using artificial intelligence processing such as machine learning processing using probability, or generate a command based on the recognition result, for example. Good.

対話管理部２２４は、自然言語処理部２２２の処理結果（コマンド）に基づいて、パーソナルプロファイル２５２や知識ベースＤＢ２５６、応答規則ＤＢ２５８を参照しながら車両Ｍの乗員に対する応答文を決定する。パーソナルプロファイル２５２は、乗員ごとに保存されている乗員の個人情報、趣味嗜好、過去の対話の履歴などを含む。知識ベースＤＢ２５６は、物事の関係性を規定した情報である。応答規則ＤＢ２５８は、コマンドに対してエージェントが行うべき動作（回答や機器制御の内容など）を規定した情報である。 The dialogue management unit 224 determines a response sentence to the occupant of the vehicle M based on the processing result (command) of the natural language processing unit 222 with reference to the personal profile 252, the knowledge base DB 256, and the response rule DB 258. The personal profile 252 includes occupant's personal information, hobbies and preferences, history of past dialogues, etc. stored for each occupant. The knowledge base DB 256 is information that defines the relationships between things. The response rule DB 258 is information that defines the actions (answers, device control contents, etc.) that the agent should perform in response to the command.

また、対話管理部２２４は、音声ストリームから得られる特徴情報を用いて、パーソナルプロファイル２５２と照合を行うことで、乗員を特定してもよい。この場合、パーソナルプロファイル２５２には、例えば、音声の特徴情報に、個人情報が対応付けられている。音声の特徴情報とは、例えば、声の高さ、イントネーション、リズム（音の高低のパターン）等の喋り方の特徴や、メル周波数ケプストラム係数（Mel Frequency Cepstrum Coefficients）等による特徴量に関する情報である。音声の特徴情報は、例えば、乗員の初期登録時に所定の単語や文章等を乗員に発声させ、発声させた音声を認識することで得られる情報である。 Further, the dialogue management unit 224 may identify the occupant by collating with the personal profile 252 using the feature information obtained from the voice stream. In this case, in the personal profile 252, for example, personal information is associated with voice feature information. The voice feature information is, for example, information on speaking features such as voice pitch, intonation, and rhythm (sound pitch pattern), and feature quantities based on Mel Frequency Cepstrum Coefficients and the like. .. The voice feature information is, for example, information obtained by having the occupant utter a predetermined word or sentence at the time of initial registration of the occupant and recognizing the uttered voice.

対話管理部２２４は、コマンドが、ネットワークＮＷを介して検索可能な情報を要求するものである場合、ネットワーク検索部２２６に検索を行わせる。ネットワーク検索部２２６は、ネットワークＮＷを介して各種ウェブサーバ５００にアクセスし、所望の情報を取得する。「ネットワークＮＷを介して検索可能な情報」とは、例えば、車両Ｍの周辺にあるレストランの一般ユーザによる評価結果であったり、その日の車両Ｍの位置に応じた天気予報であったりする。 The dialogue management unit 224 causes the network search unit 226 to perform a search when the command requests information that can be searched via the network NW. The network search unit 226 accesses various web servers 500 via the network NW and acquires desired information. The "information searchable via the network NW" may be, for example, an evaluation result by a general user of a restaurant in the vicinity of the vehicle M, or a weather forecast according to the position of the vehicle M on that day.

対話管理部２２４は、応答文を決定するまでの決定過程を示す決定過程情報を生成する。対話管理部２２４は、応答文を決定するまでの過程で、サーバ応答文を決定するための情報を十分に得られたかを否かを判定する。例えば、エージェントサーバ２００が各種ウェブサーバ５００との通信ができず、各種ウェブサーバ５００から得られるべき情報を得ることができなかった場合には、十分な情報が得られなかったと判定し、サーバ応答文の生成過程で十分な情報が得られなかったことを示す決定過程情報を生成する。 The dialogue management unit 224 generates decision process information indicating the decision process until the response sentence is decided. The dialogue management unit 224 determines whether or not sufficient information for determining the server response statement has been obtained in the process of determining the response statement. For example, if the agent server 200 cannot communicate with the various web servers 500 and cannot obtain the information to be obtained from the various web servers 500, it is determined that sufficient information has not been obtained, and the server responds. Generates decision-making process information that indicates that sufficient information was not obtained during the statement generation process.

応答文生成部２２８は、対話管理部２２４により決定された応答文が車両Ｍの乗員に伝わるように、サーバ応答文を生成し、音声対話装置１００に送信する。応答文生成部２２８は、乗員がパーソナルプロファイルに登録された乗員であることが特定されている場合に、乗員の名前を呼んだり、乗員の話し方に似せた話し方にしたりしたサーバ応答文を生成してもよい。 The response sentence generation unit 228 generates a server response sentence and transmits it to the voice dialogue device 100 so that the response sentence determined by the dialogue management unit 224 is transmitted to the occupants of the vehicle M. The response sentence generation unit 228 generates a server response sentence that calls the name of the occupant or makes the occupant speak in a manner similar to the occupant's speech when the occupant is identified as the occupant registered in the personal profile. You may.

エージェント機能部１５０は、ローカル応答文を生成したりサーバ応答文を取得したりすると、音声合成を行って音声を出力するように音声制御部１１８に指示する。また、エージェント機能部１５０は、音声出力に合わせてエージェントの画像を表示するように表示制御部１１６に指示する。このようにして、仮想的に出現したエージェントが車両Ｍの乗員に応答するエージェント機能が実現される。 When the agent function unit 150 generates a local response sentence or acquires a server response sentence, the agent function unit 150 instructs the voice control unit 118 to perform voice synthesis and output the voice. Further, the agent function unit 150 instructs the display control unit 116 to display the image of the agent in accordance with the audio output. In this way, the agent function in which the virtually appearing agent responds to the occupant of the vehicle M is realized.

［音声対話装置１００における処理］
次に、音声対話装置１００における処理の一例について説明する。音声対話装置１００は、車両Ｍの乗員が対話を開始した際にエージェントサーバ２００と通信を開始する。エージェントサーバ２００は、回答を生成して音声対話装置１００に提供する。音声対話装置１００は、エージェントサーバ２００から得られた回答を、音声出力や画像表示の形で乗員に提示する。 [Processing in the voice dialogue device 100]
Next, an example of processing in the voice dialogue device 100 will be described. The voice dialogue device 100 starts communication with the agent server 200 when the occupant of the vehicle M starts the dialogue. The agent server 200 generates an answer and provides it to the voice dialogue device 100. The voice dialogue device 100 presents the answer obtained from the agent server 200 to the occupant in the form of voice output or image display.

図４〜図６は、音声対話装置１００において実行される処理の流れの一例を示すフローチャートである。音声対話装置１００において、ＷＵ判定部１１４は、乗員が発声した音声の音声区間を検出し、検出した音声区間をテキスト化した文字情報からウエイクアップワード（ＷＵワード）を取得したか否かを判定する（ステップＳ１０１）。 4 to 6 are flowcharts showing an example of the flow of processing executed in the voice dialogue device 100. In the voice dialogue device 100, the WU determination unit 114 detects the voice section of the voice uttered by the occupant, and determines whether or not the wake-up word (WU word) is acquired from the text information obtained by converting the detected voice section into text. (Step S101).

ウエイクアップワードを取得していないと判定した場合、ＷＵ判定部１１４は、ステップＳ１０１の処理を繰り返す。ＷＵ判定部１１４がウエイクアップワードを取得したと判定した場合、音声取得部１５１は、音声情報を取得したか否かを判定する（ステップＳ１０３）。音声情報を取得していないと判定した場合、音声取得部１５１は、ステップＳ１０３の処理を繰り返す。 If it is determined that the wakeup word has not been acquired, the WU determination unit 114 repeats the process of step S101. When the WU determination unit 114 determines that the wakeup word has been acquired, the voice acquisition unit 151 determines whether or not the voice information has been acquired (step S103). If it is determined that the voice information has not been acquired, the voice acquisition unit 151 repeats the process of step S103.

音声情報を取得したと判定した場合、音声取得部１５１は、文字情報を認識し、認識結果としての文字情報をエージェントサーバ２００に送信する（ステップＳ１０５）。続いて、第２情報取得部１５３は、音声取得部１５１が認識した文字情報に基づいて、ローカル応答文を生成して取得する（ステップＳ１０７）。 When it is determined that the voice information has been acquired, the voice acquisition unit 151 recognizes the character information and transmits the character information as the recognition result to the agent server 200 (step S105). Subsequently, the second information acquisition unit 153 generates and acquires a local response sentence based on the character information recognized by the voice acquisition unit 151 (step S107).

続いて、判定部１５５は、第２情報取得部１５３が取得したローカル応答文が即答応答文であるか否かを判定する（ステップＳ１０９）。ローカル応答文が即答応答文でないと判定した場合、判定部１５５は、応答文を決定する処理を行う（ステップＳ１１１）。ステップＳ１１１における応答文を決定する処理については後に説明する。 Subsequently, the determination unit 155 determines whether or not the local response sentence acquired by the second information acquisition unit 153 is an immediate answer response sentence (step S109). When it is determined that the local response statement is not an immediate response statement, the determination unit 155 performs a process of determining the response statement (step S111). The process of determining the response statement in step S111 will be described later.

続いて、出力制御部１５６は、判定部１５５が判定したローカル応答文またはサーバ応答文を応答文として管理部１１０に出力し（ステップＳ１１３）、応答文を表示・操作装置２０に表示させ、スピーカ３０に出力させる。また、判定部１５５がノイズ情報を出力させると判定した場合には、出力制御部１５６は、ローカル応答文とともにノイズ情報を管理部１１０に出力し、応答文を表示・操作装置２０に表示させ、応答文にノイズを重畳させてスピーカ３０に出力させる。こうして、音声対話装置１００は、図４に示す処理を終了する。 Subsequently, the output control unit 156 outputs the local response statement or the server response statement determined by the determination unit 155 to the management unit 110 as a response statement (step S113), displays the response statement on the display / operation device 20, and causes the speaker. Output to 30. When the determination unit 155 determines that the noise information is to be output, the output control unit 156 outputs the noise information to the management unit 110 together with the local response statement, and displays the response statement on the display / operation device 20. Noise is superimposed on the response text and output to the speaker 30. In this way, the voice dialogue device 100 ends the process shown in FIG.

ステップＳ１０９において、判定部１５５は、取得したローカル応答文が即答応答文であると判定した場合、ローカル応答文を応答文とする（ステップＳ１１５）。続いて、また、第２情報取得部１５３は、即答応答文が指令応答文であるか否かを判定（ステップＳ１１７）。即答応答文が指令応答文であると判定した場合、第２情報取得部１５３は、指令情報を車載機器指令部１５７に出力する（ステップＳ１１９）。車載機器指令部１５７は、第２情報取得部１５３により出力された指令情報に基づいて車両機器５０を制御する。即答応答文が指令応答文でないと判定した場合、第２情報取得部１５３は、ステップＳ１１９の処理をスキップする。その後、出力制御部１５６は、ローカル応答文（即答応答文）を応答文として管理部１１０に出力し（ステップＳ１１３）、応答文を表示・操作装置２０に表示させ、スピーカ３０に出力させる。こうして、音声対話装置１００は、図４に示す処理を終了する。 In step S109, when the determination unit 155 determines that the acquired local response statement is a prompt response statement, the determination unit 155 sets the local response statement as the response statement (step S115). Subsequently, the second information acquisition unit 153 also determines whether or not the prompt answer response sentence is a command response sentence (step S117). When it is determined that the prompt answer response sentence is a command response sentence, the second information acquisition unit 153 outputs the command information to the in-vehicle device command unit 157 (step S119). The in-vehicle device command unit 157 controls the vehicle device 50 based on the command information output by the second information acquisition unit 153. When it is determined that the prompt answer response sentence is not a command response sentence, the second information acquisition unit 153 skips the process of step S119. After that, the output control unit 156 outputs the local response sentence (quick answer response sentence) as a response sentence to the management unit 110 (step S113), displays the response sentence on the display / operation device 20, and outputs the response sentence to the speaker 30. In this way, the voice dialogue device 100 ends the process shown in FIG.

続いて、ステップＳ１１１に示す処理について、図５を参照して説明する。ステップＳ１１１に示す処理では、図５に示すように、第１情報取得部１５２は、エージェントサーバ２００により送信されるサーバ応答文を受信して取得したか否かを判定する（ステップＳ２０１）。 Subsequently, the process shown in step S111 will be described with reference to FIG. In the process shown in step S111, as shown in FIG. 5, the first information acquisition unit 152 determines whether or not the server response statement transmitted by the agent server 200 has been received and acquired (step S201).

エージェントサーバ２００により送信されるサーバ応答文を取得していないと判定した場合、第１情報取得部１５２は、認識結果を送信してから、判定時間を経過したか否かを判定する（ステップＳ２０３）。判定時間を経過していないと判定した場合、第１情報取得部１５２は、ステップＳ２０１による処理を繰り返す。 When it is determined that the server response statement transmitted by the agent server 200 has not been acquired, the first information acquisition unit 152 determines whether or not the determination time has elapsed after transmitting the recognition result (step S203). ). If it is determined that the determination time has not elapsed, the first information acquisition unit 152 repeats the process according to step S201.

判定時間を経過したと判定した場合、第１情報取得部１５２は、通信品質が第１判定品質以下であると判定し、第２情報取得部１５３により取得したローカル応答文を応答文として判定するとともに、出力制御部１５６にノイズ情報を出力させると判定する（ステップＳ２０５）。こうして、音声対話装置１００は、図５に示す処理を終了し、図４に示すステップＳ１１３の処理に進む。 When it is determined that the determination time has elapsed, the first information acquisition unit 152 determines that the communication quality is equal to or lower than the first determination quality, and determines the local response statement acquired by the second information acquisition unit 153 as the response statement. At the same time, it is determined that the output control unit 156 outputs noise information (step S205). In this way, the voice dialogue device 100 ends the process shown in FIG. 5 and proceeds to the process of step S113 shown in FIG.

ステップＳ２０１において、エージェントサーバ２００により送信されるサーバ応答文を取得していないと第１情報取得部１５２が判定した場合、通信品質取得部１５４は、車載通信装置６０の通信品質を取得する（ステップＳ２０７）。続いて、判定部１５５は、通信品質取得部１５４により出力される決定過程情報に基づいて、通信品質取得部１５４が取得した通信品質が第２判定品質以下であるか否かを判定する（ステップＳ２０９）。 When the first information acquisition unit 152 determines in step S201 that the server response statement transmitted by the agent server 200 has not been acquired, the communication quality acquisition unit 154 acquires the communication quality of the vehicle-mounted communication device 60 (step). S207). Subsequently, the determination unit 155 determines whether or not the communication quality acquired by the communication quality acquisition unit 154 is equal to or lower than the second determination quality based on the determination process information output by the communication quality acquisition unit 154 (step). S209).

通信品質取得部１５４が取得した通信品質が第２判定品質以下であると判定した場合、判定部１５５は、第２情報取得部１５３により取得したサーバ応答文を応答文として判定するとともに、出力制御部１５６にノイズ情報を出力させると判定する（ステップＳ２１１）。こうして、音声対話装置１００は、図５に示す処理を終了し、図４に示すステップＳ１１３の処理に進む。 When the communication quality acquisition unit 154 determines that the communication quality acquired is equal to or lower than the second determination quality, the determination unit 155 determines the server response statement acquired by the second information acquisition unit 153 as a response statement and outputs control. It is determined that the noise information is output to the unit 156 (step S211). In this way, the voice dialogue device 100 ends the process shown in FIG. 5 and proceeds to the process of step S113 shown in FIG.

通信品質取得部１５４が取得した通信品質が第２判定品質以下でない（第２判定品質を超える）と判定した場合、判定部１５５は、第１情報取得部１５２が取得したサーバ応答文を応答文として判定する（ステップＳ２１３）。こうして、音声対話装置１００は、図５に示す処理を終了し、図４に示すステップＳ１１３の処理に進む。 When it is determined that the communication quality acquired by the communication quality acquisition unit 154 is not less than or equal to the second determination quality (exceeds the second determination quality), the determination unit 155 responds to the server response statement acquired by the first information acquisition unit 152. (Step S213). In this way, the voice dialogue device 100 ends the process shown in FIG. 5 and proceeds to the process of step S113 shown in FIG.

実施形態の音声対話装置１００において、ローカル応答文は、音声対話装置１００に記憶されたローカル辞書ＤＢ１６２、ローカル知識ベースＤＢ１６４、ローカル応答規則ＤＢ１６６を用いて生成された応答文であり、エージェントサーバ２００で生成されたサーバ応答文よりもユーザの要求に対する回答としての精度が低い可能性が高い。 In the voice dialogue device 100 of the embodiment, the local response sentence is a response sentence generated by using the local dictionary DB 162, the local knowledge base DB 164, and the local response rule DB 166 stored in the voice dialogue device 100, and is generated by the agent server 200. It is likely that the response to the user's request is less accurate than the generated server response statement.

例えば、図６及び図７に示すように、ユーザＵが対話による要求として、「明日の天気は」と問いかけたとする。この問いかけに対して、応答文がサーバ応答文である場合には、例えば図６に示すように、表示・操作装置２０には、エージェント画像Ｅとともに「午前中は快晴、午後から曇、夕方にはにわか雨があるでしょう」のテキスト文字が表示され、スピーカ３０からは、表示・操作装置２０に表示されたテキスト文字に対応する音声が出力される。 For example, as shown in FIGS. 6 and 7, it is assumed that the user U asks "Tomorrow's weather" as a request through dialogue. In response to this question, when the response statement is a server response statement, for example, as shown in FIG. 6, the display / operation device 20 is displayed with the agent image E as "clear in the morning, cloudy in the afternoon, and in the evening." The text character "There will be a shower" is displayed, and the voice corresponding to the text character displayed on the display / operation device 20 is output from the speaker 30.

これに対して、応答文がローカル応答文である場合には、例えば図７に示すように、表示・操作装置２０には、エージェント画像Ｅとともに「曇りです」のテキスト文字が表示され、スピーカ３０からは、表示・操作装置２０に表示されたテキスト文字に対応する音声が出力される。このように、応答文がローカル応答文である場合の応答文は、応答文がサーバ応答文である場合よりも充実度が低く、ユーザの要求に対する回答としての精度が低くなる。 On the other hand, when the response statement is a local response statement, for example, as shown in FIG. 7, the display / operation device 20 displays the text character “cloudy” together with the agent image E, and the speaker 30 Outputs the voice corresponding to the text characters displayed on the display / operation device 20. As described above, when the response statement is a local response statement, the response statement is less complete than when the response statement is a server response statement, and the accuracy as a response to the user's request is low.

音声対話装置１００では、スピーカ３０により出力されるローカル応答文を出力する際には、ノイズを重畳させる。具体的に、図６に示すように、サーバ応答文を回答とする場合には、ノイズを付与させることなく音声を出力させるのに対して、図７に示すローカル応答文を回答する場合には、「ザー」「ザー」というノイズを付与させてスピーカ３０から音声を出力させる。また、応答文がサーバ応答文である場合であっても、例えばエージェントサーバ２００がサーバ応答文の生成過程で十分な情報を得られなかった場合にも同様に、「ザー」「ザー」というノイズを付与させてスピーカ３０から音声を出力させる。このように、音声対話装置１００では、音声対話システム１の最大限の能力を活用して決定した応答文を出力する際には、ノイズを付さず、再度の問いかけを行うなどして応答文について改善の余地がある場合には、ノイズを付してスピーカ３０から音声を出力させる。このため、ローカル応答文を出力した場合に、スピーカ３０により出力された応答文は、ユーザの要求に対する回答としての精度の高い応答文ではなく、例えば再度の問いかけを行うことでさらに良い応答文を得ることができる余地があることをユーザに認識させることができる。こうして、音声対話装置１００は、ユーザの要求に対する回答としての精度をユーザに知らせることができる。 In the voice dialogue device 100, noise is superimposed when the local response sentence output by the speaker 30 is output. Specifically, as shown in FIG. 6, when the server response statement is the answer, the voice is output without adding noise, whereas when the local response statement shown in FIG. 7 is answered, the voice is output. , "Za" and "Za" noises are added to output sound from the speaker 30. Further, even when the response statement is a server response statement, for example, when the agent server 200 does not obtain sufficient information in the process of generating the server response statement, the noises of "Za" and "Za" are similarly obtained. Is given to output sound from the speaker 30. As described above, when the voice dialogue device 100 outputs the response sentence determined by utilizing the maximum capacity of the voice dialogue system 1, the response sentence is not added with noise and the question is asked again. If there is room for improvement, the speaker 30 is made to output sound with noise. Therefore, when the local response statement is output, the response statement output by the speaker 30 is not a highly accurate response statement as an answer to the user's request, but a better response statement is obtained by asking the question again, for example. The user can be made aware that there is room for acquisition. In this way, the voice dialogue device 100 can inform the user of the accuracy as a response to the user's request.

上記の実施形態では、ローカル応答文にノイズを付与するか否かについて、応答文の内容とは無関係に判定を行っているが、応答文の内容を加味してノイズを付与するか否かを判定してもよい。例えば、応答文の中に「わかりません」「できません」などの否定的表現が含まれる場合に、応答文にノイズを付与してもよいし、ローカル応答文の中にこのような否定的表現が含まれる場合に、ローカル応答文にノイズを付与してもよい。また、応答文にノイズを付与する際には、応答文にノイズを重畳させるほか、応答文の前後にノイズ音を含ませても。また、効果音としてノイズを付与する代わりに、チャイム音を出力させるなどとしてもよい。また、音声対話装置１００は、エージェント画像が出現することなく、ユーザとの対話が行われるものでもよい。また、効果音を出力させる際には、図７に示すエージェント画像Ｅの顔を曇らせて表示したり、テキスト文字のフォントを暗い印象のフォントにしたりするなど、ネガティブな印象をユーザに与える表示を行うなどしてもよい。 In the above embodiment, whether or not to add noise to the local response statement is determined regardless of the content of the response statement, but whether or not to add noise in consideration of the content of the response statement is determined. You may judge. For example, if the response statement contains negative expressions such as "I don't know" or "I can't", noise may be added to the response statement, or such negative expressions may be added to the local response statement. If is included, noise may be added to the local response statement. In addition, when adding noise to the response sentence, noise may be superimposed on the response sentence, and noise sounds may be included before and after the response sentence. Further, instead of adding noise as a sound effect, a chime sound may be output. Further, the voice dialogue device 100 may have a dialogue with the user without the appearance of the agent image. Further, when outputting the sound effect, a display that gives a negative impression to the user is displayed, such as displaying the agent image E shown in FIG. 7 with a cloudy face or changing the font of the text character to a dark impression font. You may do it.

上記の実施形態では、音声対話装置１００は、車両Ｍに搭載されているが、車両Ｍに搭載されたもの以外でもよく、例えば、音声対話装置１００は、スマートフォンやスマートスピーカなどに搭載されていてもよい。また、上記の実施形態では、音声対話装置１００は、ローカル応答文にノイズを付与してローカル応答文とともにノイズを出力させているが、応答文を出力させるとき以外のときにノイズを出力させるようにしてもよい。例えば、音声対話装置１００は、システムの処理期間中や音声認識を待機している間にノイズを出力させてもよい。ノイズを出力させることで、ユーザの要求に対する回答の精度が低くなることをユーザに想起させることができる。 In the above embodiment, the voice dialogue device 100 is mounted on the vehicle M, but may be other than the one mounted on the vehicle M. For example, the voice dialogue device 100 is mounted on a smartphone, a smart speaker, or the like. May be good. Further, in the above embodiment, the voice dialogue device 100 adds noise to the local response statement and outputs the noise together with the local response statement, but the noise is output at a time other than when the response statement is output. It may be. For example, the voice dialogue device 100 may output noise during the processing period of the system or while waiting for voice recognition. By outputting noise, it is possible to remind the user that the accuracy of the response to the user's request is low.

以上、本発明を実施するための形態について実施形態を用いて説明したが、本発明はこうした実施形態に何等限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々の変形及び置換を加えることができる。 Although the embodiments for carrying out the present invention have been described above using the embodiments, the present invention is not limited to these embodiments, and various modifications and substitutions are made without departing from the gist of the present invention. Can be added.

１…エージェントシステム
２０…表示・操作装置
３０…スピーカ
５０…車両機器
６０…車載通信装置
１００…音声対話装置
１５０…エージェント機能部
１５１…音声取得部
１５２…第１情報取得部
１５３…第２情報取得部
１５４…通信品質取得部
１５５…判定部
１５６…出力制御部
１５７…車載機器指令部
２００…エージェントサーバ
Ｍ…車両 1 ... Agent system 20 ... Display / operation device 30 ... Speaker 50 ... Vehicle equipment 60 ... In-vehicle communication device 100 ... Voice dialogue device 150 ... Agent function unit 151 ... Voice acquisition unit 152 ... First information acquisition unit 153 ... Second information acquisition Unit 154 ... Communication quality acquisition unit 155 ... Judgment unit 156 ... Output control unit 157 ... In-vehicle device command unit 200 ... Agent server M ... Vehicle

Claims

A voice acquisition unit that acquires voice information,
A first information acquisition unit that receives and acquires external response information to the voice information acquired by the voice acquisition unit from an external device.
A storage unit that stores internal response information for specific voice information,
A second information acquisition unit that acquires internal response information to the voice information acquired by the voice acquisition unit from the storage unit.
An output control unit that outputs at least one of the external response information and the internal response information to the output unit.
With
When the output control unit outputs the internal response information, the output control unit outputs a sound effect to the output unit together with the internal response information.
Voice dialogue device.

A determination unit that determines whether to output the internal response information or the external response information,
A communication quality acquisition unit for acquiring communication quality information regarding the quality of communication between the first information acquisition unit and the external device is further provided.
The determination unit determines the response information to be output based on the communication quality information acquired by the communication quality acquisition unit.
The output control unit causes the output unit to output the internal response information or the external response information determined by the determination unit.
The voice dialogue device according to claim 1.

The determination unit determines that the internal response information is output when the communication quality information acquired by the communication quality acquisition unit is equal to or lower than the first determination quality.
The voice dialogue device according to claim 2.

The determination unit determines that the external response information is output when the communication quality information acquired by the communication quality acquisition unit exceeds the first determination quality.
When the communication quality information acquired by the communication quality acquisition unit is equal to or lower than the second determination quality, the output control unit outputs a sound effect to the output unit together with the external response information.
The voice dialogue device according to claim 3.

The voice dialogue device according to any one of claims 1 to 4, wherein the output control unit outputs the sound effect overlaid on the internal response information.

Installed in vehicles equipped with in-vehicle devices,
When the output control unit outputs information about the in-vehicle device as the internal response information, the output control unit does not output the sound effect.
The voice dialogue device according to any one of claims 1 to 5.

The computer of the voice dialogue device
Get voice information,
Receives and acquires external response information to the voice information from an external device,
Acquire communication quality information regarding the quality of communication with the external device,
The internal response information for the voice information is acquired from the storage unit that stores the internal response information for the specific voice information.
At least one of the external response information and the internal response information is output to the output unit.
When the internal response information is output, the sound effect is output to the output unit together with the internal response information.
Voice dialogue method.

On the computer of the voice dialogue device,
Get voice information,
External response information to the voice information is received from an external device and acquired.
Acquire communication quality information regarding the quality of communication with the external device.
The internal response information for the voice information is acquired from the storage unit that stores the internal response information for the specific voice information.
A process of outputting at least one of the external response information and the internal response information to the output unit is executed.
When the internal response information is output, a process of outputting a sound effect to the output unit together with the internal response information is executed.
program.