JP2021096380A

JP2021096380A - Agent system, agent system control method, and program

Info

Publication number: JP2021096380A
Application number: JP2019228232A
Authority: JP
Inventors: 昌宏暮橋; Masahiro Kurehashi; 航遠藤; Ko Endo
Original assignee: Honda Motor Co Ltd
Current assignee: Honda Motor Co Ltd
Priority date: 2019-12-18
Filing date: 2019-12-18
Publication date: 2021-06-24
Also published as: CN112995270B; CN112995270A

Abstract

To make it possible to obtain an accurate response even if a voice operation is based on contents of a short utterance.SOLUTION: Provided is an agent system which includes: a response display control unit which displays an image of contents in response to an operation on a display unit; an utterance content interpretation unit which interprets contents of utterance by a user; an utterance content determination unit which determines whether or not the contents of the utterance interpreted by the utterance content interpretation unit is established independently as a service request; and an agent control unit which executes control for providing a service specified based on contents of operation context information indicating an operation context according to the contents of the image displayed on the display unit corresponding to when the utterance is performed and the contents of the utterance, when the utterance content determination unit determines that the service is not independently established.SELECTED DRAWING: Figure 4

Description

本発明は、エージェントシステム、エージェントシステムの制御方法、及びプログラムに関する。 The present invention relates to an agent system, a control method of the agent system, and a program.

従来、ユーザへの問い掛けに応じて入力された操作音声について音声認識した結果、非言語であると判定された場合には、非言語の入力時の状況に応じて、非言語の入力の有効性を判断し、有効性の判断結果に応じて、作業内容の確認、作業内容の保留、および、作業内容の実行のいずれかを判断するナビゲーション装置が知られている（例えば、特許文献１参照）。 Conventionally, when it is determined that the operation voice is non-verbal as a result of voice recognition of the operation voice input in response to a question to the user, the effectiveness of the non-language input is determined according to the situation at the time of the non-language input. There is known a navigation device that determines whether to confirm the work content, hold the work content, or execute the work content according to the result of determining the effectiveness (see, for example, Patent Document 1). ..

特開２００８−４６２９９号公報Japanese Unexamined Patent Publication No. 2008-46299

音声操作に関しては、短い発話の内容でありながらも的確な応答が得られるようにすることが好ましい。しかしながら、従来の技術では、音声操作として、一文として成立しない語句等による短い内容を発話したとしても、発話された内容に基づいて的確な応答を得ることが困難であった。 Regarding voice operation, it is preferable to obtain an accurate response even though the content of the short utterance is short. However, in the conventional technique, it is difficult to obtain an accurate response based on the uttered content even if a short content such as a phrase that does not hold as one sentence is uttered as a voice operation.

本発明は、このような事情を考慮してなされたものであり、短い発話の内容による音声操作でありながらも的確な応答が得られるようにすることを目的の一つとする。 The present invention has been made in consideration of such circumstances, and one of the objects of the present invention is to enable an accurate response to be obtained even though it is a voice operation based on the content of a short utterance.

この発明に係るエージェントシステム、エージェントシステムの制御方法、及びプログラムは、以下の構成を採用した。
（１）：この発明の一態様に係るエージェントシステムは、操作に応答した内容の画像を表示部に表示させる応答表示制御部と、利用者による発話の内容を解釈する発話内容解釈部と、前記発話内容解釈部により解釈された発話の内容が、単独でサービス要求として成立するものであるか否かを判定する発話内容判定部と、前記発話内容判定部により単独でサービスとして成立するものではないと判定された場合、前記発話が行われたときに対応して前記表示部にて表示されていた画像の内容に応じた操作の文脈を示す操作文脈情報の内容と前記発話の内容とに基づいて特定されるサービスを提供するための制御を実行するエージェント制御部とを備える。 The agent system, the control method of the agent system, and the program according to the present invention have adopted the following configurations.
(1): The agent system according to one aspect of the present invention includes a response display control unit that displays an image of the content in response to an operation on the display unit, an utterance content interpretation unit that interprets the content of the utterance by the user, and the above. The utterance content determination unit that determines whether or not the utterance content interpreted by the utterance content interpretation unit is independently established as a service request, and the utterance content determination unit are not independently established as a service. When it is determined that, based on the content of the operation context information indicating the operation context according to the content of the image displayed on the display unit corresponding to the utterance and the content of the utterance. It is provided with an agent control unit that executes control for providing the specified service.

（２）：上記（１）の態様に係るエージェントシステムにおいて、前記応答表示制御部は、前記操作として手動操作が行われた場合には前記手動操作に応答した内容の画像を表示させ、前記操作として発話による操作が行われた場合には前記発話の内容に応答した内容の画像を表示させる。 (2): In the agent system according to the aspect (1), when a manual operation is performed as the operation, the response display control unit displays an image of the content in response to the manual operation, and the operation. When the operation by the utterance is performed, the image of the content corresponding to the content of the utterance is displayed.

（３）：上記（１）または（２）の態様に係るエージェントシステムにおいて、前記エージェント制御部は、前記発話内容判定部により単独でサービス要求として成立するものであると判定された場合、前記発話が行われたときに対応して前記表示部にて表示されていた画像の内容に応じた操作の文脈を示す操作文脈情報の内容を維持したうえで、判定された発話の内容が要求するサービスが提供されるように制御する。 (3): In the agent system according to the aspect (1) or (2), when the agent control unit is determined by the utterance content determination unit to be independently satisfied as a service request, the utterance is said. A service required by the content of the determined utterance after maintaining the content of the operation context information indicating the operation context according to the content of the image displayed on the display unit in response to the occurrence of. Is controlled to be provided.

（４）：上記（３）の態様に係るエージェントシステムにおいて、前記エージェント制御部は、前記操作文脈情報の内容を維持したうえで、判定された発話の内容が要求するサービスが提供されるように制御した後において、前記発話内容解釈部により解釈された発話の内容が、前記発話内容判定部により単独でサービスとして成立するものではないと判定された場合、前記発話が行われたときに対応して前記表示部にて表示されていた画像の内容に応じた操作の文脈を示す操作文脈情報の内容と前記発話の内容とに基づいて特定されるサービスを提供するための制御を実行する。 (4): In the agent system according to the aspect (3) above, the agent control unit maintains the content of the operation context information and provides the service required by the content of the determined utterance. After the control, if the content of the utterance interpreted by the utterance content interpretation unit is determined by the utterance content determination unit that it cannot be established as a service independently, it corresponds to the time when the utterance is made. The control for providing the service specified based on the content of the operation context information indicating the operation context according to the content of the image displayed on the display unit and the content of the utterance is executed.

（５）：この発明の一態様に係るエージェントシステムの制御方法は、エージェントシステムにおけるコンピュータが、操作に応答した内容の画像を表示部に表示させ、利用者による発話の内容を解釈し、解釈された前記発話の内容が、単独でサービス要求として成立するものであるか否かを判定し、前記発話の内容が単独でサービスとして成立するものではないと判定された場合、前記発話が行われたときに対応して前記表示部にて表示されていた画像の内容に応じた操作の文脈を示す操作文脈情報の内容を維持し、維持された操作文脈情報の内容と当該発話の内容とに基づいて特定されるサービスを提供するための制御を実行する。 (5): In the control method of the agent system according to one aspect of the present invention, the computer in the agent system displays an image of the content in response to the operation on the display unit, interprets the content of the utterance by the user, and is interpreted. It is determined whether or not the content of the utterance is independently established as a service request, and when it is determined that the content of the utterance is not independently established as a service, the utterance is made. The content of the operation context information indicating the operation context according to the content of the image displayed on the display unit is maintained, and the content of the maintained operation context information and the content of the utterance are used. Take control to provide the identified service.

（６）：この発明の一態様に係るプログラムは、コンピュータに、操作に応答した内容の画像を表示部に表示させ、利用者による発話の内容を解釈させ、解釈された前記発話の内容が、単独でサービス要求として成立するものであるか否かを判定させ、前記発話の内容が単独でサービスとして成立するものではないと判定された場合、前記発話が行われたときに対応して前記表示部にて表示されていた画像の内容に応じた操作の文脈を示す操作文脈情報の内容を維持し、維持された操作文脈情報の内容と当該発話の内容とに基づいて特定されるサービスを提供するための制御を実行させるものである。 (6): The program according to one aspect of the present invention causes a computer to display an image of the content in response to the operation on the display unit, interprets the content of the utterance by the user, and the interpreted content of the utterance is If it is determined whether or not the utterance is independently established as a service request, and if it is determined that the content of the utterance is not independently established as a service, the display corresponds to the time when the utterance is made. Maintains the content of operation context information that indicates the context of the operation according to the content of the image displayed in the department, and provides a service that is specified based on the content of the maintained operation context information and the content of the utterance. It is intended to execute the control for doing so.

（１）、（５）、（６）によれば、表示部に表示される画像に対する音声操作としての発話の内容が、例えば文中の一部分に相当するようなものであることにより単独でサービス要求として成立するものでない場合には、現時点までの操作の文脈のもとで、今回の発話の内容による音声操作が行われたものとして扱うことができる。これにより、音声操作における発話の内容が短いものであっても的確な応答が得られるようにすることができる。 According to (1), (5), and (6), a service request is made independently because the content of the utterance as a voice operation for the image displayed on the display unit corresponds to, for example, a part of the sentence. If it does not hold, it can be treated as if the voice operation was performed according to the content of the current utterance in the context of the operation up to the present time. As a result, it is possible to obtain an accurate response even if the content of the utterance in the voice operation is short.

（２）によれば、表示部に表示される画像に対する操作としては手動操作と音声操作とのいずれであってもよい。この場合、発話の内容が単独でサービス要求として成立するものでない場合に用いられる操作文脈情報は、手動操作による履歴と音声操作の履歴とが含まれてよい。これにより、乗員は、以前の操作が手動操作と音声操作とのいずれであっても端的で短い発話の内容による音声操作を行うことが可能になる。 According to (2), the operation for the image displayed on the display unit may be either a manual operation or a voice operation. In this case, the operation context information used when the content of the utterance is not independently established as a service request may include a history of manual operation and a history of voice operation. As a result, the occupant can perform a voice operation with a simple and short utterance content regardless of whether the previous operation is a manual operation or a voice operation.

（３）、（４）によれば、今回の発話の内容が単独でサービス要求として成立するものである場合には、これまでの操作文脈情報がクリアされることなく維持される。そのうえで、今回の発話の内容に応じて、操作文脈情報が対応するサービスとは異なる他のサービスの提供が行われるようにされる。他のサービスの提供が完了した後には、操作文脈情報が維持されていることから、乗員が今回の発話が行われる前の状態から操作を再開させることができる。 According to (3) and (4), when the content of the current utterance is independently established as a service request, the operation context information so far is maintained without being cleared. Then, depending on the content of the utterance this time, other services different from the service to which the operation context information corresponds are provided. After the provision of other services is completed, the operation context information is maintained, so that the occupant can resume the operation from the state before the current utterance was made.

本実施形態におけるエージェントシステムの構成例を示す図である。It is a figure which shows the configuration example of the agent system in this embodiment. 本実施形態におけるエージェント装置の構成と、車両に搭載された機器とを示す図である。It is a figure which shows the structure of the agent apparatus in this embodiment, and the apparatus mounted on a vehicle. 本実施形態におけるエージェントサーバの構成と、エージェント装置の構成の一部とを示す図である。It is a figure which shows the configuration of the agent server in this embodiment, and a part of the configuration of an agent apparatus. 本実施形態のエージェントシステムが、乗員により行われるタッチパネルへの操作に関連して実行する処理手順例を示すフローチャートである。It is a flowchart which shows the example of the processing procedure which the agent system of this embodiment executes in relation to the operation to touch panel performed by an occupant. タッチパネルに対する乗員の操作手順に応答したエージェントシステムの動作の一具体例を示すシーケンス図である。It is a sequence diagram which shows a specific example of the operation of the agent system in response to the operation procedure of an occupant with respect to a touch panel.

以下、図面を参照し、本発明のエージェントシステム、エージェント装置の制御方法、及びプログラムの実施形態について説明する。
＜実施形態＞
［エージェント機能について］
エージェント装置は、本実施形態の通知制御システムを含むエージェントシステム１の一部または全部を実現する装置である。以下では、エージェント装置の一例として、乗員（利用者の一例）が搭乗する車両（以下、車両Ｍ）に搭載され、エージェント機能を備えたエージェント装置について説明する。なお、本発明の適用上、必ずしもエージェント装置がエージェント機能を有している必要はない。また、エージェント装置は、スマートフォン等の可搬型端末装置（汎用端末）であってもよいが、以下では、車両に搭載されたエージェント機能を備えたエージェント装置を前提として説明する。エージェント機能とは、例えば、車両Ｍの乗員と対話をしながら、乗員の発話の中に含まれる要求（コマンド）に基づく各種の情報提供や各種機器制御を行ったり、ネットワークサービスを仲介したりする機能である。エージェント装置が複数のエージェント機能を有する場合、エージェント機能は、それぞれに果たす機能、処理手順、制御、出力態様・内容がそれぞれ異なってもよい。また、エージェント機能の中には、車両内の機器（例えば運転制御や車体制御に関わる機器）の制御等を行う機能を有するものがあってよい。 Hereinafter, the agent system of the present invention, the control method of the agent device, and the embodiment of the program will be described with reference to the drawings.
<Embodiment>
[About agent function]
The agent device is a device that realizes a part or all of the agent system 1 including the notification control system of the present embodiment. Hereinafter, as an example of the agent device, an agent device mounted on a vehicle (hereinafter, vehicle M) on which a occupant (an example of a user) is boarded and having an agent function will be described. For the application of the present invention, the agent device does not necessarily have to have an agent function. Further, the agent device may be a portable terminal device (general-purpose terminal) such as a smartphone, but the following description will be made on the premise of an agent device having an agent function mounted on a vehicle. The agent function is, for example, providing various information based on a request (command) included in the utterance of the occupant, controlling various devices, and mediating a network service while interacting with the occupant of the vehicle M. It is a function. When the agent device has a plurality of agent functions, the agent functions may have different functions, processing procedures, controls, and output modes / contents. In addition, some of the agent functions may have a function of controlling devices in the vehicle (for example, devices related to driving control and vehicle body control).

エージェント機能は、例えば、乗員の音声を認識する音声認識機能（音声をテキスト化する機能）に加え、自然言語処理機能（テキストの構造や意味を理解する機能）、対話管理機能、ネットワークを介して他装置を検索し、或いは自装置が保有する所定のデータベースを検索するネットワーク検索機能等を統合的に利用して実現される。これらの機能の一部または全部は、ＡＩ（Artificial Intelligence）技術によって実現されてよい。また、これらの機能を行うための構成の一部（特に、音声認識機能や自然言語処理解釈機能）は、車両Ｍの車載通信装置または車両Ｍに持ち込まれた汎用通信装置と通信可能なエージェントサーバ（外部装置）に搭載されてもよい。以下の説明では、構成の一部がエージェントサーバに搭載されており、エージェント装置とエージェントサーバとが協働してエージェントシステムを実現することを前提とする。また、エージェント装置とエージェントサーバが協働して仮想的に出現させるサービス提供主体（サービス・エンティティ）をエージェントと称する。 Agent functions include, for example, a voice recognition function that recognizes the voice of an occupant (a function that converts voice into text), a natural language processing function (a function that understands the structure and meaning of text), a dialogue management function, and a network. It is realized by using a network search function that searches for another device or a predetermined database owned by the own device in an integrated manner. Some or all of these functions may be realized by AI (Artificial Intelligence) technology. In addition, a part of the configuration for performing these functions (particularly, the voice recognition function and the natural language processing interpretation function) is an agent server capable of communicating with the in-vehicle communication device of the vehicle M or the general-purpose communication device brought into the vehicle M. It may be mounted on (external device). In the following description, it is assumed that a part of the configuration is mounted on the agent server, and the agent device and the agent server cooperate to realize the agent system. Further, a service provider (service entity) in which an agent device and an agent server cooperate to appear virtually is called an agent.

［エージェントシステム］
図１は、エージェント装置１００を含むエージェントシステム１の構成例を示す図である。エージェントシステム１は、例えば、エージェント装置１００と、一以上のエージェントサーバ２００と、を備える。本実施形態におけるエージェントシステム１を提供する提供者は、例えば、自動車メーカー、ネットワークサービス事業者、電子商取引事業者、携帯端末の販売者や製造者等が挙げられ、任意の主体（法人、団体、個人等）がエージェントシステム１の提供者となり得る。なお、図１では、エージェントサーバ２００が一つである場合について説明したが、これに限られず、エージェントシステム１は、二以上のエージェントサーバ２００を備えるものであってもよい。この場合、各エージェントサーバ２００は、互いに異なる任意の主体によって提供されてもよい。 [Agent system]
FIG. 1 is a diagram showing a configuration example of an agent system 1 including an agent device 100. The agent system 1 includes, for example, an agent device 100 and one or more agent servers 200. Providers that provide the agent system 1 in the present embodiment include, for example, automobile manufacturers, network service providers, electronic commerce businesses, sellers and manufacturers of mobile terminals, and any other entity (corporation, organization, etc.). An individual, etc.) can be the provider of the agent system 1. Note that FIG. 1 has described the case where there is only one agent server 200, but the present invention is not limited to this, and the agent system 1 may include two or more agent servers 200. In this case, each agent server 200 may be provided by any entity different from each other.

エージェント装置１００は、ネットワークＮＷを介してエージェントサーバ２００と通信する。ネットワークＮＷは、例えば、インターネット、セルラー網、Ｗｉ−Ｆｉ網、ＷＡＮ（Wide Area Network）、ＬＡＮ（Local Area Network）、公衆回線、電話回線、無線基地局等の通信網のうち一部または全部を含む。ネットワークＮＷには、各種ウェブサーバ３００が接続されており、エージェントサーバ２００またはエージェント装置１００は、ネットワークＮＷを介して各種ウェブサーバ３００からウェブページを取得することができる。 The agent device 100 communicates with the agent server 200 via the network NW. The network NW includes, for example, a part or all of communication networks such as the Internet, cellular network, Wi-Fi network, WAN (Wide Area Network), LAN (Local Area Network), public line, telephone line, and wireless base station. Including. Various web servers 300 are connected to the network NW, and the agent server 200 or the agent device 100 can acquire web pages from the various web servers 300 via the network NW.

エージェント装置１００は、車両Ｍの乗員と対話を行い、乗員からの音声をエージェントサーバ２００に送信し、エージェントサーバ２００から得られた回答を、音声出力や画像表示の形で乗員に提示する。 The agent device 100 interacts with the occupant of the vehicle M, transmits the voice from the occupant to the agent server 200, and presents the answer obtained from the agent server 200 to the occupant in the form of voice output or image display.

［車両］
図２は、実施形態に係るエージェント装置１００の構成と、車両Ｍに搭載された機器とを示す図である。車両Ｍには、例えば、一以上のマイク１０と、表示・操作装置２０と、スピーカ３０と、ナビゲーション装置４０と、車載通信装置５０と、エージェント装置１００とが搭載される。これらの装置は、ＣＡＮ（Controller Area Network）通信線等の多重通信線やシリアル通信線、無線通信網等によって互いに接続される。なお、図２に示す構成はあくまで一例であり、構成の一部が省略されてもよいし、更に別の構成が追加されてもよい。 [vehicle]
FIG. 2 is a diagram showing the configuration of the agent device 100 according to the embodiment and the equipment mounted on the vehicle M. The vehicle M is equipped with, for example, one or more microphones 10, a display / operation device 20, a speaker 30, a navigation device 40, an in-vehicle communication device 50, and an agent device 100. These devices are connected to each other by a multiplex communication line such as a CAN (Controller Area Network) communication line, a serial communication line, a wireless communication network, or the like. The configuration shown in FIG. 2 is merely an example, and a part of the configuration may be omitted or another configuration may be added.

マイク１０は、車室内で発せられた音を収集する収音部である。表示・操作装置２０は、画像を表示するとともに、入力操作を受付可能な装置（或いは装置群）である。表示・操作装置２０は、例えば、タッチパネルとして構成されたディスプレイ装置を含む。表示・操作装置２０は、更に、ＨＵＤ（Head Up Display）や機械式の入力装置を含んでもよい。スピーカ３０は、例えば、車室内に配設されたスピーカ（音出力部）を含む。表示・操作装置２０は、エージェント装置１００とナビゲーション装置４０とで共用されてもよい。スピーカ３０は、「音声出力部」の一例である。 The microphone 10 is a sound collecting unit that collects sounds emitted in the vehicle interior. The display / operation device 20 is a device (or a group of devices) capable of displaying an image and accepting an input operation. The display / operation device 20 includes, for example, a display device configured as a touch panel. The display / operation device 20 may further include a HUD (Head Up Display) or a mechanical input device. The speaker 30 includes, for example, a speaker (sound output unit) arranged in the vehicle interior. The display / operation device 20 may be shared by the agent device 100 and the navigation device 40. The speaker 30 is an example of an “audio output unit”.

ナビゲーション装置４０は、ナビＨＭＩ（Human Machine Interface）と、ＧＰＳ（Global Positioning System）等の位置測位装置と、地図情報を記憶した記憶装置と、経路探索等を行う制御装置（ナビゲーションコントローラ）とを備える。マイク１０、表示・操作装置２０、及びスピーカ３０のうち一部または全部がナビＨＭＩとして用いられてもよい。ナビゲーション装置４０は、位置測位装置によって特定された車両Ｍの位置から、乗員によって入力された目的地まで移動するための経路（ナビ経路）を探索し、経路に沿って車両Ｍが走行できるように、ナビＨＭＩを用いて案内情報を出力する。経路探索機能は、ネットワークＮＷを介してアクセス可能なナビゲーションサーバにあってもよい。この場合、ナビゲーション装置４０は、ナビゲーションサーバから経路を取得して案内情報を出力する。 The navigation device 40 includes a navigation HMI (Human Machine Interface), a positioning device such as a GPS (Global Positioning System), a storage device that stores map information, and a control device (navigation controller) that performs route search and the like. .. A part or all of the microphone 10, the display / operation device 20, and the speaker 30 may be used as the navigation HMI. The navigation device 40 searches for a route (navigation route) for moving from the position of the vehicle M specified by the positioning device to the destination input by the occupant, so that the vehicle M can travel along the route. , Navi HMI is used to output guidance information. The route search function may be provided in a navigation server accessible via the network NW. In this case, the navigation device 40 acquires a route from the navigation server and outputs guidance information.

なお、エージェント装置１００は、ナビゲーションコントローラを基盤として構築されてもよい。この場合、ナビゲーションコントローラとエージェント装置１００は、ハードウェア上は一体に構成される。表示・操作装置２０のディスプレイ装置と、ナビゲーション装置４０のナビＨＭＩとは、「表示部」の一例である。 The agent device 100 may be constructed based on the navigation controller. In this case, the navigation controller and the agent device 100 are integrally configured on the hardware. The display device of the display / operation device 20 and the navigation HMI of the navigation device 40 are examples of the “display unit”.

車載通信装置５０は、例えば、セルラー網やＷｉ−Ｆｉ網を利用してネットワークＮＷにアクセス可能な無線通信装置である。 The in-vehicle communication device 50 is, for example, a wireless communication device that can access the network NW using a cellular network or a Wi-Fi network.

［エージェント装置］
エージェント装置１００は、管理部１１０と、エージェント機能部１３０と、車載通信部１４０と、記憶部１５０とを備える。管理部１１０は、例えば、音響処理部１１２と、エージェントＷＵ（Wake Up）判定部１１４と、通信制御部１１６と、出力制御部１２０と備える。図２に示すソフトウェア配置は説明のために簡易に示しており、実際には、例えば、エージェント機能部１３０と車載通信装置５０の間に管理部１１０が介在してもよいように、任意に改変することができる。また、以下では、エージェント機能部１３０とエージェントサーバ２００が協働して出現させるエージェントを、単に「エージェント」と称する場合がある。 [Agent device]
The agent device 100 includes a management unit 110, an agent function unit 130, an in-vehicle communication unit 140, and a storage unit 150. The management unit 110 includes, for example, an sound processing unit 112, an agent WU (Wake Up) determination unit 114, a communication control unit 116, and an output control unit 120. The software arrangement shown in FIG. 2 is simply shown for the sake of explanation, and is actually modified arbitrarily so that, for example, the management unit 110 may intervene between the agent function unit 130 and the in-vehicle communication device 50. can do. Further, in the following, an agent caused by the agent function unit 130 and the agent server 200 to appear in cooperation with each other may be simply referred to as an “agent”.

エージェント装置１００の各構成要素は、例えば、ＣＰＵ（Central Processing Unit）等のハードウェアプロセッサがプログラム（ソフトウェア）を実行することにより実現される。これらの構成要素のうち一部または全部は、ＬＳＩ（Large Scale Integration）やＡＳＩＣ（Application Specific Integrated Circuit）、ＦＰＧＡ（Field-Programmable Gate Array）、ＧＰＵ（Graphics Processing Unit）等のハードウェア（回路部；circuitryを含む）によって実現されてもよいし、ソフトウェアとハードウェアの協働によって実現されてもよい。記憶部１５０は、ＨＤＤ（Hard Disk Drive）やフラッシュメモリなどの記憶装置（非一過性の記憶媒体を備える記憶装置）により実現されてもよく、ＤＶＤやＣＤ−ＲＯＭなどの着脱可能な記憶媒体（非一過性の記憶媒体）により実現されてもよく、ドライブ装置に装着される記憶媒体であってもよい。また、記憶部１５０の一部又は全部は、ＮＡＳや外部のストレージサーバ等、エージェント装置１００がアクセス可能な外部装置であってもよい。記憶部１５０には、例えば、エージェント装置１００において実行されるプログラム等の情報が記憶される。 Each component of the agent device 100 is realized, for example, by executing a program (software) by a hardware processor such as a CPU (Central Processing Unit). Some or all of these components are hardware such as LSI (Large Scale Integration), ASIC (Application Specific Integrated Circuit), FPGA (Field-Programmable Gate Array), GPU (Graphics Processing Unit), etc. It may be realized by (including circuits), or it may be realized by the cooperation of software and hardware. The storage unit 150 may be realized by a storage device (a storage device including a non-transient storage medium) such as an HDD (Hard Disk Drive) or a flash memory, and is a removable storage medium such as a DVD or a CD-ROM. It may be realized by (non-transient storage medium), or it may be a storage medium mounted on a drive device. Further, a part or all of the storage unit 150 may be an external device such as NAS or an external storage server that can be accessed by the agent device 100. The storage unit 150 stores, for example, information such as a program executed by the agent device 100.

管理部１１０は、ＯＳ（Operating System）やミドルウェア等のプログラムが実行されることで機能する。 The management unit 110 functions by executing a program such as an OS (Operating System) or middleware.

管理部１１０の音響処理部１１２は、マイク１０から収集される音を受け付け、受け付けた音に対して、エージェントごとに予め設定されているウエイクアップワードを認識したり、その他の発話内容を認識するのに適した状態になるようにしたりする音響処理を行う。ウエイクアップワードとは、例えば、対象のエージェントを起動させるためのワード（単語）やフレーズ等である。ウエイクアップワードは、単体のエージェントを起動させるものでもよく、複数のエージェントを起動させるものでもよい。音響処理とは、例えば、バンドパスフィルタ等のフィルタリングによるノイズ除去や音の増幅等である。また、音響処理部１１２は、音響処理された音声を、エージェントＷＵ判定部１１４や起動中のエージェント機能部１３０に出力する。 The sound processing unit 112 of the management unit 110 receives the sound collected from the microphone 10, recognizes the wake-up word preset for each agent for the received sound, and recognizes other utterance contents. Perform sound processing to make it suitable for. The wakeup word is, for example, a word or phrase for activating the target agent. The wakeup word may start a single agent or may start a plurality of agents. The acoustic processing is, for example, noise removal by filtering such as a bandpass filter, sound amplification, and the like. Further, the sound processing unit 112 outputs the sound-processed voice to the agent WU determination unit 114 and the activated agent function unit 130.

エージェントＷＵ判定部１１４は、エージェントに予め定められているウエイクアップワードを認識する。エージェントＷＵ判定部１１４は、音響処理が行われた音声（音声ストリーム）から発話された音声を認識する。まず、エージェントＷＵ判定部１１４は、音声ストリームにおける音声波形の振幅と零交差に基づいて音声区間を検出する。エージェントＷＵ判定部１１４は、混合ガウス分布モデル（ＧＭＭ；Gaussian mixture model) に基づくフレーム単位の音声識別及び非音声識別に基づく区間検出を行ってもよい。 The agent WU determination unit 114 recognizes a wakeup word predetermined for the agent. The agent WU determination unit 114 recognizes the voice uttered from the voice (voice stream) to which the sound processing has been performed. First, the agent WU determination unit 114 detects a voice section based on the amplitude and zero intersection of the voice waveform in the voice stream. The agent WU determination unit 114 may perform frame-by-frame speech recognition based on a Gaussian mixture model (GMM) and section detection based on non-speech recognition.

次に、エージェントＷＵ判定部１１４は、検出した音声区間における音声をテキスト化し、文字情報とする。そして、エージェントＷＵ判定部１１４は、テキスト化した文字情報がウエイクアップワードに該当するか否かを判定する。ウエイクアップワードであると判定した場合、エージェントＷＵ判定部１１４は、ウエイクアップワードに対応するエージェント機能部１３０を起動させる。なお、エージェントＷＵ判定部１１４に相当する機能が、エージェントサーバ２００に搭載されてもよい。この場合、管理部１１０は、音響処理部１１２によって音響処理が行われた音声ストリームをエージェントサーバ２００に送信し、エージェントサーバ２００がウエイクアップワードであると判定した場合、エージェントサーバ２００からの指示に従ってエージェント機能部１３０が起動する。また、各エージェント機能部１３０は、常時起動しており且つウエイクアップワードの判定を自ら行うものであってよい。この場合、管理部１１０がエージェントＷＵ判定部１１４を備える必要はない。 Next, the agent WU determination unit 114 converts the voice in the detected voice section into text and converts it into character information. Then, the agent WU determination unit 114 determines whether or not the textualized character information corresponds to the wakeup word. When it is determined that the wakeup word is used, the agent WU determination unit 114 activates the agent function unit 130 corresponding to the wakeup word. The agent server 200 may be equipped with a function corresponding to the agent WU determination unit 114. In this case, when the management unit 110 transmits the voice stream to which the sound processing has been performed by the sound processing unit 112 to the agent server 200 and determines that the agent server 200 is a wakeup word, the management unit 110 follows an instruction from the agent server 200. The agent function unit 130 is started. Further, each agent function unit 130 may be always activated and may determine the wakeup word by itself. In this case, the management unit 110 does not need to include the agent WU determination unit 114.

また、エージェントＷＵ判定部１１４は、上述した手順と同様の手順で、発話された音声に含まれる終了ワードを認識した場合であり、且つ、終了ワードに対応するエージェントが起動している状態（以下、必要に応じて「起動中」と称する）である場合、起動中のエージェント機能部を終了（停止）させる。なお、エージェントの起動及び終了は、例えば、表示・操作装置２０から所定の操作を受け付けることによって実行されてもよいが、以下では、音声による起動及び停止の例を説明する。また、起動中のエージェントは、音声の入力を所定時間以上受け付けなかった場合に停止させてもよい。 Further, the agent WU determination unit 114 recognizes the end word included in the spoken voice by the same procedure as the above procedure, and the agent corresponding to the end word is activated (hereinafter, , If necessary, it is referred to as "starting"), the running agent function unit is terminated (stopped). The start and end of the agent may be executed, for example, by accepting a predetermined operation from the display / operation device 20, but an example of starting and stopping by voice will be described below. Further, the activated agent may be stopped when the voice input is not received for a predetermined time or more.

通信制御部１１６は、エージェント機能部１３０を、ネットワークＮＷに接続可能にするための制御を行う。例えば、通信制御部１１６は、エージェント機能部１３０がネットワークを介して外部装置（例えば、エージェントサーバ２００）と通信を行う場合の接続状態等を制御する。また、通信制御部１１６は、通信が途切れた場合の再接続や、接続状態の切り替え等の制御を行う。 The communication control unit 116 controls the agent function unit 130 so that it can be connected to the network NW. For example, the communication control unit 116 controls the connection state and the like when the agent function unit 130 communicates with an external device (for example, the agent server 200) via the network. Further, the communication control unit 116 controls such as reconnection when communication is interrupted and switching of the connection state.

出力制御部１２０は、通信制御部１１６またはエージェント機能部１３０等からの指示に応じて表示部またはスピーカ３０に応答内容等の情報を出力させることで、乗員にサービス等の提供を行う。出力制御部１２０は、例えば、表示制御部１２２と、音声制御部１２４とを備える。 The output control unit 120 provides the occupants with services and the like by causing the display unit or the speaker 30 to output information such as response contents in response to an instruction from the communication control unit 116 or the agent function unit 130 or the like. The output control unit 120 includes, for example, a display control unit 122 and a voice control unit 124.

表示制御部１２２は、エージェント機能部１３０がエージェントサーバ２００から取得した情報に基づいて、エージェントが車両Ｍの乗員の発話に応答する応答内容を、車両Ｍの乗員に通知するために用いられる画像を表示・操作装置２０のディスプレイ装置に表示させる。 The display control unit 122 displays an image used for notifying the occupant of the vehicle M of the response content in which the agent responds to the utterance of the occupant of the vehicle M based on the information acquired by the agent function unit 130 from the agent server 200. It is displayed on the display device of the display / operation device 20.

音声制御部１２４は、エージェント機能部１３０がエージェントサーバ２００から取得した情報に基づいて、エージェントが車両Ｍの乗員の発話に応答する応答内容を、車両Ｍの乗員に通知するために用いられる音声をスピーカ３０に出力させる。 The voice control unit 124 transmits a voice used for notifying the occupant of the vehicle M of the response content in which the agent responds to the utterance of the occupant of the vehicle M based on the information acquired by the agent function unit 130 from the agent server 200. Output to the speaker 30.

エージェント機能部１３０は、エージェントサーバ２００と協働して、車両の乗員の発話に応じて、音声、及び画像による応答を含むサービスを提供する。エージェント機能部１３０には、例えば、車両Ｍ、又は車両Ｍに搭載される車載機器を制御する権限が付与されており、後述する処理によりエージェントサーバ２００によって認識された車両Ｍの発話内容が、車両Ｍに搭載される車載機器の動作を指示するコマンドである場合、エージェント機能部１３０は、コマンドに基づいてそれらの車載機器を制御する。車載機器には、ナビゲーション装置４０が含まれる。エージェント機能部１３０は、通信制御部１１６の制御に基づいて、車載通信部１４０によって車載通信装置５０を介してエージェントサーバ２００と通信する。 The agent function unit 130 cooperates with the agent server 200 to provide a service including a response by voice and an image in response to an utterance of a vehicle occupant. For example, the agent function unit 130 is granted the authority to control the vehicle M or the in-vehicle device mounted on the vehicle M, and the utterance content of the vehicle M recognized by the agent server 200 by the process described later is the vehicle. In the case of a command for instructing the operation of the in-vehicle devices mounted on the M, the agent function unit 130 controls those in-vehicle devices based on the command. The in-vehicle device includes a navigation device 40. Based on the control of the communication control unit 116, the agent function unit 130 communicates with the agent server 200 by the vehicle-mounted communication unit 140 via the vehicle-mounted communication device 50.

なお、エージェント機能部１３０には、法律や条例、エージェントを提供する事業者同士の契約等に応じて、車載機器を制御する権限が割り振られるものであってもよい。 The agent function unit 130 may be assigned the authority to control the in-vehicle device according to laws, ordinances, contracts between businesses that provide agents, and the like.

車載通信部１４０は、例えば、エージェント機能部１３０がネットワークＮＷに接続する場合に、車載通信装置５０を介して通信させる。車載通信部１４０は、エージェント機能部１３０からの情報を、車載通信装置５０を介してエージェントサーバ２００やその他の外部装置に出力する。また、車載通信部１４０は、車載通信装置５０を介して入力された情報をエージェント機能部１３０に出力する。 The vehicle-mounted communication unit 140 communicates via the vehicle-mounted communication device 50, for example, when the agent function unit 130 connects to the network NW. The vehicle-mounted communication unit 140 outputs the information from the agent function unit 130 to the agent server 200 and other external devices via the vehicle-mounted communication device 50. Further, the vehicle-mounted communication unit 140 outputs the information input via the vehicle-mounted communication device 50 to the agent function unit 130.

エージェント機能部１３０は、エージェントＷＵ判定部１１４による起動指示に基づいて起動し、乗員の発話に対して、エージェントサーバ２００を介して乗員の発話の音声に含まれる要求に対する応答内容を生成し、生成した応答内容を出力制御部１２０に出力する。また、エージェント機能部１３０は、エージェントサーバ２００と通信を行う場合には、通信制御部１１６により制御された接続状態によって通信を行う。また、エージェント機能部１３０は、エージェントＷＵ判定部１１４による制御に基づいて、エージェントを停止させてもよい。 The agent function unit 130 is activated based on an activation instruction by the agent WU determination unit 114, and generates and generates a response content to a request included in the voice of the occupant's utterance via the agent server 200 in response to the occupant's utterance. The response content is output to the output control unit 120. Further, when communicating with the agent server 200, the agent function unit 130 communicates according to the connection state controlled by the communication control unit 116. Further, the agent function unit 130 may stop the agent based on the control by the agent WU determination unit 114.

［エージェントサーバ］
図３は、実施形態に係るエージェントサーバ２００の構成と、エージェント装置１００の構成の一部とを示す図である。以下、エージェントサーバ２００の構成とともに、エージェント機能部１３０等の動作について説明する。ここでは、エージェント装置１００からネットワークＮＷまでの物理的な通信についての説明を省略する。 [Agent server]
FIG. 3 is a diagram showing a configuration of the agent server 200 and a part of the configuration of the agent device 100 according to the embodiment. Hereinafter, the operation of the agent function unit 130 and the like will be described together with the configuration of the agent server 200. Here, the description of the physical communication from the agent device 100 to the network NW will be omitted.

エージェントサーバ２００は、通信部２１０を備える。通信部２１０は、例えば、ＮＩＣ（Network Interface Card）等のネットワークインターフェースである。更に、エージェントサーバ２００は、例えば、音声認識部２２０と、自然言語処理部２２１と、対話管理部２２２と、ネットワーク検索部２２３と、応答内容生成部２２４との機能部を備える。これらの構成要素は、例えば、ＣＰＵ等のハードウェアプロセッサがプログラム（ソフトウェア）を実行することにより実現される。これらの構成要素のうち一部または全部は、ＬＳＩやＡＳＩＣ、ＦＰＧＡ、ＧＰＵ等のハードウェア（回路部；circuitryを含む）によって実現されてもよいし、ソフトウェアとハードウェアの協働によって実現されてもよい。プログラムは、予めＨＤＤやフラッシュメモリ等の記憶装置（非一過性の記憶媒体を備える記憶装置）に格納されていてもよいし、ＤＶＤやＣＤ−ＲＯＭ等の着脱可能な記憶媒体（非一過性の記憶媒体）に格納されており、記憶媒体がドライブ装置に装着されることでインストールされてもよい。音声認識部２２０と、自然言語処理部２２１とを組み合わせたものは、「発話内容解釈部」の一例である。 The agent server 200 includes a communication unit 210. The communication unit 210 is, for example, a network interface such as a NIC (Network Interface Card). Further, the agent server 200 includes, for example, a voice recognition unit 220, a natural language processing unit 221, a dialogue management unit 222, a network search unit 223, and a response content generation unit 224. These components are realized, for example, by a hardware processor such as a CPU executing a program (software). Some or all of these components may be realized by hardware such as LSI, ASIC, FPGA, GPU (including circuit part; circuitry), or realized by collaboration between software and hardware. May be good. The program may be stored in advance in a storage device such as an HDD or a flash memory (a storage device including a non-transient storage medium), or a removable storage medium such as a DVD or a CD-ROM (non-transient). It is stored in a sex storage medium) and may be installed by mounting the storage medium in a drive device. The combination of the voice recognition unit 220 and the natural language processing unit 221 is an example of the "utterance content interpretation unit".

また、エージェントサーバ２００は、記憶部２５０を備える。記憶部２５０は、上記の記憶部１５０を実現する各種記憶装置と同様の装置により実現される。記憶部２５０には、例えば、辞書ＤＢ２５２、パーソナルプロファイル２５４、知識ベースＤＢ２５６、応答規則ＤＢ２５８等のデータやプログラムが格納される。 Further, the agent server 200 includes a storage unit 250. The storage unit 250 is realized by the same device as the various storage devices that realize the above-mentioned storage unit 150. The storage unit 250 stores, for example, data and programs such as a dictionary DB 252, a personal profile 254, a knowledge base DB 256, and a response rule DB 258.

エージェント装置１００において、エージェント機能部１３０は、例えば、音響処理部１１１等から入力される音声ストリーム、或いは圧縮や符号化等の処理を行った音声ストリームを、エージェントサーバ２００に送信する。エージェント機能部１３０は、ローカル処理（エージェントサーバ２００を介さない処理）が可能なコマンド（要求内容）が認識できた場合には、コマンドで要求された処理を実行してもよい。ローカル処理が可能なコマンドとは、例えば、エージェント装置１００が備える記憶部１５０を参照することで応答可能なコマンドである。より具体的には、ローカル処理が可能なコマンドとは、例えば、記憶部１５０内に存在する電話帳データ（不図示）から特定者の名前を検索し、合致した名前に対応付けられた電話番号に電話をかける（相手を呼び出す）コマンドである。したがって、エージェント機能部１３０は、エージェントサーバ２００が備える機能の一部を有してもよい。 In the agent device 100, the agent function unit 130 transmits, for example, a voice stream input from the sound processing unit 111 or the like, or a voice stream that has undergone processing such as compression or coding to the agent server 200. When the agent function unit 130 can recognize a command (request content) capable of local processing (processing that does not go through the agent server 200), the agent function unit 130 may execute the processing requested by the command. The command capable of local processing is, for example, a command that can be responded to by referring to the storage unit 150 included in the agent device 100. More specifically, the command capable of local processing is, for example, a telephone directory associated with a matching name by searching for the name of a specific person from the telephone directory data (not shown) existing in the storage unit 150. It is a command to make a call to (call the other party). Therefore, the agent function unit 130 may have a part of the functions provided in the agent server 200.

音声ストリームを取得すると、音声認識部２２０が音声認識を行ってテキスト化された文字情報を出力し、自然言語処理部２２１が文字情報に対して辞書ＤＢ２５２を参照しながら意味解釈を行う。辞書ＤＢ２５２は、例えば、文字情報に対して抽象化された意味情報が対応付けられたものである。辞書ＤＢ２５２は、例えば、機能辞書２５２Ａと、汎用辞書２５２Ｂとを含む。 When the voice stream is acquired, the voice recognition unit 220 performs voice recognition and outputs the textualized character information, and the natural language processing unit 221 interprets the meaning of the character information while referring to the dictionary DB 252. The dictionary DB 252 is, for example, associated with abstract semantic information with respect to character information. The dictionary DB 252 includes, for example, a functional dictionary 252A and a general-purpose dictionary 252B.

機能辞書２５２Ａは、エージェントサーバ２００がエージェント機能部１３０と協働して実現するエージェントが提供する機能（サービス）をカバーするための辞書である。例えば、エージェントが車載エアコンを制御する機能を提供する場合、機能辞書２５２Ａには、「エアコン」、「空調」、「つける」、「消す」、「温度」、「上げる」、「下げる」、「内気」、「外気」等の単語が、動詞、目的語等の単語種別、及び抽象化された意味と対応付けられて登録されている。また、機能辞書２５２Ａには、同時に使用可能であることを示す単語間リンク情報が含まれてよい。 The function dictionary 252A is a dictionary for covering the functions (services) provided by the agent realized by the agent server 200 in cooperation with the agent function unit 130. For example, when the agent provides a function to control an in-vehicle air conditioner, the function dictionary 252A contains "air conditioner", "air conditioning", "turn on", "turn off", "temperature", "raise", "lower", and "lower". Words such as "inside air" and "outside air" are registered in association with word types such as verbs and objects, and abstracted meanings. In addition, the functional dictionary 252A may include inter-word link information indicating that they can be used at the same time.

汎用辞書２５２Ｂは、エージェントの提供する機能に限らず、一般的な物事の事象を抽象化された意味と対応付けた辞書である。機能辞書２５２Ａと汎用辞書２５２Ｂのそれぞれは、同義語や類義語の一覧情報を含んでもよい。機能辞書２５２Ａと汎用辞書２５２Ｂとは、複数の言語のそれぞれに対応して用意されてよく、その場合、音声認識部２２０及び自然言語処理部２２１は、予め設定されている言語設定に応じた機能辞書２５２Ａ及び汎用辞書２５２Ｂ、並びに文法情報（不図示）を使用する。音声認識部２２０の処理と、自然言語処理部２２１の処理は、段階が明確に分かれるものではなく、自然言語処理部２２１の処理結果を受けて音声認識部２２０が認識結果を修正する等、相互に影響し合って行われてよい。 The general-purpose dictionary 252B is not limited to the functions provided by the agent, but is a dictionary in which general events are associated with abstract meanings. Each of the functional dictionary 252A and the general-purpose dictionary 252B may include list information of synonyms and synonyms. The function dictionary 252A and the general-purpose dictionary 252B may be prepared corresponding to each of a plurality of languages, in which case the voice recognition unit 220 and the natural language processing unit 221 have functions corresponding to preset language settings. A dictionary 252A, a general-purpose dictionary 252B, and grammatical information (not shown) are used. The processing of the voice recognition unit 220 and the processing of the natural language processing unit 221 are not clearly separated in stages, and the voice recognition unit 220 corrects the recognition result in response to the processing result of the natural language processing unit 221. It may be done by influencing each other.

自然言語処理部２２１は、音声認識部２２０による認識結果に基づく意味解析の一つとして、音声に含まれるサービスの要求に対応するために必要な機能に関する情報（以下、機能必要情報）を取得する。例えば、認識結果として、車両Ｍの車載機器の制御を指示する「窓を開けて」、「空調の温度を上げて」等のテキストが認識された場合、自然言語処理部２２１は、辞書ＤＢ２５２等を参照し、「車両機器制御」という対象機器・機能種別を取得する。そして、自然言語処理部２２１は、取得した機能必要情報をエージェント機能部１３０に出力する。自然言語処理部２２１は、機能必要情報に基づきサービス要求に対する実行可否の判定結果を取得する。自然言語処理部２２１は、要求された機能が実行可能である場合に、サービスの要求に対応できるものとして、解釈された発話内容に対応したコマンドを生成する。 The natural language processing unit 221 acquires information on functions required for responding to a request for a service included in voice (hereinafter referred to as function necessary information) as one of semantic analysis based on the recognition result by the voice recognition unit 220. .. For example, when a text such as "open the window" or "raise the temperature of the air conditioner" instructing the control of the in-vehicle device of the vehicle M is recognized as the recognition result, the natural language processing unit 221 may use the dictionary DB252 or the like. To acquire the target device / function type called "vehicle device control". Then, the natural language processing unit 221 outputs the acquired function necessary information to the agent function unit 130. The natural language processing unit 221 acquires the determination result of whether or not the service request can be executed based on the function required information. The natural language processing unit 221 generates a command corresponding to the interpreted utterance content as being able to respond to the service request when the requested function can be executed.

対話管理部２２２は、自然言語処理部２２１により生成されたコマンドに基づいて、パーソナルプロファイル２５４や知識ベースＤＢ２５６、応答規則ＤＢ２５８を参照しながら車両Ｍの乗員に対する応答内容（例えば、乗員への発話内容や出力部から出力する画像、音声）を決定する。知識ベースＤＢ２５６は、物事の関係性を規定した情報である。応答規則ＤＢ２５８は、コマンドに対してエージェントが行うべき動作（回答や機器制御の内容等）を規定した情報である。 The dialogue management unit 222 responds to the occupant of the vehicle M (for example, the utterance content to the occupant) while referring to the personal profile 254, the knowledge base DB 256, and the response rule DB 258 based on the command generated by the natural language processing unit 221. And the image and sound to be output from the output section). The knowledge base DB 256 is information that defines the relationships between things. The response rule DB 258 is information that defines the actions (answers, device control contents, etc.) that the agent should perform in response to the command.

また、対話管理部２２２は、音声ストリームから得られる特徴情報を用いて、パーソナルプロファイル２５４と照合を行うことで、乗員を特定してもよい。この場合、パーソナルプロファイル２５４には、例えば、音声の特徴情報が更に応付けられている。音声の特徴情報とは、例えば、声の高さ、イントネーション、リズム（音の高低のパターン）等の喋り方の特徴や、メル周波数ケプストラム係数（Mel Frequency Cepstrum Coefficients）等による特徴量に関する情報である。音声の特徴情報は、例えば、乗員の初期登録時に所定の単語や文章等を乗員に発声させ、発声させた音声を認識することで得られる情報である。 Further, the dialogue management unit 222 may identify the occupant by collating with the personal profile 254 using the feature information obtained from the voice stream. In this case, for example, voice feature information is further attached to the personal profile 254. The voice feature information is, for example, information on the characteristics of how to speak such as voice pitch, intonation, and rhythm (sound pitch pattern), and the feature amount based on Mel Frequency Cepstrum Coefficients and the like. .. The voice feature information is, for example, information obtained by having the occupant utter a predetermined word or sentence at the time of initial registration of the occupant and recognizing the uttered voice.

対話管理部２２２は、コマンドがネットワークＮＷを介して検索可能な情報を要求するものである場合、ネットワーク検索部２２３に検索を行わせる。ネットワーク検索部２２３は、ネットワークＮＷを介して所定のウェブサーバ３００等の外部機器にアクセスし、所望の情報を取得する。 The dialogue management unit 222 causes the network search unit 223 to perform a search when the command requests information that can be searched via the network NW. The network search unit 223 accesses an external device such as a predetermined web server 300 via the network NW and acquires desired information.

応答内容生成部２２４は、対話管理部２２２により決定された発話の内容が車両Ｍの乗員に理解されるように、応答文を生成し、生成した応答文をエージェント装置１００に送信する。また、応答内容生成部２２４は、カメラが車室内を撮像した画像に基づいて車両Ｍの乗員を認識した認識結果をエージェント装置１００から取得し、取得した認識結果によりコマンドを含む発話を行った乗員がパーソナルプロファイル２５４に登録された乗員であることが特定されている場合に、乗員の名前を呼んだり、乗員の話し方に似せた話し方にしたりした応答文を生成してもよい。 The response content generation unit 224 generates a response sentence so that the content of the utterance determined by the dialogue management unit 222 can be understood by the occupants of the vehicle M, and transmits the generated response sentence to the agent device 100. Further, the response content generation unit 224 acquires the recognition result of recognizing the occupant of the vehicle M based on the image captured by the camera interior from the agent device 100, and the occupant who made an utterance including a command based on the acquired recognition result. When is identified as an occupant registered in the personal profile 254, a response sentence may be generated that calls the occupant's name or makes the occupant's speech similar to that of the occupant.

エージェント機能部１３０は、応答文を取得すると、音声合成を行って音声を出力するように音声制御部１２４に指示する。また、エージェント機能部１３０は、応答文を含む画像等を表示するように表示制御部１２２に指示する。 When the agent function unit 130 acquires the response sentence, the agent function unit 130 instructs the voice control unit 124 to perform voice synthesis and output the voice. Further, the agent function unit 130 instructs the display control unit 122 to display an image or the like including a response sentence.

上記構成を有する本実施形態のエージェントシステム１において、乗員は、表示・操作装置２０として備えられるタッチパネル（表示部の一例）に対する操作として、手動操作と音声操作とを併用することができる。
手動操作は、物理的に設けられた入力デバイスや操作子を乗員が指等の操作体を用いて行う操作である。一例として、タッチパネルに対する手動操作は、タッチパネルの表示面（操作面）に対して指等の操作体を触れさせて行う操作である。
音声操作は、本実施形態のエージェントシステム１が備えるエージェント機能を利用して、乗員が発話を行ったことに応じて、各種サービスとしての車両Ｍの機器のコントロール等を実行させる操作である。
乗員は、タッチパネルに対して手動操作として可能な操作を、音声操作によっても行うことができる。つまり、本実施形態におけるエージェントシステムにおいて、乗員は、タッチパネルに対応して行う操作を、手動操作と音声操作とのいずれによっても行うことが可能とされている。
また、以降の説明における「サービス」は、音声操作だけではなく、手動操作も併用して行われる操作に応答して提供される機能をいう。 In the agent system 1 of the present embodiment having the above configuration, the occupant can use both manual operation and voice operation as operations on the touch panel (an example of the display unit) provided as the display / operation device 20.
The manual operation is an operation in which the occupant uses an operating body such as a finger to perform a physically provided input device or operator. As an example, the manual operation on the touch panel is an operation performed by touching the display surface (operation surface) of the touch panel with an operation body such as a finger.
The voice operation is an operation in which the agent function provided in the agent system 1 of the present embodiment is used to control the equipment of the vehicle M as various services in response to the occupant speaking.
The occupant can also perform operations that can be performed manually on the touch panel by voice operation. That is, in the agent system of the present embodiment, the occupant can perform the operation corresponding to the touch panel by both the manual operation and the voice operation.
Further, the "service" in the following description refers to a function provided in response to an operation performed not only by voice operation but also by manual operation.

図４のフローチャートを参照して、本実施形態のエージェントシステム１が、乗員により行われるタッチパネルへの操作（手動操作、音声操作）に関連して実行する処理手順例について説明する。同図の処理は、エージェントが既に起動されている状態のもとで行われる。また、同図の説明において、タッチパネルに対して行われた手動操作に対する応答に関する制御については、管理部１１０が実行するようにされた場合を例に挙げる。 An example of a processing procedure executed by the agent system 1 of the present embodiment in connection with an operation (manual operation, voice operation) on the touch panel performed by the occupant will be described with reference to the flowchart of FIG. The processing shown in the figure is performed while the agent has already been started. Further, in the description of the figure, the case where the management unit 110 is set to execute the control regarding the response to the manual operation performed on the touch panel will be given as an example.

まず、エージェント装置１００において、管理部１１０は、タッチパネルが手動操作を受け付けたか否かについて判定する（ステップＳ１００）。
タッチパネルが手動操作を受け付けた場合、管理部１１０は、タッチパネルに対して行われた手動操作に応答して車両Ｍにおける機器の動作が得られるように制御（応答制御）を実行する（ステップＳ１０２）。この際、管理部１１０（応答表示制御部の一例）は、タッチパネルにおいて表示される画像について、今回行われた手動操作に応答したものとなるように表示制御を実行してよい。 First, in the agent device 100, the management unit 110 determines whether or not the touch panel has accepted the manual operation (step S100).
When the touch panel accepts a manual operation, the management unit 110 executes control (response control) so that the operation of the device in the vehicle M can be obtained in response to the manual operation performed on the touch panel (step S102). .. At this time, the management unit 110 (an example of the response display control unit) may execute display control on the image displayed on the touch panel so as to respond to the manual operation performed this time.

また、タッチパネルが手動操作を受け付けた場合、エージェント機能部１３０が、対話状態継続フラグについての制御（対話状態継続フラグ制御）を実行する（ステップＳ１０４）。
なお、タッチパネルが手動操作を受け付けた際に、エージェント機能部１３０が起動されていない状態の場合には、エージェントＷＵ判定部１１４がエージェント機能部１３０を起動させて、ステップＳ１０４の処理を実行させるようにしてよい。
対話状態継続フラグは、セットの有無に応じて、エージェントシステム１が対話状態を継続しているか否かを示すフラグである。エージェントシステム１は、対話状態継続フラグがオンとされて対話状態を継続しているときには、音声操作を受け付けて、発話の内容に応答した制御を実行する。一方、エージェントシステム１は、対話状態継続フラグがオフとされて対話状態を停止しているときには、音声操作を受け付けない。対話状態継続フラグは、最後に行われた操作（手動操作または音声操作）から一定時間を経過した状態である場合にオンからオフとなる。
エージェント機能部１３０は、当該ステップＳ１０４の対話状態継続フラグ制御として、対話状態継続フラグがオフの状態であった場合には、対話状態継続フラグをオンにする。つまり、本実施形態におけるエージェント機能部１３０は、手動操作が行われた場合にも、対話状態継続フラグをオンとして、以降の音声操作を受け津可能な状態とする。
また、対話状態継続フラグがオンの状態であり、かつ、今回のタッチパネルに対する操作によって１のサービスの提供が完了した場合には、以降においてエージェント機能部１３０が当該１のサービスに応じた操作を受け付ける必要が無い。この場合、エージェント機能部１３０は、対話状態継続フラグ制御として、対話状態継続フラグをオフとする。
また、対話状態継続フラグがオンの状態であり、かつ、今回のタッチパネルに対する操作によっては、未だ１のサービスの提供が完了していない場合には、当該１のサービスについての以降の操作を受け付けることができる。そこで、この場合のエージェント機能部１３０は、対話状態継続フラグ制御として、対話状態継続フラグがオンの状態を維持する。 When the touch panel accepts a manual operation, the agent function unit 130 executes control of the dialogue state continuation flag (dialogue state continuation flag control) (step S104).
If the agent function unit 130 is not activated when the touch panel accepts a manual operation, the agent WU determination unit 114 activates the agent function unit 130 to execute the process of step S104. May be.
The dialogue state continuation flag is a flag indicating whether or not the agent system 1 continues the dialogue state depending on the presence or absence of the set. When the dialogue state continuation flag is turned on and the dialogue state is continued, the agent system 1 accepts a voice operation and executes control in response to the content of the utterance. On the other hand, the agent system 1 does not accept the voice operation when the dialogue state continuation flag is turned off and the dialogue state is stopped. The dialogue state continuation flag is turned on and off when a certain period of time has passed since the last operation (manual operation or voice operation).
As the dialogue state continuation flag control in step S104, the agent function unit 130 turns on the dialogue state continuation flag when the dialogue state continuation flag is in the off state. That is, the agent function unit 130 in the present embodiment turns on the dialogue state continuation flag even when a manual operation is performed, so that the agent function unit 130 can receive subsequent voice operations.
Further, when the dialogue state continuation flag is on and the provision of one service is completed by the operation on the touch panel this time, the agent function unit 130 subsequently accepts the operation corresponding to the one service. There is no need. In this case, the agent function unit 130 turns off the dialogue state continuation flag as the dialogue state continuation flag control.
In addition, if the dialogue state continuation flag is on and the provision of the service of 1 has not been completed depending on the operation on the touch panel this time, the subsequent operations of the service of the 1 are accepted. Can be done. Therefore, the agent function unit 130 in this case maintains the state in which the dialogue state continuation flag is on as the dialogue state continuation flag control.

ステップＳ１０６の処理の後、或いはステップＳ１０４にて対話状態継続フラグがオンであると判定された場合、エージェント機能部１３０は、操作文脈情報に関する制御（操作文脈情報制御）を実行する（ステップＳ１０８）。
操作文脈情報は、サービス単位で行われる一連の操作手順のもとでの操作の履歴を示す情報である。例えば、乗員が現在位置の近くのガソリンスタンドの情報を得るためにＰＯＩ（point of interest）検索を行う場合であれば、操作手順としては、１つには、ＰＯＩ検索の実行指示、検索対象のカテゴリ選択、選択されたカテゴリに対する絞り込み検索指示、絞り込み検索結果から情報提示対象となる１のガソリンスタンドの選択、といった流れとなる。操作文脈情報は、このような操作手順における操作ごとの内容が示される。例えば、上記のようなガソリンスタンドを検索する場合であれば、［ＰＯＩ検索の実行指示］、［検索対象のカテゴリとして「ガソリンスタンド」を選択］、「絞り込み条件「赤坂周辺」で絞り込み検索」、「絞り込み検索結果から「Ａ店」を選択」といった操作ごとの内容が操作文脈情報により示される。また、操作文脈情報に反映される各操作は、手動操作と音声操作とのいずれが含まれてもよい。
また、今回のステップＳ１００にて受け付けられたタッチパネルに対する手動操作が、或る１のサービスに対応する最初の操作（例えば、ＰＯＩ検索であれば、ＰＯＩ検索開始を指示する操作）である場合、エージェント機能部１３０は、当該ステップＳ１０８の操作文脈情報制御として以下の処理を実行してよい。つまり、エージェント機能部１３０は、今回のステップＳ１００に応じて受け付けられたタッチパネルの手動操作の内容を履歴として含む操作文脈情報を新規に生成し、生成した操作文脈情報を保持する。操作文脈情報の保持にあたり、エージェント機能部１３０は、操作文脈情報を記憶部１５０に記憶させてよい。
また、エージェント機能部１３０は、今回のタッチパネルに対する手動操作が、１のサービスにおける２回目以降の操作である場合には、既に保持されている操作文脈情報について、今回のタッチパネルに対する手動操作の内容の履歴が追加されるように更新する。
また、エージェント機能部１３０は、今回のタッチパネルに対する手動操作により１のサービスの提供が完了した場合には、操作文脈情報をクリアする。 After the processing of step S106, or when it is determined in step S104 that the dialogue state continuation flag is on, the agent function unit 130 executes control related to the operation context information (operation context information control) (step S108). ..
The operation context information is information indicating the history of operations under a series of operation procedures performed for each service. For example, when a occupant performs a POI (point of interest) search in order to obtain information on a gas station near the current position, one of the operation procedures is an instruction to execute a POI search and a search target. The flow is such as category selection, narrowing search instruction for the selected category, and selection of one gas station to be presented with information from the narrowed search result. The operation context information indicates the content of each operation in such an operation procedure. For example, when searching for a gas station like the one above, [POI search execution instruction], [Select "gas station" as the search target category], "Refine search by narrowing down condition" Akasaka area "", The contents of each operation such as "select" A store "from the refined search results" are indicated by the operation context information. Further, each operation reflected in the operation context information may include either a manual operation or a voice operation.
Further, when the manual operation on the touch panel received in step S100 this time is the first operation corresponding to a certain service (for example, in the case of POI search, the operation instructing the start of POI search), the agent The functional unit 130 may execute the following processing as the operation context information control in the step S108. That is, the agent function unit 130 newly generates the operation context information including the contents of the manual operation of the touch panel received in accordance with the step S100 this time as a history, and holds the generated operation context information. In holding the operation context information, the agent function unit 130 may store the operation context information in the storage unit 150.
Further, when the manual operation on the touch panel this time is the second and subsequent operations in one service, the agent function unit 130 describes the contents of the manual operation on the touch panel this time with respect to the operation context information already held. Update to add history.
Further, the agent function unit 130 clears the operation context information when the provision of one service is completed by the manual operation on the touch panel this time.

タッチパネルに対する手動操作が行われなかった場合、例えばエージェント機能部１３９は、マイク１０等にて収集された音声が音響処理部１１２にて受け付けられたか否かについて判定する（ステップＳ１０８）。
音声が受け付けられた場合、エージェント機能部１３０は、音響処理部１１２が受け付けて音響処理を施した音声をエージェントサーバ２００に送信する。エージェントサーバ２００において音声認識部２２０は、受信された音声を対象として音声認識処理を実行することで、受信された音声をテキストに変換する（ステップＳ１１０）。
次に、自然言語処理部２２１（発話内容解釈部の一例）は、テキスト化された文字情報に対する自然言語処理を実行し、文字情報の意味解釈を行う（ステップＳ１１２）。ステップＳ１１２の意味解釈によって、乗員の発話内容の意味がどういったものであるのかが認識される。 When the manual operation on the touch panel is not performed, for example, the agent function unit 139 determines whether or not the voice collected by the microphone 10 or the like is received by the sound processing unit 112 (step S108).
When the voice is received, the agent function unit 130 transmits the voice received by the sound processing unit 112 and subjected to the sound processing to the agent server 200. In the agent server 200, the voice recognition unit 220 converts the received voice into text by executing the voice recognition process for the received voice (step S110).
Next, the natural language processing unit 221 (an example of the utterance content interpretation unit) executes natural language processing on the textualized character information and interprets the meaning of the character information (step S112). By interpreting the meaning of step S112, it is recognized what the meaning of the utterance content of the occupant is.

次に、自然言語処理部２２１は、現在において対話状態継続フラグがオンであるか否かについて判定する（ステップＳ１１４）。この際、自然言語処理部２２１は、エージェント装置１００との通信を介して、エージェント機能部１３０に対話状態継続フラグの状態を問合せるようにされてよい。 Next, the natural language processing unit 221 determines whether or not the dialogue state continuation flag is currently on (step S114). At this time, the natural language processing unit 221 may inquire the agent function unit 130 of the state of the dialogue state continuation flag via communication with the agent device 100.

対話状態継続フラグがオンである場合、現在においては、１のサービスの提供のもとで次に行われる操作を待機している状態にある。このような状態では、対話状態継続フラグはオンの状態が維持され、操作文脈情報はクリアされることなくエージェント機能部１３０により保持されている。
この場合、自然言語処理部２２１（発話内容判定部の一例）は、ステップＳ１１２により意味が認識された発話内容が、単独でサービス要求として成立するものであるか否かについて判定する（ステップＳ１１６）。
単独でサービス要求として成立する発話内容は、例えば「赤坂周辺のガソリンスタンドを検索して」であるとか「エアコンの温度を２０度にして」といったように、一文の意味として要求するサービスが何であるのかが特定されるような発話内容となる。この発話内容は、それ自体で、ＰＯＩ検索により赤坂周辺のガソリンスタンドを検索することを要求する意味であると把握されることから、単独でサービス要求として成立する発話内容である。
一方、単独でサービス要求として成立しない発話内容は、例えば「赤坂周辺」といったように、一文から一部が抜き出された語句となる。このような発話内容は、それ自体では、どのようなサービスを具体的に要求するものであるのかが特定できない。このような発話内容を特定するには、例えばこれまでの操作文脈がどのようなものであったのかといったことの補完が必要となる。
当該ステップＳ１１６の判定は、以下のように行われてよい。例えば、自然言語処理部２２１は、辞書ＤＢ２５２を参照して機能必要情報を取得するにあたり、認識された発話内容自体により機能必要情報の取得が可能であったか否かに基づいて判定してよい。つまり、自然言語処理部２２１は、機能必要情報の取得が可能だったのであれば、認識された発話内容は、単独でサービス要求として成立するものであると判定する。これに対して、自然言語処理部２２１は、機能必要情報の取得ができなかったのであれば、認識された発話内容は、単独でサービス要求として成立するものでないと判定する。 When the dialogue state continuation flag is on, it is currently in a state of waiting for the next operation under the provision of one service. In such a state, the dialogue state continuation flag is maintained in the ON state, and the operation context information is held by the agent function unit 130 without being cleared.
In this case, the natural language processing unit 221 (an example of the utterance content determination unit) determines whether or not the utterance content whose meaning is recognized in step S112 is independently satisfied as a service request (step S116). ..
The content of the utterance that is independently established as a service request is, for example, "Search for a gas station around Akasaka" or "Set the temperature of the air conditioner to 20 degrees". The content of the utterance will be such that it will be specified. Since this utterance content is understood to mean that a POI search is required to search for gas stations around Akasaka, the utterance content is independently established as a service request.
On the other hand, the utterance content that cannot be satisfied as a service request by itself is a phrase that is partially extracted from one sentence, for example, "around Akasaka". The content of such an utterance cannot identify what kind of service is specifically requested by itself. In order to specify the content of such an utterance, it is necessary to supplement, for example, what the operation context has been so far.
The determination in step S116 may be performed as follows. For example, when the natural language processing unit 221 refers to the dictionary DB 252 to acquire the function required information, the natural language processing unit 221 may make a determination based on whether or not the function required information can be acquired based on the recognized utterance content itself. That is, if the natural language processing unit 221 can acquire the function necessary information, the natural language processing unit 221 determines that the recognized utterance content is independently satisfied as a service request. On the other hand, if the natural language processing unit 221 cannot acquire the function necessary information, it determines that the recognized utterance content is not independently satisfied as a service request.

認識された発話内容が、単独でサービス要求として成立するものではない場合、自然言語処理部２２１は、エージェント機能部１３０により保持されていた操作文脈情報についてクリアすることなく、保持された状態が維持されるようにする。そのうえで、自然言語処理部２２１（エージェント制御部の一例）は、保持が維持された操作文脈情報を参照する（ステップＳ１１８）。
次に、自然言語処理部２２１は、ステップＳ１１８により参照した操作文脈情報が示す操作内容の履歴により、今回のステップＳ１１２により認識された発話内容の意味を補完する（ステップＳ１２０）。具体的に、自然言語処理部２２１は、今回のステップＳ１１２により意味が認識された発話内容を、これまでに１のサービスに応じてタッチパネルに対して行われた操作（手動操作、音声操作）に続く音声操作としての発話内容として扱う。
ステップＳ１２０の補完にあたり、自然言語処理部２２１は、例えば辞書ＤＢ２５２を利用して、今回認識された発話内容の意味が、これまでのタッチパネルに対する操作文脈における次の操作となるものであるか否かについて判定する。次の操作内容としてつながるものであると判定した場合、自然言語処理部２２１は、今回認識された発話内容の意味についての補完を行う。一方、次の操作内容としてつながるものではないと判定した場合、同図における処理についての図示は省略するが、自然言語処理部２２１は、今回の発話内容に対する応答が不可であるとして、エラーに応じた処理を実行してよい。 When the recognized utterance content is not independently established as a service request, the natural language processing unit 221 maintains the retained state without clearing the operation context information held by the agent function unit 130. To be done. Then, the natural language processing unit 221 (an example of the agent control unit) refers to the operation context information whose retention is maintained (step S118).
Next, the natural language processing unit 221 complements the meaning of the utterance content recognized in step S112 this time with the history of the operation content indicated by the operation context information referred to in step S118 (step S120). Specifically, the natural language processing unit 221 converts the utterance content whose meaning has been recognized in step S112 this time into operations (manual operation, voice operation) performed on the touch panel according to the service of 1. Treat it as the content of the utterance as a subsequent voice operation.
In complementing step S120, the natural language processing unit 221 uses, for example, the dictionary DB252 to determine whether or not the meaning of the utterance content recognized this time is the next operation in the operation context for the touch panel so far. Judgment about. When it is determined that the operation content is connected as the next operation content, the natural language processing unit 221 complements the meaning of the utterance content recognized this time. On the other hand, if it is determined that the operation content is not connected as the next operation content, the processing in the figure is omitted, but the natural language processing unit 221 responds to the error by assuming that the response to the utterance content this time is impossible. You may execute the processing.

エージェント装置１００のエージェント機能部１３０は、今回のステップＳ１０８により受け付けた音声の発話内容に対して応答するための制御（応答制御）を実行する（ステップＳ１２２）。この際、エージェント機能部１３０（応答表示制御部の一例）は、自然言語処理部２２１が、ステップＳ１１２により認識した意味と、ステップＳ１２０にて補完した結果とに基づいて生成したコマンドを受信する、エージェント機能部１３０は、受信したコマンドに応じて車両Ｍの機器の制御を実行する。この際、エージェント機能部１３０は、ステップＳ１２０により生成されたコマンドに応じて、タッチパネルにおける画像の表示が今回の音声操作に応答した内容のものとなるように表示制御を実行する。
また、エージェント機能部１３０は、ステップＳ１２０により生成されたコマンドに応じた音声等の応答内容（対話内容）をエージェントサーバ２００の応答内容生成部２２４から受信し、受信した応答内容を出力する。 The agent function unit 130 of the agent device 100 executes control (response control) for responding to the utterance content of the voice received in step S108 this time (step S122). At this time, the agent function unit 130 (an example of the response display control unit) receives a command generated by the natural language processing unit 221 based on the meaning recognized in step S112 and the result complemented in step S120. The agent function unit 130 executes control of the device of the vehicle M in response to the received command. At this time, the agent function unit 130 executes display control in response to the command generated in step S120 so that the display of the image on the touch panel has the content corresponding to the current voice operation.
Further, the agent function unit 130 receives the response content (dialogue content) such as voice corresponding to the command generated in step S120 from the response content generation unit 224 of the agent server 200, and outputs the received response content.

また、エージェント機能部１３０は、操作文脈情報制御を実行する（ステップＳ１２４）。当該ステップＳ１２４での操作文脈情報制御は、既に保持されている操作文脈情報について、今回認識された発話内容により行われた音声操作の操作内容の履歴を追加するように更新するものとなる。
ステップＳ１２４の処理が終了されるとステップＳ１００に処理が戻される。 Further, the agent function unit 130 executes the operation context information control (step S124). The operation context information control in step S124 updates the already held operation context information so as to add a history of the operation contents of the voice operation performed by the utterance contents recognized this time.
When the process of step S124 is completed, the process is returned to step S100.

単独でサービス要求として成立するものであると認定された場合、エージェント機能部１３０は、今回のステップＳ１０８により受け付けた音声の発話内容に対する応答制御として、割り込みに処理による応答制御を実行する（ステップＳ１２６）。この場合は割り込み処理であるため、エージェント機能部１３０は、これまでにおけるタッチパネル操作に応じた操作文脈情報をクリアすることなく保持されている状態を維持するようにされる。これにより、ステップＳ１２６の処理の後において、乗員は、今回の割り込み処理に応じた音声操作の前と同じ画像が表示されたタッチパネルに対して、引き続き、以降の操作（手動操作でも音声操作でもよい）を再開することが可能になる。また、ステップＳ１２６の処理後における、タッチパネルに対する操作の再開にあたっては、ステップＳ１１６〜Ｓ１２２の処理が実行可能とされている。つまり、エージェント機能部１３０は、音声操作による発話内容が単独でサービス要求として成立しないものであった場合には、以前のタッチパネルに対する操作文脈を引き継いで適正に応答制御を実行することができる。 When it is determined that the service request is independently satisfied, the agent function unit 130 executes response control by processing an interrupt as response control for the voice utterance content received in step S108 this time (step S126). ). In this case, since it is an interrupt process, the agent function unit 130 is made to maintain the held state without clearing the operation context information corresponding to the touch panel operation so far. As a result, after the processing of step S126, the occupant continues to perform subsequent operations (manual operation or voice operation) on the touch panel on which the same image as before the voice operation corresponding to the interrupt processing this time is displayed. ) Can be restarted. Further, when resuming the operation on the touch panel after the process of step S126, the processes of steps S116 to S122 can be executed. That is, when the utterance content by the voice operation is not satisfied as a service request by itself, the agent function unit 130 can take over the operation context for the previous touch panel and appropriately execute the response control.

また、対話状態継続フラグがオフであった場合、今回のステップＳ１０８による音声の受け付けに応じた音声操作は、新たな１のサービスに応じた操作が音声操作によって開始されたことになる。そこで、エージェント機能部１３０は、今回の音声操作に対する応答制御を実行する（ステップＳ１２８）。エージェント機能部１３０は、当該ステップＳ１２８の応答制御にあたり、今回のステップＳ１１２により認識された意味に基づいて自然言語処理部２２１が生成したコマンドに応じた制御を実行する。この際、コマンドがタッチパネルに対する所定の操作に対応するものである場合、エージェント機能部１３０は、タッチパネルにて今回の音声操作に応答した内容の画像が表示されるように制御する。 Further, when the dialogue state continuation flag is off, the voice operation corresponding to the reception of the voice in step S108 this time means that the operation corresponding to the new service 1 is started by the voice operation. Therefore, the agent function unit 130 executes response control for the voice operation this time (step S128). In response control in step S128, the agent function unit 130 executes control according to a command generated by the natural language processing unit 221 based on the meaning recognized in step S112 this time. At this time, when the command corresponds to a predetermined operation on the touch panel, the agent function unit 130 controls the touch panel so that the image of the content corresponding to the current voice operation is displayed.

次に、エージェント機能部１３０は、今回のステップＳ１２８による応答制御がタッチパネルに対する操作に対して応答するものであったか否かを判定する（ステップＳ１３０）。タッチパネル操作に対して応答するものであった場合、エージェント機能部１３０は、対話状態継続フラグ制御として、対話状態継続フラグをオンとする（ステップＳ１３２）。
また、エージェント機能部１３０は、操作文脈情報制御として、今回のステップＳ１２８による応答制御に応じて自然言語処理部２２１により認識された操作内容が履歴として示される操作文脈情報を生成する（ステップＳ１３４）。エージェント機能部１３０は、生成された操作文脈情報を保持する。ステップＳ１３４の処理の後、或いはステップＳ１２８による応答制御がタッチパネルに対する操作に対して応答するものではないと判定された場合、ステップＳ１００に処理が戻される。 Next, the agent function unit 130 determines whether or not the response control in step S128 this time responds to the operation on the touch panel (step S130). When responding to the touch panel operation, the agent function unit 130 turns on the dialogue state continuation flag as the dialogue state continuation flag control (step S132).
Further, as the operation context information control, the agent function unit 130 generates operation context information in which the operation content recognized by the natural language processing unit 221 is shown as a history in response to the response control in step S128 this time (step S134). .. The agent function unit 130 holds the generated operation context information. After the processing in step S134, or when it is determined that the response control in step S128 does not respond to the operation on the touch panel, the processing is returned to step S100.

音声が受け付けられないことが判定された場合、音声操作とタッチパネルに対する手動操作とのいずれも行われなかったことになる。この場合、エージェント機能部１３０は、最後の操作から一定時間が経過したか否かについて判定する（ステップＳ１３６）。ここでの最後の操作は、手動操作と音声操作とのいずれかとなる。
最後の操作から一定時間が経過していない場合、ステップＳ１００に処理が戻される。 If it is determined that the voice cannot be accepted, it means that neither the voice operation nor the manual operation on the touch panel has been performed. In this case, the agent function unit 130 determines whether or not a certain time has elapsed since the last operation (step S136). The final operation here is either a manual operation or a voice operation.
If a certain time has not passed since the last operation, the process is returned to step S100.

最後の操作から一定時間が経過すると、エージェント機能部１３０は、対話状態継続フラグ制御として、対話状態継続フラグがオンの状態であった場合には、対話状態継続フラグをオフとする（ステップＳ１３８）。また、エージェント機能部１３０は、最後の操作から一定時間が経過したことに応じて、現在において操作文脈情報保持されていた場合には、当該操作文脈情報をクリアする（ステップＳ１４０）。ステップＳ１３８、Ｓ１４０の処理によって、或るサービスに対応する画像を表示しているタッチパネルに対して操作が行われることなく一定時間が経過した場合には、タイムアウトとなって、タッチパネルは、例えばサービスの開始に応じた操作を待機する状態となる。 When a certain time has elapsed from the last operation, the agent function unit 130 turns off the dialogue state continuation flag as the dialogue state continuation flag control when the dialogue state continuation flag is on (step S138). .. Further, the agent function unit 130 clears the operation context information when the operation context information is currently held according to the elapse of a certain time from the last operation (step S140). If a certain period of time elapses without performing an operation on the touch panel displaying the image corresponding to a certain service by the processing of steps S138 and S140, a timeout occurs and the touch panel is, for example, of the service. It will be in a state of waiting for the operation according to the start.

［本実施形態におけるタッチパネル操作についての具体例］
図５のシーケンス図を参照して、タッチパネルに対する乗員の操作手順に応答したエージェントシステム１の動作の一具体例について説明する。以下の説明においては、乗員が、ナビゲーション装置４０にＰＯＩ検索を実行させてガソリンスタンドを検索する場合を例に挙げる。同図では、操作手順と、操作手順に応答したエージェントシステム１の動作手順とが示される。
まず、乗員は、タッチパネルに対する手動操作によってＰＯＩ検索の開始を指示する（ステップＳ２００）。
エージェント装置１００において管理部１１０は、ステップＳ２００により行われた手動操作に応答して、名ナビゲーション装置４０にＰＯＩ検索を開始させる。ナビゲーション装置４０は、ＰＯＩ検索の開始にあたり、カテゴリ選択画面としての画像をタッチパネルに表示する（ステップＳ２０２）。当該ステップＳ２０２の動作は以下のようにして実現される。つまり、タッチパネルがＰＯＩ検索の開始を指示する手動操作を受け付けたことに応じて、管理部１１０は、図４のステップＳ１０２の処理を実行することで、ナビゲーション装置４０のＰＯＩ検索機能を起動させる。ＰＯＩ検索機能が起動されたナビゲーション装置４０は、タッチパネルにカテゴリ選択画面を表示する。 [Specific example of touch panel operation in this embodiment]
A specific example of the operation of the agent system 1 in response to the operation procedure of the occupant with respect to the touch panel will be described with reference to the sequence diagram of FIG. In the following description, an example will be given in which the occupant causes the navigation device 40 to perform a POI search to search for a gas station. In the figure, an operation procedure and an operation procedure of the agent system 1 in response to the operation procedure are shown.
First, the occupant instructs the start of the POI search by manually operating the touch panel (step S200).
In the agent device 100, the management unit 110 causes the name navigation device 40 to start the POI search in response to the manual operation performed in step S200. The navigation device 40 displays an image as a category selection screen on the touch panel at the start of the POI search (step S202). The operation of step S202 is realized as follows. That is, in response to the touch panel receiving the manual operation instructing the start of the POI search, the management unit 110 activates the POI search function of the navigation device 40 by executing the process of step S102 of FIG. The navigation device 40 in which the POI search function is activated displays a category selection screen on the touch panel.

エージェント機能部１３０は、ステップＳ２０２によりカテゴリ選択画面を表示する動作が行われたことに応じて、操作文脈情報を生成する（ステップＳ２０４）。当該ステップＳ２０４の動作は、図４におけるステップＳ１０６の処理によるものである。 The agent function unit 130 generates operation context information in response to the operation of displaying the category selection screen in step S202 (step S204). The operation of step S204 is based on the process of step S106 in FIG.

ステップＳ２０２により表示されたカテゴリ選択画面は、ＰＯＩ検索において検索可能なカテゴリの候補のうちから、検索対象とするカテゴリを選択する操作が行われる画面である。この場合の乗員が検索対象とするカテゴリはガソリンスタンドである。そこで、乗員は、タッチパネルに表示されたカテゴリ選択画面に対する手動操作によって、検索対象のカテゴリとしてガソリンスタンドを選択する操作を行った（ステップＳ２０６）。
ステップＳ２０６により行われた手動操作に応答して、管理部１１０は、図４のステップＳ１０２の処理を実行することで、ナビゲーション装置４０にガソリンスタンドの検索の実行を指示する。当該指示に応じて、ナビゲーション装置４０は、例えば現在地点を基準とする一定範囲内のガソリンスタンドのＰＯＩ検索を実行する（ステップＳ２０８）。ナビゲーション装置４０は、ガソリンスタンドについて検索した結果を示す検索結果提示画面をタッチパネルに表示する（ステップＳ２１０）。 The category selection screen displayed in step S202 is a screen on which an operation of selecting a category to be searched from among the searchable category candidates in the POI search is performed. The category searched by the occupants in this case is a gas station. Therefore, the occupant manually selected the gas station as the category to be searched by manually operating the category selection screen displayed on the touch panel (step S206).
In response to the manual operation performed in step S206, the management unit 110 instructs the navigation device 40 to execute the search for the gas station by executing the process in step S102 in FIG. In response to the instruction, the navigation device 40 executes a POI search of a gas station within a certain range based on, for example, the current position (step S208). The navigation device 40 displays a search result presentation screen showing the result of searching for the gas station on the touch panel (step S210).

エージェント機能部１３０は、ステップＳ２１０により検索結果提示画面を表示する動作が行われたことに応じて図４のステップＳ１０６の処理を実行することで、操作文脈情報を更新する（ステップＳ２１２）。 The agent function unit 130 updates the operation context information by executing the process of step S106 of FIG. 4 in response to the operation of displaying the search result presentation screen in step S210 (step S212).

同図においては、ステップＳ２１２による操作文脈情報の更新後の操作文脈情報Ｄ１の内容例が示されている。操作文脈情報Ｄ１は、ＰＯＩ検索機能としてのサービスに応じた操作が、これまでに、ＰＯＩ検索の起動を指示する操作、カテゴリとしてガソリンスタンドを選択する操作の順で行われたことを示す。 In the figure, an example of the contents of the operation context information D1 after the operation context information is updated in step S212 is shown. The operation context information D1 indicates that the operation according to the service as the POI search function has been performed in the order of the operation of instructing the activation of the POI search and the operation of selecting the gas station as the category.

ガソリンスタンドについて検索した結果を示す検索結果提示画面が表示された状態のもとで、乗員は、検索結果提示画面により提示されたガソリンスタンドのうちから、赤坂周辺のガソリンスタンドに絞り込み検索を行いたいと考えた。ここで、乗員は、これまでの手動操作ではなく、音声操作によって赤坂周辺のガソリンスタンドの絞り込み検索を行わせることとした。そこで、乗員は音声操作として「赤坂周辺」と発話した（ステップＳ２１４）。
「赤坂周辺」の発話内容は、図４のステップＳ１０８〜Ｓ１１６の処理によって、単独でサービス要求として成立しないものであると判定される。この場合、続けてステップＳ１１８〜Ｓ１２２の処理が実行される。
つまり、「赤坂周辺」との発話内容は、ＰＯＩ検索のサービスのもとで、エージェント機能部１３０が保持する操作文脈情報Ｄ１により示される操作手順に続く次の音声操作として扱われる。この結果、エージェント機能部１３０は、図４のステップＳ１２２による応答制御として、タッチパネルにて表示されている検索結果提示画面に対応させて、赤坂における所定位置を基準とするガソリンスタンドの絞り込み検索を指示する。つまり、この場合のエージェント機能部１３０は、これまで表示されていた検索結果提示画面を維持させたうえで、当該検索結果提示画面に対して絞り込み検索を指示する操作が行われた結果を生じさせる。
上記の応答制御に応じて、ナビゲーション装置４０は、絞り込み検索を実行する（ステップＳ２１６）。つまり、ナビゲーション装置４０は、ステップＳ２１０により表示させた検索結果提示画面において提示されたガソリンスタンドのうちから、赤坂の所定位置を基準として、絞り込み検索に対応して定められた一定地域範囲に含まれるガソリンスタンドを抽出する。
ナビゲーション装置４０は、ステップＳ２１６による絞り込み検索結果を提示した絞り込み検索結果提示画面タッチパネルに表示させる（ステップＳ２１８）。また、エージェントシステム１は、図４のステップＳ１２２による応答制御により、「赤坂周辺」の発話による音声操作に対する応答音声をスピーカ３０から出力させる（ステップＳ２２０）。
また、エージェント機能部１３０は、ステップＳ２１８により絞り込み検索結果提示画面を表示する動作が行われたことに応じて図４のステップＳ１２４の処理を実行することで、操作文脈情報を更新する（ステップＳ２２２）。 With the search result presentation screen showing the search results for gas stations displayed, the occupant wants to narrow down the search to the gas stations around Akasaka from the gas stations presented on the search result presentation screen. I thought. Here, the occupants decided to narrow down the search for gas stations around Akasaka by voice operation instead of the conventional manual operation. Therefore, the occupant spoke "around Akasaka" as a voice operation (step S214).
It is determined that the utterance content of "around Akasaka" is not independently satisfied as a service request by the processing of steps S108 to S116 of FIG. In this case, the processes of steps S118 to S122 are subsequently executed.
That is, the content of the utterance of "around Akasaka" is treated as the next voice operation following the operation procedure indicated by the operation context information D1 held by the agent function unit 130 under the POI search service. As a result, the agent function unit 130 instructs the narrow-down search of the gas station based on the predetermined position in Akasaka in correspondence with the search result presentation screen displayed on the touch panel as the response control in step S122 of FIG. To do. That is, the agent function unit 130 in this case maintains the search result presentation screen that has been displayed so far, and then produces a result in which an operation of instructing a narrowed search is performed on the search result presentation screen. ..
In response to the above response control, the navigation device 40 executes a refined search (step S216). That is, the navigation device 40 is included in a certain area range determined in response to the narrowed search based on the predetermined position of Akasaka from the gas stations presented on the search result presentation screen displayed in step S210. Extract the gas station.
The navigation device 40 displays the narrowed-down search result presented in step S216 on the narrowed-down search result presentation screen touch panel (step S218). Further, the agent system 1 outputs the response voice to the voice operation by the utterance of "around Akasaka" from the speaker 30 by the response control in step S122 of FIG. 4 (step S220).
Further, the agent function unit 130 updates the operation context information by executing the process of step S124 of FIG. 4 in response to the operation of displaying the narrowed-down search result presentation screen in step S218 (step S222). ).

例えば、従来においては、認識された発話内容の意味が、単独でサービス要求として成立するものでない場合には、例えばエラーとして処理されていた。このため、乗員が、ガソリンスタンドのカテゴリによるＰＯＩ検索を実行させた後、音声操作によって赤坂周辺のガソリンスタンドの絞り込み検索を実行させる場合には、例えば「赤坂周辺のガソリンスタンドを検索」といったように発話する必要がある。つまり、単独でサービス要求として成立する内容を発話する必要がある。この場合、乗員が発話すべき内容としては、語句が多いことから長くなってしまう。
これに対して、本実施形態においては、単独でサービス要求として成立しない発話内容であっても、これまでの操作文脈のもとで行われた音声操作として扱われる。これにより、乗員は、短い発話内容で音声操作を行うことが可能になる。 For example, in the past, when the meaning of the recognized utterance content was not independently established as a service request, it was treated as an error, for example. For this reason, when the occupant executes a POI search by gas station category and then executes a narrowing search for gas stations around Akasaka by voice operation, for example, "Search for gas stations around Akasaka". I need to speak. In other words, it is necessary to utter the content that is satisfied as a service request independently. In this case, the content that the occupant should speak is long because there are many words and phrases.
On the other hand, in the present embodiment, even if the utterance content is not satisfied as a service request by itself, it is treated as a voice operation performed under the operation context so far. As a result, the occupant can perform voice operations with a short utterance content.

なお、上記実施形態においては、音声操作に応じた発話内容の意味認識、応答内容の生成等のエージェント機能の一部をエージェントサーバ２００が実行するようにされている。しかしながら、本実施形態においては、エージェントサーバ２００が実行する機能も車両Ｍに設けられるエージェント装置１００が実行可能なように構成することで、図４に示した処理が車両Ｍにて完結するようにされてよい。 In the above embodiment, the agent server 200 executes a part of the agent functions such as recognition of the meaning of the utterance content and generation of the response content in response to the voice operation. However, in the present embodiment, the function executed by the agent server 200 is also configured to be executable by the agent device 100 provided in the vehicle M so that the process shown in FIG. 4 is completed in the vehicle M. May be done.

以上、本発明を実施するための形態について実施形態を用いて説明したが、本発明はこうした実施形態に何等限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々の変形及び置換を加えることができる。 Although the embodiments for carrying out the present invention have been described above using the embodiments, the present invention is not limited to these embodiments, and various modifications and substitutions are made without departing from the gist of the present invention. Can be added.

１…エージェントシステム、１０…マイク、２０…表示・操作装置、３０…スピーカ、４０…ナビゲーション装置、５０…車載通信装置、１００…エージェント装置、１１０…管理部、１１２…音響処理部、１１４…エージェントＷＵ判定部、１１６…通信制御部、１２０…出力制御部、１２２…表示制御部、１２４…音声制御部、１３０…エージェント機能部、１４０…車載通信部、１５０…記憶部、２００…エージェントサーバ、２１０…通信部、２２０…音声認識部、２２１…自然言語処理部、２２２…対話管理部、２２４…ネットワーク検索部、２２４…応答内容生成部、２５０…記憶部、３００…ウェブサーバ 1 ... Agent system, 10 ... Microphone, 20 ... Display / operation device, 30 ... Speaker, 40 ... Navigation device, 50 ... In-vehicle communication device, 100 ... Agent device, 110 ... Management unit, 112 ... Sound processing unit, 114 ... Agent WU determination unit, 116 ... communication control unit, 120 ... output control unit, 122 ... display control unit, 124 ... voice control unit, 130 ... agent function unit, 140 ... in-vehicle communication unit, 150 ... storage unit, 200 ... agent server, 210 ... Communication unit, 220 ... Voice recognition unit, 221 ... Natural language processing unit, 222 ... Dialogue management unit, 224 ... Network search unit, 224 ... Response content generation unit, 250 ... Storage unit, 300 ... Web server

Claims

A response display control unit that displays an image of the content in response to the operation on the display unit,
The utterance content interpretation department that interprets the content of the utterance by the user,
An utterance content determination unit that determines whether or not the utterance content interpreted by the utterance content interpretation unit is independently satisfied as a service request.
When the utterance content determination unit determines that the service cannot be established independently, the context of the operation according to the content of the image displayed on the display unit in response to the utterance. An agent system including an agent control unit that executes control for providing a service specified based on the content of the operation context information indicating the above and the content of the utterance.

The response display control unit
When a manual operation is performed as the operation, an image of the content corresponding to the manual operation is displayed, and when an operation by utterance is performed as the operation, an image of the content corresponding to the content of the utterance is displayed. The agent system according to claim 1.

The agent control unit
When the utterance content determination unit determines that the service request is independently established, the operation according to the content of the image displayed on the display unit corresponding to the utterance is performed. The agent system according to claim 1 or 2, wherein the operation indicating the context maintains the content of the context information and controls the content of the determined utterance to provide the required service.

The agent control unit
After maintaining the content of the operation context information and controlling so that the service required by the determined utterance content is provided, the utterance content interpreted by the utterance content interpretation unit is the utterance content. When the determination unit determines that the service cannot be established independently, an operation indicating the context of the operation according to the content of the image displayed on the display unit in response to the utterance. The agent system according to claim 3, which executes control for providing a service specified based on the content of context information and the content of the utterance.

The computer in the agent system
Display the image of the content in response to the operation on the display,
Interpret the content of the utterance by the user and
It is determined whether or not the interpreted content of the utterance is independently satisfied as a service request.
When it is determined that the content of the utterance is not independently established as a service, the context of the operation according to the content of the image displayed on the display unit corresponding to the time when the utterance is made is set. A control method of an agent system that executes control to provide a service specified based on the content of the operation context information shown and the content of the utterance.

On the computer
Display the image of the content in response to the operation on the display,
Have the user interpret the content of the utterance
It is made to judge whether or not the interpreted content of the utterance is independently satisfied as a service request.
When it is determined that the content of the utterance is not independently established as a service, the context of the operation according to the content of the image displayed on the display unit corresponding to the time when the utterance is made is set. A program that executes control to provide a service specified based on the content of the operation context information shown and the content of the utterance.