JP2020144285A

JP2020144285A - Agent system, information processing device, control method for mobile body mounted apparatus, and program

Info

Publication number: JP2020144285A
Application number: JP2019041994A
Authority: JP
Inventors: 佐和子古屋; Sawako Furuya
Original assignee: Honda Motor Co Ltd
Current assignee: Honda Motor Co Ltd
Priority date: 2019-03-07
Filing date: 2019-03-07
Publication date: 2020-09-10

Abstract

To provide an agent system, an information processing device, a control method for a mobile body mounted device, and a program which enable occupants to control in-vehicle devices with desired voice.SOLUTION: The mobile device control device includes a voice recognition unit (216A) which recognizes voices collected by a microphone which collects the voices in a passenger's vehicle interior, a meaning interpretation unit (218A) which interprets meanings of the voices, and a mobile body apparatus control unit (134) which controls an in-vehicle device based on a meaning interpreted by the meaning interpretation unit. The meaning interpretation unit recognizes that the meaning of the voice instructs control of the in-vehicle device, using an utterance command dictionary stored by the storage unit and associated together with an utterance command (command) and a content indicated by the utterance command, and when it is interpreted that the recognized voice recognized by the voice recognition unit includes an instruction/command to register a new utterance command, the new utterance command is registered in the utterance command dictionary together with the contents indicated by the new utterance command.SELECTED DRAWING: Figure 2

Description

本発明は、エージェントシステム、情報処理装置、移動体搭載機器制御方法、及びプログラムに関する。 The present invention relates to an agent system, an information processing device, a mobile body-mounted device control method, and a program.

人間と音声対話によって情報を提供するヒューマンマシンインターフェースの研究が進められている。これに関連し、ロボットのコミュニケーションの対象とする人物の状況に基づいて、当該人物に対する発話の可否や発話音量、発話口調を決定する技術や、語彙が登録された辞書を利用して、乗員が発話した音声を認識し、複数の車載機器を、認識した音声の内容に応じて制御する技術が知られている（例えば、特許文献１、２参照）。 Research on human-machine interfaces that provide information through voice dialogue with humans is underway. In connection with this, the occupants use a technique to determine whether or not to speak to the person, the volume of speech, and the tone of speech based on the situation of the person to be communicated by the robot, and a dictionary in which vocabulary is registered. A technique of recognizing an uttered voice and controlling a plurality of in-vehicle devices according to the content of the recognized voice is known (see, for example, Patent Documents 1 and 2).

特許第４９７６９０３号公報Japanese Patent No. 4976903 特開２００７−２８６１３６号公報Japanese Unexamined Patent Publication No. 2007-286136

ここで、乗員は、よく利用する制御対象機器に対する指示は、簡単な言葉や短い言葉によって行いたい場合がある、しかしながら、従来の技術では、乗員が所望する音声によって車載機器を制御することは困難であった。 Here, the occupant may want to give instructions to the frequently used controlled device by simple words or short words, however, with the conventional technology, it is difficult to control the in-vehicle device by the voice desired by the occupant. Met.

本発明の態様は、このような事情を考慮してなされたものであり、乗員が所望する音声によって車載機器を制御することができるようにするエージェントシステム、情報処理装置、移動体搭載機器制御方法、及びプログラムを提供することを目的の一つとする。 Aspects of the present invention have been made in consideration of such circumstances, and are an agent system, an information processing device, and a mobile body-mounted device control method that enable an in-vehicle device to be controlled by a voice desired by an occupant. , And one of the purposes is to provide a program.

本発明に係るエージェントシステム、情報処理装置、移動体搭載機器制御方法、及びプログラムは、以下の構成を採用した。
（１）：この発明の一態様に係るエージェントシステムは、乗員が搭乗する移動体に搭載された移動体搭載機器と、前記移動体搭載機器を制御する命令であり、マイクにより収音される前記乗員の音声である発話コマンドを含む音声を認識する音声認識部と、前記音声認識部により認識された前記音声の意味を解釈する意味解釈部と、を備え、前記意味解釈部は、前記音声の意味が、新たな発話コマンドを登録する指示を含むと解釈された場合には、前記新たな発話コマンドを記憶部に登録するものである。 The agent system, information processing device, mobile body-mounted device control method, and program according to the present invention have the following configurations.
(1): The agent system according to one aspect of the present invention is a command for controlling a mobile body-mounted device mounted on a moving body on which an occupant is on board and the mobile body-mounted device, and the sound is picked up by a microphone. The voice recognition unit includes a voice recognition unit that recognizes a voice including an utterance command that is a voice of an occupant, and a meaning interpretation unit that interprets the meaning of the voice recognized by the voice recognition unit. When the meaning is interpreted to include an instruction to register a new utterance command, the new utterance command is registered in the storage unit.

（２）：上記（１）の態様において、前記意味解釈部は、前記音声の意味が、登録されている前記発話コマンドを削除する指示を含むと解釈した場合、当該発話コマンドを記憶部から削除するものである。 (2): In the aspect of (1) above, when the meaning interpreting unit interprets that the meaning of the voice includes an instruction to delete the registered utterance command, the utterance command is deleted from the storage unit. To do.

（３）：上記（１）又は（２）の態様において、前記発話コマンドを登録した前記記憶部を備え、前記記憶部には、前記発話コマンドと該発話コマンドが示す制御の内容とが互いに対応付けられて登録された発話コマンド辞書が記憶されているものである。 (3): In the embodiment (1) or (2), the storage unit in which the utterance command is registered is provided, and the utterance command and the content of control indicated by the utterance command correspond to each other in the storage unit. The utterance command dictionary attached and registered is stored.

（４）：上記（１）〜（３）のいずれかの態様において、エージェントシステムは、前記発話コマンド辞書を用いて、前記音声認識部により認識された音声から、前記意味解釈部が解釈した音声の意味に基づいて前記移動体搭載機器を制御する搭載機器制御部を更に備えるものである。 (4): In any of the above aspects (1) to (3), the agent system uses the utterance command dictionary to interpret the voice recognized by the voice recognition unit by the meaning interpretation unit. It is further provided with an on-board device control unit that controls the mobile on-board device based on the meaning of.

（５）：上記（３）の態様において、前記意味解釈部は、前記音声認識部により認識された音声を、新たな発話コマンドを登録する指示を含むと解釈した場合であって、前記移動体搭載機器の基本的な制御命令である基本音声コマンドと前記基本音声コマンドに対する制御の内容とが互いに対応付けられた基本発話コマンド辞書に基づいて、前記新たな発話コマンドが、前記基本発話コマンド辞書に含まれていない場合、前記発話コマンド辞書に登録するものである。 (5): In the aspect of (3) above, the semantic interpretation unit interprets the voice recognized by the voice recognition unit as including an instruction to register a new utterance command, and the moving body. The new utterance command is added to the basic utterance command dictionary based on the basic utterance command dictionary in which the basic voice command, which is the basic control command of the on-board device, and the content of control for the basic voice command are associated with each other. If it is not included, it is registered in the utterance command dictionary.

（６）：上記（３）又は（５）の態様において、前記意味解釈部は、前記音声認識部により認識された音声を、前記乗員が直前に行った前記移動体搭載機器に対する制御の指示に係る新たな発話コマンドを登録する指示を含むと解釈した場合、前記新たな発話コマンドを、前記移動体の乗員が直前に行った前記移動体搭載機器に対する制御の内容と共に前記発話コマンド辞書に登録するものである。 (6): In the embodiment (3) or (5) above, the semantic interpretation unit uses the voice recognized by the voice recognition unit as an instruction for control of the mobile-mounted device immediately before by the occupant. When it is interpreted that the instruction to register the new utterance command is included, the new utterance command is registered in the utterance command dictionary together with the content of the control performed by the occupant of the mobile body on the mobile body-mounted device immediately before. It is a thing.

（７）：上記（３）〜（６）のいずれかの態様において、前記移動体の乗員のうち、前記マイクにより収音された前記発話コマンドが含まれる音声の発話者の位置を特定する発話者位置特定部と、前記発話者位置特定部により特定された前記発話者の位置情報と、前記移動体に搭載される移動体搭載機器に対する操作権限を有する乗員の位置を示す操作権限位置情報と、前記意味解釈部により解釈された前記発話コマンドの意味に含まれる移動体搭載機器情報とに基づいて、前記発話コマンドが含まれる音声の当該発話者が、当該移動体搭載機器の操作権限を有するか否かを判定する操作権限判定部とを更に備え、前記意味解釈部は、前記音声認識部により認識された音声を、新たな発話コマンドを登録する指示を含むと解釈した場合において、前記操作権限判定部により前記発話コマンドが含まれる音声の当該発話者が、当該移動体搭載機器の操作権限を有すると判定された場合に、前記新たな発話コマンドを、前記新たな発話コマンドが示す制御の内容と共に前記発話コマンド辞書に登録するものである。 (7): In any of the above aspects (3) to (6), an utterance that specifies the position of the speaker of the voice including the utterance command picked up by the microphone among the occupants of the moving body. The person position specifying unit, the position information of the speaker specified by the speaker position specifying unit, and the operation authority position information indicating the position of an occupant who has the operation authority for the moving body mounted device mounted on the moving body. , The speaker of the voice including the utterance command has the operation authority of the mobile-mounted device based on the mobile-mounted device information included in the meaning of the utterance command interpreted by the utterance interpreting unit. The operation authority determination unit for determining whether or not the utterance is further provided, and the semantic interpretation unit interprets the voice recognized by the voice recognition unit as including an instruction to register a new utterance command. When the authority determination unit determines that the speaker of the voice including the utterance command has the operation authority of the mobile body-mounted device, the new utterance command is controlled by the new utterance command. It is registered in the utterance command dictionary together with the contents.

（８）：上記（４）の態様において、エージェントシステムは、前記移動体の乗員のうち、前記マイクにより収音された前記発話コマンドが含まれる音声の発話者の位置を特定する発話者位置特定部と、前記発話者位置特定部により特定された前記発話者の位置情報と、前記移動体に搭載される移動体搭載機器に対する操作権限を有する乗員の位置を示す操作権限位置情報と、前記意味解釈部により解釈された前記発話コマンドの意味に含まれる移動体搭載機器情報とに基づいて、前記発話コマンドが含まれる音声の当該発話者が、当該移動体搭載機器の操作権限を有するか否かを判定する操作権限判定部とを更に備え、前記搭載機器制御部は、前記意味解釈部により前記音声の意味が、前記移動体搭載機器の制御を指示していることを認識され、且つ、前記操作権限判定部により前記発話コマンドが含まれる音声の当該発話者が、当該移動体搭載機器の操作権限を有すると判定された場合に、前記移動体搭載機器を制御するものである。 (8): In the aspect of (4) above, the agent system identifies the position of the speaker of the voice including the utterance command picked up by the microphone among the occupants of the moving body. The unit, the position information of the speaker specified by the speaker position specifying unit, the operation authority position information indicating the position of an occupant having the operation authority for the mobile body-mounted device mounted on the moving body, and the meaning thereof. Whether or not the speaker of the voice including the utterance command has the operation authority of the mobile-mounted device based on the mobile-mounted device information included in the meaning of the utterance command interpreted by the interpretation unit. The on-board device control unit is further provided with an operation authority determination unit for determining the above, and the on-board device control unit recognizes that the meaning of the utterance indicates control of the mobile on-board device by the meaning interpretation unit. When the operation authority determination unit determines that the speaker of the voice including the utterance command has the operation authority of the mobile body-mounted device, the operation authority-mounted device is controlled.

（９）：この発明の一態様に係る情報処理装置は、移動体に搭載される移動体搭載機器を制御する命令であり、前記移動体に搭乗している乗員の音声である発話コマンドを含む音声を取得する取得部と、前記発話コマンドと該発話コマンドが示す制御されるべき移動体搭載機器の情報が含まれる制御の内容とが互いに対応付けられた発話コマンド辞書を記憶する記憶部と、前記音声を認識する音声認識部と、前記音声認識部により認識された音声の意味を解釈する意味解釈部と、前記意味解釈部により解釈された音声の意味内容に対応する情報を生成する生成部と、を備える。 (9): The information processing device according to one aspect of the present invention is a command for controlling a mobile body-mounted device mounted on the moving body, and includes an utterance command which is a voice of an occupant on the moving body. An acquisition unit that acquires voice, a storage unit that stores an utterance command dictionary in which the utterance command and the content of control including information on a mobile device mounted to be controlled indicated by the utterance command are associated with each other, and a storage unit. A voice recognition unit that recognizes the voice, a meaning interpretation unit that interprets the meaning of the voice recognized by the voice recognition unit, and a generation unit that generates information corresponding to the meaning content of the voice interpreted by the meaning interpretation unit. And.

（１０）：上記（９）の態様において、情報処理装置は、前記生成部により生成された音声の意味内容に対応する前記情報に基づいて、前記移動体搭載機器を制御する搭載機器制御部を更に備える。 (10): In the aspect of (9) above, the information processing apparatus controls the on-board device control unit that controls the mobile on-board device based on the information corresponding to the meaning and content of the voice generated by the generation unit. Further prepare.

（１１）：この発明の一態様に係る移動体搭載機器制御方法は、発話コマンドと該発話コマンドが示す制御されるべき移動体搭載機器の情報が含まれる制御の内容とが互いに対応付けられた発話コマンド辞書を記憶する記憶部を備えるシステムにおける単一又は複数のコンピュータが、移動体に搭乗している乗員が発話した発話コマンドを含む音声を認識するステップと、認識された音声の意味を解釈するステップと、前記発話コマンド辞書を参照して、制御されるべき移動体搭載機器の情報及び制御の内容を取得するステップと、解釈された音声の意味に基づいて、制御されるべき移動体搭載機器を制御するステップ、認識された音声中に、新たな発話コマンドを登録する指示が含まれると解釈した場合、前記新たな発話コマンドを、前記新たな発話コマンドが示す制御の内容と共に前記発話コマンド辞書に登録するステップと、を有する。 (11): In the mobile body-mounted device control method according to one aspect of the present invention, the utterance command and the content of the control including the information of the mobile-mounted device to be controlled indicated by the utterance command are associated with each other. A single or multiple computers in a system with a storage unit that stores an utterance command dictionary interpret the steps of recognizing a voice containing an utterance command spoken by an occupant on a moving object and the meaning of the recognized voice. Steps to be performed, steps to acquire information on the mobile device mounted device to be controlled and control contents by referring to the utterance command dictionary, and mobile mounted to be controlled based on the meaning of the interpreted voice. When it is interpreted that the step of controlling the device and the recognized voice include an instruction to register a new utterance command, the new utterance command is combined with the control content indicated by the new utterance command. It has a step of registering in a dictionary.

（１２）：上記（１１）の態様において、移動体搭載機器制御方法は、前記音声の意味を解釈するステップの後、前記音声の意味に対して、前記発話コマンド辞書に、該音声の意味と対応する制御の内容がある場合には、前記音声の意味を解釈するステップにより生成された音声の意味の解釈内容を、前記発話コマンド辞書を参照して、標準的な文字情報の制御の内容に置き換えるステップを更に有する。 (12): In the aspect of (11) above, the mobile body-mounted device control method displays the meaning of the voice in the utterance command dictionary with respect to the meaning of the voice after the step of interpreting the meaning of the voice. If there is a corresponding control content, the interpretation content of the voice meaning generated by the step of interpreting the voice meaning is added to the standard character information control content by referring to the utterance command dictionary. It has additional steps to replace.

（１３）：この発明の一態様に係るプログラムは、発話コマンドと該発話コマンドが示す制御されるべき移動体搭載機器の情報が含まれる制御の内容とが互いに対応付けられた発話コマンド辞書を記憶する記憶部を備えるシステムにおける単一又は複数のコンピュータにインストールされ、移動体に搭乗している乗員が発話した発話コマンドを含む音声を認識する処理と、認識された音声の意味を解釈する処理と、前記発話コマンド辞書を参照して、制御されるべき前記移動体搭載機器の情報及び制御の内容を取得する処理と、解釈された音声の意味に基づいて、制御されるべき前記移動体搭載機器を制御する処理と、認識された音声中に、新たな発話コマンドを登録する指示が含まれると解釈した場合、前記新たな発話コマンドを、前記新たな発話コマンドが示す制御の内容と共に前記発話コマンド辞書に登録する処理と、を含む前記コンピュータに実施させるものである。 (13): The program according to one aspect of the present invention stores an utterance command dictionary in which an utterance command and a control content including information on a mobile device to be controlled indicated by the utterance command are associated with each other. A process of recognizing a voice including an utterance command uttered by an occupant on a moving object and a process of interpreting the meaning of the recognized voice, which is installed in one or more computers in a system having a storage unit. , The process of acquiring the information and the content of control of the mobile-mounted device to be controlled by referring to the utterance command dictionary, and the mobile-mounted device to be controlled based on the meaning of the interpreted voice. When it is interpreted that the process of controlling the above and the recognized voice include an instruction to register a new utterance command, the new utterance command is combined with the control content indicated by the new utterance command. It is to be performed by the computer including the process of registering in the dictionary.

（１）〜（１３）の態様によれば、乗員が所望する音声によって車載機器を制御することができるようにすることができる。 According to the aspects (1) to (13), it is possible to enable the vehicle-mounted device to be controlled by the voice desired by the occupant.

第１実施形態に係る車載機器制御装置を利用したエージェントシステム１の構成の一例を示す図である。It is a figure which shows an example of the configuration of the agent system 1 using the vehicle-mounted device control device which concerns on 1st Embodiment. 第１実施形態に係るエージェント装置１００の構成の一例を示す図である。It is a figure which shows an example of the structure of the agent apparatus 100 which concerns on 1st Embodiment. 運転席から見た車室内の一例を示す図である。It is a figure which shows an example of the vehicle interior seen from the driver's seat. 車両Ｍを上から見た車室内の一例を示す図である。It is a figure which shows an example of the vehicle interior which looked at the vehicle M from above. 操作権限位置情報１５４の内容の一例を示す図である。It is a figure which shows an example of the content of operation authority position information 154. 第１実施形態に係るサーバ装置２００の構成の一例を示す図である。It is a figure which shows an example of the structure of the server apparatus 200 which concerns on 1st Embodiment. 基本発話コマンド辞書２３２の内容の一例を示す図である。It is a figure which shows an example of the contents of the basic utterance command dictionary 232. ユーザ発話コマンド辞書２３４の内容の一例を示す図である。It is a figure which shows an example of the contents of the user utterance command dictionary 234. 第１実施形態に係るエージェント装置１００の一連の処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a series of processing of the agent apparatus 100 which concerns on 1st Embodiment. 第１実施形態に係るサーバ装置２００の一例の処理の流れを示すフローチャートである。It is a flowchart which shows the process flow of the example of the server apparatus 200 which concerns on 1st Embodiment. 第２実施形態に係るエージェント装置１００Ａの一例を示す図である。It is a figure which shows an example of the agent apparatus 100A which concerns on 2nd Embodiment. 第２実施形態に係るエージェント装置１００Ａの一連の処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a series of processing of the agent apparatus 100A which concerns on 2nd Embodiment. 第２実施形態に係るエージェント装置１００Ａの一連の処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a series of processing of the agent apparatus 100A which concerns on 2nd Embodiment.

以下、図面を参照し、本発明のエージェントシステム、情報処理装置、移動体搭載機器制御方法、及びプログラムの実施形態について説明する。 Hereinafter, embodiments of the agent system, the information processing device, the mobile body-mounted device control method, and the program of the present invention will be described with reference to the drawings.

＜実施形態＞
［システム構成］
図１は、第１実施形態に係る車載機器制御装置を利用したエージェントシステム１の構成の一例を示す図である。第１実施形態に係るエージェントシステム１は、例えば、車両（以下、車両Ｍ）に搭載されるエージェント装置１００と、サーバ装置２００とを備える。車両Ｍは、例えば、二輪や三輪、四輪等の車両である。これらの車両の駆動源は、ディーゼルエンジンやガソリンエンジン等の内燃機関、電動機、或いはこれらの組み合わせであってよい。電動機は、内燃機関に連結された発電機による発電電力、或いは二次電池や燃料電池の放電電力を使用して動作する。 <Embodiment>
[System configuration]
FIG. 1 is a diagram showing an example of a configuration of an agent system 1 using the in-vehicle device control device according to the first embodiment. The agent system 1 according to the first embodiment includes, for example, an agent device 100 mounted on a vehicle (hereinafter, vehicle M) and a server device 200. The vehicle M is, for example, a vehicle such as two wheels, three wheels, or four wheels. The drive source of these vehicles may be an internal combustion engine such as a diesel engine or a gasoline engine, an electric motor, or a combination thereof. The electric motor operates by using the electric power generated by the generator connected to the internal combustion engine or the electric power generated by the secondary battery or the fuel cell.

エージェント装置１００とサーバ装置２００とは、ネットワークＮＷを介して通信可能に接続される。ネットワークＮＷは、ＬＡＮ（Local Area Network）やＷＡＮ（Wide Area Network）等が含まれる。ネットワークＮＷには、例えば、Ｗｉ−ＦｉやＢｌｕｅｔｏｏｔｈ（登録商標、以下省略）等無線通信を利用したネットワークが含まれてよい。エージェントシステム１は、複数のエージェント装置１００および複数のサーバ装置２００により構成されてもよい。 The agent device 100 and the server device 200 are communicably connected via the network NW. The network NW includes a LAN (Local Area Network), a WAN (Wide Area Network), and the like. The network NW may include, for example, a network using wireless communication such as Wi-Fi or Bluetooth (registered trademark, hereinafter omitted). The agent system 1 may be composed of a plurality of agent devices 100 and a plurality of server devices 200.

エージェント装置１００は、エージェント機能を用いて車両Ｍの乗員からの音声を取得し、取得した音声をサーバ装置２００に送信する。また、エージェント装置１００は、サーバ装置から得られるデータ（例えば、エージェント設定データ）等に基づいて、乗員と対話したり、画像や映像等の情報を提供したり、車載機器ＶＥや他の装置を制御したりする。車両Ｍには、例えば、複数の車載機器ＶＥが搭載される。車載機器ＶＥは、例えば、自動運転や高度運転支援（例えば、ＡＣＣ（Adaptive Cruise Control）に係る機器、ＶＳＡ（Vehicle Stability Assist）等）に係る機器、エアコン、パワーウィンドウ、オーディオ、カーナビゲーション等である。 The agent device 100 acquires a voice from the occupant of the vehicle M by using the agent function, and transmits the acquired voice to the server device 200. Further, the agent device 100 interacts with the occupants, provides information such as images and videos, and uses the in-vehicle device VE and other devices based on the data (for example, agent setting data) obtained from the server device. To control. For example, a plurality of in-vehicle devices VE are mounted on the vehicle M. The in-vehicle device VE includes, for example, devices related to automatic driving and advanced driving support (for example, devices related to ACC (Adaptive Cruise Control), VSA (Vehicle Stability Assist), etc.), air conditioners, power windows, audio, car navigation, and the like. ..

サーバ装置２００は、車両Ｍに搭載されたエージェント装置１００と通信し、エージェント装置１００から各種データを取得する。サーバ装置２００は、取得したデータに基づいて、音声等による問い合わせに関するエージェント設定データを生成し、生成したエージェント設定データをエージェント装置１００に提供する。第１実施形態に係るサーバ装置２００の機能は、エージェント機能に含まれる。また、サーバ装置２００の機能は、エージェント装置１００におけるエージェント機能を、より高精度な機能に更新する。 The server device 200 communicates with the agent device 100 mounted on the vehicle M and acquires various data from the agent device 100. The server device 200 generates agent setting data related to inquiries by voice or the like based on the acquired data, and provides the generated agent setting data to the agent device 100. The function of the server device 200 according to the first embodiment is included in the agent function. Further, the function of the server device 200 updates the agent function in the agent device 100 to a more accurate function.

［エージェント装置の構成］
図２は、第１実施形態に係るエージェント装置１００の構成の一例を示す図である。第１実施形態に係るエージェント装置１００は、例えば、エージェント側通信部１０２と、マイク（マイクロフォン）１０６と、スピーカ１０８と、表示部１１０と、エージェント側制御部１２０と、エージェント側記憶部１５０とを備える。これらの装置や機器は、ＣＡＮ（Controller Area Network）通信線等の多重通信線やシリアル通信線、無線通信網等によって互いに接続されてよい。なお、図２に示すエージェント装置１００の構成はあくまでも一例であり、構成の一部が省略されてもよいし、更に別の構成が追加されてもよい。 [Agent device configuration]
FIG. 2 is a diagram showing an example of the configuration of the agent device 100 according to the first embodiment. The agent device 100 according to the first embodiment includes, for example, an agent-side communication unit 102, a microphone (microphone) 106, a speaker 108, a display unit 110, an agent-side control unit 120, and an agent-side storage unit 150. Be prepared. These devices and devices may be connected to each other by a multiplex communication line such as a CAN (Controller Area Network) communication line, a serial communication line, a wireless communication network, or the like. The configuration of the agent device 100 shown in FIG. 2 is merely an example, and a part of the configuration may be omitted or another configuration may be added.

エージェント側通信部１０２は、ＮＩＣ（Network Interface controller）等の通信インターフェースを含む。エージェント側通信部１０２は、ネットワークＮＷを介してサーバ装置２００等と通信する。 The agent-side communication unit 102 includes a communication interface such as a NIC (Network Interface controller). The agent-side communication unit 102 communicates with the server device 200 and the like via the network NW.

マイク１０６は、車室内の音声を電気信号化し収音する音声入力装置である。マイク１０６は、収音した音声のデータ（以下、音声データ）をエージェント側制御部１２０に出力する。例えば、マイク１０６は、乗員が車室内のシートに着座したときの前方付近に設置される。例えば、マイク１０６は、マットランプ、ステアリングホイール、インストルメントパネル、またはシートの付近に設置される。マイク１０６は、車室内に複数設置されてよい。 The microphone 106 is a voice input device that converts the sound in the vehicle interior into an electric signal and collects the sound. The microphone 106 outputs the collected voice data (hereinafter referred to as voice data) to the agent side control unit 120. For example, the microphone 106 is installed near the front when the occupant sits on the seat in the vehicle interior. For example, the microphone 106 is installed near a mat lamp, steering wheel, instrument panel, or seat. A plurality of microphones 106 may be installed in the vehicle interior.

スピーカ１０８は、例えば、車室内のシート付近または表示部１１０付近に設置される。スピーカ１０８は、エージェント側制御部１２０により出力される情報に基づいて音声を出力する。 The speaker 108 is installed, for example, near the seat in the vehicle interior or near the display unit 110. The speaker 108 outputs sound based on the information output by the agent-side control unit 120.

表示部１１０は、ＬＣＤ（Liquid Crystal Display）や有機ＥＬ（Electroluminescence）ディスプレイ等の表示装置を含む。表示部１１０は、エージェント側制御部１２０により出力される情報に基づいて画像を表示する。 The display unit 110 includes a display device such as an LCD (Liquid Crystal Display) or an organic EL (Electroluminescence) display. The display unit 110 displays an image based on the information output by the agent-side control unit 120.

図３は、運転席から見た車室内の一例を示す図である。図示の例の車室内には、マイク１０６Ａ〜１０６Ｃと、スピーカ１０８Ａ〜１０８Ｃと、表示部１１０Ａ〜１１０Ｃとが設置される。マイク１０６Ａは、例えば、ステアリングホイールに設けられ、主に運転者が発話した音声を収音する。マイク１０６Ｂは、例えば、助手席正面のインストルメントパネル（ダッシュボードまたはガーニッシュ）ＩＰに設けられ、主に助手席の乗員が発話した音声を収音する。マイク１０６Ｃは、例えば、インストルメントパネルの中央（運転席と助手席との間）付近に設置される。 FIG. 3 is a diagram showing an example of the vehicle interior as seen from the driver's seat. Microphones 106A to 106C, speakers 108A to 108C, and display units 110A to 110C are installed in the vehicle interior of the illustrated example. The microphone 106A is provided on the steering wheel, for example, and mainly collects the voice spoken by the driver. The microphone 106B is provided, for example, on the instrument panel (dashboard or garnish) IP in front of the passenger seat, and mainly collects the voice spoken by the passenger in the passenger seat. The microphone 106C is installed, for example, near the center of the instrument panel (between the driver's seat and the passenger's seat).

スピーカ１０８Ａは、例えば、運転席側のドアの下部に設置され、スピーカ１０８Ｂは、例えば、助手席側のドアの下部に設置され、スピーカ１０８Ｃは、例えば、表示部１１０Ｃの付近、つまり、インストルメントパネルＩＰの中央付近に設置される。 The speaker 108A is installed, for example, below the door on the driver's side, the speaker 108B is installed, for example, below the door on the passenger side, and the speaker 108C is installed near, for example, the display 110C, that is, the instrument. It is installed near the center of the panel IP.

表示部１１０Ａは、例えば運転者が車外を視認する際の視線の先に虚像を表示させるＨＵＤ（Head-Up Display）装置である。ＨＵＤ装置は、例えば、車両Ｍのフロントウインドシールド、或いはコンバイナーと呼ばれる光の透過性を有する透明な部材に光を投光することで、乗員に虚像を視認させる装置である。乗員は、主に運転者であるが、運転者以外の乗員であってもよい。 The display unit 110A is a HUD (Head-Up Display) device that displays a virtual image at the tip of the line of sight when the driver visually recognizes the outside of the vehicle, for example. The HUD device is, for example, a device that allows an occupant to visually recognize a virtual image by projecting light onto a front windshield of a vehicle M or a transparent member called a combiner having light transmission. The occupant is mainly a driver, but may be a occupant other than the driver.

表示部１１０Ｂは、運転席（ステアリングホイールに最も近い座席）の正面付近のインストルメントパネルＩＰに設けられ、乗員がステアリングホイールの間隙から、或いはステアリングホイール越しに視認可能な位置に設置される。表示部１１０Ｂは、例えば、ＬＣＤや有機ＥＬ表示装置等である。表示部１１０Ｂには、例えば、車両Ｍの速度、エンジン回転数、燃料残量、ラジエータ水温、走行距離、その他の情報の画像が表示される。 The display unit 110B is provided on the instrument panel IP near the front of the driver's seat (the seat closest to the steering wheel), and is installed at a position where the occupant can see through the gap between the steering wheels or through the steering wheel. The display unit 110B is, for example, an LCD, an organic EL display device, or the like. On the display unit 110B, for example, an image of the speed of the vehicle M, the engine speed, the remaining fuel amount, the radiator water temperature, the mileage, and other information is displayed.

表示部１１０Ｃは、インストルメントパネルＩＰの中央付近に設置される。表示部１１０Ｃは、例えば、表示部１１０Ｂと同様に、ＬＣＤや有機ＥＬ表示装置等である。表示部１１０Ｃは、テレビ番組や映画等のコンテンツを表示する。 The display unit 110C is installed near the center of the instrument panel IP. The display unit 110C is, for example, an LCD, an organic EL display device, or the like, like the display unit 110B. The display unit 110C displays contents such as TV programs and movies.

なお、車両Ｍには、更に、後部座席付近にマイクとスピーカが設けられてよい。図４は、車両Ｍを上から見た車室内の一例を示す図である。車室内には、図３で例示したマイクスピーカに加えて、更に、マイク１０６Ｄ、１０６Ｅと、スピーカ１０８Ｄ、１０８Ｅとが設置されてよい。 The vehicle M may be further provided with a microphone and a speaker near the rear seats. FIG. 4 is a diagram showing an example of the vehicle interior when the vehicle M is viewed from above. In addition to the microphone speakers illustrated in FIG. 3, microphones 106D and 106E and speakers 108D and 108E may be further installed in the vehicle interior.

マイク１０６Ｄは、例えば、助手席ＳＴ２の後方に設置された左後部座席ＳＴ３の付近（例えば、助手席ＳＴ２の後面）に設けられ、主に、左後部座席ＳＴ３に着座する乗員が発話した音声を収音する。マイク１０６Ｅは、例えば、運転席ＳＴ１の後方に設置された右後部座席ＳＴ４の付近（例えば、運転席ＳＴ１の後面）に設けられ、主に、右後部座席ＳＴ４に着座する乗員が発話した音声を収音する。 The microphone 106D is provided, for example, in the vicinity of the left rear seat ST3 installed behind the passenger seat ST2 (for example, the rear surface of the passenger seat ST2), and mainly transmits the voice spoken by the occupant seated in the left rear seat ST3. Pick up sound. The microphone 106E is provided, for example, in the vicinity of the right rear seat ST4 installed behind the driver's seat ST1 (for example, the rear surface of the driver's seat ST1), and mainly transmits the voice spoken by the occupant seated in the right rear seat ST4. Pick up sound.

スピーカ１０８Ｄは、例えば、左後部座席ＳＴ３側のドアの下部に設置され、スピーカ１０８Ｅは、例えば、右後部座席ＳＴ４側のドアの下部に設置される。以降の説明において、運転席ＳＴ１、助手席ＳＴ２、左後部座席ＳＴ３、及び右後部座席ＳＴ４を互いに区別しない場合には、単に座席ＳＴと記載する。 The speaker 108D is installed, for example, below the door on the left rear seat ST3 side, and the speaker 108E is installed, for example, below the door on the right rear seat ST4 side. In the following description, when the driver's seat ST1, the passenger's seat ST2, the left rear seat ST3, and the right rear seat ST4 are not distinguished from each other, they are simply referred to as seat ST.

なお、図１に例示した車両Ｍは、図３または図４に例示するように、乗員である運転手が操作可能なステアリングホイールを備える車両であるものとして説明したがこれに限られない。例えば、車両Ｍは、ルーフがない、すなわち車室がない（またはその明確な区分けがない）車両であってもよい。また、図３または図４の例では、車両Ｍを運転操作する運転手が座る運転席と、その他の運転操作をしない乗員が座る助手席や後部座席とが一つの室内にあるものとして説明しているがこれに限られない。また、図３または図４の例では、車両Ｍが、ステアリングホイールを備える車両であるものとして説明しているがこれに限られない。例えば、車両Ｍは、ステアリングホイールのような運転操作機器が設けられていない自動運転車両であってもよい。自動運転車両とは、例えば、乗員の操作に依らずに車両の操舵または加減速のうち一方または双方を制御して運転制御を実行することである。 The vehicle M illustrated in FIG. 1 has been described as being a vehicle provided with a steering wheel that can be operated by a driver who is an occupant, as illustrated in FIG. 3 or 4, but is not limited thereto. For example, vehicle M may be a vehicle without a roof, i.e., without a cabin (or without a clear division thereof). Further, in the example of FIG. 3 or 4, it is assumed that the driver's seat in which the driver who drives the vehicle M sits and the passenger's seat and the rear seat in which the occupant who does not drive the vehicle M sits are in one room. However, it is not limited to this. Further, in the example of FIG. 3 or 4, the vehicle M is described as being a vehicle provided with a steering wheel, but the present invention is not limited to this. For example, the vehicle M may be an autonomous driving vehicle that is not provided with a driving operation device such as a steering wheel. The autonomous driving vehicle is, for example, to execute driving control by controlling one or both of steering and acceleration / deceleration of the vehicle without depending on the operation of an occupant.

図２の説明に戻り、エージェント側制御部１２０は、例えば、取得部１２２と、音声合成部１２４と、出力制御部１２６と、通信制御部１２８と、発話者位置特定部１３０と、操作権限判定部１３２と、車載機器制御部１３４とを備える。これらの構成要素は、例えば、ＣＰＵ（Central Processing Unit）やＧＰＵ（Graphics Processing Unit）等のプロセッサがプログラム（ソフトウェア）を実行することにより実現される。また、これらの構成要素のうち一部または全部は、ＬＳＩ（Large Scale Integration）やＡＳＩＣ（Application Specific Integrated Circuit）、ＦＰＧＡ（Field-Programmable Gate Array）等のハードウェア（回路部；circuitryを含む）によって実現されてもよいし、ソフトウェアとハードウェアの協働によって実現されてもよい。プログラムは、予めエージェント側記憶部１５０（非一過性の記憶媒体を備える記憶装置）に格納されていてもよいし、ＤＶＤやＣＤ−ＲＯＭ等の着脱可能な記憶媒体（非一過性の記憶媒体）に格納されており、記憶媒体がドライブ装置に装着されることでエージェント側記憶部１５０にインストールされてもよい。 Returning to the description of FIG. 2, the agent-side control unit 120 includes, for example, the acquisition unit 122, the voice synthesis unit 124, the output control unit 126, the communication control unit 128, the speaker position identification unit 130, and the operation authority determination. A unit 132 and an in-vehicle device control unit 134 are provided. These components are realized, for example, by executing a program (software) by a processor such as a CPU (Central Processing Unit) or a GPU (Graphics Processing Unit). In addition, some or all of these components are determined by hardware (including circuit section; circuitry) such as LSI (Large Scale Integration), ASIC (Application Specific Integrated Circuit), and FPGA (Field-Programmable Gate Array). It may be realized, or it may be realized by the cooperation of software and hardware. The program may be stored in advance in the agent-side storage unit 150 (a storage device including a non-transient storage medium), or a removable storage medium (non-transient storage) such as a DVD or a CD-ROM. It is stored in the medium), and may be installed in the agent-side storage unit 150 by mounting the storage medium on the drive device.

エージェント側記憶部１５０は、ＨＤＤ、フラッシュメモリ、ＥＥＰＲＯＭ（Electrically Erasable Programmable Read Only Memory）、ＲＯＭ（Read Only Memory）、またはＲＡＭ（Random Access Memory）等により実現される。エージェント側記憶部１５０には、例えば、プロセッサによって参照されるプログラム等と、車載機器情報１５２と、操作権限位置情報１５４とが格納される。車載機器情報１５２は、車両Ｍに搭載されている車載機器ＶＥ（の一覧）を示す情報である。 The agent-side storage unit 150 is realized by an HDD, a flash memory, an EEPROM (Electrically Erasable Programmable Read Only Memory), a ROM (Read Only Memory), a RAM (Random Access Memory), or the like. The agent-side storage unit 150 stores, for example, a program or the like referred to by the processor, in-vehicle device information 152, and operation authority position information 154. The in-vehicle device information 152 is information indicating (a list of) in-vehicle devices VE mounted on the vehicle M.

図５は、操作権限位置情報１５４の内容の一例を示す図である。操作権限位置情報１５４は、例えば、車両Ｍ内における乗員の位置（この一例では、乗員の座席）と、当該位置に乗車する乗員が操作権限を有する車載機器ＶＥとを、乗員の位置毎に対応付けた情報である。操作権限位置情報１５４には、例えば、車両Ｍの挙動や状態に影響しない車載機器ＶＥ（例えば、オーディオ、エアコン等）は、どの位置の乗員も操作権限を有し、車両Ｍの挙動や状態に影響する車載機器ＶＥ（例えば、窓の開閉、ドアロック、運転支援に係る機器、方向指示器、前照灯等）は、当該車載機器ＶＥの近傍の位置の乗員か、又は運転席に位置する乗員のみが操作権限を有することが示される。 FIG. 5 is a diagram showing an example of the contents of the operation authority position information 154. The operation authority position information 154 corresponds to, for example, the position of the occupant in the vehicle M (in this example, the seat of the occupant) and the in-vehicle device VE to which the occupant riding in the position has the operation authority for each position of the occupant. This is the attached information. In the operation authority position information 154, for example, the in-vehicle device VE (for example, audio, air conditioner, etc.) that does not affect the behavior or state of the vehicle M has the operation authority for the occupant at any position, and the behavior or state of the vehicle M The affected in-vehicle device VE (for example, window opening / closing, door lock, driving support device, turn signal, headlight, etc.) is located in the occupant near the in-vehicle device VE or in the driver's seat. It is shown that only the occupant has operational authority.

図２に戻り、取得部１２２は、マイク１０６から音声データを取得したり、他の情報を取得したりする。 Returning to FIG. 2, the acquisition unit 122 acquires voice data from the microphone 106 and other information.

音声合成部１２４は、エージェント側通信部１０２がサーバ装置２００から受信したエージェント設定データに音声制御内容が含まれる場合に、音声制御として発話によって指示（つまり、音声指示）された音声データに基づいて、人工的な合成音声（以下、エージェント音声と称する）を生成する。 The voice synthesis unit 124 is based on the voice data instructed by utterance (that is, voice instruction) as voice control when the agent setting data received from the server device 200 by the agent side communication unit 102 includes the voice control content. , Generates artificial synthetic voice (hereinafter referred to as agent voice).

出力制御部１２６は、音声合成部１２４によってエージェント音声が生成されると、そのエージェント音声をスピーカ１０８に出力させる。また、出力制御部１２６は、エージェント設定データに画像制御内容が含まれる場合に、画像制御として指示された画像データを表示部１１０に表示させる。また、出力制御部１２６は、音声データの認識結果（フレーズ等のテキストデータ）の画像を表示部１１０に表示させてもよい。 When the agent voice is generated by the voice synthesis unit 124, the output control unit 126 causes the speaker 108 to output the agent voice. Further, the output control unit 126 causes the display unit 110 to display the image data instructed as the image control when the agent setting data includes the image control contents. Further, the output control unit 126 may display an image of the recognition result of voice data (text data such as a phrase) on the display unit 110.

通信制御部１２８は、エージェント側通信部１０２を介して、取得部１２２によって取得された音声データをサーバ装置２００に送信する。 The communication control unit 128 transmits the voice data acquired by the acquisition unit 122 to the server device 200 via the agent-side communication unit 102.

発話者位置特定部１３０は、車両Ｍの乗員のうち、マイク１０６によって収音された音声を発話した乗員の位置を特定する。発話者位置特定部１３０は、例えば、マイク１０６Ａ〜１０６Ｅのうち、ある発話について最も大きい音量によって音声を収音したマイク１０６の近傍に設置されている座席ＳＴを、音声を発話した乗員の位置として特定する。なお、発話者位置特定部１３０は、車内カメラ（不図示）によって乗員が撮像された画像に基づいて、マイク１０６によって音声が収音されたタイミングに口を動かしている乗員の位置を、音声を発話した乗員の位置として特定してもよい。 The speaker position specifying unit 130 identifies the position of the occupant of the vehicle M who has spoken the voice picked up by the microphone 106. The speaker position specifying unit 130 uses, for example, the seat ST installed in the vicinity of the microphone 106 that picks up the voice at the loudest volume for a certain utterance among the microphones 106A to 106E as the position of the occupant who utters the voice. Identify. The speaker position specifying unit 130 sets the position of the occupant who is moving his / her mouth at the timing when the sound is picked up by the microphone 106 based on the image captured by the occupant by the in-vehicle camera (not shown). It may be specified as the position of the occupant who spoke.

操作権限判定部１３２は、例えば、エージェント設定データに車載機器ＶＥの制御を示す情報が含まれる場合、発話者位置特定部１３０によって特定された位置に乗車する乗員が、指示対象の車載機器ＶＥについて操作権限を有するか否かを、操作権限位置情報１５４に基づいて判定する。まず、操作権限判定部１３２は、発話者位置特定部１３０によって特定された乗員の位置を検索キーとして操作権限位置情報１５４を検索する。そして、操作権限判定部１３２は、特定した乗員の位置に対応付けられた操作権限を有する車載機器ＶＥに、エージェント設定データに示される指示対象の車載機器ＶＥが対応付けられている場合、発話した乗員が操作権限を有すると判定する。操作権限判定部１３２は、発話した乗員が操作権限を有することを示す情報をエージェント側通信部１０２によってサーバ装置２００に送信する。 For example, when the agent setting data includes information indicating control of the in-vehicle device VE, the operation authority determination unit 132 allows the occupant who gets on the position specified by the speaker position identification unit 130 to refer to the in-vehicle device VE to be instructed. Whether or not the user has the operation authority is determined based on the operation authority position information 154. First, the operation authority determination unit 132 searches for the operation authority position information 154 using the position of the occupant specified by the speaker position identification unit 130 as a search key. Then, the operation authority determination unit 132 utters when the in-vehicle device VE having the operation authority associated with the position of the specified occupant is associated with the in-vehicle device VE to be instructed shown in the agent setting data. It is determined that the occupant has the operation authority. The operation authority determination unit 132 transmits information indicating that the uttered occupant has the operation authority to the server device 200 by the agent side communication unit 102.

また、操作権限判定部１３２は、例えば、エージェント設定データに車載機器ＶＥの制御を指示する新たな発話コマンドを登録する制御内容が含まれる場合、発話者位置特定部１３０によって特定された位置に乗車する乗員が、指示対象の車載機器ＶＥについて操作権限を有するか否かを、操作権限位置情報１５４に基づいて判定する。操作権限判定部１３２が新たな発話コマンドを登録する処理に伴い、乗員の操作権限を判定する処理は、上述した、車載機器ＶＥの制御に伴い乗員の操作権限を判定する処理と同様であるため、説明を省略する。 Further, the operation authority determination unit 132 gets on the position specified by the speaker position identification unit 130, for example, when the agent setting data includes the control content for registering a new utterance command instructing the control of the in-vehicle device VE. It is determined based on the operation authority position information 154 whether or not the occupant has the operation authority for the in-vehicle device VE to be instructed. Since the operation authority determination unit 132 determines the operation authority of the occupant in accordance with the process of registering a new utterance command, it is the same as the process of determining the operation authority of the occupant in accordance with the control of the in-vehicle device VE described above. , The description is omitted.

車載機器制御部１３４は、エージェント設定データに車載機器ＶＥの制御内容が含まれ、且つ操作権限判定部１３２の判定結果が、指示語を発話した乗員が指示対象の車載機器ＶＥの操作権限を有することを示す場合に、車載機器ＶＥの動作を制御する。 In the in-vehicle device control unit 134, the agent setting data includes the control content of the in-vehicle device VE, and the determination result of the operation authority determination unit 132 has the operation authority of the in-vehicle device VE to be instructed by the occupant who spoke the demonstrative word. When indicating that, the operation of the in-vehicle device VE is controlled.

［サーバ装置の構成］
図６は、第１実施形態に係るサーバ装置２００の構成の一例を示す図である。第１実施形態に係るサーバ装置２００は、例えば、サーバ側通信部２０２と、サーバ側制御部２１０と、サーバ側記憶部２３０とを備える。 [Server device configuration]
FIG. 6 is a diagram showing an example of the configuration of the server device 200 according to the first embodiment. The server device 200 according to the first embodiment includes, for example, a server-side communication unit 202, a server-side control unit 210, and a server-side storage unit 230.

サーバ側通信部２０２は、ＮＩＣ等の通信インターフェースを含む。サーバ側通信部２０２は、ネットワークＮＷを介して各車両Ｍに搭載されたエージェント装置１００等と通信する。 The server-side communication unit 202 includes a communication interface such as a NIC. The server-side communication unit 202 communicates with the agent device 100 and the like mounted on each vehicle M via the network NW.

サーバ側制御部２１０は、例えば、取得部２１２と、発話区間抽出部２１４と、音声認識部２１６と、意味解釈部２１８と、エージェント設定データ生成部２２２と、通信制御部２２４とを備える。これらの構成要素は、例えば、ＣＰＵやＧＰＵ等のプロセッサがプログラム（ソフトウェア）を実行することにより実現される。また、これらの構成要素のうち一部または全部は、ＬＳＩやＡＳＩＣ、ＦＰＧＡ等のハードウェア（回路部；circuitryを含む）によって実現されてもよいし、ソフトウェアとハードウェアの協働によって実現されてもよい。プログラムは、予めサーバ側記憶部２３０（非一過性の記憶媒体を備える記憶装置）に格納されていてもよいし、ＤＶＤやＣＤ−ＲＯＭ等の着脱可能な記憶媒体（非一過性の記憶媒体）に格納されており、記憶媒体がドライブ装置に装着されることでサーバ側記憶部２３０にインストールされてもよい。 The server-side control unit 210 includes, for example, an acquisition unit 212, an utterance section extraction unit 214, a voice recognition unit 216, a meaning interpretation unit 218, an agent setting data generation unit 222, and a communication control unit 224. These components are realized, for example, by a processor such as a CPU or GPU executing a program (software). In addition, some or all of these components may be realized by hardware such as LSI, ASIC, FPGA (including circuit section; circuitry), or realized by collaboration between software and hardware. May be good. The program may be stored in advance in the server-side storage unit 230 (a storage device including a non-transient storage medium), or a removable storage medium (non-transient storage) such as a DVD or a CD-ROM. It is stored in the medium), and may be installed in the server-side storage unit 230 by mounting the storage medium on the drive device.

サーバ側記憶部２３０は、ＨＤＤ、フラッシュメモリ、ＥＥＰＲＯＭ、ＲＯＭ、またはＲＡＭ等により実現される。サーバ側記憶部２３０には、例えば、プロセッサによって参照されるプログラムのほかに、基本発話コマンド辞書２３２、ユーザ発話コマンド辞書２３４等が格納される。基本発話コマンド辞書２３２には、例えば、予め登録されている音声コマンドが含まれる発話命令文や句等（以下、音声コマンド）が含まれ、ユーザ発話コマンド辞書２３４には、例えば、車両Ｍの乗員によって登録された発話命令文や句等（以下、ユーザ発話コマンド）が含まれる。 The server-side storage unit 230 is realized by an HDD, a flash memory, an EEPROM, a ROM, a RAM, or the like. In the server-side storage unit 230, for example, in addition to the program referenced by the processor, the basic utterance command dictionary 232, the user utterance command dictionary 234, and the like are stored. The basic utterance command dictionary 232 includes, for example, utterance command sentences and phrases (hereinafter, voice commands) including pre-registered voice commands, and the user utterance command dictionary 234 includes, for example, the occupant of the vehicle M. Includes utterance command sentences, phrases, etc. (hereinafter referred to as user utterance commands) registered by.

図７は、基本発話コマンド辞書２３２の内容の一例を示す図である。基本発話コマンド辞書２３２には、例えば、予め登録されている音声コマンドが含まれる命令を標準的に発話した文や句例である基本音声コマンドと、基本音声コマンドと、基本音声コマンドによって動作が指示される制御対象の車載機器ＶＥと、エージェント側制御部１２０に実行させる制御内容とが互いに（共に）対応付けられる。基本音声コマンドには、例えば、制御対象の車載機器ＶＥの名称と、制御対象の車載機器ＶＥに対する指示・命令（させたい事）が含まれる。制御内容には、例えば、車載機器ＶＥに対する動作の制御（動作させたい事）について、その制御が入力されている。例えば、基本発話コマンド辞書２３２では、車載機器制御内容「エアコンを起動させる（エアコンの電源を入れる）」に対して、「エアコンＯＮ」「エアコン起動」「エアコン作動」という３つの予め登録された基本音声コマンドが対応付けられている。 FIG. 7 is a diagram showing an example of the contents of the basic utterance command dictionary 232. In the basic utterance command dictionary 232, for example, an operation is instructed by a basic voice command, a basic voice command, and a basic voice command, which are sentences or phrases in which a command including a pre-registered voice command is uttered as standard. The vehicle-mounted device VE to be controlled and the control content to be executed by the agent-side control unit 120 are (together) associated with each other. The basic voice command includes, for example, the name of the in-vehicle device VE to be controlled and instructions / commands (what to be made) to the in-vehicle device VE to be controlled. In the control content, for example, the control of the operation of the in-vehicle device VE (what is desired to be operated) is input. For example, in the basic speech command dictionary 232, three pre-registered basics, "air conditioner ON", "air conditioner start", and "air conditioner operation", are used for the in-vehicle device control content "start the air conditioner (turn on the air conditioner)". Voice commands are associated.

図８は、ユーザ発話コマンド辞書２３４の内容の一例を示す図である。ユーザ発話コマンド辞書２３４には、例えば、乗員により登録されたユーザ発話コマンドと、ユーザ発話コマンドによって動作が指示される制御対象の車載機器ＶＥと、エージェント側制御部１２０に実行させる制御内容とが互いに（共に）対応付けられる。ユーザ発話コマンドには、例えば、車載機器ＶＥに対する指示を示す言葉（例えば、指示を示す代名詞「あれやって」等）が含まれる。制御内容には、例えば、車載機器ＶＥに対する動作制御（動作させたい事）について、当該車載機器ＶＥの名称を含む当該制御が登録されている。例えば、ユーザ発話コマンド辞書２３４では、「運転席の窓（車載機器ＶＥ）を２０％開ける」車載機器制御に対して、「いつものやって」というユーザ発話コマンドに対して、「運転席窓の昇降装置（車載機器ＶＥ）で運転席の窓を２０％開ける」という車載機器制御内容が対応付けられている。 FIG. 8 is a diagram showing an example of the contents of the user utterance command dictionary 234. In the user utterance command dictionary 234, for example, the user utterance command registered by the occupant, the in-vehicle device VE whose operation is instructed by the user utterance command, and the control content to be executed by the agent side control unit 120 are mutually present. (Both) are associated. The user utterance command includes, for example, a word indicating an instruction to the in-vehicle device VE (for example, a pronoun "that do" indicating an instruction). In the control content, for example, regarding the operation control (what is desired to be operated) for the in-vehicle device VE, the control including the name of the in-vehicle device VE is registered. For example, in the user utterance command dictionary 234, in response to the in-vehicle device control of "opening the driver's seat window (vehicle-mounted device VE) by 20%" and the user-spoken command of "usually doing", "in the driver's seat window". The in-vehicle device control content of "opening the driver's seat window by 20% with the elevating device (in-vehicle device VE)" is associated with it.

なお、基本発話コマンド辞書２３２やユーザ発話コマンド辞書２３４において、制御内容には、車載機器制御の他、出力制御部１２６がスピーカ１０８に音声を出力させる音声制御や、出力制御部１２６が表示部１１０に画像を表示させる表示制御が含まれていてもよい。 In the basic utterance command dictionary 232 and the user utterance command dictionary 234, the control contents include in-vehicle device control, voice control in which the output control unit 126 outputs voice to the speaker 108, and the output control unit 126 in the display unit 110. May include display control for displaying an image.

図６に戻り、取得部２１２は、サーバ側通信部２０２を介して、エージェント装置１００から、音声データを取得する。 Returning to FIG. 6, the acquisition unit 212 acquires voice data from the agent device 100 via the server-side communication unit 202.

発話区間抽出部２１４は、取得部１２２によって取得された音声データから、乗員が発話している期間（以下、発話区間と称する）を抽出する。例えば、発話区間抽出部２１４は、零交差法を利用して、音声データに含まれる音声信号の振幅に基づいて発話区間を抽出してよい。また、発話区間抽出部２１４は、混合ガウス分布モデル（ＧＭＭ；Gaussian mixture model）に基づいて、音声データから発話区間を抽出してもよいし、発話区間特有の音声信号をテンプレート化したデータベースとテンプレートマッチング処理を行うことで、音声データから発話区間を抽出してもよい。 The utterance section extraction unit 214 extracts the period during which the occupant is speaking (hereinafter, referred to as the utterance section) from the voice data acquired by the acquisition unit 122. For example, the utterance section extraction unit 214 may extract the utterance section based on the amplitude of the voice signal included in the voice data by using the zero intersection method. Further, the utterance section extraction unit 214 may extract the utterance section from the voice data based on the Gaussian mixture model (GMM), or a database and a template in which the voice signal peculiar to the utterance section is templated. The utterance section may be extracted from the voice data by performing the matching process.

音声認識部２１６は、発話区間抽出部２１４によって抽出された発話区間ごとに音声データを認識し、認識した音声データをテキスト化することで、発話内容を含むテキストデータを生成する。例えば、音声認識部２１６は、発話区間の音声信号を、低周波数や高周波数等の複数の周波数帯に分離し、分類した各音声信号をフーリエ変換することで、スペクトログラムを生成する。音声認識部２１６は、生成したスペクトログラムを、再帰的ニューラルネットワークに入力することで、スペクトログラムから文字列を得る。再帰的ニューラルネットワークは、例えば、学習用の音声から生成したスペクトログラムに対して、その学習用の音声に対応した既知の文字列が教師ラベルとして対応付けられた教師データを利用することで、予め学習されていてよい。そして、音声認識部２１６は、再帰的ニューラルネットワークから得た文字列のデータを、テキストデータとして出力する。 The voice recognition unit 216 recognizes voice data for each utterance section extracted by the utterance section extraction unit 214, and converts the recognized voice data into text to generate text data including the utterance content. For example, the voice recognition unit 216 divides the voice signal in the utterance section into a plurality of frequency bands such as low frequency and high frequency, and Fourier transforms each classified voice signal to generate a spectrogram. The voice recognition unit 216 obtains a character string from the spectrogram by inputting the generated spectrogram into the recursive neural network. The recursive neural network learns in advance by using, for example, teacher data in which a known character string corresponding to the learning voice is associated with a teacher label for a spectrogram generated from the learning voice. May be done. Then, the voice recognition unit 216 outputs the data of the character string obtained from the recursive neural network as text data.

意味解釈部２１８は、音声認識部２１６により認識されたテキストデータに基づいて、自然言語のテキストデータの構文解析を行って、テキストデータを形態素に分け、各形態素からテキストデータに含まれる文言の意味を解釈する。意味解釈部２１８は、例えば、サーバ側記憶部２３０に記憶された基本発話コマンド辞書２３２やユーザ発話コマンド辞書２３４を用いて、音声認識部２１６により認識されたテキストデータの意味が、車載機器ＶＥの制御を指示していることを解釈する。具体的には、認識されたテキストデータ（発話内容）から、少なくとも指示制御対象としての車載機器ＶＥの名称と車載機器制御内容とを把握することである。 The semantic interpretation unit 218 performs syntactic analysis of the natural language text data based on the text data recognized by the speech recognition unit 216, divides the text data into morphemes, and means the words included in the text data from each morpheme. To interpret. The meaning interpretation unit 218 uses, for example, the basic utterance command dictionary 232 and the user utterance command dictionary 234 stored in the server-side storage unit 230, and the meaning of the text data recognized by the voice recognition unit 216 is the meaning of the in-vehicle device VE. Interpret that you are instructing control. Specifically, from the recognized text data (utterance content), at least the name of the vehicle-mounted device VE as the instruction control target and the vehicle-mounted device control content are grasped.

エージェント設定データ生成部２２２は、意味解釈部２１８により意味が解釈された発話内容が、車載機器ＶＥの制御を指示していると解釈された（基本音声コマンド、又はユーザ発話コマンドが発話内容に含まれると認識された）場合、基本発話コマンド辞書２３２、及びユーザ発話コマンド辞書２３４を参照し、合致する基本音声コマンド、又はユーザ発話コマンドに対応付けられた制御内容を取得する。エージェント設定データ生成部２２２は、取得した制御内容（例えば、車載機器制御、音声制御、または表示制御のうち少なくとも一つ）に対応する処理を実行させるためのエージェント設定データを生成する。なお、解釈結果として、「ＯＮエアコン」、「エアコン作動」等の意味が解釈された場合、エージェント設定データ生成部２２２は、上述の意味を標準文字情報「エアコンを起動させる」や、標準コマンド情報「ＴＵＲＮＡＣＯＮ」等に置き換える。これにより、発話内容の要求に文字揺らぎがあった場合にも要求にあった制御内容を取得し易くすることができる。 The agent setting data generation unit 222 interpreted that the utterance content whose meaning was interpreted by the meaning interpretation unit 218 indicates control of the in-vehicle device VE (basic voice command or user utterance command is included in the utterance content. When it is recognized that the utterance command is used, the basic utterance command dictionary 232 and the user utterance command dictionary 234 are referred to, and the matching basic voice command or the control content associated with the user utterance command is acquired. The agent setting data generation unit 222 generates agent setting data for executing a process corresponding to the acquired control content (for example, at least one of in-vehicle device control, voice control, or display control). When the meanings of "ON air conditioner", "air conditioner operation", etc. are interpreted as the interpretation result, the agent setting data generation unit 222 uses the above meanings as standard character information "start the air conditioner" or standard command information. Replace with "TURN AC ON" or the like. As a result, even if there is a character fluctuation in the request for the utterance content, it is possible to easily acquire the control content that meets the request.

通信制御部２２４は、エージェント設定データ生成部２２２により生成されたエージェント設定データを、サーバ側通信部２０２を介して車両Ｍに送信する。これにより、車両Ｍは、エージェント側制御部１２０によって、エージェント設定データに対応する制御が実行される。 The communication control unit 224 transmits the agent setting data generated by the agent setting data generation unit 222 to the vehicle M via the server-side communication unit 202. As a result, the vehicle M is controlled by the agent-side control unit 120 in response to the agent setting data.

［ユーザ発話コマンド辞書２３４への新たな発話コマンドの登録］
また、意味解釈部２１８は、音声認識部２１６により認識されたテキストデータの意味が、新たな発話コマンドを登録する指示を含むことを解釈する。『新たな発話コマンドを登録する指示』は、例えば、新たな発話コマンドそのものと、新たな発話コマンドによって指示される車載機器ＶＥに対する指示とを少なくとも含む言葉によって示され、例えば、『運転席の窓（車載機器ＶＥ）を２０％開けることを、「いつものやって」で登録して。』等の指示である。この場合、車載機器ＶＥの名称は、「運転席の窓」であり、「いつものやって」は、この一例における、新たな発話コマンドである。エージェント設定データ生成部２２２は、意味解釈部２１８によって新たな発話コマンドを登録する指示が含まれると解釈された場合、解釈された新たな発話コマンドと、新たな発話コマンドの制御対象の車載機器ＶＥを示す情報（例えば、車載機器ＶＥの名称）と、新たな発話コマンドを登録する制御内容と含むエージェント設定データを生成し、サーバ側通信部２０２によってエージェント装置１００に送信する。 [Registration of new utterance commands in the user utterance command dictionary 234]
Further, the meaning interpretation unit 218 interprets that the meaning of the text data recognized by the voice recognition unit 216 includes an instruction to register a new utterance command. The "instruction to register a new utterance command" is indicated by, for example, a word including at least the new utterance command itself and the instruction to the in-vehicle device VE instructed by the new utterance command. For example, "the driver's seat window". Register to open (in-vehicle device VE) 20% with "Usually do". ] Etc. In this case, the name of the in-vehicle device VE is "driver's seat window", and "usually doing" is a new utterance command in this example. When the agent setting data generation unit 222 is interpreted by the semantic interpretation unit 218 to include an instruction to register a new utterance command, the interpreted new utterance command and the in-vehicle device VE to be controlled by the new utterance command. The agent setting data including the information indicating the above (for example, the name of the in-vehicle device VE) and the control content for registering a new utterance command is generated, and transmitted to the agent device 100 by the server-side communication unit 202.

上述したように、操作権限判定部１３２は、新たな発話コマンドを登録する制御内容を含むエージェント設定データを受信した場合、ユーザ発話コマンドの制御対象の車載機器ＶＥについて、新たな発話コマンドの登録を指示した乗員が操作権限を有するか否かを判定し、判定結果を示す情報をエージェント側通信部１０２によってサーバ装置２００に送信する。 As described above, when the operation authority determination unit 132 receives the agent setting data including the control content for registering a new utterance command, the operation authority determination unit 132 registers a new utterance command for the in-vehicle device VE to be controlled by the user utterance command. It is determined whether or not the instructed occupant has the operation authority, and information indicating the determination result is transmitted to the server device 200 by the agent-side communication unit 102.

意味解釈部２１８は、サーバ側通信部２０２によって受信した当該情報が発話した乗員が車載機器ＶＥの操作権限を有することを示す場合、解釈した新たな発話コマンドと、当該新たな発話コマンドによって指示される車載機器ＶＥの制御内容とを対応付けて、ユーザ発話コマンド辞書２３４に登録する。 When the information received by the server-side communication unit 202 indicates that the occupant who has spoken has the operation authority of the in-vehicle device VE, the semantic interpretation unit 218 is instructed by the interpreted new utterance command and the new utterance command. It is registered in the user utterance command dictionary 234 in association with the control content of the in-vehicle device VE.

［新たな発話コマンドの登録の例外］
なお、意味解釈部２１８が、新たな発話コマンドを検索キーとして基本発話コマンド辞書２３２を検索し、新たな発話コマンドが既に予め登録される基本音声コマンドとして登録されている場合には、新たな発話コマンドをユーザ発話コマンド辞書２３４に登録する処理を実行しない。この場合、意味解釈部２１８は、エージェント設定データ生成部２２２に新たな発話コマンドを登録する処理に係るエージェント設定データを生成させない。 [Exception to registration of new utterance command]
The semantic interpretation unit 218 searches the basic utterance command dictionary 232 using the new utterance command as a search key, and when the new utterance command is already registered as the basic voice command to be registered in advance, the new utterance is made. The process of registering the command in the user utterance command dictionary 234 is not executed. In this case, the semantic interpretation unit 218 does not cause the agent setting data generation unit 222 to generate the agent setting data related to the process of registering a new utterance command.

［直前の動作に対してユーザ発話コマンドを登録］
また、意味解釈部２１８は、音声認識部２１６により認識されたテキストデータの意味が、乗員が車載機器ＶＥに対して直前に行った指示（或いは、制御）を新たな発話コマンドとして登録する指示を含むことを解釈する。『直前に行った指示を新たな発話コマンドとして登録する指示』は、例えば、新たな発話コマンドそのものを含む言葉であり、例えば、『今やった制御を、「いつものやって」で登録して。』等の指示である。今やった制御とは、乗員が直前に行った指示に応じて車載機器ＶＥに行われた制御や、乗員が自ら操作して車載機器ＶＥに行った制御等である。エージェント設定データ生成部２２２は、意味解釈部２１８によって直前に行った指示を新たな発話コマンドを登録する指示が含まれると解釈された場合、解釈された新たな発話コマンドと、直前に行った指示を新たな発話コマンドを登録する制御内容と含むエージェント設定データを生成し、サーバ側通信部２０２によってエージェント装置１００に送信する。 [Register user utterance command for the previous operation]
Further, the meaning interpretation unit 218 gives an instruction that the meaning of the text data recognized by the voice recognition unit 216 registers the instruction (or control) that the occupant has given to the in-vehicle device VE immediately before as a new utterance command. Interpret to include. "Instruction to register the instruction given immediately before as a new utterance command" is, for example, a word including the new utterance command itself. For example, "Register the control that has just been performed with" usual do ". .. ] Etc. The control just performed is a control performed on the in-vehicle device VE in response to an instruction given immediately before by the occupant, a control performed on the in-vehicle device VE by the occupant himself / herself, and the like. When the agent setting data generation unit 222 interprets that the instruction given immediately before by the semantic interpretation unit 218 includes an instruction for registering a new utterance command, the interpreted new utterance command and the instruction given immediately before are interpreted. The agent setting data including the control content for registering a new utterance command is generated, and is transmitted to the agent device 100 by the server-side communication unit 202.

この場合、操作権限判定部１３２は、直前に行った指示を新たな発話コマンドを登録する制御内容を含むエージェント設定データを受信し、車載機器ＶＥに対して行われた制御の履歴を示す履歴に基づいて、直前に行われた制御内容における制御対象の車載機器ＶＥを特定する。履歴情報は、例えば、エージェント側記憶部１５０に蓄積（記憶）されている。操作権限判定部１３２は、特定した車載機器ＶＥについて、新たな発話コマンドの登録を指示した乗員が操作権限を有するか否かを判定し、判定結果を示す情報と、直前に行われた制御内容とを示す情報をエージェント側通信部１０２によってサーバ装置２００に送信する。 In this case, the operation authority determination unit 132 receives the agent setting data including the control content for registering a new utterance command for the instruction given immediately before, and displays the history of the control performed on the in-vehicle device VE. Based on this, the vehicle-mounted device VE to be controlled in the control content performed immediately before is specified. The history information is stored (stored) in, for example, the agent side storage unit 150. The operation authority determination unit 132 determines whether or not the occupant who has instructed to register a new utterance command has the operation authority for the specified in-vehicle device VE, and the information indicating the determination result and the control content performed immediately before Information indicating that is transmitted to the server device 200 by the agent-side communication unit 102.

意味解釈部２１８は、サーバ側通信部２０２によって受信した情報に基づいて判定結果が、発話した乗員が車載機器ＶＥの操作権限を有することを示す場合、解釈した新たな発話コマンドと、受信した直前に行われた制御内容とを対応付けて、ユーザ発話コマンド辞書２３４に登録する。 When the determination result based on the information received by the server-side communication unit 202 indicates that the uttered occupant has the operation authority of the in-vehicle device VE, the semantic interpretation unit 218 interprets the new utterance command and immediately before receiving it. It is registered in the user utterance command dictionary 234 in association with the control content performed in.

［処理フロー］
次に、第１実施形態に係るエージェントシステム１の処理の流れについてフローチャートを用いて説明する。なお、以下では、エージェント装置１００の処理と、サーバ装置２００との処理を分けて説明するものとする。また、以下に示す処理の流れは、所定のタイミングで繰り返し実行されてよい。所定のタイミングとは、例えば、音声データからエージェント装置を起動させる特定ワード（例えば、ウェイクアップワード）が抽出されたタイミングや、車両Ｍに搭載される各種スイッチのうち、エージェント装置１００を起動させるスイッチの選択を受け付けたタイミング等である。 [Processing flow]
Next, the processing flow of the agent system 1 according to the first embodiment will be described with reference to a flowchart. In the following, the processing of the agent device 100 and the processing of the server device 200 will be described separately. Further, the processing flow shown below may be repeatedly executed at a predetermined timing. The predetermined timing is, for example, the timing at which a specific word (for example, a wakeup word) for activating the agent device is extracted from the voice data, or a switch for activating the agent device 100 among various switches mounted on the vehicle M. It is the timing when the selection of is accepted.

図９は、第１実施形態に係るエージェント装置１００の一連の処理の流れを示すフローチャートである。まず、エージェント側制御部１２０の取得部１２２は、マイク１０６によって乗員の音声データが収集されたか否かを判定する（ステップＳ１００）。取得部１２２は、乗員の音声データが収集されるまでの間、待機する。次に、通信制御部１２８は、エージェント側通信部１０２を介して、音声データをサーバ装置２００に送信する（ステップＳ１０２）。 FIG. 9 is a flowchart showing a series of processing flows of the agent device 100 according to the first embodiment. First, the acquisition unit 122 of the agent-side control unit 120 determines whether or not the voice data of the occupant has been collected by the microphone 106 (step S100). The acquisition unit 122 waits until the voice data of the occupant is collected. Next, the communication control unit 128 transmits voice data to the server device 200 via the agent-side communication unit 102 (step S102).

次に、通信制御部１２８は、サーバ装置２００からエージェント設定データを受信する（ステップＳ１０４）。次に、操作権限判定部１３２は、受信したエージェント設定データに含まれる車載機器ＶＥを特定する（ステップＳ１０６）。発話者位置特定部１３０は、ステップＳ１００において収集された音声を発話した乗員の位置を特定する（ステップＳ１０８）。操作権限判定部１３２は、受信したエージェント設定データに車載機器ＶＥの制御を指示する新たな発話コマンドを登録する制御内容が含まれるか否かを判定する（ステップＳ１１０）。操作権限判定部１３２は、エージェント設定データに新たな発話コマンドを登録する制御内容が含まれない場合、エージェント設定データにそれ以外の指示が含まれる（例えば、音声制御、表示制御、車載機器制御等である）ものとして、処理をステップＳ１１８に進める。 Next, the communication control unit 128 receives the agent setting data from the server device 200 (step S104). Next, the operation authority determination unit 132 identifies the in-vehicle device VE included in the received agent setting data (step S106). The speaker position specifying unit 130 identifies the position of the occupant who uttered the voice collected in step S100 (step S108). The operation authority determination unit 132 determines whether or not the received agent setting data includes the control content for registering a new utterance command instructing the control of the vehicle-mounted device VE (step S110). When the agent setting data does not include the control content for registering a new utterance command, the operation authority determination unit 132 includes other instructions (for example, voice control, display control, in-vehicle device control, etc.). The process proceeds to step S118.

操作権限判定部１３２は、エージェント設定データに新たな発話コマンドを登録する制御内容が含まれる場合、操作権限位置情報１５４に基づいて、特定した車載機器ＶＥについて、発話者位置特定部１３０により位置が特定された乗員が操作権限を有するか否かを判定する（ステップＳ１１２）。操作権限判定部１３２は、乗員が車載機器ＶＥの操作権限を有しないと判定した場合、乗員が車載機器ＶＥの操作権限を有しないことを示す情報を、エージェント側通信部１０２によりサーバ装置２００に送信する（ステップＳ１１４）。操作権限判定部１３２は、乗員が車載機器ＶＥの操作権限を有すると判定した場合、乗員が車載機器ＶＥの操作権限を有することを示す情報をエージェント側通信部１０２によりサーバ装置２００に送信する（ステップＳ１１６）。 When the agent setting data includes the control content for registering a new utterance command, the operation authority determination unit 132 determines the position of the specified in-vehicle device VE based on the operation authority position information 154 by the speaker position identification unit 130. It is determined whether or not the specified occupant has the operation authority (step S112). When the operation authority determination unit 132 determines that the occupant does not have the operation authority of the in-vehicle device VE, the agent-side communication unit 102 provides the server device 200 with information indicating that the occupant does not have the operation authority of the in-vehicle device VE. Transmit (step S114). When the operation authority determination unit 132 determines that the occupant has the operation authority of the in-vehicle device VE, the agent-side communication unit 102 transmits information indicating that the occupant has the operation authority of the in-vehicle device VE to the server device 200 ( Step S116).

操作権限判定部１３２は、エージェント設定データに車載機器ＶＥに対する制御内容が含まれる場合、操作権限位置情報１５４に基づいて、特定した車載機器ＶＥについて、発話者位置特定部１３０によって位置が特定された乗員が操作権限を有するか否かを判定する（ステップＳ１１８）。車載機器制御部１３４は、操作権限判定部１３２により乗員が車載機器ＶＥの操作権限を有しないと判定した場合、車載機器ＶＥに対する制御を実行せず、処理を終了する（ステップＳ１２０）。車載機器制御部１３４は、操作権限判定部１３２によって乗員が車載機器ＶＥの操作権限を有すると判定した場合、エージェント設定データに含まれる制御内容に基づいて、車載機器ＶＥに対する制御を実行する（ステップＳ１２２）。これにより、本フローチャートの処理は、終了する。 When the agent setting data includes the control content for the in-vehicle device VE, the operation authority determination unit 132 specifies the position of the specified in-vehicle device VE by the speaker position identification unit 130 based on the operation authority position information 154. It is determined whether or not the occupant has the operation authority (step S118). When the in-vehicle device control unit 134 determines that the occupant does not have the operation authority of the in-vehicle device VE by the operation authority determination unit 132, the in-vehicle device control unit 134 does not execute the control for the in-vehicle device VE and ends the process (step S120). When the in-vehicle device control unit 134 determines that the occupant has the operation authority of the in-vehicle device VE by the operation authority determination unit 132, the in-vehicle device control unit 134 executes control for the in-vehicle device VE based on the control content included in the agent setting data (step). S122). As a result, the processing of this flowchart ends.

図１０は、第１実施形態に係るサーバ装置２００の一例の処理の流れを示すフローチャートである。まず、サーバ側通信部２０２は、エージェント装置１００から音声データ、第１画像データ、第２画像データを取得する（ステップＳ２００）。次に、発話区間抽出部２１４は、音声データに含まれる発話区間を抽出する（ステップＳ２０２）。次に、音声認識部２１６は、抽出された発話区間における音声データの音声認識をする（ステップＳ２０３）。次に、意味解釈部２１８は、抽出された発話区間における音声データの意味を解釈する（ステップＳ２０４）。意味解釈部２１８は、解釈した音声の意味が、新たな発話コマンドを登録する指示を示すか否かを判定する（ステップＳ２０６）。エージェント設定データ生成部２２２は、意味解釈部２１８によって解釈した音声の意味が、新たな発話コマンドを登録する指示を示さない（つまり、音声制御、表示制御、或いは車載機器制御を示す）と判定された場合、発話内容全体の意味に基づくエージェント設定データを生成する（ステップＳ２０８）。サーバ側制御部２１０の通信制御部２２４は、サーバ側通信部２０２を介して、エージェント設定データをエージェント装置１００に送信する（ステップＳ２１０）。 FIG. 10 is a flowchart showing a processing flow of an example of the server device 200 according to the first embodiment. First, the server-side communication unit 202 acquires voice data, first image data, and second image data from the agent device 100 (step S200). Next, the utterance section extraction unit 214 extracts the utterance section included in the voice data (step S202). Next, the voice recognition unit 216 performs voice recognition of the voice data in the extracted utterance section (step S203). Next, the meaning interpretation unit 218 interprets the meaning of the voice data in the extracted utterance section (step S204). The meaning interpretation unit 218 determines whether or not the meaning of the interpreted voice indicates an instruction to register a new utterance command (step S206). The agent setting data generation unit 222 determines that the meaning of the voice interpreted by the meaning interpretation unit 218 does not indicate an instruction to register a new utterance command (that is, indicates voice control, display control, or in-vehicle device control). If so, agent setting data based on the meaning of the entire utterance content is generated (step S208). The communication control unit 224 of the server side control unit 210 transmits the agent setting data to the agent device 100 via the server side communication unit 202 (step S210).

エージェント設定データ生成部２２２は、意味解釈部２１８によって解釈した音声の意味が、新たな発話コマンドを登録する指示を示すと判定された場合、解釈された新たな発話コマンドと、新たな発話コマンドの制御対象の車載機器ＶＥを示す情報と、新たな発話コマンドを登録する制御内容と含むエージェント設定データを生成する（ステップＳ２１２）。サーバ側制御部２１０の通信制御部２２４は、サーバ側通信部２０２を介して、エージェント設定データをエージェント装置１００に送信する（ステップＳ２１４）。意味解釈部２１８は、ステップＳ２１４によってエージェント設定データがサーバ装置２００により送信されたことに伴い、上述したステップＳ１１４やステップＳ１１６において生成／送信された情報が、乗員が車載機器ＶＥの操作権限を有することを示すか否かを判定する（ステップＳ２１６）。 When the agent setting data generation unit 222 determines that the meaning of the voice interpreted by the meaning interpretation unit 218 indicates an instruction to register a new utterance command, the interpreted new utterance command and the new utterance command are used. Agent setting data including information indicating the vehicle-mounted device VE to be controlled, control contents for registering a new utterance command, and agent setting data is generated (step S212). The communication control unit 224 of the server side control unit 210 transmits the agent setting data to the agent device 100 via the server side communication unit 202 (step S214). In the semantic interpretation unit 218, the occupant has the authority to operate the in-vehicle device VE with the information generated / transmitted in the above-mentioned steps S114 and S116 as the agent setting data is transmitted by the server device 200 in step S214. It is determined whether or not to indicate that (step S216).

意味解釈部２１８は、受信した情報が、乗員が車載機器ＶＥの操作権限を有することを示さない場合、新たな発話コマンドをユーザ発話コマンド辞書２３４に登録せず、処理を終了する。意味解釈部２１８は、受信した情報が、乗員が車載機器ＶＥの操作権限を有することを示す場合、解釈した新たな発話コマンドが基本発話コマンド辞書２３２に含まれるか否かを判定する（ステップＳ２１８）。意味解釈部２１８は、新たな発話コマンドが基本発話コマンド辞書２３２に含まれる場合、当該新たな発話コマンドをユーザ発話コマンドとして登録することができないため、処理を終了する。意味解釈部２１８は、乗員が車載機器ＶＥの操作権限を有し、且つ新たな発話コマンドが基本発話コマンド辞書２３２に含まれないと判定した場合、新たな発話コマンドと、当該新たな発話コマンドによって指示される車載機器ＶＥの制御内容とを対応付けて、ユーザ発話コマンド辞書２３４に登録する（ステップＳ２２０）。これにより、本フローチャートの処理は、終了する。 If the received information does not indicate that the occupant has the operation authority of the in-vehicle device VE, the semantic interpretation unit 218 does not register a new utterance command in the user utterance command dictionary 234 and ends the process. When the received information indicates that the occupant has the operation authority of the in-vehicle device VE, the semantic interpretation unit 218 determines whether or not the interpreted new utterance command is included in the basic utterance command dictionary 232 (step S218). ). When the new utterance command is included in the basic utterance command dictionary 232, the semantic interpretation unit 218 ends the process because the new utterance command cannot be registered as the user utterance command. When the semantic interpretation unit 218 determines that the occupant has the operation authority of the in-vehicle device VE and the new utterance command is not included in the basic utterance command dictionary 232, the new utterance command and the new utterance command are used. It is registered in the user utterance command dictionary 234 in association with the control content of the in-vehicle device VE instructed (step S220). As a result, the processing of this flowchart ends.

以上説明した第１実施形態のエージェントシステム１によれば、乗員が所望する音声によって車載機器ＶＥを制御することができるようにし、乗員がより簡便に車載機器ＶＥを操作できるようにすることができる。 According to the agent system 1 of the first embodiment described above, the vehicle-mounted device VE can be controlled by the voice desired by the occupant, and the vehicle-mounted device VE can be operated more easily by the occupant. ..

＜第２実施形態＞
上述した第１実施形態では、車両Ｍに搭載されたエージェント装置１００と、サーバ装置２００とが互いに異なる装置であるものとして説明したがこれに限定されるものではない。例えば、エージェント機能に係るサーバ装置２００の構成要素は、エージェント装置１００の構成要素に含まれてもよい。この場合、サーバ装置２００は、エージェント装置１００のエージェント側制御部１２０によって仮想的に実現される仮想マシンとして機能させてもよい。以下、サーバ装置２００の構成要素を含むエージェント装置１００Ａを第２実施形態として説明する。この場合、エージェント装置１００Ａが「エージェントシステム」の一例である。なお、第２実施形態において、上述した第１実施形態と同様の構成要素については、同様の符号を付するものとし、ここでの具体的な説明は省略する。 <Second Embodiment>
In the above-described first embodiment, the agent device 100 mounted on the vehicle M and the server device 200 have been described as different devices from each other, but the present invention is not limited thereto. For example, the component of the server device 200 related to the agent function may be included in the component of the agent device 100. In this case, the server device 200 may function as a virtual machine virtually realized by the agent side control unit 120 of the agent device 100. Hereinafter, the agent device 100A including the components of the server device 200 will be described as the second embodiment. In this case, the agent device 100A is an example of the "agent system". In the second embodiment, the same components as those in the first embodiment described above are designated by the same reference numerals, and specific description thereof will be omitted here.

図１１は、第２実施形態に係るエージェント装置１００Ａの一例を示す図である。エージェント装置１００Ａは、例えば、エージェント側通信部１０２と、マイク１０６と、スピーカ１０８と、表示部１１０と、エージェント側制御部１２０Ａと、エージェント側記憶部１５０Ａとを備える。エージェント側制御部１２０Ａは、例えば、取得部１２２と、音声合成部１２４と、出力制御部１２６と、通信制御部１２８と、発話者位置特定部１３０と、操作権限判定部１３２と、車載機器制御部１３４と、取得部２１２Ａと、発話区間抽出部２１４Ａと、音声認識部２１６Ａと、意味解釈部２１８Ａと、エージェント設定データ生成部２２２Ａとを備える。 FIG. 11 is a diagram showing an example of the agent device 100A according to the second embodiment. The agent device 100A includes, for example, an agent-side communication unit 102, a microphone 106, a speaker 108, a display unit 110, an agent-side control unit 120A, and an agent-side storage unit 150A. The agent-side control unit 120A includes, for example, an acquisition unit 122, a voice synthesis unit 124, an output control unit 126, a communication control unit 128, a speaker position identification unit 130, an operation authority determination unit 132, and an in-vehicle device control. It includes a unit 134, an acquisition unit 212A, an utterance section extraction unit 214A, a voice recognition unit 216A, a semantic interpretation unit 218A, and an agent setting data generation unit 222A.

また、エージェント側記憶部１５０Ａは、例えば、プロセッサによって参照されるプログラムのほかに、車載機器情報１５２、操作権限位置情報１５４、基本発話コマンド辞書２３２Ａ、ユーザ発話コマンド辞書２３４Ａ等が格納される。基本発話コマンド辞書２３２Ａは、サーバ装置２００から取得した最新の情報によって更新されてもよい。 Further, the agent-side storage unit 150A stores, for example, in-vehicle device information 152, operation authority position information 154, basic utterance command dictionary 232A, user utterance command dictionary 234A, and the like, in addition to the program referenced by the processor. The basic utterance command dictionary 232A may be updated with the latest information acquired from the server device 200.

［処理フロー］
図１２、及び図１３は、第２実施形態に係るエージェント装置１００Ａの一連の処理の流れを示すフローチャートである。また、以下に示す処理の流れは、第１実施形態における処理の流れと同様に所定のタイミングで繰り返し実行されてよい。まず、エージェント側制御部１２０の取得部１２２は、マイク１０６によって乗員の音声データが収集されたか否かを判定する（ステップＳ３００）。次に、発話区間抽出部２１４は、音声データに含まれる発話区間を抽出する（ステップＳ３０２）。次に、意味解釈部２１８は、抽出された発話区間における音声データの意味を解釈する（ステップＳ３０４）。 [Processing flow]
12 and 13 are flowcharts showing a series of processing flows of the agent device 100A according to the second embodiment. Further, the processing flow shown below may be repeatedly executed at a predetermined timing in the same manner as the processing flow in the first embodiment. First, the acquisition unit 122 of the agent-side control unit 120 determines whether or not the voice data of the occupant has been collected by the microphone 106 (step S300). Next, the utterance section extraction unit 214 extracts the utterance section included in the voice data (step S302). Next, the meaning interpretation unit 218 interprets the meaning of the voice data in the extracted utterance section (step S304).

意味解釈部２１８は、解釈した音声の意味が、新たな発話コマンドを登録する指示を示すか否かを判定する（ステップＳ３０６）。エージェント設定データ生成部２２２は、意味解釈部２１８によって解釈した音声の意味が、新たな発話コマンドを登録する指示を示さない（つまり、音声制御、表示制御、或いは車載機器制御を示す）と判定された場合、発話内容全体の意味に基づくエージェント設定データを生成する（ステップＳ３０８）。エージェント設定データ生成部２２２は、意味解釈部２１８によって解釈した音声の意味が、新たな発話コマンドを登録する指示を示すと判定された場合、解釈された新たな発話コマンドと、新たな発話コマンドの制御対象の車載機器ＶＥを示す情報と、新たな発話コマンドを登録する制御内容と含むエージェント設定データを生成する（ステップＳ３１０）。操作権限判定部１３２は、ステップＳ２０８、又はステップＳ２１２において生成されたエージェント設定データに含まれる車載機器ＶＥを特定する（ステップＳ３１２）。発話者位置特定部１３０は、ステップＳ１００において収集された音声を発話した乗員の位置を特定する（ステップＳ３１４）。 The meaning interpretation unit 218 determines whether or not the meaning of the interpreted voice indicates an instruction to register a new utterance command (step S306). The agent setting data generation unit 222 determines that the meaning of the voice interpreted by the meaning interpretation unit 218 does not indicate an instruction to register a new utterance command (that is, indicates voice control, display control, or in-vehicle device control). If so, agent setting data based on the meaning of the entire utterance content is generated (step S308). When the agent setting data generation unit 222 determines that the meaning of the voice interpreted by the meaning interpretation unit 218 indicates an instruction to register a new utterance command, the interpreted new utterance command and the new utterance command are used. Agent setting data including information indicating the vehicle-mounted device VE to be controlled, control contents for registering a new utterance command, and agent setting data is generated (step S310). The operation authority determination unit 132 identifies the in-vehicle device VE included in the agent setting data generated in step S208 or step S212 (step S312). The speaker position specifying unit 130 identifies the position of the occupant who uttered the voice collected in step S100 (step S314).

操作権限判定部１３２は、エージェント設定データに車載機器ＶＥの制御を指示する新たな発話コマンドを登録する制御内容が含まれるか否かを判定する（ステップＳ３１６）。操作権限判定部１３２は、エージェント設定データにユーザ発話コマンドを登録する制御内容が含まれず、車載機器ＶＥに対する制御内容が含まれる場合、操作権限位置情報１５４に基づいて、特定した車載機器ＶＥについて、発話者位置特定部１３０によって位置が特定された乗員が操作権限を有するか否かを判定する（ステップＳ３１８）。車載機器制御部１３４は、操作権限判定部１３２によって乗員が車載機器ＶＥの操作権限を有しないと判定した場合、車載機器ＶＥに対する制御を実行せず、処理を終了する（ステップＳ３２０）。車載機器制御部１３４は、操作権限判定部１３２によって乗員が車載機器ＶＥの操作権限を有すると判定した場合、エージェント設定データに含まれる制御内容に基づいて、車載機器ＶＥに対する制御を実行する（ステップＳ３２２）。 The operation authority determination unit 132 determines whether or not the agent setting data includes the control content for registering a new utterance command instructing the control of the vehicle-mounted device VE (step S316). When the operation authority determination unit 132 does not include the control content for registering the user utterance command in the agent setting data and includes the control content for the in-vehicle device VE, the operation authority determination unit 132 regards the specified in-vehicle device VE based on the operation authority position information 154. It is determined whether or not the occupant whose position is specified by the speaker position specifying unit 130 has the operation authority (step S318). When the in-vehicle device control unit 134 determines that the occupant does not have the operation authority of the in-vehicle device VE by the operation authority determination unit 132, the in-vehicle device control unit 134 does not execute the control for the in-vehicle device VE and ends the process (step S320). When the in-vehicle device control unit 134 determines that the occupant has the operation authority of the in-vehicle device VE by the operation authority determination unit 132, the in-vehicle device control unit 134 executes control for the in-vehicle device VE based on the control content included in the agent setting data (step). S322).

操作権限判定部１３２は、エージェント設定データに新たな発話コマンドを登録する制御内容が含まれる場合、操作権限位置情報１５４に基づいて、特定した車載機器ＶＥについて、発話者位置特定部１３０により位置が特定された乗員が操作権限を有するか否かを判定する（ステップＳ３２４）。意味解釈部２１８は、操作権限判定部１３２が、乗員が車載機器ＶＥの操作権限を有しないと判定した場合、新たな発話コマンドをユーザ発話コマンド辞書２３４に登録せず、処理を終了する。意味解釈部２１８は、操作権限判定部１３２が、乗員が車載機器ＶＥの操作権限を有すると判定した場合、解釈した新たな発話コマンドが基本発話コマンド辞書２３２に含まれるか否かを判定する（ステップＳ３２６）。意味解釈部２１８は、新たな発話コマンドが基本発話コマンド辞書２３２に含まれる場合、当該新たな発話コマンドをユーザ発話コマンドとして登録することができないため、処理を終了する。意味解釈部２１８は、乗員が車載機器ＶＥの操作権限を有し、且つ新たな発話コマンドが基本発話コマンド辞書２３２に含まれないと判定した場合、新たな発話コマンドと、当該新たな発話コマンドによって指示される車載機器ＶＥの制御内容とを対応付けて、ユーザ発話コマンド辞書２３４に登録する（ステップＳ３２８）。 When the agent setting data includes the control content for registering a new utterance command, the operation authority determination unit 132 determines the position of the specified in-vehicle device VE based on the operation authority position information 154 by the speaker position identification unit 130. It is determined whether or not the identified occupant has the operation authority (step S324). When the operation authority determination unit 132 determines that the occupant does not have the operation authority of the in-vehicle device VE, the semantic interpretation unit 218 does not register a new utterance command in the user utterance command dictionary 234 and ends the process. When the operation authority determination unit 132 determines that the occupant has the operation authority of the in-vehicle device VE, the semantic interpretation unit 218 determines whether or not the interpreted new utterance command is included in the basic utterance command dictionary 232 ( Step S326). When the new utterance command is included in the basic utterance command dictionary 232, the semantic interpretation unit 218 ends the process because the new utterance command cannot be registered as the user utterance command. When the semantic interpretation unit 218 determines that the occupant has the operation authority of the in-vehicle device VE and the new utterance command is not included in the basic utterance command dictionary 232, the new utterance command and the new utterance command are used. It is registered in the user utterance command dictionary 234 in association with the control content of the in-vehicle device VE instructed (step S328).

以上説明した第２実施形態のエージェント装置１００Ａによれば、第１実施形態と同様の効果を奏する他、乗員からの音声を取得するたびに、ネットワークＮＷを介してサーバ装置２００との通信を行う必要がないため、より迅速に発話内容を認識することができる。また、車両Ｍがサーバ装置２００と通信できない状態であっても、乗員が所望する音声によって車載機器ＶＥを制御することができるようにし、乗員がより簡便に車載機器ＶＥを操作できるようにすることができる。 According to the agent device 100A of the second embodiment described above, the same effect as that of the first embodiment is obtained, and each time a voice from an occupant is acquired, the agent device 200 communicates with the server device 200 via the network NW. Since it is not necessary, the content of the utterance can be recognized more quickly. Further, even when the vehicle M cannot communicate with the server device 200, the vehicle-mounted device VE can be controlled by the voice desired by the occupant so that the occupant can operate the vehicle-mounted device VE more easily. Can be done.

以上、本発明を実施するための形態について実施形態を用いて説明したが、本発明はこうした実施形態に何等限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々の変形及び置換を加えることができる。 Although the embodiments for carrying out the present invention have been described above using the embodiments, the present invention is not limited to these embodiments, and various modifications and substitutions are made without departing from the gist of the present invention. Can be added.

例えば、上述した実施形態では、一例として、移動体が車両であるものとして説明したがこれに限定されるものではない。例えば、移動体は、船舶や飛行物体等の他の移動体であってもよい。この場合、例えば、複数の遊覧船や複数の遊覧飛行機等のキャビン等にエージェント装置１００が配設されていればよい。さらに、これらの移動体には、移動体を運転操作する操舵手がいれば、エージェント装置１００が、運転操作をしない他の乗員（乗客）と音声対話を行うことで接客することができるため、キャビンアテンダントは乗客への他のサービスに注力することができる。また、エージェント装置１００は、タクシーやバス等に配設されてもよい。この場合、エージェント装置１００が乗客と音声対話を行うことで接客することができるため、それらの車両の運転手は運転操作に注力することができる。 For example, in the above-described embodiment, the moving body is described as a vehicle as an example, but the present invention is not limited to this. For example, the moving body may be another moving body such as a ship or a flying object. In this case, for example, the agent device 100 may be arranged in the cabins of a plurality of pleasure boats or a plurality of pleasure airplanes. Further, if the moving body has a helmsman who operates the moving body, the agent device 100 can serve the customer by performing a voice dialogue with another occupant (passenger) who does not perform the driving operation. The cabin attendant can focus on other services to the passengers. Further, the agent device 100 may be arranged in a taxi, a bus, or the like. In this case, since the agent device 100 can serve passengers by having a voice dialogue with the passengers, the drivers of those vehicles can focus on the driving operation.

１…エージェントシステム、１００…エージェント装置、１００Ａ…エージェント装置、１０２…エージェント側通信部、１０６、１０６、１０６Ａ、１０６Ｂ、１０６Ｃ、１０６Ｄ、１０６Ｅ…マイク、１０８、１０８Ａ、１０８Ｂ、１０８Ｃ、１０８Ｄ、１０８Ｅ…スピーカ、１１０、１１０Ａ、１１０Ｂ、１１０Ｃ…表示部、１２０、１２０Ａ…エージェント側制御部、１２２…取得部、１２４…音声合成部、１２６…出力制御部、１２８…通信制御部、１３０…発話者位置特定部、１３２…操作権限判定部、１３４…車載機器制御部、１５０、１５０Ａ…エージェント側記憶部、１５２…車載機器情報、１５４…操作権限位置情報、２００…サーバ装置、２０２…サーバ側通信部、２１０…サーバ側制御部、２１２、２１２Ａ…取得部、２１４、２１４Ａ…発話区間抽出部、２１６、２１６Ａ…音声認識部、２１８、２１８Ａ…意味解釈部、２２２、２２２Ａ…エージェント設定データ生成部、２２４…通信制御部、２３０…サーバ側記憶部、２３２、２３２Ａ…基本発話コマンド辞書、２３４、２３４Ａ…ユーザ発話コマンド辞書、Ｍ…車両、ＶＥ…車載機器 1 ... Agent system, 100 ... Agent device, 100A ... Agent device, 102 ... Agent side communication unit, 106, 106, 106A, 106B, 106C, 106D, 106E ... Mike, 108, 108A, 108B, 108C, 108D, 108E ... Speaker, 110, 110A, 110B, 110C ... Display unit, 120, 120A ... Agent side control unit, 122 ... Acquisition unit, 124 ... Voice synthesis unit, 126 ... Output control unit, 128 ... Communication control unit, 130 ... Speaker position Specific unit, 132 ... Operation authority determination unit, 134 ... In-vehicle device control unit, 150, 150A ... Agent side storage unit, 152 ... In-vehicle device information, 154 ... Operation authority position information, 200 ... Server device, 202 ... Server side communication unit , 210 ... Server-side control unit, 212, 212A ... Acquisition unit, 214, 214A ... Speech section extraction unit, 216, 216A ... Speech recognition unit, 218, 218A ... Semantic interpretation unit, 222, 222A ... Agent setting data generation unit, 224 ... Communication control unit, 230 ... Server side storage unit, 232, 232A ... Basic utterance command dictionary, 234, 234A ... User utterance command dictionary, M ... Vehicle, VE ... In-vehicle device

Claims

The mobile body-mounted equipment mounted on the mobile body on which the occupants board,
A voice recognition unit that recognizes a voice including a utterance command, which is a command for controlling the mobile body-mounted device and is a voice of the occupant picked up by a microphone.
A meaning interpreting unit for interpreting the meaning of the voice recognized by the voice recognition unit is provided.
When the meaning of the voice is interpreted to include an instruction to register a new utterance command, the meaning interpreting unit registers the new utterance command in the storage unit.
Agent system.

When the meaning interpreting unit interprets that the meaning of the voice includes an instruction to delete the registered utterance command, the meaning interpreting unit deletes the utterance command from the storage unit.
The agent system according to claim 1.

The storage unit in which the utterance command is registered is provided.
The utterance command dictionary in which the utterance command and the content of the control indicated by the utterance command are associated with each other and registered is stored in the storage unit.
The agent system according to claim 1 or 2.

An on-board device control unit that controls the mobile on-board device based on the meaning of the voice interpreted by the meaning interpreting unit from the voice recognized by the voice recognition unit using the utterance command dictionary is further provided.
The agent system according to claim 3.

The semantic interpretation unit interprets the voice recognized by the voice recognition unit as including an instruction to register a new utterance command, and is a basic voice which is a basic control command of the mobile device. If the new utterance command is not included in the basic utterance command dictionary, it is registered in the utterance command dictionary based on the basic utterance command dictionary in which the command and the content of control for the basic voice command are associated with each other. To do,
The agent system according to claim 3.

When the semantic interpretation unit interprets the voice recognized by the voice recognition unit as including an instruction to register a new utterance command related to a control instruction to the mobile-mounted device immediately before the occupant. The new utterance command is registered in the utterance command dictionary together with the content of the control performed by the occupant of the mobile body on the mobile body-mounted device immediately before.
The agent system according to claim 3 or 5.

Among the occupants of the moving body, a speaker position specifying unit for specifying the position of the speaker of the voice including the utterance command picked up by the microphone, and
The position information of the speaker specified by the speaker position specifying unit, the operation authority position information indicating the position of an occupant having the operation authority for the moving body-mounted device mounted on the moving body, and the meaning interpreting unit. Based on the interpreted meaning of the utterance command and the mobile-mounted device information, it is determined whether or not the speaker of the voice including the utterance command has the operation authority of the mobile-mounted device. Further equipped with an operation authority judgment unit
When the semantic interpretation unit interprets the voice recognized by the voice recognition unit as including an instruction to register a new utterance command, the speaker of the voice including the utterance command by the operation authority determination unit. However, when it is determined that the mobile body-mounted device has the operation authority, the new utterance command is registered in the utterance command dictionary together with the control content indicated by the new utterance command.
The agent system according to any one of claims 3 to 6.

Among the occupants of the moving body, a speaker position specifying unit for specifying the position of the speaker of the voice including the utterance command picked up by the microphone, and
The position information of the speaker specified by the speaker position specifying unit, the operation authority position information indicating the position of an occupant having the operation authority for the moving body-mounted device mounted on the moving body, and the meaning interpreting unit. Based on the interpreted meaning of the utterance command and the mobile-mounted device information, it is determined whether or not the speaker of the voice including the utterance command has the operation authority of the mobile-mounted device. Further equipped with an operation authority judgment unit
The on-board device control unit recognizes that the meaning of the voice indicates control of the mobile body-mounted device by the meaning interpretation unit, and the operation authority determination unit includes the utterance command. When it is determined that the speaker has the operating authority of the mobile body-mounted device, the mobile body-mounted device is controlled.
The agent system according to claim 4.

An acquisition unit that acquires a voice including a utterance command, which is a command for controlling a mobile body-mounted device mounted on the moving body and is a voice of an occupant on the moving body.
A storage unit that stores an utterance command dictionary in which the utterance command and the content of control including information on the mobile body-mounted device to be controlled indicated by the utterance command are associated with each other.
A voice recognition unit that recognizes the voice and
A meaning interpretation unit that interprets the meaning of the voice recognized by the voice recognition unit,
A generation unit that generates information corresponding to the meaning content of the voice interpreted by the meaning interpretation unit, and a generation unit.
Information processing device equipped with.

An on-board device control unit that controls the mobile body-mounted device is further provided based on the information corresponding to the meaning and content of the voice generated by the generation unit.
The information processing device according to claim 9.

A single or multiple computers in a system including a storage unit that stores an utterance command dictionary in which an utterance command and a control content including information on a mobile device to be controlled indicated by the utterance command are associated with each other. ,
The step of recognizing the voice including the utterance command spoken by the occupant on the moving body, and
Steps to interpret the meaning of the recognized speech,
With reference to the utterance command dictionary, a step of acquiring information on the mobile device to be controlled and the content of control, and
Steps to control mobile-mounted equipment to be controlled, based on the interpreted audio meaning,
When it is interpreted that the recognized voice includes an instruction to register a new utterance command, the step of registering the new utterance command in the utterance command dictionary together with the control content indicated by the new utterance command. ,
Mobile body-mounted device control method having.

After the step of interpreting the meaning of the speech
When the speech command dictionary has a control content corresponding to the voice meaning with respect to the voice meaning, the interpretation content of the voice meaning generated by the step of interpreting the voice meaning is displayed. Further having a step of referring to the speech command dictionary and replacing it with the content of standard character information control.
The mobile body-mounted device control method according to claim 11.

A single or multiple computers in a system having a storage unit that stores an utterance command dictionary in which an utterance command and a control content including information on a mobile device to be controlled indicated by the utterance command are associated with each other. Installed and
The process of recognizing the voice including the utterance command spoken by the occupant on the moving object, and
The process of interpreting the meaning of the recognized voice,
With reference to the utterance command dictionary, a process of acquiring information on the mobile body-mounted device to be controlled and the content of control, and
The process of controlling the mobile-mounted device to be controlled based on the meaning of the interpreted voice,
When it is interpreted that the recognized voice includes an instruction to register a new utterance command, the process of registering the new utterance command in the utterance command dictionary together with the control content indicated by the new utterance command. ,
A program to be executed by the computer including.