JP2022148823A

JP2022148823A - Agent device

Info

Publication number: JP2022148823A
Application number: JP2021050649A
Authority: JP
Inventors: 大樹粟野; Hiroki Awano
Original assignee: Toyota Motor Corp
Current assignee: Toyota Motor Corp
Priority date: 2021-03-24
Filing date: 2021-03-24
Publication date: 2022-10-06

Abstract

To provide an agent device configured to efficiently specify information of an object to be presented, from among a plurality of targets located around a vehicle.SOLUTION: In an agent system configured by connecting an in-vehicle unit, an agent server, and an information providing server over a network, a CPU 30A of the agent server includes: a receiving unit 200 which receives a request from a user on a vehicle; a target information acquisition unit 230 which acquires target information on a target located around the current position of the vehicle; an indoor information acquisition unit 220 which acquires indoor information, which is at least one of a captured image of a user and voice information on a speech of the user; an estimation unit 240 which estimates a line of sight or a seating position of the user using the acquired indoor information and estimates a region that the user can visually recognize; a specifying unit 240 which specifies target information included in the estimated area, as an object; and a presentation unit 260 which presents the target information relating to the specified object.SELECTED DRAWING: Figure 4

Description

本発明は、車両の乗員に対して当該車両の周辺に位置する対象の情報を提供するエージェント装置に関する。 TECHNICAL FIELD The present invention relates to an agent device that provides an occupant of a vehicle with information on objects located around the vehicle.

特許文献１には、発話者の発話に対応する対象の情報を提供する音声対話システムが開示されている。 Patent Literature 1 discloses a voice dialogue system that provides target information corresponding to an utterance of a speaker.

特開２０１８－０５４７９０号公報JP 2018-054790 A

特許文献１の音声対話システムは、ユーザ発話の音声認識の結果に基づいて、ユーザの発話が、応答として期待するユーザの発話内容であるか否かを判断し、応答として期待するユーザの発話内容である場合、ユーザの発話に対応する情報を提示する。しかしながら、提示する情報が車両の周辺に位置する店舗及び建物等の目標物（ＰＯＩ：ＰｏｉｎｔＯｆＩｎｔｅｒｅｓｔ）の情報である場合、提示する目標物が複数存在し、発話内容だけでは、提示する対象の情報を効率よく特定できるとは限らなかった。 The voice interaction system of Patent Document 1 determines whether or not the user's utterance is the content of the user's utterance expected as a response, based on the result of voice recognition of the user's utterance, and determines the content of the user's utterance expected as a response. , present the information corresponding to the user's utterance. However, when the information to be presented is information on a target (POI: Point Of Interest) such as a store or building located around the vehicle, there are a plurality of targets to be presented, and only the utterance content can identify the target to be presented. It was not always possible to identify information efficiently.

本発明は、車両の周辺に位置する複数の目標物から提示する対象の情報を効率よく特定できるエージェント装置を提供することを目的とする。 SUMMARY OF THE INVENTION It is an object of the present invention to provide an agent device capable of efficiently specifying target information to be presented from a plurality of targets located around a vehicle.

請求項１に記載のエージェント装置は、車両に乗車しているユーザの要求を受け付ける受付部と、前記車両の現在地の周辺に位置する目標物に関する目標物情報を取得する目標物情報取得部と、前記ユーザを撮像した撮像画像、及び前記ユーザが発話した音声情報の少なくとも一方である車内情報を取得する車内情報取得部と、取得した前記車内情報を用いて前記ユーザの視線、又は着座位置を推定して、前記ユーザが視認可能な領域を推定する推定部と、推定した前記領域に含まれる前記目標物情報を対象として特定する特定部と、特定した前記対象に係る目標物情報を提示する提示部と、を備えている。 1. The agent device according to claim 1, comprising: a reception unit that receives a request from a user in a vehicle; a target object information acquisition unit that acquires target information related to targets located around the current location of the vehicle; a vehicle interior information acquisition unit that acquires vehicle interior information that is at least one of a captured image of the user and voice information uttered by the user; an estimating unit for estimating an area visible to the user; a specifying unit for specifying the target information included in the estimated area as a target; and a presentation for presenting target information related to the specified target. and

請求項１に記載のエージェント装置は、受付部が車両に乗車しているユーザから対象に関する対象情報の要求を受け付けると、目標物情報取得部が車両の現在地の周辺に位置する目標物の目標物情報を取得する。また、エージェント装置は、車内情報取得部がユーザを撮像した撮像画像、及びユーザが発話した音声情報の少なくとも一方である車内情報を取得し、推定部が車内情報を用いて、当該ユーザが視認可能な領域を推定する。エージェント装置は、特定部が視認可能な領域に含まれる目標物を特定し、提示部が、ユーザが要求した対象として、目標物情報を提示する。 In the agent device according to claim 1, when the receiving unit receives a request for target information about the target from the user in the vehicle, the target object information acquisition unit acquires the target object located around the current position of the vehicle. Get information. In the agent device, the vehicle interior information acquisition unit acquires vehicle interior information that is at least one of a captured image of the user and voice information uttered by the user, and the estimation unit uses the vehicle interior information to make the information visible to the user. to estimate the region. The agent device specifies a target included in the visible area by the specifying unit, and the presentation unit presents the target object information as the target requested by the user.

なお、ユーザが視認可能な領域は、例えば、カメラによって撮像されたユーザの視線、又はユーザが着座している位置から推定される。また、ユーザの視線は、例えば、撮像画像を用いて推定され、ユーザが着座している位置は、複数のマイクによって検出された音声情報を用いて推定される。当該エージェント装置によれば、車両の周辺に位置する店舗及び建物等の複数の対象の候補から、提示する対象の情報を効率よく特定できる。 Note that the area visible to the user is estimated from, for example, the line of sight of the user captured by a camera or the position where the user is seated. Also, the line of sight of the user is estimated using, for example, a captured image, and the position where the user is seated is estimated using audio information detected by a plurality of microphones. According to the agent device, it is possible to efficiently identify the information of the target to be presented from a plurality of target candidates such as shops and buildings located around the vehicle.

本発明によれば、車両の周辺に位置する複数の目標物から提示する対象の情報を効率よく特定できる。 ADVANTAGE OF THE INVENTION According to this invention, the information of the object to present can be efficiently specified from the several target object located around a vehicle.

本実施形態に係るエージェントシステムの概略構成を示す図である。1 is a diagram showing a schematic configuration of an agent system according to this embodiment; FIG. 本実施形態に係る車両のハードウェア構成を示すブロック図である。It is a block diagram showing the hardware constitutions of the vehicle concerning this embodiment. 本実施形態に係るエージェントサーバのハードウェア構成を示すブロック図である。3 is a block diagram showing the hardware configuration of an agent server according to this embodiment; FIG. 本実施形態に係るエージェントサーバの機能構成を示すブロック図である。3 is a block diagram showing the functional configuration of an agent server according to this embodiment; FIG. 本実施形態に係る視認可能リストの内容を示す図である。It is a figure which shows the content of the visible list|wrist which concerns on this embodiment. 本実施形態に係るエージェントシステムにおいて実行される処理の流れを示すフローチャートである。4 is a flow chart showing the flow of processing executed in the agent system according to this embodiment;

本発明のエージェント装置を含むエージェントシステムについて説明する。エージェント装置は、車両の機能のうち当該車両が走行する地点の周辺に位置し、当該車両の乗員が興味を示した対象に係る情報を、当該車両の乗員に対して提示するエージェントとして機能する。 An agent system including the agent device of the present invention will be described. The agent device is positioned around a point where the vehicle travels among the functions of the vehicle, and functions as an agent that presents to the occupant of the vehicle information related to an object of interest to the occupant of the vehicle.

［第１の実施形態］
図１に示されるように、本実施形態のエージェントシステム１０は、車両１２と、エージェント装置としてのエージェントサーバ３０と、情報提供サーバ４０と、を含んで構成されている。車両１２は、車載器２０と、制御装置である複数のＥＣＵ（ＥｌｅｃｔｒｏｎｉｃＣｏｎｔｒｏｌＵｎｉｔ）２２と、を含んでいる。車載器２０、エージェントサーバ３０、及び情報提供サーバ４０は、ネットワークＮを介して相互に接続されている。 [First embodiment]
As shown in FIG. 1, the agent system 10 of this embodiment includes a vehicle 12, an agent server 30 as an agent device, and an information providing server 40. FIG. The vehicle 12 includes an onboard device 20 and a plurality of ECUs (Electronic Control Units) 22 that are control devices. The vehicle-mounted device 20, the agent server 30, and the information providing server 40 are interconnected via a network N.

情報提供サーバ４０は、店舗及び建物等の目標物の情報（以下、「目標物情報」という。）を提供するサーバである。なお、本実施形態では、複数存在する目標物情報のうち、車両１２が走行している現在地の周辺に位置し、車両１２の乗員が興味を示した対象に関する情報を対象情報として説明する。また、本実施形態に係る目標物情報、及び対象情報は、目標物の名称、目標物の画像、テキスト、目標物を示すアイコン、及び目標物の位置情報を含んでいる。 The information providing server 40 is a server that provides information on targets such as shops and buildings (hereinafter referred to as "target information"). In the present embodiment, among multiple pieces of target object information, information relating to an object that is located around the current location where the vehicle 12 is traveling and that the occupant of the vehicle 12 is interested in will be described as the object information. Further, the target information and target information according to the present embodiment include the name of the target, the image of the target, text, an icon indicating the target, and the position information of the target.

車載器２０は、各ＥＣＵ２２から送信されたＣＡＮプロトコルに基づく通信情報を取得し、エージェントサーバ３０に送信する機能を有している。 The vehicle-mounted device 20 has a function of acquiring communication information based on the CAN protocol transmitted from each ECU 22 and transmitting it to the agent server 30 .

本実施形態のＥＣＵ２２としては、車両制御ＥＣＵ、エンジンＥＣＵ、ブレーキＥＣＵ、ボデーＥＣＵ、カメラＥＣＵ、マルチメディアＥＣＵが例示される。車載器２０及び各ＥＣＵ２２は、外部バス２９を介して、相互に接続されている。 A vehicle control ECU, an engine ECU, a brake ECU, a body ECU, a camera ECU, and a multimedia ECU are exemplified as the ECU 22 of the present embodiment. The vehicle-mounted device 20 and each ECU 22 are interconnected via an external bus 29 .

図２に示されるように、車載器２０は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）２０Ａ、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）２０Ｂ、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）２０Ｃ、車内通信Ｉ／Ｆ（ＩｎｔｅｒＦａｃｅ）２０Ｄ、無線通信Ｉ／Ｆ２０Ｅ及び入出力Ｉ／Ｆ２０Ｆを含んで構成されている。ＣＰＵ２０Ａ、ＲＯＭ２０Ｂ、ＲＡＭ２０Ｃ、車内通信Ｉ／Ｆ２０Ｄ、無線通信Ｉ／Ｆ２０Ｅ及び入出力Ｉ／Ｆ２０Ｆは、内部バス２０Ｇを介して相互に通信可能に接続されている。 As shown in FIG. 2, the vehicle-mounted device 20 includes a CPU (Central Processing Unit) 20A, a ROM (Read Only Memory) 20B, a RAM (Random Access Memory) 20C, an in-vehicle communication I/F (Inter Face) 20D, wireless communication It is configured including an I/F 20E and an input/output I/F 20F. The CPU 20A, ROM 20B, RAM 20C, in-vehicle communication I/F 20D, wireless communication I/F 20E, and input/output I/F 20F are communicably connected to each other via an internal bus 20G.

ＣＰＵ２０Ａは、中央演算処理ユニットであり、各種プログラムを実行したり、各部を制御したりする。すなわち、ＣＰＵ２０Ａは、ＲＯＭ２０Ｂからプログラムを読み出し、ＲＡＭ２０Ｃを作業領域としてプログラムを実行する。 The CPU 20A is a central processing unit that executes various programs and controls each section. That is, the CPU 20A reads a program from the ROM 20B and executes the program using the RAM 20C as a work area.

ＲＯＭ２０Ｂは、各種プログラム及び各種データを記憶している。本実施形態のＲＯＭ２０Ｂには、車載器２０を制御するための制御プログラムが記憶されている。
ＲＡＭ２０Ｃは、作業領域として一時的にプログラム又はデータを記憶する。 The ROM 20B stores various programs and various data. A control program for controlling the vehicle-mounted device 20 is stored in the ROM 20B of the present embodiment.
The RAM 20C temporarily stores programs or data as a work area.

車内通信Ｉ／Ｆ２０Ｄは、ＥＣＵ２２と接続するためのインタフェースである。当該インタフェースは、ＣＡＮ（ＣｏｎｔｒｏｌｌｅｒＡｒｅａＮｅｔｗｏｒｋ）プロトコルによる通信が行われている。なお、車内通信Ｉ／Ｆ２０Ｄでは、イーサネット（登録商標）による通信規格を適用してもよい。車内通信Ｉ／Ｆ２０Ｄは、外部バス２９に対して接続されている。つまり、外部バス２９において、車載器２０及び各ＥＣＵ２２の間で送受信されるデータは、ＣＡＮプロトコルに基づく通信フレームとして送受信される。 In-vehicle communication I/F 20D is an interface for connecting with ECU 22 . The interface performs communication according to the CAN (Controller Area Network) protocol. Note that the in-vehicle communication I/F 20D may employ a communication standard based on Ethernet (registered trademark). In-vehicle communication I/F 20D is connected to external bus 29 . That is, in the external bus 29, the data transmitted/received between the onboard equipment 20 and each ECU22 is transmitted/received as a communication frame based on a CAN protocol.

無線通信Ｉ／Ｆ２０Ｅは、エージェントサーバ３０、及び情報提供サーバ４０と通信するための無線通信モジュールである。当該無線通信モジュールは、例えば、５Ｇ、ＬＴＥ、Ｗｉ－Ｆｉ（登録商標）等の通信規格が用いられる。無線通信Ｉ／Ｆ２０Ｅは、ネットワークＮに対して接続されている。 The wireless communication I/F 20E is a wireless communication module for communicating with the agent server 30 and the information providing server 40. FIG. The wireless communication module uses communication standards such as 5G, LTE, Wi-Fi (registered trademark), for example. A wireless communication I/F 20E is connected to the network N.

入出力Ｉ／Ｆ２０Ｆは、車両１２に搭載されるマイク２４、スピーカ２５、モニタ２６、カメラ２７、及びＧＰＳ装置２８と通信するためのインタフェースである。なお、マイク２４、スピーカ２５、モニタ２６、カメラ２７、及びＧＰＳ装置２８は、内部バス２０Ｇに対して直接接続されていてもよい。 The input/output I/F 20F is an interface for communicating with the microphone 24, the speaker 25, the monitor 26, the camera 27, and the GPS device 28 mounted on the vehicle 12. FIG. The microphone 24, speaker 25, monitor 26, camera 27, and GPS device 28 may be directly connected to the internal bus 20G.

マイク２４は、インストルメントパネル、センタコンソール、フロントピラー、及びダッシュボード等に設けられ、車両１２の乗員が発した音声を集音する装置である。なお、本実施形態では、マイク２４が車両１２に複数設置されている形態について説明する。 The microphone 24 is provided on an instrument panel, a center console, a front pillar, a dashboard, or the like, and is a device that collects sounds uttered by the occupant of the vehicle 12 . In addition, in this embodiment, a form in which a plurality of microphones 24 are installed in the vehicle 12 will be described.

スピーカ２５は、インストルメントパネル、センタコンソール、フロントピラー、又はダッシュボード等に設けられ、音声を出力するための装置である。また、本実施形態のスピーカ２５は、対象情報の提示の音声が出力される。 The speaker 25 is provided on an instrument panel, a center console, a front pillar, a dashboard, or the like, and is a device for outputting sound. Further, the speaker 25 of the present embodiment outputs the sound of presenting the target information.

モニタ２６は、車両１２のインストルメントパネル、ダッシュボード等に設けられ、種々の情報を表示するための液晶モニタである。また、本実施形態のモニタ２６は対象情報に含まれている画像及びテキストが表示される。 The monitor 26 is a liquid crystal monitor provided on an instrument panel, dashboard, or the like of the vehicle 12 for displaying various information. Also, the monitor 26 of this embodiment displays images and texts included in the target information.

カメラ２７は、フロントウィンドウの上部やルームミラーに隣接して設けられ、車両１２に乗車している乗員の様子を撮像するための撮像装置である。カメラ２７は、複数の撮像画像を撮像する形態について説明する。 The camera 27 is an imaging device that is provided above the front window and adjacent to the rearview mirror to capture an image of an occupant riding in the vehicle 12 . A mode in which the camera 27 captures a plurality of captured images will be described.

ＧＰＳ装置２８は車両１２の現在地を測定する装置である。ＧＰＳ装置２８は、ＧＰＳ衛星からの信号を受信する図示しないアンテナを含んでいる。なお、ＧＰＳ装置２８は、図示しないカーナビゲーションシステムを経由して車載器２０に接続されていてもよい。 GPS device 28 is a device that measures the current location of vehicle 12 . GPS device 28 includes an antenna (not shown) that receives signals from GPS satellites. The GPS device 28 may be connected to the vehicle-mounted device 20 via a car navigation system (not shown).

エージェントサーバ３０は、車載器２０から車両１２の通信情報を取得することにより、車両１２の制御状態を把握する。また、エージェントサーバ３０は、車両１２の乗員に対し、対象情報の提示を行う。 The agent server 30 grasps the control state of the vehicle 12 by acquiring the communication information of the vehicle 12 from the vehicle-mounted device 20 . Also, the agent server 30 presents target information to the passengers of the vehicle 12 .

図３に示されるように、エージェントサーバ３０は、ＣＰＵ３０Ａ、ＲＯＭ３０Ｂ、ＲＡＭ３０Ｃ、ストレージ３０Ｄ及び通信Ｉ／Ｆ３０Ｅを含んで構成されている。ＣＰＵ３０Ａ、ＲＯＭ３０Ｂ、ＲＡＭ３０Ｃ、ストレージ３０Ｄ及び通信Ｉ／Ｆ３０Ｅは、内部バス３０Ｇを介して相互に通信可能に接続されている。ＣＰＵ３０Ａ、ＲＯＭ３０Ｂ、ＲＡＭ３０Ｃ及び通信Ｉ／Ｆ３０Ｅの機能は、上述した車載器２０のＣＰＵ２０Ａ、ＲＯＭ２０Ｂ、ＲＡＭ２０Ｃ及び無線通信Ｉ／Ｆ２０Ｅと同じである。なお、通信Ｉ／Ｆ３０Ｅは有線による通信を行ってもよい。 As shown in FIG. 3, the agent server 30 includes a CPU 30A, ROM 30B, RAM 30C, storage 30D and communication I/F 30E. The CPU 30A, ROM 30B, RAM 30C, storage 30D and communication I/F 30E are communicably connected to each other via an internal bus 30G. The functions of the CPU 30A, ROM 30B, RAM 30C, and communication I/F 30E are the same as those of the CPU 20A, ROM 20B, RAM 20C, and wireless communication I/F 20E of the vehicle-mounted device 20 described above. Note that the communication I/F 30E may perform wired communication.

記憶部としてのストレージ３０Ｄは、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）又はＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）により構成され、各種プログラム及び各種データを記憶している。本実施形態のストレージ３０Ｄには、処理プログラム１００、及び視認可能リスト１１０が記憶されている。なお、ＲＯＭ３０Ｂが処理プログラム１００を記憶してもよい。 The storage 30D as a storage unit is configured by a HDD (Hard Disk Drive) or SSD (Solid State Drive), and stores various programs and various data. A processing program 100 and a visible list 110 are stored in the storage 30D of this embodiment. Note that the processing program 100 may be stored in the ROM 30B.

処理プログラム１００は、エージェントサーバ３０を制御するためのプログラムである。視認可能リスト１１０は、乗員の着座位置から乗員が視認することが可能な方向を予め定めたリストである。 A processing program 100 is a program for controlling the agent server 30 . The visible list 110 is a list that predetermines directions that can be visually recognized by the occupant from the seating position of the occupant.

図４に示されるように、本実施形態のエージェントサーバ３０では、ＣＰＵ３０Ａが、処理プログラム１００を実行することで、受付部２００、意図理解部２１０、車内情報取得部２２０、目標物情報取得部２３０、推定部２４０、特定部２５０、及び提示部２６０として機能する。 As shown in FIG. 4, in the agent server 30 of the present embodiment, the CPU 30A executes the processing program 100 to obtain a reception unit 200, an intention understanding unit 210, an in-vehicle information acquisition unit 220, and a target information acquisition unit 230. , an estimation unit 240 , an identification unit 250 , and a presentation unit 260 .

受付部２００は、車載器２０から通信情報を受け付ける機能を有している。ここで、通信情報は、複数のマイク２４によって取得された乗員が発話した音声の情報（以下、「音声情報」という。）、カメラ２７によって撮像された複数の撮像画像、及び車両１２がＧＰＳ装置２８を用いて取得した車両１２の現在地を含んでいる。また、通信情報は、図示しないシートベルトを検知するセンサによって取得された乗員の着座位置を含んでいる。 The receiving unit 200 has a function of receiving communication information from the vehicle-mounted device 20 . Here, the communication information includes the information of the voice uttered by the passenger acquired by the multiple microphones 24 (hereinafter referred to as "voice information"), the multiple captured images captured by the camera 27, and the GPS device of the vehicle 12. It contains the current location of the vehicle 12 obtained using 28. The communication information also includes the seating position of the occupant acquired by a sensor that detects the seat belt (not shown).

具体的には、受付部２００は、車両１２に乗車している乗員がマイク２４に向けて発話した音声に係る音声情報と、発話した際に撮像された撮像画像と、発話した際の車両１２の現在地と、を車載器２０から受信することで通信情報を受け付ける。 Specifically, the receiving unit 200 receives audio information related to the voice uttered into the microphone 24 by the passenger on board the vehicle 12, the captured image captured when the utterance was made, and the vehicle 12 when the utterance was made. and the current location of the vehicle from the vehicle-mounted device 20 to receive the communication information.

意図理解部２１０は、音声情報から乗員の発話の意図を解析し、理解する機能を有している。意図理解部２１０は、受付部２００において受け付けられた発話から乗員が対象に係る対象情報を要求している旨の意図を解析する。例えば、意図理解部２１０は、「あっちに見えているものの情報を教えて？」という乗員の発話を解析して、乗員が対象情報の提示を要求している旨を取得する。 The intention understanding unit 210 has a function of analyzing and understanding the intention of the occupant's speech from the voice information. The intention understanding unit 210 analyzes the intention that the occupant is requesting the target information related to the target from the speech received by the reception unit 200 . For example, the intention understanding unit 210 analyzes the passenger's utterance "Can you tell me the information about what you see over there?" and acquires that the passenger is requesting presentation of target information.

車内情報取得部２２０は、車内情報として、通信情報から撮像画像、及び音声情報を取得する機能を有している。 The in-vehicle information acquisition unit 220 has a function of acquiring captured images and audio information from communication information as in-vehicle information.

目標物情報取得部２３０は、車両１２の周辺に位置する目標物情報を取得する機能を有している。具体的には、目標物情報取得部２３０は、車両１２の位置情報を情報提供サーバ４０に送信し、情報提供サーバ４０から車両１２の現在地の周辺に位置する目標物に係る目標物情報を取得する。 The target information acquisition unit 230 has a function of acquiring target information located around the vehicle 12 . Specifically, the target object information acquisition unit 230 transmits the position information of the vehicle 12 to the information providing server 40, and acquires the target object information related to the target objects located around the current location of the vehicle 12 from the information providing server 40. do.

推定部２４０は、車両１２の乗員のうち、対象の情報を要求するために発話した乗員（以下、「発話者」という。）を特定し、発話者が視認可能な領域を推定する。具体的には、推定部２４０は、車内情報取得部２２０によって取得された撮像画像から乗員の発話している状態を検出し、発話者を推定する。また、推定部２４０は、発話者の視線の方向を検出して、発話者の視認可能な領域を推定する。 The estimating unit 240 identifies an occupant of the vehicle 12 who has spoken to request the target information (hereinafter referred to as a "speaker"), and estimates an area visible to the speaker. Specifically, the estimation unit 240 detects a state in which the passenger is speaking from the captured image acquired by the in-vehicle information acquisition unit 220, and estimates the speaker. The estimation unit 240 also detects the direction of the line of sight of the speaker and estimates the visible area of the speaker.

ここで、発話者が視認可能な領域とは、発話者の視線の方向を基軸として、発話者が視認できる水平方向の視野角の範囲である。なお、当該視野角は、３０度等の予め定められた値が設定されてもよいし、車両１２の速度に応じて設定されてもよい。 Here, the region that can be visually recognized by the speaker is a horizontal viewing angle range that can be visually recognized by the speaker, with the direction of the line of sight of the speaker as a base axis. Note that the viewing angle may be set to a predetermined value such as 30 degrees, or may be set according to the speed of the vehicle 12 .

例えば、発話者が車両１２の後部座席の進行方向の左側の窓を見ながら「あっちに見えているものの情報を教えて？」と発話した場合、推定部２４０は、複数の撮像画像から発話者が発話した唇の動き等を検出して発話者を推定する。また、推定部２４０は、車両１２の後部座席の進行方向の左側の窓を見ている発話者の視線の方向を検出して、進行方向の左側を見ている視線を基軸とした視認可能な領域を推定する。 For example, when the speaker looks at the window on the left side of the rear seat of the vehicle 12 in the traveling direction and says, "Can you tell me the information about what you see over there?" The speaker is estimated by detecting the movement of the lips when the speaker speaks. In addition, the estimation unit 240 detects the direction of the line of sight of the speaker looking at the window on the left side of the traveling direction of the rear seat of the vehicle 12, and detects the direction of the line of sight of the speaker looking at the left side of the traveling direction. Estimate the area.

推定部２４０は、撮像画像から発話者を推定できなかった場合、車内情報取得部２２０によって取得された音声情報を用いて、発話者を推定する。具体的には、複数のマイク２４から取得した音声情報を用いて、各々のマイク２４が捉えた音の大きさから発話者の着座位置を検出する。また、推定部２４０は、発話者の着座位置から視認可能な領域を推定する。 The estimation unit 240 estimates the speaker using the voice information acquired by the in-vehicle information acquisition unit 220 when the speaker cannot be estimated from the captured image. Specifically, using voice information acquired from a plurality of microphones 24 , the seating position of the speaker is detected from the volume of sound captured by each microphone 24 . In addition, the estimation unit 240 estimates the visible area from the sitting position of the speaker.

ここで、視認可能な領域は、一例として図５に示す視認可能リスト１１０に示されるように、着座位置から予め定められており、発話者の着座位置から視認可能な領域が推定される。なお、発話者の隣に乗員が着座していた場合、乗員が座っている方向は視認が不可であるとしてもよい。 Here, the visible area is predetermined from the seating position, as shown in the visible list 110 shown in FIG. 5 as an example, and the visible area is estimated from the sitting position of the speaker. In addition, when an occupant is seated next to the speaker, the direction in which the occupant is sitting may not be visible.

例えば、発話者が車両１２の後部座席の進行方向の左側に着座しており、右側に乗員が着座していた場合、推定部２４０は、図５の視認可能リスト１１０を参照して、視認可能な方向「左、右」を取得する。さらに、推定部２４０は、発話者の右側に乗員が着座していることから視認可能な方向「右」を除外して、発話者が視認可能な方向は「左」であると判定し、当該方向から視認可能な領域を推定する。なお、発話者の隣に座っている乗員は、撮像画像から検知してもよいし、図示しないシートベルトを検知するセンサを用いて検知してもよい。 For example, when the speaker is seated on the left side of the rear seat of the vehicle 12 in the traveling direction and the passenger is seated on the right side, the estimation unit 240 refers to the visible list 110 in FIG. to get the direction "left, right". Furthermore, the estimating unit 240 excludes the visible direction “right” because the passenger is seated on the right side of the speaker, and determines that the visible direction for the speaker is “left”. Estimate the visible area from the direction. The occupant sitting next to the speaker may be detected from the captured image, or may be detected using a seat belt sensor (not shown).

特定部２５０は、推定された視認可能な領域に含まれる対象を目標物情報から特定する。具体的には、特定部２５０は、視認可能な領域と、目標物の位置情報と、を比較して、視認可能な領域に含まれる目標物を発話者が提示を要求した対象であると判定して目標物を特定する。また、特定部２５０は、特定した目標物が複数存在する場合、各々の目標物の特徴をユーザに提示して、対象である目標物を特定する。 The specifying unit 250 specifies a target included in the estimated visible area from the target object information. Specifically, the identifying unit 250 compares the visible area with the position information of the target, and determines that the target included in the visible area is the target for which the speaker has requested presentation. to identify the target. Further, when there are a plurality of specified targets, the specifying unit 250 presents the characteristics of each target to the user to specify the target target.

提示部２６０は、車両１２の乗員に対象情報を提示する機能を有している。具体的には、提示部２６０は、対象情報として、特定部２５０によって特定された目標物に係る目標物情報を提示する。ここで、対象情報は、車両１２に搭載されているスピーカ２５、及びモニタ２６を用いて提示される。 The presentation unit 260 has a function of presenting target information to the occupants of the vehicle 12 . Specifically, the presentation unit 260 presents target object information related to the target specified by the specifying unit 250 as the target information. Here, the target information is presented using the speaker 25 and monitor 26 mounted on the vehicle 12 .

（制御の流れ）
本実施形態のエージェントシステム１０において実行される処理の流れについて、図６のフローチャートを用いて説明する。エージェントサーバ３０における処理は、ＣＰＵ３０Ａが、受付部２００、意図理解部２１０、車内情報取得部２２０、目標物情報取得部２３０、推定部２４０、特定部２５０、及び提示部２６０として機能することで実現される。 (control flow)
The flow of processing executed in the agent system 10 of this embodiment will be described using the flowchart of FIG. The processing in the agent server 30 is realized by the CPU 30A functioning as the reception unit 200, the intention understanding unit 210, the vehicle interior information acquisition unit 220, the target information acquisition unit 230, the estimation unit 240, the identification unit 250, and the presentation unit 260. be done.

ステップＳ１００において、ＣＰＵ３０Ａは、撮像画像、音声情報、及び車両１２の現在地を含む通信情報を取得する。 In step S100 , the CPU 30A acquires communication information including the captured image, audio information, and the current location of the vehicle 12 .

ステップＳ１０１において、ＣＰＵ３０Ａは、取得した音声情報から乗員が案内コマンドを発話した否かの判定を行う。乗員が案内コマンドを発話した場合（ステップＳ１０１：ＹＥＳ）、ＣＰＵ３０Ａは、ステップＳ１０２に移行する。一方、乗員が案内コマンドを発話していない場合（ステップＳ１０１：ＮＯ）、ＣＰＵ３０Ａは、ステップＳ１０１に移行して、通信情報を取得する。ここで、案内コマンドとは、例えば、「あっちに見えているものの情報を教えて？」という乗員が対象情報の提示を要求する音声情報である。 In step S101, the CPU 30A determines whether or not the passenger has uttered a guidance command based on the acquired voice information. If the passenger utters a guidance command (step S101: YES), the CPU 30A proceeds to step S102. On the other hand, if the passenger does not utter a guidance command (step S101: NO), the CPU 30A proceeds to step S101 and acquires communication information. Here, the guidance command is, for example, voice information requesting presentation of target information by the passenger, such as "Tell me information about what you see over there?"

ステップＳ１０２において、ＣＰＵ３０Ａは、撮像画像を用いて、発話者を特定し、発話者の着座位置を特定する。また、ＣＰＵ３０Ａは、音声情報を用いて、発話者の着座位置を特定する。 In step S102, CPU 30A identifies the speaker and the sitting position of the speaker using the captured image. Also, the CPU 30A identifies the sitting position of the speaker using the voice information.

ステップＳ１０３において、ＣＰＵ３０Ａは、撮像画像から発話者の視線を検出する。また、ＣＰＵ３０Ａは、発話者の着座位置を用いて、視認可能リスト１１０から視認可能な方向を取得する。 In step S103, the CPU 30A detects the line of sight of the speaker from the captured image. Further, the CPU 30A acquires the visible direction from the visible list 110 using the speaker's sitting position.

ステップＳ１０４において、ＣＰＵ３０Ａは、車両１２の現在地を用いて、情報提供サーバ４０から目標物情報を取得する。 In step S104 , the CPU 30A acquires target object information from the information providing server 40 using the current location of the vehicle 12 .

ステップＳ１０５において、ＣＰＵ３０Ａは、撮像画像から発話者の視線が検出できたか否かの判定を行う。撮像画像から発話者の視線検出できた場合（ステップＳ１０５：ＹＥＳ）、ＣＰＵ３０Ａは、ステップＳ１０６に移行する。一方、撮像画像から発話者の視線が検出できなかった場合（ステップＳ１０５：ＮＯ）、ＣＰＵ３０Ａは、ステップＳ１０７に移行する。 In step S105, the CPU 30A determines whether or not the line of sight of the speaker has been detected from the captured image. If the line of sight of the speaker can be detected from the captured image (step S105: YES), the CPU 30A proceeds to step S106. On the other hand, if the line of sight of the speaker cannot be detected from the captured image (step S105: NO), the CPU 30A proceeds to step S107.

ステップＳ１０６において、ＣＰＵ３０Ａは、発話者の視線から視認可能な領域を推定する。 In step S106, CPU 30A estimates a visible area from the line of sight of the speaker.

ステップＳ１０７において、ＣＰＵ３０Ａは、取得した視認可能な方向から視認可能な領域を推定する。 In step S107, the CPU 30A estimates a visible area from the acquired visible direction.

ステップＳ１０８において、ＣＰＵ３０Ａは、発話者の隣に乗員がいるか否かの判定を行う。発話者の隣に乗員がいる場合（ステップＳ１０８：ＹＥＳ）、ＣＰＵ３０Ａは、ステップＳ１０９に移行する。一方、発話者の隣に乗員がいない場合（ステップＳ１０８：ＮＯ）、ＣＰＵ３０Ａは、ステップＳ１１０に移行する。 In step S108, the CPU 30A determines whether or not there is an occupant next to the speaker. If there is an occupant next to the speaker (step S108: YES), the CPU 30A proceeds to step S109. On the other hand, if there is no passenger next to the speaker (step S108: NO), the CPU 30A proceeds to step S110.

ステップＳ１０９において、ＣＰＵ３０Ａは、推定した視認可能な領域から乗員が着座している方向を除外する。 In step S109, CPU 30A excludes the direction in which the passenger is seated from the estimated visible area.

ステップＳ１１０において、ＣＰＵ３０Ａは、目標物情報に含まれる目標物の位置情報を用いて、視認可能な領域に含まれている目標物を対象として抽出する。 In step S110, the CPU 30A extracts a target included in the visible area as a target using the target position information included in the target information.

ステップＳ１１１において、ＣＰＵ３０Ａは、対象として抽出した目標物が複数存在するか否かを判定する。ＣＰＵ３０Ａは、抽出した目標物が複数存在すると判定した場合（ステップＳ１１１：ＹＥＳ）、ステップＳ１１２に進む。一方、ＣＰＵ３０Ａは、抽出した目標物が複数存在しないと判定した場合（ステップＳ１１１：ＮＯ）、ステップＳ１１５に進む。 In step S111, the CPU 30A determines whether or not there are a plurality of targets extracted as targets. When the CPU 30A determines that there are a plurality of extracted targets (step S111: YES), the process proceeds to step S112. On the other hand, when the CPU 30A determines that there are not a plurality of extracted targets (step S111: NO), the process proceeds to step S115.

ステップＳ１１２において、ＣＰＵ３０Ａは、車載器２０に対して、抽出した目標物の特徴を送信して応答を要求する。 In step S112, the CPU 30A transmits the characteristics of the extracted target to the vehicle-mounted device 20 and requests a response.

ステップＳ１１３において、ＣＰＵ３０Ａは、車載器２０から発話者の応答を取得する。 In step S113 , the CPU 30A acquires the speaker's response from the vehicle-mounted device 20 .

ステップＳ１１４において、ＣＰＵ３０Ａは、取得した発話者の応答を解析して、抽出した目標物から発話者が要求した対象を特定する。 In step S114, CPU 30A analyzes the acquired response of the speaker and identifies the target requested by the speaker from the extracted targets.

ステップＳ１１５において、ＣＰＵ３０Ａは、車載器２０に対して、対象情報として、特定した対象に係る目標物情報を送信して提示する。 In step S115, the CPU 30A transmits and presents target object information related to the specified target to the vehicle-mounted device 20 as target information.

（第１の実施形態のまとめ）
本実施形態のエージェントサーバ３０は、車両１２の乗員による要求に応じて、推定部２４０が撮像画像、及び音声情報から発話者の視認可能な領域を推定し、特定部２５０が視認可能な領域に含まれる目標物情報を発話者が所望する対象情報として特定する。また、本実施形態のエージェントサーバ３０は、提示部２６０によって特定した対象情報が車両１２に提示され、車両１２は、スピーカ２５、及びモニタ２６を用いて、出力する。 (Summary of the first embodiment)
In the agent server 30 of this embodiment, the estimating unit 240 estimates the visible area of the speaker from the captured image and the voice information in response to a request from the passenger of the vehicle 12, and the specifying unit 250 determines the visible area. The included target information is specified as the target information desired by the speaker. In addition, the agent server 30 of the present embodiment presents the vehicle 12 with the target information specified by the presentation unit 260 , and the vehicle 12 uses the speaker 25 and the monitor 26 to output the information.

以上、本実施形態によれば、車両の周辺に位置する複数の目標物から提示する対象の情報を効率よく特定できる。 As described above, according to the present embodiment, it is possible to efficiently specify the information of the target to be presented from a plurality of targets located around the vehicle.

［備考］
なお、上記実施形態でＣＰＵ２０Ａ、ＣＰＵ３０Ａがソフトウェア（プログラム）を読み込んで実行した各種処理を、ＣＰＵ以外の各種のプロセッサが実行してもよい。この場合のプロセッサとしては、ＦＰＧＡ（Ｆｉｅｌｄ－ＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）等の製造後に回路構成を変更可能なＰＬＤ（ＰｒｏｇｒａｍｍａｂｌｅＬｏｇｉｃＤｅｖｉｃｅ）、及びＡＳＩＣ（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）等の特定の処理を実行させるために専用に設計された回路構成を有するプロセッサである専用電気回路等が例示される。また、上述した処理を、これらの各種のプロセッサのうちの１つで実行してもよいし、同種又は異種の２つ以上のプロセッサの組み合わせ（例えば、複数のＦＰＧＡ、及びＣＰＵとＦＰＧＡとの組み合わせ等）で実行してもよい。また、これらの各種のプロセッサのハードウェア的な構造は、より具体的には、半導体素子等の回路素子を組み合わせた電気回路である。 [remarks]
Note that the various processes executed by the CPU 20A and the CPU 30A by reading the software (program) in the above embodiment may be executed by various processors other than the CPU. The processor in this case is a PLD (Programmable Logic Device) whose circuit configuration can be changed after manufacturing such as an FPGA (Field-Programmable Gate Array), and an ASIC (Application Specific Integrated Circuit) for executing specific processing. A dedicated electric circuit or the like, which is a processor having a specially designed circuit configuration, is exemplified. Further, the processing described above may be executed by one of these various processors, or by a combination of two or more processors of the same or different type (for example, multiple FPGAs and a combination of CPU and FPGA). etc.). Further, the hardware structure of these various processors is, more specifically, an electric circuit in which circuit elements such as semiconductor elements are combined.

また、上記実施形態において、各プログラムはコンピュータが読み取り可能な非一時的記録媒体に予め記憶（インストール）されている態様で説明した。例えば、ＣＰＵ２０Ａにおける制御プログラムはＲＯＭ２０Ｂに予め記憶され、ＣＰＵ３０Ａにおける処理プログラム１００はストレージ３０Ｄに予め記憶されている。しかしこれに限らず、各プログラムは、ＣＤ－ＲＯＭ（ＣｏｍｐａｃｔＤｉｓｃＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＤＶＤ－ＲＯＭ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、及びＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）メモリ等の非一時的記録媒体に記録された形態で提供されてもよい。また、プログラムは、ネットワークを介して外部装置からダウンロードされる形態としてもよい。 Further, in the above embodiments, each program has been described as being pre-stored (installed) in a computer-readable non-temporary recording medium. For example, the control program for the CPU 20A is pre-stored in the ROM 20B, and the processing program 100 for the CPU 30A is pre-stored in the storage 30D. However, not limited to this, each program is recorded on non-temporary recording media such as CD-ROM (Compact Disc Read Only Memory), DVD-ROM (Digital Versatile Disc Read Only Memory), and USB (Universal Serial Bus) memory. may be provided in any form. Also, the program may be downloaded from an external device via a network.

２００受付部
２２０車内情報取得部
２３０目標物情報取得部
２４０推定部
２５０特定部
２６０提示部 200 reception unit 220 in-vehicle information acquisition unit 230 target object information acquisition unit 240 estimation unit 250 identification unit 260 presentation unit

Claims

a reception unit that receives a request from a user on board the vehicle;
a target object information acquisition unit that acquires target object information about the target objects located around the current location of the vehicle;
an in-vehicle information acquisition unit that acquires in-vehicle information that is at least one of a captured image of the user and voice information uttered by the user;
an estimating unit that identifies a line of sight or a seating position of the user using the acquired in-vehicle information and estimates an area visible to the user;
an identifying unit that identifies the target object information included in the estimated area as a target;
a presentation unit that presents the target object information related to the identified target;
agent device.