JP7407665B2

JP7407665B2 - Audio output control device and audio output control program

Info

Publication number: JP7407665B2
Application number: JP2020116585A
Authority: JP
Inventors: 博仁真瀬
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2020-07-06
Filing date: 2020-07-06
Publication date: 2024-01-04
Anticipated expiration: 2040-07-06
Also published as: JP2022014313A

Description

本開示は、音声出力制御装置および音声出力制御プログラムに関する。 The present disclosure relates to an audio output control device and an audio output control program.

従来、車両内に複数のスピーカが設置されている場合に、当該複数のスピーカに対する音声の出力制御を行う技術が知られている。
例えば、特許文献１には、運転者が後部座席の搭乗者に話しかける会話動作があると判定されたときは運転者の音声データが後部座席へ出力されるよう制御し、当該会話動作がないと判定されたときは運転者の音声データが後部座席へ出力されないように制御する会話支援装置が開示されている。 2. Description of the Related Art Conventionally, when a plurality of speakers are installed in a vehicle, a technique for controlling audio output from the plurality of speakers is known.
For example, Patent Document 1 discloses that when it is determined that the driver is making a conversational movement in which the driver talks to a passenger in the rear seat, the driver's voice data is controlled to be output to the rear seat; A conversation support device is disclosed that controls the driver's voice data so as not to be output to the rear seat when the determination is made.

特開２０１５－７１３２０号公報JP2015-71320A

複数のスピーカが設置されている車両内において複数の搭乗者が存在する場合、当該複数のスピーカから出力される音声は、ある搭乗者にとっては必要な音声であっても、他の搭乗者には必要な音声ではない場合があるという課題があった。当該音声が必要ではない搭乗者にとって、当該音声は耳障りとなる。
特許文献１に開示されているような車載装置の技術は、後部座席の搭乗者への会話動作があったか否かを判定しているが、後部座席のどの搭乗者への会話動作であったかは考慮されておらず、依然として上記課題を解決しない。 If there are multiple passengers in a vehicle with multiple speakers installed, the audio output from the multiple speakers may be necessary for one passenger, but may not be heard by other passengers. There was a problem that sometimes the sound was not what was needed. The sound is annoying to passengers who do not need the sound.
The technology of the in-vehicle device as disclosed in Patent Document 1 determines whether or not there is a conversation movement toward a passenger in the rear seat, but it does not take into account which passenger in the back seat the conversation movement is directed toward. However, the above problem still remains unsolved.

本開示は、上記のような課題を解決するためになされたもので、座席毎に対応するスピーカが設定されている車両内において、搭乗者の状況を考慮し、音声出力が必要と推定される搭乗者に対して音声が出力されるよう、音声出力の制御を行うことができる音声出力制御装置を提供することを目的とする。 The present disclosure was made to solve the above-mentioned problems, and it is estimated that audio output is necessary in consideration of the passenger situation in a vehicle where each seat is equipped with a corresponding speaker. It is an object of the present invention to provide a voice output control device that can control voice output so that voice is output to a passenger.

本開示に係る音声出力制御装置は、座席毎に対応するスピーカが設置されている車両内において音声出力を制御する音声出力制御装置であって、車両内を撮像した撮像画像を取得する撮像画像取得部と、撮像画像取得部が取得した撮像画像に基づいて、少なくとも着座位置を含む、搭乗者の状況を検出する搭乗者状況検出部と、スピーカから出力するための音声に関する音声関連情報を取得する音声関連情報取得部と、搭乗者状況検出部が検出した搭乗者の状況に関する搭乗者状況情報と、音声関連情報取得部が取得した音声関連情報とに基づいて、音声を出力する対象となる対象搭乗者および当該対象搭乗者の着座位置を判定する判定部と、スピーカのうち、判定部が判定した対象搭乗者の着座位置に対応する対象スピーカから、音声を出力させる出力制御部を備え、搭乗者状況検出部が検出した搭乗者の状況には、搭乗者の視線の方向が含まれ、音声関連情報取得部が取得する音声関連情報には、音声データと、当該音声データを出力した車載装置に関する情報とが含まれ、判定部は、搭乗者のうち、車載装置の方向に視線を向けている搭乗者を、対象搭乗者と判定することを特徴とするものである。 An audio output control device according to the present disclosure is an audio output control device that controls audio output in a vehicle in which a corresponding speaker is installed for each seat, and acquires a captured image of the inside of the vehicle. a passenger status detection unit that detects the status of the passenger, including at least a seating position, based on the captured image acquired by the captured image acquisition unit; and a passenger status detection unit that acquires audio-related information regarding the audio to be output from the speaker. A target to which audio is output based on the voice-related information acquisition unit, the passenger status information regarding the passenger status detected by the passenger status detection unit, and the voice-related information acquired by the voice-related information acquisition unit. comprising a determination unit that determines the seating position of the passenger and the target passenger; and an output control unit that outputs audio from a target speaker of the speakers that corresponds to the seating position of the target passenger determined by the determination unit, The passenger status detected by the passenger status detection unit includes the direction of the passenger's line of sight, and the voice related information acquired by the voice related information acquisition unit includes voice data and the in-vehicle device that outputs the voice data. The determining unit is characterized in that, among the passengers, the passenger who is directing his/her line of sight in the direction of the vehicle-mounted device is determined to be the target passenger .

本開示によれば、座席毎に対応するスピーカが設定されている車両内において、搭乗者の状況を考慮し、音声出力が必要と推定される搭乗者に対して音声が出力されるよう、音声出力の制御を行うことができる。 According to the present disclosure, in a vehicle in which a corresponding speaker is set for each seat, audio output is performed so that audio is output to passengers who are estimated to need audio output, taking into account the passenger's situation. Output can be controlled.

実施の形態１に係る音声出力制御装置の構成例を示す図である。1 is a diagram illustrating a configuration example of an audio output control device according to Embodiment 1. FIG. 実施の形態１に係る音声出力制御装置を搭載した車両内のイメージの一例を説明するための図である。FIG. 2 is a diagram for explaining an example of an image inside a vehicle equipped with the audio output control device according to the first embodiment. 実施の形態１に係る音声出力制御装置の動作を説明するためのフローチャートである。3 is a flowchart for explaining the operation of the audio output control device according to the first embodiment. 図４Ａ，図４Ｂは、実施の形態１に係る音声出力制御装置のハードウェア構成の一例を示す図である。4A and 4B are diagrams showing an example of the hardware configuration of the audio output control device according to the first embodiment.

以下、本開示の実施の形態について、図面を参照しながら詳細に説明する。
実施の形態１．
図１は、実施の形態１に係る音声出力制御装置１の構成例を示す図である。
音声出力制御装置１は、車両１００に搭載され、カメラ２、ＡＶ機器３、マイク４、および、スピーカ５と接続される。 Embodiments of the present disclosure will be described in detail below with reference to the drawings.
Embodiment 1.
FIG. 1 is a diagram showing a configuration example of an audio output control device 1 according to the first embodiment.
The audio output control device 1 is mounted on a vehicle 100 and connected to a camera 2, an AV device 3, a microphone 4, and a speaker 5.

カメラ２は、車両１００に設置され、少なくとも、車両１００内の各座席を含む領域を撮像する。なお、カメラ２は、いわゆる「ドライバーモニタリングシステム（ＤｒｉｖｅｒＭｏｎｉｔｏｒｉｎｇＳｙｓｔｅｍ，ＤＭＳ）」と共用のものであってもよい。
図１では、便宜上、カメラ２は１つのみ図示しているが、これは一例に過ぎない。カメラ２は、車両１００内に複数設置されるようになっていてもよい。例えば、カメラ２は、座席毎に１台設置されるようになっていてもよい。
カメラ２は、車両１００内を撮像した撮像画像を、音声出力制御装置１に出力する。 The camera 2 is installed in the vehicle 100 and images at least an area including each seat in the vehicle 100. Note that the camera 2 may be shared with a so-called "Driver Monitoring System (DMS)."
Although only one camera 2 is shown in FIG. 1 for convenience, this is only an example. A plurality of cameras 2 may be installed within the vehicle 100. For example, one camera 2 may be installed for each seat.
Camera 2 outputs a captured image of the inside of vehicle 100 to audio output control device 1 .

ＡＶ（Ａｕｄｉｏ／Ｖｉｓｕａｌ）機器３は、車両１００に設置されている車載ＡＶ装置である。ＡＶ機器３は、例えば、カーナビゲーション装置、車載用テレビ、または、車載用ラジオである。ＡＶ機器３は、音声にて操作が可能な音声制御機器も含む。図１では、便宜上、ＡＶ機器３は１つのみ図示しているが、これは一例に過ぎない。ＡＶ機器３は、車両１００内に複数設置されるようになっていてもよい。 The AV (Audio/Visual) device 3 is an in-vehicle AV device installed in the vehicle 100. The AV device 3 is, for example, a car navigation device, a vehicle-mounted television, or a vehicle-mounted radio. The AV equipment 3 also includes voice-controlled equipment that can be operated by voice. In FIG. 1, only one AV device 3 is shown for convenience, but this is only an example. A plurality of AV devices 3 may be installed in the vehicle 100.

ＡＶ機器３は、例えば、搭乗者から操作を受け付け、受け付けた操作に基づき動作する。ＡＶ機器３は、受け付けた操作が音声出力を伴う操作である場合、スピーカ５から出力するための音声に関する情報（以下「音声関連情報」という。）を出力する。
ＡＶ機器３が出力した音声関連情報に基づく音声は、音声出力制御装置１の制御に基づき、スピーカ５から出力される。音声関連情報に基づく音声は、具体的には、例えば、テレビもしくはラジオのＡＶ出力、応答メッセージ、道案内音声、または、インフォメーション音声である。実施の形態１において、インフォメーション音声とは、車両１００の搭乗者全員にあてた案内音声である。
なお、音声出力制御装置１が、ＡＶ機器３から出力された音声関連情報に基づく音声を出力しないと判定した場合には、例えば、当該音声は、スピーカ５から出力しないよう制御される。 For example, the AV device 3 receives an operation from a passenger and operates based on the received operation. If the received operation is an operation that involves audio output, the AV device 3 outputs information regarding the audio to be output from the speaker 5 (hereinafter referred to as "audio related information").
Sound based on the sound-related information output by the AV device 3 is output from the speaker 5 under the control of the sound output control device 1. Specifically, the sound based on the sound-related information is, for example, an AV output of a television or radio, a response message, a route guidance sound, or an information sound. In the first embodiment, the information voice is a guidance voice addressed to all passengers of the vehicle 100.
Note that when the audio output control device 1 determines not to output audio based on the audio related information output from the AV device 3, the audio is controlled not to be output from the speaker 5, for example.

ＡＶ機器３が出力する音声関連情報には、例えば、音声データと、当該音声データを出力したＡＶ機器３に関する情報（以下「音声出力機器情報」という。）が含まれる。音声出力機器情報は、ＡＶ機器３を特定可能な情報であればよい。また、音声出力機器情報には、例えば、音声の種別に関する情報が含まれていてもよい。実施の形態１において、音声の種別とは、例えば、ＡＶ出力、応答メッセージ、道案内、または、インフォメーション等、音声がどのような目的で出力されるかの種別をいう。当該種別は、予め設定されている。 The audio related information output by the AV device 3 includes, for example, audio data and information regarding the AV device 3 that outputs the audio data (hereinafter referred to as "audio output device information"). The audio output device information may be any information that allows the AV device 3 to be identified. Further, the audio output device information may include, for example, information regarding the type of audio. In the first embodiment, the type of audio refers to the type of purpose for which the audio is output, such as AV output, response message, route guidance, or information. The type is set in advance.

マイク４は、車両１００に設置され、例えば、車両１００内において搭乗者が発した音声を収集する。車両１００内において搭乗者が発する音声は、ある搭乗者による、他の搭乗者への発話、または、音声制御機器に対する操作指示等である。図１では、便宜上、マイク４は１つのみ図示しているが、これは一例に過ぎない。マイク４は、車両１００内に複数設置されるようになっていてもよい。例えば、マイク４は、座席毎に１つ設置されるようになっていてもよい。マイク４が複数設置される場合、どのマイク４が、どの座席からの音声を収集するかは、予め決められている。
マイク４は、収集した音声に関する音声関連情報を、音声出力制御装置１に出力する。マイク４が出力する音声は、音声出力制御装置１の制御に基づき、スピーカ５から出力される。なお、音声出力制御装置１が、マイク４が収集した音声を出力しないと判定した場合には、当該音声は、例えば、スピーカ５から出力しないよう制御される。
マイク４が出力する音声関連情報には、音声データが含まれる。 The microphone 4 is installed in the vehicle 100 and collects, for example, the voice emitted by a passenger inside the vehicle 100. The voices emitted by passengers in the vehicle 100 are utterances by a certain passenger to other passengers, operating instructions for a voice control device, or the like. In FIG. 1, only one microphone 4 is shown for convenience, but this is only an example. A plurality of microphones 4 may be installed in the vehicle 100. For example, one microphone 4 may be installed for each seat. When a plurality of microphones 4 are installed, which microphone 4 collects audio from which seat is determined in advance.
The microphone 4 outputs voice related information regarding the collected voice to the voice output control device 1. The audio output from the microphone 4 is output from the speaker 5 under the control of the audio output control device 1 . Note that when the audio output control device 1 determines not to output the audio collected by the microphone 4, the audio is controlled not to be output from the speaker 5, for example.
The audio-related information output by the microphone 4 includes audio data.

スピーカ５は、指向性スピーカ５１と減衰用スピーカ５２とを含む。指向性スピーカ５１と減衰用スピーカ５２とは、それぞれ、車両１００内の各座席と対応付けて設置される。指向性スピーカ５１によって、各座席に別々の音声が出力可能となっている。指向性スピーカ５１は、音声出力制御装置１の制御に基づいて、音声を出力する。減衰用スピーカ５２は、音声出力制御装置１の制御に基づいて、他のスピーカ５から出力される音声を減衰させるための逆位相の音声を出力する。なお、音声出力制御装置１は、指向性スピーカ５１と減衰用スピーカ５２の切替を行う。 The speaker 5 includes a directional speaker 51 and an attenuation speaker 52. Directional speaker 51 and attenuation speaker 52 are installed in association with each seat in vehicle 100, respectively. The directional speakers 51 can output different sounds to each seat. The directional speaker 51 outputs audio based on the control of the audio output control device 1. The attenuation speaker 52 outputs an opposite phase sound for attenuating the sound output from the other speakers 5 under the control of the sound output control device 1 . Note that the audio output control device 1 switches between the directional speaker 51 and the attenuation speaker 52.

ここで、図２は、実施の形態１に係る音声出力制御装置１を搭載した車両１００内のイメージの一例を説明するための図である。なお、図２において、音声出力制御装置１の図示は省略している。音声出力制御装置１は、例えば、ダッシュボード等に設置される。
図２では、上方からみた車両１００内のイメージの一例を示している。
図２では、例えば、ＡＶ機器３は、車両１００のダッシュボードに１つ設置されるものとしている。また、例えば、カメラ２は、各座席に、座席の前方から各座席を含む領域を撮像するよう設置されるものとしている。また、例えば、マイク４は、各座席に、座席毎の音声を収集するよう設置されるものとしている。また、例えば、指向性スピーカ５１は、各座席を取り囲むように、座席と対応付けて、座席毎に４つ設置されるものとしている。また、例えば、減衰用スピーカ５２は、各座席に２つずつ設置されるものとしている。 Here, FIG. 2 is a diagram for explaining an example of an image inside the vehicle 100 equipped with the audio output control device 1 according to the first embodiment. Note that in FIG. 2, illustration of the audio output control device 1 is omitted. The audio output control device 1 is installed, for example, on a dashboard or the like.
FIG. 2 shows an example of an image of the interior of the vehicle 100 viewed from above.
In FIG. 2, for example, one AV device 3 is installed on the dashboard of the vehicle 100. Further, for example, the camera 2 is installed at each seat so as to image an area including each seat from the front of the seat. Further, for example, it is assumed that the microphone 4 is installed at each seat so as to collect the voice of each seat. Further, for example, it is assumed that four directional speakers 51 are installed for each seat so as to surround each seat and correspond to the seats. Further, for example, it is assumed that two attenuation speakers 52 are installed at each seat.

音声出力制御装置１の構成例について説明する。
音声出力制御装置１は、撮像画像取得部１１、搭乗者状況検出部１２、音声関連情報取得部１３、音声解析部１４、判定部１５、および、出力制御部１６を備える。音声関連情報取得部１３は、機器関連情報取得部１３１および集音情報取得部１３２を備える。出力制御部１６は、減衰データ生成部１６１および減衰データ出力部１６２を備える。 A configuration example of the audio output control device 1 will be explained.
The audio output control device 1 includes a captured image acquisition section 11 , a passenger situation detection section 12 , an audio-related information acquisition section 13 , an audio analysis section 14 , a determination section 15 , and an output control section 16 . The audio related information acquisition unit 13 includes a device related information acquisition unit 131 and a sound collection information acquisition unit 132. The output control section 16 includes an attenuation data generation section 161 and an attenuation data output section 162.

撮像画像取得部１１は、カメラ２から、車両１００内を撮像した撮像画像を取得する。
撮像画像取得部１１は、取得した撮像画像を、搭乗者状況検出部１２に出力する。 The captured image acquisition unit 11 acquires a captured image of the inside of the vehicle 100 from the camera 2.
The captured image acquisition unit 11 outputs the acquired captured image to the passenger situation detection unit 12.

搭乗者状況検出部１２は、撮像画像取得部１１が取得した撮像画像に基づいて、搭乗者の状況を検出する。
実施の形態１において、搭乗者の状況とは、例えば、搭乗者の着座位置、搭乗者の名前、搭乗者が発話しているか否か、搭乗者がＡＶ機器３の操作を行っているか否か、搭乗者が睡眠状態であるか否か、または、搭乗者の視線の向きをいう。
具体的には、例えば、搭乗者状況検出部１２は、撮像画像に対して、既知の画像解析処理を行って、搭乗者の状況を検出する。また、搭乗者状況検出部１２は、必要に応じて、記憶部１７に記憶されている、搭乗者に関する情報（以下「ユーザ情報」という。）、または、ＡＶ機器３に関する情報（以下「機器情報」という。）を参照して、搭乗者の状況を検出する。ユーザ情報には、例えば、搭乗者の名前または顔写真等、当該搭乗者を特定可能な情報が含まれる。機器情報には、例えば、ＡＶ機器３を特定可能な情報と、ＡＶ機器３の設置位置に関する情報が含まれる。 The passenger status detection unit 12 detects the status of the passenger based on the captured image acquired by the captured image acquisition unit 11.
In the first embodiment, the status of the passenger includes, for example, the seating position of the passenger, the passenger's name, whether the passenger is speaking, and whether the passenger is operating the AV device 3. , whether the passenger is sleeping or not, or the direction of the passenger's line of sight.
Specifically, for example, the passenger situation detection unit 12 performs known image analysis processing on the captured image to detect the passenger situation. In addition, the passenger status detection unit 12 detects information related to the passenger (hereinafter referred to as “user information”) stored in the storage unit 17 or information regarding the AV equipment 3 (hereinafter referred to as “device information”) as necessary. ) to detect the passenger's status. The user information includes information that can identify the passenger, such as the passenger's name or face photograph. The device information includes, for example, information that allows identification of the AV device 3 and information regarding the installation position of the AV device 3.

搭乗者状況検出部１２が搭乗者の状況を検出する方法について、具体例を挙げて具体的に説明する。
例えば、搭乗者状況検出部１２は、撮像画像に対して既知の画像解析処理を行って、車両１００内に存在する搭乗者の顔を検出するとともに、各搭乗者の着座位置を検出する。なお、カメラ２の設置位置および画角は予めわかっているので、搭乗者状況検出部１２は、搭乗者の顔を検出すると、当該搭乗者がどの座席に着座しているか特定できる。例えば、搭乗者状況検出部１２は、記憶部１７に記憶されているユーザ情報と突き合わせることで、当該搭乗者の名前も特定することができる。
また、例えば、搭乗者状況検出部１２は、撮像画像に対して既知の画像解析処理を行って、搭乗者が睡眠状態であることを検出する。搭乗者状況検出部１２は、例えば、搭乗者が目を閉じた状態が予め設定された時間継続すれば、当該搭乗者が睡眠状態であると検出する。 The method by which the passenger status detection unit 12 detects the status of the passenger will be specifically explained using a specific example.
For example, the passenger situation detection unit 12 performs a known image analysis process on the captured image to detect the faces of the passengers present in the vehicle 100 and detect the seating positions of each passenger. Note that since the installation position and viewing angle of the camera 2 are known in advance, the passenger situation detection unit 12 can identify which seat the passenger is seated in when the passenger's face is detected. For example, the passenger status detection unit 12 can also identify the name of the passenger by comparing the user information stored in the storage unit 17.
Further, for example, the passenger status detection unit 12 performs known image analysis processing on the captured image to detect that the passenger is in a sleeping state. For example, the passenger status detection unit 12 detects that the passenger is in a sleeping state if the passenger continues to close his/her eyes for a preset time.

また、例えば、搭乗者状況検出部１２は、撮像画像に対して既知の画像解析処理を行って、搭乗者がＡＶ機器３の表示部の操作を行っている状態であること、または、搭乗者がリモコン操作を行っている状態であること等、搭乗者によるＡＶ機器３等の操作状況を検出する。
また、例えば、搭乗者状況検出部１２は、撮像画像に対して既知の画像解析処理を行って、搭乗者の視線方向を検出する。その際、搭乗者状況検出部１２は、例えば、カメラ２の設置位置と、カメラ２の画角と、検出した搭乗者の視線方向と、記憶部１７に記憶されている機器情報とから、搭乗者の視線の先に設置されているＡＶ機器３を特定することができる。また、例えば、搭乗者状況検出部１２は、車両１００に搭載されている、図示しないＧＰＳ（ＧｌｏｂａｌＰｏｓｉｔｉｏｎｉｎｇＳｙｓｔｅｍ）から取得した車両１００の現在位置、図示しない地図データベースから取得した地図情報、カメラ２の設置位置、および、カメラ２の画角に基づいて、搭乗者が視線を向けている先の車両１００外の地点の位置情報を算出することもできる。 For example, the passenger status detection unit 12 performs a known image analysis process on the captured image to detect that the passenger is operating the display unit of the AV device 3, or that the passenger is operating the display unit of the AV device 3. The operation status of the AV equipment 3 etc. by the passenger is detected, such as that the passenger is operating the remote control.
Further, for example, the passenger situation detection unit 12 performs known image analysis processing on the captured image to detect the direction of the passenger's line of sight. At this time, the passenger situation detection unit 12 detects the boarding situation based on, for example, the installation position of the camera 2, the angle of view of the camera 2, the detected direction of the passenger's line of sight, and the device information stored in the storage unit 17. It is possible to specify the AV equipment 3 installed in front of the person's line of sight. Further, for example, the passenger situation detection unit 12 may detect the current position of the vehicle 100 obtained from a GPS (Global Positioning System) (not shown) installed in the vehicle 100, map information obtained from a map database (not shown), and the information of the camera 2. Based on the installation position and the angle of view of the camera 2, it is also possible to calculate the positional information of a point outside the vehicle 100 toward which the passenger is directing his/her line of sight.

また、搭乗者状況検出部１２は、撮像画像に対して既知の画像解析処理を行って、搭乗者が発話状態であることを検出する。搭乗者状況検出部１２は、例えば、搭乗者が口を開けている状態であれば、当該搭乗者は発話状態であると検出する。例えば、搭乗者状況検出部１２は、搭乗者が発話状態であることを検出した場合、搭乗者による発話が、音声制御機器を音声操作するための発話であるのか、他の搭乗者等、音声制御機器以外の人への発話であるのかを、判定することもできる。例えば、搭乗者状況検出部１２は、搭乗者の視線方向が音声制御機器を向いている、または、搭乗者の視線方向が音声制御機器への操作指示を入力するためのマイク４を向いている場合、搭乗者による発話は、音声制御機器を操作するための発話であると判定する。また、搭乗者状況検出部１２は、搭乗者の口の動きによって、搭乗者による発話は、音声制御機器を操作するための発話であると判定してもよい。搭乗者状況検出部１２は、搭乗者の口の動きが、予め設定されている、音声制御機器を操作するための発話を行った場合の口の動きであれば、搭乗者が音声制御機器を操作するための会話を行ったと判定する。 Further, the passenger status detection unit 12 performs known image analysis processing on the captured image to detect that the passenger is in a speaking state. For example, if the passenger is in a state where the passenger's mouth is open, the passenger status detection unit 12 detects that the passenger is in a speaking state. For example, when the passenger status detection unit 12 detects that the passenger is in a speaking state, the passenger status detection unit 12 determines whether the passenger's utterance is a utterance for operating a voice control device by voice, whether other passengers, etc. It is also possible to determine whether the utterance is directed to a person other than the control device. For example, the passenger situation detection unit 12 detects that the passenger's line of sight is facing the voice control device, or the passenger's line of sight is facing the microphone 4 for inputting operation instructions to the voice control device. In this case, it is determined that the utterance by the passenger is an utterance for operating the voice control device. Further, the passenger status detection unit 12 may determine, based on the movement of the passenger's mouth, that the utterance by the passenger is an utterance for operating the voice control device. The passenger situation detection unit 12 detects whether the passenger operates the voice control device if the passenger's mouth movement is a preset mouth movement when making a utterance for operating the voice control device. It is determined that a conversation was conducted for the purpose of operation.

なお、上述した例は一例に過ぎない。搭乗者状況検出部１２は、搭乗者の状況として、上述したような状況以外の状況を検出するようになっていてもよいし、上述したような状況のうちのいくつかの状況を検出するようになっていてもよい。ただし、搭乗者状況検出部１２は、搭乗者の状況として、少なくとも、搭乗者の着座位置を検出するものとする。
搭乗者状況検出部１２は、検出した搭乗者の状況に関する情報（以下「搭乗者状況情報」という。）を、判定部１５に出力する。搭乗者状況情報は、搭乗者毎に、当該搭乗者の状況として少なくとも着座位置が対応づけられた情報である。 Note that the example described above is just an example. The passenger situation detection unit 12 may be configured to detect situations other than the above-mentioned situations as the passenger situation, or may be configured to detect some of the above-mentioned situations. It may be . However, the passenger status detection unit 12 is assumed to detect at least the seating position of the passenger as the status of the passenger.
The passenger status detection unit 12 outputs information regarding the detected status of the passenger (hereinafter referred to as “passenger status information”) to the determination unit 15. The passenger situation information is information in which each passenger is associated with at least a seating position as the situation of the passenger.

音声関連情報取得部１３は、ＡＶ機器３もしくはマイク４、または、ＡＶ機器３およびマイク４の両方から、音声関連情報を取得する。
具体的には、音声関連情報取得部１３の機器関連情報取得部１３１は、ＡＶ機器３から、音声関連情報を取得する。
機器関連情報取得部１３１がＡＶ機器３から取得する音声関連情報には、例えば、ＡＶ機器３を特定可能な情報と、音声データが含まれている。音声関連情報において、ＡＶ機器３を特定可能な情報と音声データとは関連付けられている。
具体例を挙げると、例えば、ＡＶ機器３が搭乗者に対してアラートを出力しようとした場合、機器関連情報取得部１３１は、ＡＶ機器３からアラートを出力するための音声データと、当該アラートを出力したＡＶ機器３を特定可能な情報とを音声関連情報として取得する。 The audio-related information acquisition unit 13 acquires audio-related information from the AV device 3 or the microphone 4, or from both the AV device 3 and the microphone 4.
Specifically, the device-related information acquisition unit 131 of the audio-related information acquisition unit 13 acquires audio-related information from the AV device 3.
The audio-related information that the device-related information acquisition unit 131 acquires from the AV device 3 includes, for example, information that can identify the AV device 3 and audio data. In the audio related information, information that can identify the AV device 3 and audio data are associated.
To give a specific example, when the AV device 3 attempts to output an alert to the passenger, the device-related information acquisition unit 131 acquires audio data for outputting the alert from the AV device 3 and the alert. Information that allows identification of the output AV device 3 is acquired as audio-related information.

また、例えば、ＡＶ機器３がカーナビゲーション装置であり、当該カーナビゲーション装置が搭乗者からのタッチパネル操作を受け付けたとすると、機器関連情報取得部１３１は、カーナビゲーション装置から、「はい」等、当該タッチパネル操作に応答する応答メッセージを示す音声データと、当該音声データを出力したカーナビゲーション装置を特定可能な情報とを、音声関連情報として取得する。
また、例えば、ＡＶ機器３が車載用テレビであり、当該車載用テレビが、搭乗者からのリモコン操作を受け付けてＯＮ状態となったとすると、機器関連情報取得部１３１は、当該車載用テレビから、テレビ放送のＡＶ音声データと、当該ＡＶ音声データを出力している車載用テレビを特定可能な情報とを、音声関連情報として取得する。 Further, for example, if the AV device 3 is a car navigation device and the car navigation device receives a touch panel operation from a passenger, the device-related information acquisition unit 131 receives a message such as “Yes” from the car navigation device on the touch panel. Audio data indicating a response message in response to an operation and information that allows identification of the car navigation device that outputs the audio data are acquired as audio-related information.
Further, for example, if the AV device 3 is an in-vehicle television and the in-vehicle television receives a remote control operation from a passenger and is turned on, the device-related information acquisition unit 131 acquires information from the in-vehicle television. AV audio data of television broadcasting and information that allows identification of the in-vehicle television that outputs the AV audio data are acquired as audio-related information.

また、機器関連情報取得部１３１がＡＶ機器３から取得する音声関連情報には、例えば、音声データの種別に関する情報が含まれてもよい。具体例を挙げると、例えば、カーナビゲーション装置が道案内を行っているとすると、機器関連情報取得部１３１は、当該カーナビゲーション装置から、「次の信号を右折してください」等、道案内のための音声データと、当該音声データの種別が「道案内」を示す種別である旨の情報とを、音声関連情報として取得する。 Furthermore, the audio-related information that the device-related information acquisition unit 131 acquires from the AV device 3 may include, for example, information regarding the type of audio data. To give a specific example, if a car navigation device is providing directions, the device-related information acquisition unit 131 may receive directions from the car navigation device, such as “Please turn right at the next traffic light.” and information indicating that the type of the audio data is a type indicating "guidance" as audio-related information.

また、機器関連情報取得部１３１がＡＶ機器３から取得する音声関連情報には、例えば、音声データが含まれ、当該音声データは、地点を案内するための音声データであってもよい。具体例を挙げると、例えば、カーナビゲーション装置が車両１００の周辺の場所の案内を行っているとすると、機器関連情報取得部１３１は、当該カーナビゲーション装置から、「右手に見える△△国立公園は紅葉で有名です」等、地点を案内する音声データを、音声関連情報として取得する。機器関連情報取得部１３１は、当該カーナビゲーション装置から、案内する地点の位置情報を、当該地点を案内する音声データとともに、音声関連情報として取得してもよい。 Further, the audio-related information that the device-related information acquisition unit 131 acquires from the AV device 3 includes, for example, audio data, and the audio data may be audio data for guiding a location. To give a specific example, for example, if a car navigation device is providing guidance to places around the vehicle 100, the device-related information acquisition unit 131 receives information from the car navigation device such as “The △△ national park visible on the right is Acquire audio data that guides you to a location, such as "It's famous for its autumn leaves," as audio-related information. The device-related information acquisition unit 131 may obtain, from the car navigation device, the position information of the point to be guided, together with the voice data for guiding the point, as voice-related information.

音声関連情報取得部１３の集音情報取得部１３２は、マイク４から音声関連情報を取得する。
集音情報取得部１３２が取得する音声関連情報には、例えば、搭乗者による、他の搭乗者への発話音声が含まれる。具体例を挙げると、例えば、集音情報取得部１３２は、マイク４から、「〇〇さん、・・・」または「みなさん、・・・」等の発話音声データを、音声関連情報として取得する。
また、集音情報取得部１３２が取得する音声関連情報には、例えば、搭乗者による、音声制御機器を音声操作するための発話音声データが含まれていてもよい。具体例を挙げると、例えば、集音情報取得部１３２は、マイク４から、「ボリュームを上げて」等の発話音声データを、音声関連情報として取得する。
音声関連情報取得部１３は、取得した音声関連情報を、音声解析部１４または判定部１５に出力する。具体的には、機器関連情報取得部１３１は、取得した音声関連情報を、判定部１５に出力する。集音情報取得部１３２は、取得した音声関連情報を、音声解析部１４に出力する。 The collected sound information acquisition unit 132 of the audio-related information acquisition unit 13 acquires audio-related information from the microphone 4 .
The audio-related information acquired by the sound collection information acquisition unit 132 includes, for example, audio uttered by a passenger to another passenger. To give a specific example, for example, the sound collection information acquisition unit 132 acquires uttered voice data such as “Mr. .
Further, the audio-related information acquired by the sound collection information acquisition unit 132 may include, for example, voice data uttered by the passenger to operate the voice control device by voice. To give a specific example, for example, the sound collection information acquisition unit 132 acquires uttered voice data such as "Turn up the volume" from the microphone 4 as the voice related information.
The audio-related information acquisition unit 13 outputs the acquired audio-related information to the audio analysis unit 14 or the determination unit 15. Specifically, the device-related information acquisition unit 131 outputs the acquired audio-related information to the determination unit 15. The collected sound information acquisition unit 132 outputs the acquired audio related information to the audio analysis unit 14.

音声解析部１４は、集音情報取得部１３２が取得した音声関連情報に基づき、集音情報取得部１３２が取得した発話音声データの発話内容を解析する。音声解析部１４は、音声認識辞書を用いる等、既存の音声認識技術を用いて、発話内容を解析するようにすればよい。
音声解析部１４は、発話内容の解析結果を付与した音声関連情報を、判定部１５に出力する。 The voice analysis unit 14 analyzes the utterance content of the uttered voice data acquired by the voice collection information acquisition unit 132 based on the voice related information acquired by the voice collection information acquisition unit 132. The speech analysis unit 14 may use existing speech recognition technology, such as a speech recognition dictionary, to analyze the content of the utterance.
The speech analysis section 14 outputs speech-related information to which the analysis result of the utterance content has been added to the determination section 15 .

判定部１５は、搭乗者状況検出部１２から出力された搭乗者状況情報と、音声関連情報取得部１３が取得した音声関連情報とに基づいて、車両１００の搭乗者のうち、音声関連情報取得部１３が取得した音声を出力する対象となる搭乗者（以下「対象搭乗者」という。）、および、当該対象搭乗者の着座位置を判定する。そして、判定部１５は、判定した対象搭乗者に関する情報（以下「対象搭乗者情報」という。）を、出力制御部１６に出力する。このとき、判定部１５は、対象搭乗者に対して出力すべき音声データを、対象搭乗者情報と対応付けて、出力制御部１６に出力するようにする。 The determination unit 15 acquires voice-related information from among the passengers of the vehicle 100 based on the passenger status information output from the passenger status detection unit 12 and the voice-related information acquired by the voice-related information acquisition unit 13. The unit 13 determines the passenger to whom the acquired voice is output (hereinafter referred to as "target passenger") and the seating position of the target passenger. Then, the determination unit 15 outputs information regarding the determined target passenger (hereinafter referred to as “target passenger information”) to the output control unit 16. At this time, the determination unit 15 outputs the audio data to be output to the target passenger to the output control unit 16 in association with the target passenger information.

具体的には、例えば、判定部１５は、音声関連情報取得部１３の機器関連情報取得部１３１がＡＶ機器３から取得した音声関連情報、または、音声関連情報取得部１３の集音情報取得部１３２がマイク４から取得し、音声解析部１４が発話内容の解析を行った後の音声関連情報と、搭乗者状況検出部１２から出力された搭乗者状況情報とに基づいて、対象搭乗者を判定する。以下に、いくつか具体例を挙げて、判定部１５による、音声関連情報と搭乗者に基づく対象搭乗者の判定方法を説明する。 Specifically, for example, the determination unit 15 uses the audio-related information acquired from the AV equipment 3 by the device-related information acquisition unit 131 of the audio-related information acquisition unit 13, or the sound collection information acquisition unit of the audio-related information acquisition unit 13. 132 from the microphone 4 and the speech analysis section 14 analyzes the speech content, and the passenger situation information output from the passenger situation detection section 12, identifies the target passenger. judge. Hereinafter, a method for determining a target passenger based on the voice-related information and the passenger by the determining unit 15 will be described with reference to some specific examples.

（１）音声関連情報を出力したＡＶ機器３を操作している搭乗者を対象搭乗者と判定する例
例えば、判定部１５は、音声関連情報と搭乗者状況情報とに基づき、音声関連情報を出力したＡＶ機器３を操作している搭乗者を、対象搭乗者と判定する。この場合、音声関連情報には、音声データと、当該音声データを出力したＡＶ機器３に関する情報が含まれている。また、搭乗者状況情報には、搭乗者によるＡＶ機器３の操作状況が含まれている。具体例を挙げると、例えば、ＡＶ機器３が車載用テレビであったとし、ある搭乗者がリモコン操作によって車載用テレビを操作したとする。この場合、判定部１５は、ある搭乗者がリモコン操作を行った旨の搭乗者状況情報を取得する。また、判定部１５は、車載用テレビから、リモコン操作が行われたことを示す情報を含む音声関連情報を取得する。判定部１５は、音声関連情報と搭乗者状況情報とに基づき、ある搭乗者が車載用テレビを操作したと判定することができる。そして、判定部１５は、ある搭乗者を対象搭乗者と判定する。
判定部１５は、対象搭乗者を判定すると、搭乗者状況情報に基づき、当該対象搭乗者の着座位置を判定する。判定部１５は、ＡＶ機器３、上述の例でいうと車載用テレビ、を操作している搭乗者を対象搭乗者とする旨の対象搭乗者情報を、出力制御部１６に出力する。このとき、判定部１５は、対象搭乗者の着座位置に関する情報を、対象搭乗者情報に含める。
判定部１５は、対象搭乗者に対して出力すべき、車載用テレビからの音声データを、対象搭乗者情報と対応付けて、出力制御部１６に出力する。なお、音声データは、音声関連情報に含まれている音声データである。 (1) Example of determining that the passenger who is operating the AV device 3 that outputs the audio-related information is the target passenger. For example, the determination unit 15 determines the audio-related information based on the audio-related information and the passenger status information. The passenger operating the output AV device 3 is determined to be the target passenger. In this case, the audio related information includes audio data and information regarding the AV device 3 that outputs the audio data. Further, the passenger status information includes the operation status of the AV equipment 3 by the passenger. To give a specific example, assume that the AV device 3 is an in-vehicle television, and a certain passenger operates the in-vehicle television using a remote control. In this case, the determination unit 15 acquires passenger status information indicating that a certain passenger has operated the remote control. Further, the determination unit 15 acquires audio-related information including information indicating that a remote control operation has been performed from the in-vehicle television. The determination unit 15 can determine that a certain passenger has operated the in-vehicle television based on the audio related information and the passenger status information. Then, the determination unit 15 determines that a certain passenger is the target passenger.
After determining the target passenger, the determination unit 15 determines the seating position of the target passenger based on the passenger status information. The determination unit 15 outputs to the output control unit 16 target passenger information indicating that the passenger operating the AV device 3, in the above example, the in-vehicle television, is the target passenger. At this time, the determination unit 15 includes information regarding the seating position of the target passenger in the target passenger information.
The determination unit 15 outputs the audio data from the in-vehicle television, which should be output to the target passenger, to the output control unit 16 in association with the target passenger information. Note that the audio data is audio data included in the audio related information.

（２）音声関連情報を出力したＡＶ機器３の方向に視線を向けている搭乗者を対象搭乗者と判定する例
例えば、判定部１５は、音声関連情報と搭乗者状況情報とに基づき、音声関連情報を出力したＡＶ機器３の方向に視線を向けている搭乗者を、対象搭乗者と判定してもよい。この場合、音声関連情報には、音声データと、当該音声データを出力したＡＶ機器３に関する情報が含まれている。また、搭乗者状況情報には、搭乗者の視線および当該視線が向けられているＡＶ機器３に関する情報が含まれている。具体例を挙げると、例えば、ＡＶ機器３が車載用テレビであったとし、当該車載用テレビからＡＶ出力データが音声関連情報として出力されたとする。この場合、判定部１５は、搭乗者状況情報に基づき、車載用テレビの表示部の方向に視線を向けている搭乗者を、対象搭乗者と判定する。
判定部１５は、対象搭乗者を判定すると、搭乗者状況情報に基づき、当該対象搭乗者の着座位置を判定する。判定部１５は、ＡＶ機器３、上述の例でいうと車載用テレビ、の方向に視線を向けている搭乗者を対象搭乗者とする旨の対象搭乗者情報を、出力制御部１６に出力する。このとき、判定部１５は、対象搭乗者の着座位置に関する情報を、対象搭乗者情報に含める。
判定部１５は、対象搭乗者に対して出力すべき、車載用テレビからの音声データを、対象搭乗者情報と対応付けて、出力制御部１６に出力する。なお、音声データは、音声関連情報に含まれている音声データである。 (2) Example of determining that a passenger who is looking in the direction of the AV device 3 that outputs audio-related information is a target passenger. A passenger who is directing his/her line of sight in the direction of the AV device 3 that outputs the related information may be determined to be the target passenger. In this case, the audio related information includes audio data and information regarding the AV device 3 that outputs the audio data. Further, the passenger status information includes information regarding the passenger's line of sight and the AV device 3 to which the line of sight is directed. To give a specific example, let us assume that the AV device 3 is an in-vehicle television, and that the in-vehicle television outputs AV output data as audio-related information. In this case, the determination unit 15 determines, based on the passenger status information, the passenger who is directing his/her line of sight toward the display section of the in-vehicle television to be the target passenger.
After determining the target passenger, the determination unit 15 determines the seating position of the target passenger based on the passenger status information. The determination unit 15 outputs target passenger information to the output control unit 16 indicating that the passenger who is looking in the direction of the AV device 3, in the above example, the in-vehicle television, is the target passenger. . At this time, the determination unit 15 includes information regarding the seating position of the target passenger in the target passenger information.
The determination unit 15 outputs the audio data from the in-vehicle television, which should be output to the target passenger, to the output control unit 16 in association with the target passenger information. Note that the audio data is audio data included in the audio related information.

（３）音声の種別にマッチする状況にある搭乗者を対象搭乗者と判定する例
例えば、判定部１５は、音声関連情報と搭乗者状況情報とに基づき、音声データの種別に応じて、当該音声の種別にマッチする状況にある搭乗者を対象搭乗者と判定するようにしてもよい。この場合、音声関連情報には、音声データと、当該音声データの種別に関する情報が含まれている。なお、どの音声データの種別に対して、どのような状況を、マッチする状況とするかは、予め決められている。
例えば、音声関連情報に含まれている音声データの種別が「道案内」を示す種別であったとする。この場合、判定部１５は、搭乗者状況情報に基づき、例えば、運転者と、カーナビゲーション装置に表示されている地図に視線を向けている搭乗者とを、対象搭乗者と判定する。なお、この場合、搭乗者状況情報には、搭乗者の視線および当該視線が向けられているＡＶ機器３に関する情報が含まれているものとする。判定部１５は、搭乗者状況情報に基づいて、運転者、および、カーナビゲーション装置に表示されている地図に視線を向けている搭乗者を判定する。
判定部１５は、対象搭乗者を判定すると、搭乗者状況情報に基づき、当該対象搭乗者の着座位置を判定する。判定部１５は、運転者、および、カーナビゲーション装置に表示されている地図に視線を向けている搭乗者を対象搭乗者とする旨の対象搭乗者情報を、出力制御部１６に出力する。このとき、判定部１５は、対象搭乗者の着座位置に関する情報を、対象搭乗者情報に含める。
判定部１５は、対象搭乗者に対して出力すべき、カーナビゲーション装置からの音声データを、対象搭乗者情報と対応付けて、出力制御部１６に出力する。なお、音声データは、音声関連情報に含まれている音声データである。 (3) Example of determining that a passenger whose situation matches the type of voice is the target passenger. For example, the determination unit 15 determines the type of voice data to A passenger whose situation matches the type of voice may be determined to be the target passenger. In this case, the audio related information includes audio data and information regarding the type of the audio data. It should be noted that it is determined in advance which situation is to be matched with which audio data type.
For example, assume that the type of audio data included in the audio-related information is a type indicating "guidance." In this case, the determination unit 15 determines, for example, the driver and the passenger who is looking at the map displayed on the car navigation device as the target passengers based on the passenger status information. In this case, it is assumed that the passenger status information includes information regarding the passenger's line of sight and the AV device 3 to which the line of sight is directed. The determination unit 15 determines the driver and the passenger who is directing his/her gaze toward the map displayed on the car navigation device based on the passenger status information.
After determining the target passenger, the determination unit 15 determines the seating position of the target passenger based on the passenger status information. The determination unit 15 outputs to the output control unit 16 target passenger information indicating that the driver and the passenger who is looking at the map displayed on the car navigation device are the target passengers. At this time, the determination unit 15 includes information regarding the seating position of the target passenger in the target passenger information.
The determination unit 15 outputs the audio data from the car navigation device, which should be output to the target passenger, to the output control unit 16 in association with the target passenger information. Note that the audio data is audio data included in the audio related information.

例えば、音声関連情報に含まれている音声データの種別が「インフォメーション」を示す種別の場合は、判定部１５は、搭乗者のうち、覚醒している搭乗者、言い換えれば、睡眠状態でない搭乗者を、対象搭乗者と判定する。なお、この場合、搭乗者状況情報には、搭乗者が睡眠状態であることを示す情報が含まれているものとする。判定部１５は、搭乗者状況情報に基づいて、覚醒している搭乗者を判定する。
判定部１５は、対象搭乗者を判定すると、搭乗者状況情報に基づき、当該対象搭乗者の着座位置を判定する。判定部１５は、覚醒している搭乗者を対象搭乗者とする旨の対象搭乗者情報を、出力制御部１６に出力する。このとき、判定部１５は、対象搭乗者の着座位置に関する情報を、対象搭乗者情報に含める。 For example, if the type of audio data included in the audio-related information is a type indicating "information," the determination unit 15 determines whether the passenger is awake among the passengers, in other words, if the passenger is not in a sleeping state. is determined to be the target passenger. Note that in this case, the passenger status information includes information indicating that the passenger is in a sleeping state. The determining unit 15 determines whether the passenger is awake based on the passenger status information.
After determining the target passenger, the determination unit 15 determines the seating position of the target passenger based on the passenger status information. The determination unit 15 outputs target passenger information indicating that the awake passenger is the target passenger to the output control unit 16. At this time, the determination unit 15 includes information regarding the seating position of the target passenger in the target passenger information.

例えば、音声関連情報に含まれている音声データの種別が「アラート」を示す種別の場合は、判定部１５は、搭乗者全員を対象搭乗者と判定する。判定部１５は、搭乗者全員を対象搭乗者とする旨の対象搭乗者情報を、出力制御部１６に出力する。このとき、判定部１５は、対象搭乗者の着座位置に関する情報を、対象搭乗者情報に含める。 For example, if the type of audio data included in the audio-related information is a type indicating "alert", the determining unit 15 determines that all passengers are target passengers. The determination unit 15 outputs target passenger information indicating that all passengers are target passengers to the output control unit 16. At this time, the determination unit 15 includes information regarding the seating position of the target passenger in the target passenger information.

（４）音声の内容にマッチする状況にある搭乗者を対象搭乗者と判定する例
例えば、判定部１５は、音声関連情報と搭乗者状況情報とに基づき、音声データが地点を案内する音声データである場合、視線が当該地点の方向を向いている搭乗者を、対象搭乗者と判定する。具体例を挙げると、例えば、ＡＶ機器３はカーナビゲーション装置であったとし、音声関連情報に含まれる音声が「右手に見える△△国立公園は紅葉で有名です」のように、△△国立公園という地点を案内する音声データであったとする。この場合、判定部１５は、視線が当該△△国立公園の方向を向いている搭乗者を、対象搭乗者と判定する。なお、この場合、音声関連情報には、△△国立公園の位置に関する情報が含まれているものとする。また、搭乗者状況情報には、視線の先の位置に関する情報が含まれているものとする。判定部１５は、視線の先の位置と、△△国立公園の位置とをマッチングさせることで、視線が△△国立公園の方向を向いている搭乗者を判定できる。
判定部１５は、対象搭乗者を判定すると、搭乗者状況情報に基づき、当該対象搭乗者の着座位置を判定する。判定部１５は、音声データによって案内される地点の方向に視線を向けている搭乗者を対象搭乗者とする旨の対象搭乗者情報を、出力制御部１６に出力する。このとき、判定部１５は、対象搭乗者の着座位置に関する情報を、対象搭乗者情報に含める。
判定部１５は、対象搭乗者に対して出力すべき、カーナビゲーション装置からの音声データを、対象搭乗者情報と対応付けて、出力制御部１６に出力する。なお、音声データは、音声関連情報に含まれている音声データである。 (4) Example of determining that a passenger whose situation matches the content of the voice is the target passenger. For example, the determination unit 15 determines, based on the voice-related information and the passenger situation information, that the voice data is the voice data that guides the point. If so, the passenger whose line of sight is facing the direction of the point is determined to be the target passenger. To give a specific example, let's say that the AV device 3 is a car navigation device, and the voice included in the audio-related information is ``The △△ National Park on your right is famous for its autumn leaves.'' Suppose that the audio data is a guide to a point called . In this case, the determination unit 15 determines that the passenger whose line of sight is facing the direction of the ΔΔ national park is the target passenger. In this case, it is assumed that the audio-related information includes information regarding the location of the △△ national park. Further, it is assumed that the passenger status information includes information regarding the position ahead of the line of sight. The determining unit 15 can determine the passenger whose line of sight is directed toward the △△ national park by matching the position ahead of the line of sight with the position of the △△ national park.
After determining the target passenger, the determining unit 15 determines the seating position of the target passenger based on the passenger status information. The determination unit 15 outputs to the output control unit 16 target passenger information indicating that the target passenger is a passenger who is directing his/her line of sight in the direction of the point guided by the audio data. At this time, the determination unit 15 includes information regarding the seating position of the target passenger in the target passenger information.
The determination unit 15 outputs the audio data from the car navigation device, which should be output to the target passenger, to the output control unit 16 in association with the target passenger information. Note that the audio data is audio data included in the audio-related information.

（５）発話音声に基づいて、特定の搭乗者を対象搭乗者と判定する例
例えば、ある搭乗者が、他の搭乗者に対して、「〇〇さん、」と呼びかける発話を行ったとする。この場合、マイク４は、当該発話による発話音声を収集し、集音情報取得部１３２は、マイク４から発話音声の音声データを音声関連情報として取得する。そして、音声解析部１４は、「〇〇さん、」との発話内容の解析を行う。
この場合、判定部１５は、発話音声に含まれている「〇〇」という名前の搭乗者を、対象搭乗者とする。
判定部１５は、対象搭乗者を判定すると、搭乗者状況情報に基づき、当該対象搭乗者の着座位置を判定する。具体的には、判定部１５は、「○○」という名前の搭乗者の着座位置を判定する。この場合、搭乗者状況情報には、搭乗者の名前の情報が含まれているものとする。そして、判定部１５は、「○○」という名前の搭乗者が対象搭乗者である旨の情報を、対象搭乗者情報として、出力制御部１６に出力する。このとき、判定部１５は、対象搭乗者の着座位置に関する情報を、対象搭乗者情報に含める。
判定部１５は、対象搭乗者に対して出力すべき、マイク４が収集した音声データを、対象搭乗者情報と対応付けて、出力制御部１６に出力する。なお、音声データは、音声関連情報に含まれている音声データである。 (5) Example of determining a specific passenger as a target passenger based on uttered audio For example, suppose that a certain passenger makes an utterance to address another passenger as “Mr. In this case, the microphone 4 collects the utterances of the utterances, and the collected sound information acquisition unit 132 acquires the audio data of the utterances from the microphone 4 as audio-related information. Then, the speech analysis unit 14 analyzes the content of the utterance "Mr. XX."
In this case, the determination unit 15 determines the passenger whose name is "〇〇" included in the uttered voice as the target passenger.
After determining the target passenger, the determination unit 15 determines the seating position of the target passenger based on the passenger status information. Specifically, the determining unit 15 determines the seating position of the passenger named "○○". In this case, it is assumed that the passenger status information includes information on the passenger's name. Then, the determination unit 15 outputs information indicating that the passenger named “○○” is the target passenger to the output control unit 16 as target passenger information. At this time, the determination unit 15 includes information regarding the seating position of the target passenger in the target passenger information.
The determination unit 15 outputs the audio data collected by the microphone 4, which should be output to the target passenger, to the output control unit 16 in association with the target passenger information. Note that the audio data is audio data included in the audio related information.

（６）発話音声に基づいて、複数の搭乗者を対象搭乗者と判定する例
例えば、ある搭乗者が、「みなさん、」と呼びかける発話を行ったとする。この場合、マイク４は、当該発話による発話音声を収集し、集音情報取得部１３２は、マイク４から発話音声の音声データを音声関連情報として取得する。そして、音声解析部１４は、「みなさん、」との発話内容の解析を行う。
この場合、判定部１５は、対象搭乗者は搭乗者全員であると判定する。そして、判定部１５は、搭乗者全員が対象搭乗者である旨の情報を対象搭乗者情報として、出力制御部１６に出力する。このとき、判定部１５は、対象搭乗者の着座位置に関する情報を、対象搭乗者情報に含める。
判定部１５は、対象搭乗者に対して出力すべき、マイク４から取得した音声データを、対象搭乗者情報と対応付けて、出力制御部１６に出力する。なお、音声データは、音声関連情報に含まれている音声データである。 (6) Example of determining multiple passengers as target passengers based on uttered audio For example, suppose that a certain passenger makes an utterance that calls out, "Ladies and gentlemen." In this case, the microphone 4 collects the utterances of the utterances, and the collected sound information acquisition unit 132 acquires the audio data of the utterances from the microphone 4 as audio-related information. Then, the speech analysis unit 14 analyzes the content of the utterance "Everyone."
In this case, the determination unit 15 determines that the target passengers are all the passengers. Then, the determination unit 15 outputs information indicating that all the passengers are target passengers to the output control unit 16 as target passenger information. At this time, the determination unit 15 includes information regarding the seating position of the target passenger in the target passenger information.
The determination unit 15 outputs the audio data acquired from the microphone 4, which should be output to the target passenger, to the output control unit 16 in association with the target passenger information. Note that the audio data is audio data included in the audio related information.

上述の（５）の例において、判定部１５は、例えば、双方向の会話を成立させるために、発話を行った搭乗者も対象搭乗者と判定するようにしてもよい。この場合、判定部１５は、搭乗者状況検出部１２から出力された搭乗者状況情報に基づき、発話を行った搭乗者を特定する。具体的には、判定部１５は、例えば、搭乗者状況情報に基づき、発話状態である搭乗者を、発話を行った搭乗者と特定し、当該搭乗者を、「〇〇」という名前の搭乗者とともに、対象搭乗者と判定する。
また、上述の（６）の一例において、判定部１５は、例えば、発話を行った搭乗者以外の搭乗者を対象搭乗者と判定するようにしてもよい。この場合、判定部１５は、搭乗者状況検出部１２から出力された搭乗者状況情報に基づき、発話を行った搭乗者を特定する。搭乗者状況情報に基づき発話を行った搭乗者を特定する方法の一例は上述のとおりであるので、重複した説明を省略する。判定部１５は、特定した、発話を行った搭乗者以外の搭乗者を、対象搭乗者と判定する。
また、判定部１５は、上述した（１）～（６）の例のような判定を並行して行ってもよい。 In the above-mentioned example (5), the determination unit 15 may determine that the passenger who made the utterance is also the target passenger, for example, in order to establish a two-way conversation. In this case, the determining unit 15 identifies the passenger who made the utterance based on the passenger status information output from the passenger status detecting unit 12. Specifically, for example, the determination unit 15 identifies the passenger who is speaking based on the passenger status information as the passenger who made the utterance, and identifies the passenger as a passenger named "〇〇". The passenger is determined to be a target passenger.
Furthermore, in the example of (6) above, the determination unit 15 may determine, for example, a passenger other than the passenger who made the utterance to be the target passenger. In this case, the determining unit 15 identifies the passenger who made the utterance based on the passenger status information output from the passenger status detecting unit 12. An example of a method for identifying the passenger who made the utterance based on the passenger status information is as described above, and therefore, repeated explanation will be omitted. The determining unit 15 determines the identified passenger other than the passenger who made the utterance to be the target passenger.
Furthermore, the determination unit 15 may perform determinations such as the examples (1) to (6) above in parallel.

実施の形態１において、判定部１５は、音声操作者判定部１５１を備える。
音声操作者判定部１５１は、搭乗者状況検出部１２から出力された搭乗者状況情報に基づき、車両１００内に設置されている音声制御機器に対して、音声による操作指示を行った搭乗者（以下「音声操作搭乗者」という。）がいるか否かを判定する。
上述のとおり、搭乗者状況検出部１２は、搭乗者が音声制御機器を操作するための会話を行った状況であることを検出できる。例えば、搭乗者状況情報には、音声制御機器を操作するための会話を行ったことを検出した旨の情報が含まれているものとする。なお、例えば、音声操作者判定部１５１が、搭乗者状況情報に基づき、視線が音声制御機器、または、マイク４の方向を向いていて、かつ、発話状態である搭乗者が存在するか否かを判定し、当該搭乗者が存在する場合に、当該搭乗者を音声操作搭乗者と判定するようにしてもよい。
音声操作者判定部１５１が、音声操作搭乗者がいると判定すると、判定部１５は、当該音声操作搭乗者に関する情報（以下「音声操作者情報」という。）を、出力制御部１６に出力する。判定部１５は、音声操作者情報に、音声操作搭乗者の着座位置の情報を含めるようにする。 In the first embodiment, the determination section 15 includes a voice operator determination section 151.
Based on the passenger status information output from the passenger status detection unit 12 , the voice operator determination unit 151 determines whether the passenger ( It is determined whether or not there is a person (hereinafter referred to as a "voice operated passenger").
As described above, the passenger status detection unit 12 can detect that the passenger is having a conversation to operate the voice control device. For example, it is assumed that the passenger status information includes information indicating that a conversation for operating a voice control device has been detected. For example, the voice operator determination unit 151 determines whether or not there is a passenger whose line of sight is facing the voice control device or the microphone 4 and who is in a speaking state based on the passenger status information. If the passenger is present, the passenger may be determined to be a voice-operated passenger.
When the voice operator determining unit 151 determines that there is a voice operated passenger, the determining unit 15 outputs information regarding the voice operated passenger (hereinafter referred to as “voice operator information”) to the output control unit 16. . The determination unit 15 causes the voice operator information to include information on the seating position of the voice operated passenger.

出力制御部１６は、車両１００に設置されている指向性スピーカ５１のうち、判定部１５が判定した対象搭乗者の着座位置に対応する指向性スピーカ５１（以下「対象スピーカ」という。）から、音声関連情報取得部１３が取得した音声関連情報に基づく音声を出力させる。具体的には、出力制御部１６は、判定部１５が判定した対象スピーカから、判定部１５から対象搭乗者情報とともに出力された音声データに基づく音声を出力する。
例えば、予め、記憶部１７には、車両１００内の座席とスピーカ５とを対応付けたスピーカ情報が記憶されているものとし、出力制御部１６は、スピーカ情報から、対象スピーカを特定すればよい。
出力制御部１６は、対象スピーカから音声関連情報に基づく音声を出力させる際、出力音声の音量を制御することもできる。具体的には、例えば、出力制御部１６は、音声関連情報に含まれている、音声データの種別に応じて、音量を制御することもできる。音声データがどの種別であった場合に、どれぐらいの音量で当該音声を出力するかは、予め決められているものとする。 The output control unit 16 outputs signals from among the directional speakers 51 installed in the vehicle 100 that correspond to the seating position of the target passenger determined by the determination unit 15 (hereinafter referred to as “target speaker”). The audio based on the audio related information acquired by the audio related information acquisition unit 13 is output. Specifically, the output control unit 16 outputs audio based on the audio data output from the determination unit 15 together with the target passenger information from the target speaker determined by the determination unit 15.
For example, it is assumed that the storage unit 17 stores in advance speaker information that associates the seats in the vehicle 100 with the speakers 5, and the output control unit 16 may specify the target speaker from the speaker information. .
The output control unit 16 can also control the volume of output audio when causing the target speaker to output audio based on the audio related information. Specifically, for example, the output control unit 16 can also control the volume depending on the type of audio data included in the audio-related information. It is assumed that the type of audio data and the volume at which the audio is output is determined in advance.

実施の形態１において、出力制御部１６は、減衰データ生成部１６１および減衰データ出力部１６２を備える。減衰データ生成部１６１および減衰データ出力部１６２は、減衰用スピーカ５２から、他のスピーカ５からの音声を減衰させるための逆位相の音声を出力する制御を行う場合に機能する。
具体的には、減衰データ生成部１６１は、音声関連情報取得部１３が取得した音声データを減衰させるための逆位相の減衰データを生成し、生成した減衰データを減衰データ出力部１６２に出力する。減衰データ出力部１６２は、減衰データ生成部１６１が生成した減衰データを、各座席に対応する減衰用スピーカ５２から出力させる。すなわち、減衰データ出力部１６２は、音声データに基づく音声を減衰させるための音声を出力する。
減衰データ生成部１６１および減衰データ出力部１６２が機能するケースについて、いくつか具体例を挙げて説明する。
例えば、減衰データ生成部１６１は、判定部１５が判定した対象搭乗者以外の搭乗者の着座位置に対応するスピーカ５から出力された音声を減衰させるための減衰データを生成する。そして、出力制御部１６が、対象搭乗者の着座位置に対応する対象スピーカから音声関連情報に基づく音声を出力させると、減衰データ出力部１６２は、減衰データ生成部１６１が生成した減衰データを、対象搭乗者以外の搭乗者の着座位置に対応する減衰用スピーカ５２から出力する。
また、例えば、判定部１５から音声操作者情報が出力された場合、減衰データ生成部１６１は、音声関連情報取得部１３が取得した音声データ、具体的には、音声関連情報取得部１３の集音情報取得部１３２がマイク４から音声関連情報に含まれる音声データのノイズを減衰するための減衰データを生成することもできる。そして、減衰データ出力部１６２は、減衰データを出力する。なお、減衰データ出力部１６２は、音声操作搭乗者の着座位置に対応する減衰用スピーカ５２から減衰用データを出力する。 In the first embodiment, the output control section 16 includes an attenuation data generation section 161 and an attenuation data output section 162. The attenuation data generation section 161 and the attenuation data output section 162 function when controlling the attenuation speaker 52 to output an opposite phase sound for attenuating the sound from the other speakers 5.
Specifically, the attenuation data generation unit 161 generates attenuation data of opposite phase for attenuating the audio data acquired by the audio related information acquisition unit 13, and outputs the generated attenuation data to the attenuation data output unit 162. . The attenuation data output unit 162 outputs the attenuation data generated by the attenuation data generation unit 161 from the attenuation speaker 52 corresponding to each seat. That is, the attenuation data output unit 162 outputs audio for attenuating the audio based on the audio data.
Cases in which the attenuation data generation section 161 and the attenuation data output section 162 function will be described using some specific examples.
For example, the attenuation data generation unit 161 generates attenuation data for attenuating the sound output from the speaker 5 corresponding to the seating position of a passenger other than the target passenger determined by the determination unit 15. Then, when the output control unit 16 outputs the sound based on the audio related information from the target speaker corresponding to the seating position of the target passenger, the attenuation data output unit 162 outputs the attenuation data generated by the attenuation data generation unit 161. The output is output from the attenuation speaker 52 corresponding to the seating position of a passenger other than the target passenger.
Further, for example, when the voice operator information is output from the determination unit 15, the attenuation data generation unit 161 generates the voice data acquired by the voice-related information acquisition unit 13, specifically, the collection of voice-related information acquisition unit 13. The sound information acquisition unit 132 can also generate attenuation data for attenuating noise in audio data included in the audio-related information from the microphone 4. Then, the attenuation data output section 162 outputs attenuation data. Note that the attenuation data output unit 162 outputs attenuation data from the attenuation speaker 52 corresponding to the seating position of the voice operated passenger.

また、例えば、音声出力制御装置１は、対象搭乗者の状況が変化したことに応じて、減衰データ生成部１６１および減衰データ出力部１６２を機能させるようにしてもよい。すなわち、音声出力制御装置１は、ＡＶ機器３から出力される音声データに基づく音声を聞く側の対象搭乗者の状況の変化に応じて、当該音声を搭乗者に届かないように減衰させて出力するようにすることができる。
例えば、ある搭乗者がＡＶ機器３を動作させる指示を入力する操作を行った際、ＡＶ機器３は、当該操作が行われたことにより音声を出力するための音声関連情報を出力する。判定部１５は、操作を行っている人を対象搭乗者とする（上記（１）の場合参照）。出力制御部１６は、対象搭乗者の着座位置に対応する対象スピーカからＡＶ機器３からの音声関連情報に基づく音声を出力させる。その後、ある搭乗者がＡＶ機器３を停止させる操作を行わない限り、当該ＡＶ機器３からは継続的に音声関連情報が出力されることになる。ここで、出力制御部１６の減衰データ生成部１６１は、搭乗者状況情報に基づき、ある搭乗者、言い換えれば、対象搭乗者が睡眠状態となったと判定した場合、対象スピーカから出力された音声を減衰させるための減衰データを生成し、減衰データ出力部１６２は、対象搭乗者の着座位置に対応する減衰用スピーカ５２から減衰データを出力させる。 Further, for example, the audio output control device 1 may cause the attenuation data generation section 161 and the attenuation data output section 162 to function in response to a change in the situation of the target passenger. That is, the audio output control device 1 attenuates and outputs the audio so that it does not reach the passenger, depending on the change in the situation of the target passenger who is listening to the audio based on the audio data output from the AV equipment 3. You can do as you like.
For example, when a passenger performs an operation to input an instruction to operate the AV device 3, the AV device 3 outputs audio-related information for outputting audio in response to the operation. The determination unit 15 determines that the person performing the operation is the target passenger (see case (1) above). The output control unit 16 outputs audio based on the audio related information from the AV device 3 from the target speaker corresponding to the seating position of the target passenger. Thereafter, unless a certain passenger performs an operation to stop the AV device 3, the AV device 3 will continue to output audio-related information. Here, when the attenuation data generation unit 161 of the output control unit 16 determines that a certain passenger, in other words, a target passenger is in a sleeping state, based on the passenger status information, the attenuation data generation unit 161 outputs the sound output from the target speaker. The attenuation data output unit 162 generates attenuation data for attenuation, and causes the attenuation data to be output from the attenuation speaker 52 corresponding to the seating position of the target passenger.

このように、音声出力制御装置１は、車両１００の搭乗者の状況に応じて、当該搭乗者のうち、音声が必要とされる対象搭乗者の着座位置に対応する対象スピーカからのみ、音声が出力されるようにする。
例えば、ある搭乗者がリモコンを操作してＡＶ機器３を動作させ、ＡＶ機器３から当該操作に応答する応答メッセージに関する音声関連情報が出力されたとすると、音声出力制御装置１は、当該ある搭乗者を対象搭乗者と判定し、ＡＶ機器３を動作させた搭乗者の着座位置に対応する対象スピーカからのみ、応答メッセージを出力させる。
また、例えば、ＡＶ機器３がナビゲーション装置であり、当該ナビゲーション装置から道案内を行う音声データに関する音声関連情報が出力されたとすると、音声出力制御装置１は、運転者、および、視線をナビゲーション装置に表示されている地図の方向に向けている搭乗者を対象搭乗者と判定し、運転者、および、視線をナビゲーション装置に表示されている地図の方向に向けている搭乗者の着座位置に対応する対象スピーカからのみ、道案内を行う音声を出力させる。
また、例えば、ＡＶ機器３がナビゲーション装置であり、当該ナビゲーション装置から△△国立公園を案内する音声データに関する音声関連情報が出力されたとすると、音声出力制御装置１は、視線を△△国立公園の方向に向けている搭乗者を対象搭乗者と判定し、視線を△△国立公園の方向に向けている搭乗者の着座位置に対応する対象スピーカからのみ、音声を出力させる。
また、例えば、ある搭乗者が「○○さん、・・・」と、他の搭乗者に対する発話を行ったとすると、音声出力制御装置１は、○○という名前の搭乗者を対象搭乗者と判定し、○○という名前あの搭乗者の着座位置に対応する対象スピーカから、ある搭乗者による発話音声を出力する。
これにより、音声出力制御装置１は、音声を届けるべき搭乗者に音声を届け、音声が必要のない搭乗者に対しては出力される音声が耳障りにならないようにすることができる。 In this manner, the audio output control device 1 is configured to output audio only from the target speaker corresponding to the seating position of the target occupant who requires audio, depending on the situation of the occupant of the vehicle 100. Make it output.
For example, if a certain passenger operates a remote control to operate the AV device 3 and the AV device 3 outputs audio-related information regarding a response message in response to the operation, the audio output control device 1 is determined to be the target passenger, and a response message is output only from the target speaker corresponding to the seating position of the passenger who has activated the AV equipment 3.
Further, for example, if the AV device 3 is a navigation device and the navigation device outputs audio-related information regarding audio data for providing route guidance, the audio output control device 1 controls the driver and the line of sight to the navigation device. The passenger who is facing in the direction of the displayed map is determined to be the target passenger, and the seating position of the driver and the passenger whose line of sight is directed in the direction of the map displayed on the navigation device is determined to be the target passenger. To output voice providing directions only from the target speaker.
Further, for example, if the AV device 3 is a navigation device and the navigation device outputs audio-related information regarding audio data guiding the △△ national park, the audio output control device 1 directs the line of sight to the △△ national park. The passenger who is facing in the direction is determined to be the target passenger, and the sound is output only from the target speaker corresponding to the seating position of the passenger who is facing the direction of the △△ national park.
Further, for example, if a certain passenger makes an utterance to another passenger, such as "Mr. ○○...", the voice output control device 1 determines that the passenger named ○○ is the target passenger. Then, the voice uttered by a certain passenger is output from the target speaker corresponding to the seating position of that passenger named ○○.
Thereby, the audio output control device 1 can deliver the audio to the passenger to whom the audio should be delivered, and can prevent the output audio from becoming harsh to the passenger who does not need the audio.

また、音声出力制御装置１は、車両１００の搭乗者の状況に応じて、音声出力が必要ない場合は、音声が出力されないようにすることができる。
例えば、音声出力制御装置１は、対象搭乗者の着座位置に対応する対象スピーカからは音声関連情報に基づく音声を出力させ、対象搭乗者以外の搭乗者の着座位置に対応する減衰用スピーカ５２からは減衰データを出力させる。
これにより、音声出力制御装置１は、対象搭乗者以外の搭乗者に対して、不要な音声が聞こえないように制御することができる。
また、例えば、音声出力制御装置１は、音声操作搭乗者が存在する場合、音声操作搭乗者の着座位置に対応する減衰用スピーカ５２から減衰データを出力させるようにする。
これにより、音声出力制御装置１は、音声操作搭乗者が音声制御機器を操作するために行った発話による音声を阻害するノイズを打ち消すことができる。
また、例えば、音声出力制御装置１は、対象操作者が睡眠状態となった場合、当該対象操作者の着座位置に対応する減衰用スピーカ５２から減衰データを出力させるようにする。このように、音声出力制御装置１は、指向性スピーカ５１から出力される音声を遮り、対象操作者の睡眠を妨げない制御を行うことができる。すなわち、音声出力制御装置１は、音声を聞く側の搭乗者の状況を把握し、当該音声の届け先に存在する搭乗者の状況に応じて音声の出力方法を制御することもできる。 Further, the audio output control device 1 can prevent the audio from being output if audio output is not necessary depending on the situation of the passenger of the vehicle 100.
For example, the audio output control device 1 outputs audio based on the audio-related information from the target speaker corresponding to the seating position of the target passenger, and outputs the sound based on the audio related information from the attenuation speaker 52 corresponding to the seating position of a passenger other than the target passenger. outputs attenuation data.
Thereby, the audio output control device 1 can perform control so that passengers other than the target passenger do not hear unnecessary audio.
Further, for example, when a voice-operated passenger is present, the voice output control device 1 outputs attenuation data from the attenuation speaker 52 corresponding to the seating position of the voice-operated passenger.
Thereby, the voice output control device 1 can cancel the noise that obstructs the voice produced by the voice-operated passenger's utterance to operate the voice-controlled device.
Further, for example, when the target operator is in a sleeping state, the audio output control device 1 outputs attenuation data from the attenuation speaker 52 corresponding to the seating position of the target operator. In this way, the audio output control device 1 can block the audio output from the directional speaker 51 and perform control that does not disturb the target operator's sleep. That is, the audio output control device 1 can also grasp the situation of the passenger who is listening to the audio, and control the audio output method according to the situation of the passenger who is present at the destination of the audio.

記憶部１７は、ユーザ情報または機器情報等を記憶する。
なお、実施の形態１では、図１に示すように、記憶部１７は、音声出力制御装置１に備えられるものとするが、これは一例に過ぎない。記憶部１７は、音声出力制御装置１の外部の、音声出力制御装置１が参照可能な場所に備えられるようにしてもよい。 The storage unit 17 stores user information, device information, and the like.
Note that in the first embodiment, as shown in FIG. 1, the storage section 17 is provided in the audio output control device 1, but this is only an example. The storage unit 17 may be provided at a location outside the audio output control device 1 that can be referenced by the audio output control device 1.

実施の形態１に係る音声出力制御装置１の動作について説明する。
図３は、実施の形態１に係る音声出力制御装置１の動作について説明するためのフローチャートである。 The operation of the audio output control device 1 according to the first embodiment will be explained.
FIG. 3 is a flowchart for explaining the operation of the audio output control device 1 according to the first embodiment.

撮像画像取得部１１は、カメラ２から、車両１００内を撮像した撮像画像を取得する（ステップＳＴ３０１）。
撮像画像取得部１１は、取得した撮像画像を、搭乗者状況検出部１２に出力する。 The captured image acquisition unit 11 acquires a captured image of the inside of the vehicle 100 from the camera 2 (step ST301).
The captured image acquisition unit 11 outputs the acquired captured image to the passenger situation detection unit 12.

搭乗者状況検出部１２は、ステップＳＴ３０１にて撮像画像取得部１１が取得した撮像画像に基づいて、搭乗者の状況を検出する（ステップＳＴ３０２）。
搭乗者状況検出部１２は、検出した搭乗者の状況に関する搭乗者状況情報を、判定部１５に出力する。 The passenger situation detection section 12 detects the situation of the passenger based on the captured image acquired by the captured image acquisition section 11 in step ST301 (step ST302).
The passenger situation detection section 12 outputs passenger situation information regarding the detected situation of the passenger to the determination section 15.

音声関連情報取得部１３は、ＡＶ機器３もしくはマイク４、または、ＡＶ機器３およびマイク４の両方から、音声関連情報を取得する（ステップＳＴ３０３）。
具体的には、音声関連情報取得部１３の機器関連情報取得部１３１は、ＡＶ機器３から、音声関連情報を取得する。音声関連情報取得部１３の集音情報取得部１３２は、マイク４から音声関連情報を取得する。
音声関連情報取得部１３は、取得した音声関連情報を、音声解析部１４または判定部１５に出力する。具体的には、機器関連情報取得部１３１は、取得した音声関連情報を、判定部１５に出力する。集音情報取得部１３２は、取得した音声関連情報を、音声解析部１４に出力する。
音声解析部１４は、集音情報取得部１３２が取得した音声関連情報に基づき、集音情報取得部１３２が取得した発話音声データの発話内容を解析する。音声解析部１４は、発話内容の解析結果を付与した音声関連情報を、判定部１５に出力する。 The audio-related information acquisition unit 13 acquires audio-related information from the AV device 3 or the microphone 4, or from both the AV device 3 and the microphone 4 (step ST303).
Specifically, the device-related information acquisition unit 131 of the audio-related information acquisition unit 13 acquires audio-related information from the AV device 3. The collected sound information acquisition unit 132 of the audio-related information acquisition unit 13 acquires audio-related information from the microphone 4 .
The audio-related information acquisition unit 13 outputs the acquired audio-related information to the audio analysis unit 14 or the determination unit 15. Specifically, the device-related information acquisition unit 131 outputs the acquired audio-related information to the determination unit 15. The collected sound information acquisition unit 132 outputs the acquired audio related information to the audio analysis unit 14.
The voice analysis unit 14 analyzes the utterance content of the uttered voice data acquired by the voice collection information acquisition unit 132 based on the voice related information acquired by the voice collection information acquisition unit 132. The speech analysis section 14 outputs speech-related information to which the analysis result of the utterance content has been added to the determination section 15 .

判定部１５は、ステップＳＴ３０２にて搭乗者状況検出部１２から出力された搭乗者状況情報と、ステップＳＴ３０３にて音声関連情報取得部１３が取得した音声関連情報とに基づいて、対象搭乗者を判定する（ステップＳＴ３０４）。また、判定部１５は、当該対象搭乗者の着座位置を判定する。そして、判定部１５は、対象搭乗者情報を、出力制御部１６に出力する。 The determination unit 15 determines the target passenger based on the passenger status information output from the passenger status detection unit 12 in step ST302 and the voice related information acquired by the voice related information acquisition unit 13 in step ST303. Determination is made (step ST304). Further, the determination unit 15 determines the seating position of the target passenger. The determination unit 15 then outputs the target passenger information to the output control unit 16.

判定部１５の音声操作者判定部１５１は、ステップＳＴ３０２にて搭乗者状況検出部１２から出力された搭乗者状況情報に基づき、音声操作搭乗者がいるか否かを判定する（ステップＳＴ３０５）。
音声操作者判定部１５１が、音声操作搭乗者がいると判定すると、判定部１５は、音声操作者情報を、出力制御部１６に出力する。 The voice operator determining unit 151 of the determining unit 15 determines whether or not there is a voice operated passenger based on the passenger status information output from the passenger status detecting unit 12 in step ST302 (step ST305).
When the voice operator determination unit 151 determines that there is a voice operated passenger, the determination unit 15 outputs voice operator information to the output control unit 16 .

出力制御部１６は、車両１００に設置されている指向性スピーカ５１のうち、ステップＳＴ３０４にて判定部１５が判定した対象搭乗者の着座位置に対応する対象スピーカから、ステップＳＴ３０３にて音声関連情報取得部１３が取得した音声関連情報に基づく音声を出力させる（ステップＳＴ３０６）。
なお、当該ステップＳＴ３０６において、出力制御部１６の減衰データ生成部１６１は、ステップＳＴ３０３にて音声関連情報取得部１３が取得した音声データを減衰させるための逆位相の減衰データを生成し、生成した減衰データを減衰データ出力部１６２に出力する。減衰データ出力部１６２は、減衰データ生成部１６１が生成した減衰データを、減衰用スピーカ５２から出力させる。 Output control unit 16 outputs audio-related information in step ST303 from the target speaker corresponding to the seating position of the target passenger determined by determination unit 15 in step ST304, among the directional speakers 51 installed in vehicle 100. Audio based on the audio related information acquired by the acquisition unit 13 is output (step ST306).
Note that in step ST306, the attenuation data generation section 161 of the output control section 16 generates attenuation data of opposite phase for attenuating the audio data acquired by the audio-related information acquisition section 13 in step ST303. The attenuation data is output to the attenuation data output section 162. The attenuation data output unit 162 outputs the attenuation data generated by the attenuation data generation unit 161 from the attenuation speaker 52.

図３のフローチャートにて説明した音声出力制御装置１の動作について、ステップＳＴ３０１～ステップＳＴ３０２の動作と、ステップＳＴ３０３の動作の順番は、逆であってもよいし、並行して行われてもよい。 Regarding the operation of the audio output control device 1 explained in the flowchart of FIG. 3, the order of the operations in steps ST301 to ST302 and the operation in step ST303 may be reversed or may be performed in parallel. .

図４Ａ，図４Ｂは、実施の形態１に係る音声出力制御装置１のハードウェア構成の一例を示す図である。
実施の形態１において、撮像画像取得部１１と、搭乗者状況検出部１２と、音声関連情報取得部１３と、音声解析部１４と、判定部１５と、出力制御部１６の機能は、処理回路４０１により実現される。すなわち、音声出力制御装置１は、車両１００の搭乗者の状況を考慮し、音声出力が必要と推定される搭乗者に対して音声が出力されるよう、音声出力の制御を行うための処理回路４０１を備える。
処理回路４０１は、図４Ａに示すように専用のハードウェアであっても、図４Ｂに示すようにメモリ４０５に格納されるプログラムを実行するＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）４０５であってもよい。 4A and 4B are diagrams showing an example of the hardware configuration of the audio output control device 1 according to the first embodiment.
In the first embodiment, the functions of the captured image acquisition section 11, the passenger situation detection section 12, the voice-related information acquisition section 13, the voice analysis section 14, the determination section 15, and the output control section 16 are performed by the processing circuit. This is realized by 401. That is, the audio output control device 1 is a processing circuit for controlling audio output so that audio is output to the passenger who is estimated to need audio output, taking into consideration the situation of the passenger of the vehicle 100. 401.
The processing circuit 401 may be dedicated hardware as shown in FIG. 4A, or may be a CPU (Central Processing Unit) 405 that executes a program stored in a memory 405 as shown in FIG. 4B.

処理回路４０１が専用のハードウェアである場合、処理回路４０１は、例えば、単一回路、複合回路、プログラム化したプロセッサ、並列プログラム化したプロセッサ、ＡＳＩＣ（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）、ＦＰＧＡ（Ｆｉｅｌｄ－ＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）、またはこれらを組み合わせたものが該当する。 When the processing circuit 401 is dedicated hardware, the processing circuit 401 may be, for example, a single circuit, a composite circuit, a programmed processor, a parallel programmed processor, an ASIC (Application Specific Integrated Circuit), or an FPGA (Field-Programmable Circuit). Gate Array), or a combination of these.

処理回路４０１がＣＰＵ４０４の場合、撮像画像取得部１１と、搭乗者状況検出部１２と、音声関連情報取得部１３と、音声解析部１４と、判定部１５と、出力制御部１６の機能は、ソフトウェア、ファームウェア、または、ソフトウェアとファームウェアとの組み合わせにより実現される。ソフトウェアまたはファームウェアは、プログラムとして記述され、メモリ４０５に記憶される。処理回路４０１は、メモリ４０５に記憶されたプログラムを読み出して実行することにより、撮像画像取得部１１と、搭乗者状況検出部１２と、音声関連情報取得部１３と、音声解析部１４と、判定部１５と、出力制御部１６の機能を実行する。すなわち、音声出力制御装置１は、処理回路４０１により実行されるときに、上述の図３のステップＳＴ３０１～ステップＳＴ３０６が結果的に実行させることになるプログラムを格納するためのメモリ４０５を備える。また、メモリ４０５に記憶されたプログラムは、撮像画像取得部１１と、搭乗者状況検出部１２と、音声関連情報取得部１３と、音声解析部１４と、判定部１５と、出力制御部１６の手順または方法をコンピュータに実行させるものであるとも言える。ここで、メモリ４０５とは、例えば、ＲＡＭ、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、フラッシュメモリ、ＥＰＲＯＭ（ＥｒａｓａｂｌｅＰｒｏｇｒａｍｍａｂｌｅＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＥＥＰＲＯＭ（ＥｌｅｃｔｒｉｃａｌｌｙＥｒａｓａｂｌｅＰｒｏｇｒａｍｍａｂｌｅＲｅａｄ－ＯｎｌｙＭｅｍｏｒｙ）等の、不揮発性もしくは揮発性の半導体メモリ、または、磁気ディスク、フレキシブルディスク、光ディスク、コンパクトディスク、ミニディスク、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ）等が該当する。 When the processing circuit 401 is the CPU 404, the functions of the captured image acquisition section 11, passenger situation detection section 12, voice related information acquisition section 13, voice analysis section 14, determination section 15, and output control section 16 are as follows. Realized by software, firmware, or a combination of software and firmware. Software or firmware is written as a program and stored in memory 405. The processing circuit 401 reads out and executes a program stored in the memory 405 to perform processing on the captured image acquisition section 11, passenger situation detection section 12, voice-related information acquisition section 13, voice analysis section 14, and judgment. The functions of the output control section 15 and the output control section 16 are executed. That is, the audio output control device 1 includes a memory 405 for storing a program that, when executed by the processing circuit 401, causes steps ST301 to ST306 in FIG. 3 described above to be executed as a result. Further, the program stored in the memory 405 includes the captured image acquisition section 11 , the passenger situation detection section 12 , the voice-related information acquisition section 13 , the voice analysis section 14 , the determination section 15 , and the output control section 16 . It can also be said to be something that causes a computer to execute a procedure or method. Here, the memory 405 includes, for example, RAM, ROM (Read Only Memory), flash memory, EPROM (Erasable Programmable Read Only Memory), and EEPROM (Electrically Erasable Programmable Memory). non-volatile or volatile This includes semiconductor memory, magnetic disks, flexible disks, optical disks, compact disks, mini disks, DVDs (Digital Versatile Discs), and the like.

なお、撮像画像取得部１１と、搭乗者状況検出部１２と、音声関連情報取得部１３と、音声解析部１４と、判定部１５と、出力制御部１６の機能について、一部を専用のハードウェアで実現し、一部をソフトウェアまたはファームウェアで実現するようにしてもよい。例えば、撮像画像取得部１１と音声関連情報取得部１３については専用のハードウェアとしての処理回路４０１でその機能を実現し、搭乗者状況検出部１２と音声解析部１４と判定部１５と出力制御部１６については処理回路４０１がメモリ４０５に格納されたプログラムを読み出して実行することによってその機能を実現することが可能である。
また、記憶部１７は、メモリ４０５を使用する。なお、図４Ａにおいては、例えば、処理回路４０１が不揮発性メモリを有しており、記憶部１７はこれを使用する。これは一例であって、記憶部は、ＨＤＤ、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）、または、ＤＶＤ等によって構成されるものであってもよい。
また、音声出力制御装置１は、カメラ
２、ＡＶ機器３、マイク４、または、スピーカ５等の装置と、有線通信または無線通信を行う入力インタフェース装置４０２および出力インタフェース装置４０３を備える。 Note that some of the functions of the captured image acquisition section 11, passenger situation detection section 12, voice-related information acquisition section 13, voice analysis section 14, determination section 15, and output control section 16 are implemented using dedicated hardware. It may be realized by software, and a part thereof may be realized by software or firmware. For example, the functions of the captured image acquisition unit 11 and the audio-related information acquisition unit 13 are realized by the processing circuit 401 as dedicated hardware, and the functions of the captured image acquisition unit 11 and the audio-related information acquisition unit 13 are realized by the processing circuit 401 as dedicated hardware, and the passenger situation detection unit 12, the audio analysis unit 14, the determination unit 15 and the output control The function of the unit 16 can be realized by the processing circuit 401 reading out and executing a program stored in the memory 405.
Furthermore, the storage unit 17 uses a memory 405. Note that in FIG. 4A, for example, the processing circuit 401 has a nonvolatile memory, and the storage unit 17 uses this. This is just an example, and the storage unit may be configured by an HDD, an SSD (Solid State Drive), a DVD, or the like.
The audio output control device 1 also includes an input interface device 402 and an output interface device 403 that perform wired or wireless communication with devices such as a camera 2, AV device 3, microphone 4, or speaker 5.

なお、以上の実施の形態１では、車両１００には減衰用スピーカ５２が設置されているものとしたが、減衰用スピーカ５２が設置されていることは必須ではない。
車両１００には減衰用スピーカ５２が設置されておらず、指向性スピーカ５１のみ設置されていてもよい。この場合、音声出力制御装置１の出力制御部１６は、減衰データ生成部１６１および減衰データ出力部１６２を備えない構成とすることができる。
音声出力制御装置１の出力制御部１６は、例えば、上述したような、音声を減衰させるための逆位相の減衰データを出力するようにした場合において、指向性スピーカ５１からの音声出力の停止を行う。
また、車両１００には減衰用スピーカ５２が設置されておらず、指向性スピーカ５１のみ設置されている場合であっても、音声出力制御装置１の出力制御部１６が、減衰データ生成部１６１および減衰データ出力部１６２を備えているようにしてもよい。この場合、減衰データ出力部１６２は、例えば、指向性スピーカ５１から減衰データを出力すればよい。 Note that in the first embodiment described above, it is assumed that the attenuation speaker 52 is installed in the vehicle 100, but it is not essential that the attenuation speaker 52 is installed.
The attenuation speaker 52 may not be installed in the vehicle 100, and only the directional speaker 51 may be installed. In this case, the output control section 16 of the audio output control device 1 may be configured without the attenuation data generation section 161 and the attenuation data output section 162.
For example, when the output control unit 16 of the audio output control device 1 outputs attenuation data of opposite phase for attenuating the audio as described above, the output control unit 16 stops the audio output from the directional speaker 51. conduct.
Further, even if the attenuation speaker 52 is not installed in the vehicle 100 and only the directional speaker 51 is installed, the output control section 16 of the audio output control device 1 can control the attenuation data generation section 161 and the attenuation data generation section 161. An attenuation data output section 162 may also be provided. In this case, the attenuation data output unit 162 may output attenuation data from the directional speaker 51, for example.

また、以上の実施の形態１では、音声出力制御装置１は、音声操作者判定部１５１を備えるものとしたが、これは一例に過ぎない。音声出力制御装置１は、音声操作者判定部１５１を備えない構成としてもよい。この場合、図３を用いて説明した音声出力制御装置１の動作について、ステップＳＴ３０５の動作は省略できる。 Further, in the first embodiment described above, the audio output control device 1 is provided with the audio operator determination section 151, but this is only an example. The audio output control device 1 may be configured without the audio operator determining section 151. In this case, regarding the operation of the audio output control device 1 described using FIG. 3, the operation of step ST305 can be omitted.

また、以上の実施の形態１では、音声出力制御装置１は、車両１００に搭載される車載装置とし、撮像画像取得部１１と、搭乗者状況検出部１２と、音声関連情報取得部１３と、音声解析部１４と、判定部１５と、出力制御部１６とは、音声出力制御装置１に備えられているものとした。
これに限らず、撮像画像取得部１１と、搭乗者状況検出部１２と、音声関連情報取得部１３と、音声解析部１４と、判定部１５と、出力制御部１６のうち、一部または全部を車両の車載装置に搭載されるものとし、その他を当該車載装置とネットワークを介して接続されるサーバに備えられるものとして、車載装置とサーバとで音声出力制御システムを構成するようにしてもよい。 Further, in the first embodiment described above, the audio output control device 1 is an in-vehicle device mounted on the vehicle 100, and includes a captured image acquisition section 11, a passenger situation detection section 12, an audio-related information acquisition section 13, It is assumed that the voice analysis section 14, the determination section 15, and the output control section 16 are included in the voice output control device 1.
However, some or all of the captured image acquisition section 11, passenger situation detection section 12, voice-related information acquisition section 13, voice analysis section 14, determination section 15, and output control section 16 are not limited to this. may be installed in an on-vehicle device of a vehicle, and the rest may be provided in a server connected to the in-vehicle device via a network, so that the on-vehicle device and the server constitute an audio output control system. .

以上のように、実施の形態１に係る音声出力制御装置１は、車両１００内を撮像した撮像画像を取得する撮像画像取得部１１と、撮像画像取得部１１が取得した撮像画像に基づいて、少なくとも着座位置を含む、搭乗者の状況を検出する搭乗者状況検出部１２と、スピーカ（指向性スピーカ５１）から出力するための音声に関する音声関連情報を取得する音声関連情報取得部１３と、搭乗者状況検出部１２が検出した搭乗者の状況に関する搭乗者状況情報と、音声関連情報取得部１３が取得した音声関連情報とに基づいて、音声を出力する対象となる対象搭乗者および当該対象搭乗者の着座位置を判定する判定部１５と、スピーカのうち、判定部１５が判定した対象搭乗者の着座位置に対応する対象スピーカから、音声を出力させる出力制御部１６を備えるように構成した。そのため、音声出力制御装置１は、座席毎に対応するスピーカが設定されている車両内において、搭乗者の状況を考慮し、音声出力が必要と推定される搭乗者に対して音声が出力されるよう、音声出力の制御を行うことができる。 As described above, the audio output control device 1 according to the first embodiment includes the captured image acquisition unit 11 that acquires a captured image of the inside of the vehicle 100, and the captured image acquisition unit 11 that performs the following operations based on the captured image acquired by the captured image acquisition unit 11. A passenger situation detection unit 12 that detects the situation of the passenger, including at least the seating position; Based on the passenger status information regarding the passenger status detected by the passenger status detection unit 12 and the voice-related information acquired by the voice-related information acquisition unit 13, the target passenger to whom the voice is output and the target boarding are determined. The vehicle is configured to include a determination unit 15 that determines the seating position of the passenger, and an output control unit 16 that outputs audio from a target speaker that corresponds to the seating position of the target passenger determined by the determination unit 15. Therefore, in a vehicle where a corresponding speaker is set for each seat, the audio output control device 1 takes into account the situation of the passenger and outputs audio to the passenger who is estimated to need audio output. You can control audio output.

なお、本開示は、実施の形態の任意の構成要素の変形、もしくは実施の形態の任意の構成要素の省略が可能である。 Note that in the present disclosure, any component of the embodiments may be modified or any component of the embodiments may be omitted.

１音声出力制御装置、２カメラ、３ＡＶ機器、４マイク、５スピーカ、５１指向性スピーカ、５２減衰用スピーカ、１１撮像画像取得部、１２搭乗者状況検出部、１３音声関連情報取得部、１３１機器関連情報取得部、１３２集音情報取得部、１４音声解析部、１５判定部、１５１音声操作者判定部、１６出力制御部、１６１減衰データ生成部、１６２減衰データ出力部、１７記憶部、４０１処理回路、４０２入力インタフェース装置、４０３出力インタフェース装置、４０４ＣＰＵ、４０５メモリ。 1 Audio output control device, 2 Camera, 3 AV equipment, 4 Microphone, 5 Speaker, 51 Directional speaker, 52 Attenuation speaker, 11 Captured image acquisition unit, 12 Passenger situation detection unit, 13 Audio related information acquisition unit, 131 Equipment related information acquisition unit, 132 Sound collection information acquisition unit, 14 Voice analysis unit, 15 Determination unit, 151 Voice operator determination unit, 16 Output control unit, 161 Attenuation data generation unit, 162 Attenuation data output unit, 17 Storage unit, 401 processing circuit, 402 input interface device, 403 output interface device, 404 CPU, 405 memory.

Claims

An audio output control device that controls audio output in a vehicle in which a corresponding speaker is installed for each seat,
a captured image acquisition unit that acquires a captured image of the interior of the vehicle;
a passenger status detection unit that detects the status of the passenger, including at least a seating position, based on the captured image acquired by the captured image acquisition unit;
an audio-related information acquisition unit that acquires audio-related information regarding audio to be output from the speaker;
a target passenger to whom the voice is to be output based on passenger status information regarding the status of the passenger detected by the passenger status detection unit and the voice-related information acquired by the voice-related information acquisition unit; and a determination unit that determines the seating position of the target passenger;
an output control unit that outputs the sound from a target speaker of the speakers that corresponds to the seating position of the target passenger determined by the determination unit;
The passenger situation detected by the passenger situation detection unit includes a direction of the passenger's line of sight,
The audio-related information acquired by the audio-related information acquisition unit includes audio data and information regarding the in-vehicle device that outputs the audio data,
The determination unit includes:
Among the passengers, the passenger who is directing the line of sight in the direction of the in-vehicle device is determined to be the target passenger.
An audio output control device characterized by :

a voice operator determination unit that determines, based on the passenger status information, whether or not there is a voice operated passenger who has given the voice operated instruction to a voice operated voice control device installed in the vehicle; Equipped with
The output control section includes:
2. The vehicle according to claim 1, wherein when the voice operator determination unit determines that the voice operation passenger is present, it outputs attenuation data for attenuating the voice or does not output the voice. Audio output control device.

The passenger status detected by the passenger status detection unit includes whether or not the passenger is in a sleeping state;
The output control section includes:
If the target passenger is in a sleeping state based on the passenger status information, attenuation data for attenuating the sound is output, or the sound is not output. Item 2. The audio output control device according to item 2.

The passenger status detected by the passenger status detection unit includes an operation status of the vehicle -mounted device by the passenger,
The audio related information acquired by the audio related information acquisition unit includes the audio data and information regarding the in-vehicle device that outputs the audio data,
The determination unit includes:
The audio output control device according to any one of claims 1 to 3, wherein among the passengers, the passenger who operated the in-vehicle device is determined to be the target passenger.

The situation of the passenger detected by the passenger situation detection unit includes the direction of the line of sight of the passenger,
The audio related information acquired by the audio related information acquisition unit includes the audio data and information regarding the type of the audio data,
The determination unit includes:
If the type of the audio data is a type indicating route guidance, the passenger who is directing his / her line of sight to the map displayed on the in- vehicle device is determined to be the target passenger. The audio output control device according to any one of claims 1 to 3.

The passenger status detected by the passenger status detection unit includes whether or not the passenger is in a sleeping state;
The audio related information acquired by the audio related information acquisition unit includes the audio data and information regarding the type of the audio data,
The determination unit includes:
Based on the voice related information and the passenger status information, if the type of the voice data is a type indicating guidance for all the passengers, the passenger who is not in a sleeping state is determined to be the target passenger. The audio output control device according to any one of claims 1 to 3, characterized in that:

The situation of the passenger detected by the passenger situation detection unit includes the direction of the line of sight of the passenger,
The audio-related information acquired by the audio-related information acquisition unit includes the audio data, and the audio data is the audio data for guiding the location,
The determination unit includes:
Claims 1 to 3, characterized in that the passenger who is directing the line of sight in the direction of the point is determined to be the target passenger based on the voice-related information and the passenger status information. The audio output control device according to any one of the items.

The passenger status information detected by the passenger status detection unit includes information that can identify the passenger who made the utterance,
The voice-related information acquired by the voice-related information acquisition unit includes the voice data, and the voice data is voice uttered to another passenger, collected from a microphone installed in the vehicle. ,
The determination unit includes:
The audio output control device according to any one of claims 1 to 3, wherein the target passenger is determined based on the utterance content based on the uttered voice and the passenger status information.

An audio output control program that controls audio output in a vehicle in which a corresponding speaker is installed for each seat,
computer,
a captured image acquisition unit that acquires a captured image of the interior of the vehicle;
a passenger status detection unit that detects the status of the passenger, including at least a seating position, based on the captured image acquired by the captured image acquisition unit;
an audio-related information acquisition unit that acquires audio-related information regarding audio to be output from the speaker;
a target passenger to whom the voice is to be output based on passenger status information regarding the status of the passenger detected by the passenger status detection unit and the voice-related information acquired by the voice-related information acquisition unit; and a determination unit that determines the seating position of the target passenger;
Among the speakers, the target speaker corresponding to the seating position of the target passenger determined by the determination unit functions as an output control unit that outputs the sound ,
The passenger situation detected by the passenger situation detection unit includes a direction of the passenger's line of sight,
The audio-related information acquired by the audio-related information acquisition unit includes audio data and information regarding the in-vehicle device that outputs the audio data,
The determination unit includes:
Among the passengers, the passenger who is directing the line of sight in the direction of the in-vehicle device is determined to be the target passenger.
Audio output control program for .