JP6995254B2

JP6995254B2 - Sound field control device and sound field control method

Info

Publication number: JP6995254B2
Application number: JP2021541855A
Authority: JP
Inventors: 立明高橋; 光生下谷
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2019-08-28
Filing date: 2019-08-28
Publication date: 2022-01-14
Anticipated expiration: 2039-08-28
Also published as: JPWO2021038736A1; WO2021038736A1; DE112019007580T5; DE112019007580B4

Description

本発明は、車両内のスピーカによって生成される、車両内の特定の位置での音場を制御する音場制御装置及び音場制御方法に関する。 The present invention relates to a sound field control device and a sound field control method for controlling a sound field at a specific position in a vehicle, which is generated by a speaker in the vehicle.

従来、車両内において快適な会話環境を提供するシステムが提案されている。例えば、マイク（マイクロホン）とスピーカとを備えたハンドフリー電話システムでは、車両内のノイズを除去するノイズキャンセリング技術や、車両内のエコーを除去するエコーキャンセル技術によって快適な通話環境を実現している。また、車両内での乗員の会話をスムーズに行うために、それぞれの乗員近傍にマイク及びスピーカを配置し、一方の乗員の音声をマイクロホンから取り込み、他方の乗員近傍のスピーカから当該音声を出力する技術が提案されている。 Conventionally, a system that provides a comfortable conversation environment in a vehicle has been proposed. For example, in a hand-free telephone system equipped with a microphone (microphone) and a speaker, a comfortable call environment is realized by noise canceling technology that removes noise in the vehicle and echo canceling technology that removes echo in the vehicle. There is. In addition, in order to facilitate conversations among occupants in the vehicle, microphones and speakers are placed near each occupant, the voice of one occupant is captured from the microphone, and the voice is output from the speaker near the other occupant. Technology has been proposed.

なお、マイク及びスピーカを用いて車両内の音場を制御する技術として、例えば特許文献１及び特許文献２の技術が提案されている。特許文献１には、頭の回転に対応して音場を制御する技術が提案されている。特許文献２には、座席の位置に応じて音場を制御する技術が提案されている。 As a technique for controlling the sound field in the vehicle by using a microphone and a speaker, for example, the techniques of Patent Document 1 and Patent Document 2 have been proposed. Patent Document 1 proposes a technique for controlling a sound field in response to rotation of the head. Patent Document 2 proposes a technique for controlling a sound field according to the position of a seat.

特開２００９－２５３５２６号公報Japanese Unexamined Patent Publication No. 2009-25526 特開平０７－１８４２９８号公報Japanese Unexamined Patent Publication No. 07-184298

以上のような従来技術では、運転者という特定の乗員からの音声のみに関して音場を適切に制御することを想定している。このため、車両内の任意の乗員からの音声に関して車両内の音場を適切に制御することができないという問題があった。 In the above-mentioned conventional technology, it is assumed that the sound field is appropriately controlled only with respect to the voice from a specific occupant called the driver. Therefore, there is a problem that the sound field in the vehicle cannot be appropriately controlled with respect to the sound from an arbitrary occupant in the vehicle.

そこで、本発明は、上記のような問題点を鑑みてなされたものであり、車両内の任意の乗員からの音声に関して車両内の音場を適切に制御することが可能な技術を提供することを目的とする。 Therefore, the present invention has been made in view of the above-mentioned problems, and provides a technique capable of appropriately controlling the sound field in the vehicle with respect to the sound from an arbitrary occupant in the vehicle. With the goal.

本発明に係る音場制御装置は、車両内の複数の乗員の撮影画像から、複数の乗員の顔情報を取得する取得部と、取得部で取得された顔情報と、車両内のマイクで受け付けた音声とに基づいて、複数の乗員の中から話者と聴者とを判定する判定制御部と、判定制御部による判定の結果に基づいて、複数の乗員の位置における音場を制御する音場制御部とを備え、音場制御部は、判定制御部による判定の結果に基づいて、話者からの音声を打ち消す音声を、スピーカから聴者に出力させる駆動を行う消音音声駆動部と、判定制御部による判定の結果に基づいて、話者からの音声を、スピーカから聴者に出力させる駆動を行う音声駆動部と、消音音声駆動部の駆動及び音声駆動部の駆動を制御する駆動制御を行う駆動制御部とを含む。 The sound field control device according to the present invention accepts the acquisition unit that acquires the face information of a plurality of occupants from the images taken by a plurality of occupants in the vehicle, the face information acquired by the acquisition unit, and the microphone in the vehicle. A sound field that controls the sound field at the positions of a plurality of occupants based on the judgment control unit that determines a speaker and a listener from among a plurality of occupants based on the sound and the result of the determination by the determination control unit. The sound field control unit includes a control unit, and the sound field control unit includes a mute sound drive unit that drives the listener to output a sound that cancels the voice from the speaker based on the result of the judgment by the judgment control unit, and a judgment control unit. Based on the result of the determination by the unit, the voice drive unit that drives the speaker to output the voice from the speaker to the listener, and the drive that controls the drive of the mute voice drive unit and the drive of the voice drive unit. Including the control unit .

本発明によれば、取得された顔情報と、マイクで受け付けた音声とに基づいて、複数の乗員の中から話者と聴者とを判定し、話者及び聴者の判定の結果に基づいて、複数の乗員の位置における音場を制御する。このような構成によれば、車両内の任意の乗員からの音声に関して車両内の音場を適切に制御することができる。 According to the present invention, a speaker and a listener are determined from a plurality of occupants based on the acquired face information and the voice received by the microphone, and based on the result of the determination of the speaker and the listener, the speaker and the listener are determined. Control the sound field at the positions of multiple occupants. According to such a configuration, the sound field in the vehicle can be appropriately controlled with respect to the voice from any occupant in the vehicle.

本発明の目的、特徴、態様及び利点は、以下の詳細な説明と添付図面とによって、より明白となる。 The objects, features, embodiments and advantages of the present invention will be made clearer by the following detailed description and accompanying drawings.

実施の形態１に係る音場制御装置の構成を示すブロック図である。It is a block diagram which shows the structure of the sound field control apparatus which concerns on Embodiment 1. FIG. 実施の形態２に係る音場制御装置の構成を示すブロック図である。It is a block diagram which shows the structure of the sound field control apparatus which concerns on Embodiment 2. FIG. 実施の形態２に係る音場制御装置の動作を示すフローチャートである。It is a flowchart which shows the operation of the sound field control apparatus which concerns on Embodiment 2. 実施の形態２に係る音場制御装置の動作を説明するための図である。It is a figure for demonstrating operation of the sound field control apparatus which concerns on Embodiment 2. FIG. 実施の形態３に係る音場制御装置の動作を説明するための図である。It is a figure for demonstrating operation of the sound field control apparatus which concerns on Embodiment 3. FIG. その他の変形例に係る音場制御装置のハードウェア構成を示すブロック図である。It is a block diagram which shows the hardware composition of the sound field control device which concerns on other modification. その他の変形例に係る音場制御装置のハードウェア構成を示すブロック図である。It is a block diagram which shows the hardware composition of the sound field control device which concerns on other modification. その他の変形例に係るサーバの構成を示すブロック図である。It is a block diagram which shows the structure of the server which concerns on other modification. その他の変形例に係る通信端末の構成を示すブロック図である。It is a block diagram which shows the structure of the communication terminal which concerns on other modification.

＜実施の形態１＞
図１は、本発明の実施の形態１に係る音場制御装置１の構成を示すブロック図である。音場制御装置１は、例えばＤＭＳ（Driver Monitoring System）などに適用することができる。図１の音場制御装置１は、車両内のマイク（マイクロホン）５１及びスピーカ５２と無線または有線を介して接続されている。<Embodiment 1>
FIG. 1 is a block diagram showing a configuration of a sound field control device 1 according to a first embodiment of the present invention. The sound field control device 1 can be applied to, for example, a DMS (Driver Monitoring System) or the like. The sound field control device 1 of FIG. 1 is connected to a microphone (microphone) 51 and a speaker 52 in a vehicle via wireless or wired.

マイク５１は、車両内の複数の乗員の口から発せられた音声を受け付ける。マイク５１には、例えば、各乗員からの音声のみを受け付け可能な超指向性マイクなどが適用される。複数の乗員には、例えば、運転者、助手席の同乗者、及び、後部席の同乗者などが含まれる。 The microphone 51 receives voices emitted from the mouths of a plurality of occupants in the vehicle. For example, a super-directional microphone capable of receiving only voice from each occupant is applied to the microphone 51. The plurality of occupants include, for example, a driver, a passenger in the passenger seat, and a passenger in the rear seat.

スピーカ５２は、音場制御装置１の制御によって、車両内に音場を生成する。このスピーカ５２には、例えば、各乗員の位置において異なる音場を生成することが可能なパラメトリックアレイスピーカなどの超指向性スピーカなどが適用される。 The speaker 52 generates a sound field in the vehicle under the control of the sound field control device 1. For example, a super-directional speaker such as a parametric array speaker capable of generating a different sound field at each occupant's position is applied to the speaker 52.

次に、音場制御装置１について説明する。以下で説明するように、音場制御装置１は、スピーカ５２によって生成される、車両内の特定の位置での音場を制御することが可能となっている。 Next, the sound field control device 1 will be described. As will be described below, the sound field control device 1 can control the sound field generated by the speaker 52 at a specific position in the vehicle.

図１の音場制御装置１は、取得部１１と、判定制御部１２と、音場制御部１３とを備える。 The sound field control device 1 of FIG. 1 includes an acquisition unit 11, a determination control unit 12, and a sound field control unit 13.

取得部１１は、車両内の複数の乗員の撮影画像から複数の乗員の顔情報を取得する。顔情報は、乗員の顔の特徴点及び状態を含む情報であり、例えば、乗員の口の位置、乗員の耳（例えば両耳）の位置、及び、口の動きに関する情報である。取得部１１には、カメラなどの撮影装置で撮影された乗員の画像に認識処理が可能な装置、及び、そのインターフェースの少なくともいずれか１つが用いられる。 The acquisition unit 11 acquires the face information of the plurality of occupants from the images taken by the plurality of occupants in the vehicle. The face information is information including feature points and states of the occupant's face, and is, for example, information on the position of the occupant's mouth, the position of the occupant's ears (for example, both ears), and the movement of the mouth. The acquisition unit 11 uses at least one of a device capable of recognizing an image of an occupant taken by a photographing device such as a camera and an interface thereof.

判定制御部１２は、取得部１１で取得された顔情報と、マイク５１で受け付けた音声とに基づいて、複数の乗員の中から話者と聴者とを判定する。例えば、判定制御部１２は、時系列の顔情報から乗員の口が開閉していることを認識し、かつ、マイク５１で受け付けた音声の波形が母音波形または子音波形であることを認識した場合に、当該乗員を話者と判定し、それ以外の乗員を聴者と判定する。また例えば、判定制御部１２は、時系列の顔情報にリップリーディング機能を用いて乗員が発話していると推定し、かつ、マイク５１で一定以上の音圧を検出した場合に、当該乗員を話者と判定し、それ以外の乗員を聴者と判定する。 The determination control unit 12 determines a speaker and a listener from among a plurality of occupants based on the face information acquired by the acquisition unit 11 and the voice received by the microphone 51. For example, when the determination control unit 12 recognizes that the occupant's mouth is open / closed from the time-series face information, and recognizes that the voice waveform received by the microphone 51 is a vowel type or a consonant type. In addition, the occupant is determined to be a speaker, and the other occupants are determined to be listeners. Further, for example, when the determination control unit 12 estimates that the occupant is speaking to the face information in time series using the lip reading function, and detects the sound pressure above a certain level with the microphone 51, the determination control unit 12 determines the occupant. It is determined that the speaker is a speaker, and the other occupants are determined to be listeners.

音場制御部１３は、判定制御部１２による判定の結果に基づいてスピーカ５２を制御することにより、車両内の特定の位置での音場、ひいては複数の乗員の位置における音場を制御する。例えば、音場制御部１３は、判定制御部１２で判定された話者の口の位置と、判定制御部１２で判定された聴者の耳の位置とに基づいて、スピーカ５２から聴者の耳に対して、話者の音声、及び、話者の音声を打ち消す音声を選択的に出力する制御を行う。 The sound field control unit 13 controls the speaker 52 based on the result of the determination by the determination control unit 12 to control the sound field at a specific position in the vehicle, and by extension, the sound field at the positions of a plurality of occupants. For example, the sound field control unit 13 transfers the speaker 52 to the listener's ear based on the position of the speaker's mouth determined by the determination control unit 12 and the position of the listener's ear determined by the determination control unit 12. On the other hand, control is performed to selectively output the speaker's voice and the voice that cancels the speaker's voice.

＜実施の形態１のまとめ＞
以上のような本実施の形態１に係る音場制御装置１によれば、取得された顔情報と、マイク５１で受け付けた音声とに基づいて、複数の乗員の中から話者と聴者とを判定し、話者及び聴者の判定の結果に基づいて、複数の乗員の位置における音場を制御する。このような構成によれば、例えば、車両内で音声を発した任意の乗員を話者として判定し、それ以外の乗員を聴者として判定することができるので、任意の乗員からの音声に関して、複数の乗員の位置における音場を適切に制御することができる。このことは、近年提案されているライドシェアにおいて特に有効である。<Summary of Embodiment 1>
According to the sound field control device 1 according to the first embodiment as described above, the speaker and the listener are selected from a plurality of occupants based on the acquired face information and the voice received by the microphone 51. Judgment is made, and the sound field at the positions of a plurality of occupants is controlled based on the judgment results of the speaker and the listener. According to such a configuration, for example, any occupant who emits a voice in the vehicle can be determined as a speaker, and other occupants can be determined as a listener. The sound field at the position of the occupant can be appropriately controlled. This is particularly effective in the recently proposed rideshare.

＜実施の形態２＞
図２は、本発明の実施の形態２に係る音場制御装置１の構成を示すブロック図である。以下、本実施の形態２に係る構成要素のうち、上述の構成要素と同じまたは類似する構成要素については同じまたは類似する参照符号を付し、異なる構成要素について主に説明する。<Embodiment 2>
FIG. 2 is a block diagram showing the configuration of the sound field control device 1 according to the second embodiment of the present invention. Hereinafter, among the components according to the second embodiment, the same or similar components as those described above will be designated by the same or similar reference numerals, and different components will be mainly described.

図２の音場制御装置１は、指向性マイク５１ａ、スピーカアレイ５２ａ、及び、画像認識装置５３と通信可能に接続されている。指向性マイク５１ａ及びスピーカアレイ５２ａは、図１のマイク５１及びスピーカ５２の概念に含まれる。具体的には、指向性マイク５１ａは、実施の形態１で説明したマイク５１と同様に、車両内の複数の乗員の口から発せられた音声を受け付ける。スピーカアレイ５２ａは、実施の形態１で説明したスピーカ５２と同様に、音場制御装置１の制御によって、車両内に音場を生成する。画像認識装置５３は、車両内の複数の乗員の撮影画像に認識処理を行うことにより、複数の乗員の顔情報を生成する。 The sound field control device 1 of FIG. 2 is communicably connected to the directional microphone 51a, the speaker array 52a, and the image recognition device 53. The directional microphone 51a and the speaker array 52a are included in the concept of the microphone 51 and the speaker 52 in FIG. Specifically, the directional microphone 51a receives voices emitted from the mouths of a plurality of occupants in the vehicle, similarly to the microphone 51 described in the first embodiment. Similar to the speaker 52 described in the first embodiment, the speaker array 52a generates a sound field in the vehicle under the control of the sound field control device 1. The image recognition device 53 generates face information of a plurality of occupants by performing recognition processing on images taken by a plurality of occupants in the vehicle.

次に、音場制御装置１について説明する。図２の音場制御装置１は、顔情報取得部１１ａと、発話音声判定部１２ａと、話者聴者認識部１２ｂと、消音音声駆動部１３ａと、音声駆動部１３ｂと、駆動制御部１３ｃと、マイク制御部１４とを備える。なお、顔情報取得部１１ａは、図１の取得部１１の概念に含まれ、発話音声判定部１２ａ及び話者聴者認識部１２ｂは、図１の判定制御部１２の概念に含まれる。消音音声駆動部１３ａ、音声駆動部１３ｂ及び駆動制御部１３ｃは、図１の音場制御部１３の概念に含まれる。 Next, the sound field control device 1 will be described. The sound field control device 1 of FIG. 2 includes a face information acquisition unit 11a, an utterance voice determination unit 12a, a speaker / listener recognition unit 12b, a muffling voice drive unit 13a, a voice drive unit 13b, and a drive control unit 13c. , A microphone control unit 14 is provided. The face information acquisition unit 11a is included in the concept of the acquisition unit 11 of FIG. 1, and the utterance voice determination unit 12a and the speaker / listener recognition unit 12b are included in the concept of the determination control unit 12 of FIG. The mute voice drive unit 13a, the voice drive unit 13b, and the drive control unit 13c are included in the concept of the sound field control unit 13 of FIG.

顔情報取得部１１ａは、画像認識装置５３で生成された顔情報を取得する。顔情報は、実施の形態１と同様に乗員の顔の特徴点及び状態を含む情報である。 The face information acquisition unit 11a acquires the face information generated by the image recognition device 53. The face information is information including feature points and states of the occupant's face as in the first embodiment.

マイク制御部１４は、指向性マイク５１ａが乗員の口の位置の音声を精度よく受け付けることができるように、顔情報取得部１１ａで取得された顔情報に基づいて、指向性マイク５１ａの指向性を制御する。なお、指向性マイク５１ａの指向性の調整ロジックは限定されるものではなく、既存の任意の調整ロジックを用いることができる。 The microphone control unit 14 has the directivity of the directional microphone 51a based on the face information acquired by the face information acquisition unit 11a so that the directional microphone 51a can accurately receive the voice of the position of the occupant's mouth. To control. The directivity adjustment logic of the directional microphone 51a is not limited, and any existing adjustment logic can be used.

発話音声判定部１２ａは、顔情報取得部１１ａで取得された顔情報と、指向性マイク５１ａで受け付けた音声とに基づいて、指向性マイク５１ａで受け付けた音声が、乗員が発した音声であるか否かを判定する。このような判定には、例えば、実施の形態１の判定制御部１２で説明した判定が用いられる。 The utterance voice determination unit 12a is a voice emitted by the occupant based on the face information acquired by the face information acquisition unit 11a and the voice received by the directional microphone 51a. Judge whether or not. For such a determination, for example, the determination described by the determination control unit 12 of the first embodiment is used.

話者聴者認識部１２ｂは、指向性マイク５１ａで受け付けた音声が乗員からの音声であると発話音声判定部１２ａにて判定された場合に、顔情報取得部１１ａで取得された顔情報と、発話音声判定部１２ａの判定の結果とに基づいて話者と聴者とを特定する。例えば、話者聴者認識部１２ｂは、顔情報取得部１１ａで顔情報が取得された複数の乗員のうち、発話音声判定部１２ａで音声を発した乗員と判定された話者と判定し、それ以外の乗員を聴者と判定する。この際、話者聴者認識部１２ｂは、話者の口の位置と聴者の耳の位置も判定する。話者聴者認識部１２ｂによる判定の結果は、駆動制御部１３ｃを介して消音音声駆動部１３ａ及び音声駆動部１３ｂに出力される。 When the utterance voice determination unit 12a determines that the voice received by the directional microphone 51a is the voice from the occupant, the speaker / listener recognition unit 12b receives the face information acquired by the face information acquisition unit 11a and the face information. The speaker and the listener are identified based on the result of the determination of the utterance voice determination unit 12a. For example, the speaker / listener recognition unit 12b determines that the speaker is determined to be the occupant who has made a voice by the utterance voice determination unit 12a among the plurality of occupants whose face information has been acquired by the face information acquisition unit 11a. Crew members other than are judged to be listeners. At this time, the speaker-listener recognition unit 12b also determines the position of the speaker's mouth and the position of the listener's ear. The result of the determination by the speaker / listener recognition unit 12b is output to the mute voice drive unit 13a and the voice drive unit 13b via the drive control unit 13c.

消音音声駆動部１３ａは、話者聴者認識部１２ｂによる判定の結果に基づいて、話者からの音声を打ち消す音声を、スピーカアレイ５２ａから聴者に出力させる駆動を行う。具体的には、消音音声駆動部１３ａは、話者聴者認識部１２ｂで判定された話者の口の位置及び聴者の耳の位置と、指向性マイク５１ａで受け付けた話者の音声とに基づいて、聴者の耳の位置に話者の音声を打ち消す音声が出力されるように、スピーカアレイ５２ａを制御する。話者の音声の打ち消しは、聴者が話者の音声を聞き取れない程度の打ち消しであればよく、完全な打ち消しでなくてもよい。 The muffling voice driving unit 13a drives the speaker array 52a to output the voice for canceling the voice from the speaker to the listener based on the result of the determination by the speaker listener recognition unit 12b. Specifically, the muffling voice driving unit 13a is based on the position of the speaker's mouth and the position of the listener's ear determined by the speaker-listener recognition unit 12b, and the speaker's voice received by the directional microphone 51a. The speaker array 52a is controlled so that the voice that cancels the voice of the speaker is output to the position of the listener's ear. The cancellation of the speaker's voice may be such that the listener cannot hear the speaker's voice, and may not be a complete cancellation.

音声駆動部１３ｂは、話者聴者認識部１２ｂによる判定の結果に基づいて、話者からの音声を、スピーカアレイ５２ａから聴者に出力させる駆動を行う。具体的には、音声駆動部１３ｂは、話者聴者認識部１２ｂで判定された話者の口の位置及び聴者の耳の位置と、指向性マイク５１ａで受け付けた話者の音声とに基づいて、聴者の耳の位置に話者の音声が出力されるように、スピーカアレイ５２ａを制御する。 The voice driving unit 13b drives the speaker array 52a to output the voice from the speaker to the listener based on the result of the determination by the speaker listener recognition unit 12b. Specifically, the voice driving unit 13b is based on the position of the speaker's mouth and the position of the listener's ear determined by the speaker-listener recognition unit 12b, and the voice of the speaker received by the directional microphone 51a. , The speaker array 52a is controlled so that the speaker's voice is output to the position of the listener's ear.

駆動制御部１３ｃは、消音音声駆動部１３ａの駆動及び音声駆動部１３ｂの駆動を制御する駆動制御を行う。本実施の形態２では、駆動制御部１３ｃは、話者がハンドフリー通話をしているか否かに基づいて駆動制御を行う。ここでいう話者は、運転者であってもよいし、運転者以外の乗員であってもよい。 The drive control unit 13c performs drive control for controlling the drive of the mute voice drive unit 13a and the drive of the voice drive unit 13b. In the second embodiment, the drive control unit 13c performs drive control based on whether or not the speaker is making a hands-free call. The speaker here may be a driver or an occupant other than the driver.

例えば、駆動制御部１３ｃは、話者がハンドフリー通話をしていると判定した場合に、全ての聴者に対して音声駆動部１３ｂを駆動せずに消音音声駆動部１３ａを駆動する制御を行う。また例えば、駆動制御部１３ｃは、話者がハンドフリー通話をしていないと判定した場合に、全ての聴者に対して消音音声駆動部１３ａを駆動せずに音声駆動部１３ｂを駆動する制御を行う。 For example, when it is determined that the speaker is making a hands-free call, the drive control unit 13c controls all listeners to drive the mute voice drive unit 13a without driving the voice drive unit 13b. .. Further, for example, when it is determined that the speaker is not making a hands-free call, the drive control unit 13c controls all listeners to drive the voice drive unit 13b without driving the mute voice drive unit 13a. conduct.

なお、話者がハンドフリー通話をしているか否かは、ハンドフリー通話の制御装置（図示せず）から入力されたハンドフリー通話のステータス情報に基づいて判定されてもよいし、顔情報取得部１１ａで取得された顔情報に基づいて判定されてもよい。 Whether or not the speaker is making a hands-free call may be determined based on the status information of the hands-free call input from the hand-free call control device (not shown), or face information acquisition. The determination may be made based on the face information acquired by the unit 11a.

＜動作＞
図３は、本実施の形態２に係る音場制御装置１の動作を示すフローチャートである。<Operation>
FIG. 3 is a flowchart showing the operation of the sound field control device 1 according to the second embodiment.

まずステップＳＴ１にて、顔情報取得部１１ａは、画像認識装置５３から顔情報を取得する。 First, in step ST1, the face information acquisition unit 11a acquires face information from the image recognition device 53.

ステップＳＴ２にて、マイク制御部１４は、顔情報取得部１１ａで取得された顔情報に基づいて指向性マイク５１ａの指向性を制御する。 In step ST2, the microphone control unit 14 controls the directivity of the directional microphone 51a based on the face information acquired by the face information acquisition unit 11a.

ステップＳＴ３にて、発話音声判定部１２ａは、顔情報取得部１１ａで取得された顔情報と、指向性マイク５１ａで受け付けた音声とに基づいて、指向性マイク５１ａで受け付けた音声が、乗員が発した音声であるか否かを判定する。つまり、発話音声判定部１２ａは、乗員が発話しているか否かを判定する。乗員が発話していると判定された場合には処理がステップＳＴ４に進み、乗員が発話していないと判定された場合には処理がステップＳＴ１に戻る。 In step ST3, the utterance voice determination unit 12a receives the voice received by the directional microphone 51a based on the face information acquired by the face information acquisition unit 11a and the voice received by the directional microphone 51a by the occupant. It is determined whether or not the voice is emitted. That is, the utterance voice determination unit 12a determines whether or not the occupant is speaking. If it is determined that the occupant is speaking, the process proceeds to step ST4, and if it is determined that the occupant is not speaking, the process returns to step ST1.

ステップＳＴ４にて、話者聴者認識部１２ｂは、顔情報取得部１１ａで取得された顔情報と、発話音声判定部１２ａの判定の結果とに基づいて話者と聴者とを判定する。 In step ST4, the speaker / listener recognition unit 12b determines the speaker and the listener based on the face information acquired by the face information acquisition unit 11a and the determination result of the utterance voice determination unit 12a.

ステップＳＴ５にて、駆動制御部１３ｃは、話者がハンドフリー通話をしているか否かを判定する。話者がハンドフリー通話をしていると判定された場合には処理がステップＳＴ６に進み、話者がハンドフリー通話をしていないと判定された場合には処理がステップＳＴ７に進む。 In step ST5, the drive control unit 13c determines whether or not the speaker is making a hands-free call. If it is determined that the speaker is making a hands-free call, the process proceeds to step ST6, and if it is determined that the speaker is not making a hands-free call, the process proceeds to step ST7.

ステップＳＴ６にて、駆動制御部１３ｃは消音音声駆動部１３ａを駆動する制御を行う。これにより、消音音声駆動部１３ａは、話者聴者認識部１２ｂによる判定の結果に基づいて、話者からの音声を打ち消す音声を、スピーカアレイ５２ａから全ての聴者に出力させる駆動を行う。その後、処理がステップＳＴ１に戻る。 In step ST6, the drive control unit 13c controls to drive the mute voice drive unit 13a. As a result, the muffling voice driving unit 13a drives all listeners to output the voice for canceling the voice from the speaker based on the result of the determination by the speaker listener recognition unit 12b. After that, the process returns to step ST1.

ステップＳＴ７にて、駆動制御部１３ｃは音声駆動部１３ｂを駆動する制御を行う。これにより、音声駆動部１３ｂは、話者聴者認識部１２ｂによる判定の結果に基づいて、話者からの音声を、スピーカアレイ５２ａから全ての聴者に出力させる駆動を行う。その後、処理がステップＳＴ１に戻る。 In step ST7, the drive control unit 13c controls to drive the voice drive unit 13b. As a result, the voice driving unit 13b drives the speaker array 52a to output the voice from the speaker to all listeners based on the result of the determination by the speaker listener recognition unit 12b. After that, the process returns to step ST1.

＜実施の形態２のまとめ＞
以上のような本実施の形態２に係る音場制御装置１によれば、話者が運転者であるか運転者以外の乗員であるかに関わらず、話者がハンドフリー通話をしているか否かに基づいて、消音音声駆動部１３ａの駆動及び音声駆動部１３ｂの駆動を制御する駆動制御を行う。このような構成によれば、任意の乗員がハンドフリー通話を行っている場合に、他の乗員に対してその会話の秘匿を行うことができる。<Summary of Embodiment 2>
According to the sound field control device 1 according to the second embodiment as described above, whether the speaker is making a hands-free call regardless of whether the speaker is a driver or a occupant other than the driver. Based on whether or not, the drive control for controlling the drive of the muffling voice driving unit 13a and the driving of the voice driving unit 13b is performed. According to such a configuration, when any occupant is making a hands-free call, the conversation can be concealed from other occupants.

＜変形例１＞
駆動制御部１３ｃは、ハンドフリー通話以外の動作に基づいて、消音音声駆動部１３ａの駆動及び音声駆動部１３ｂの駆動を制御する駆動制御を行ってもよい。以下、その例について説明する。<Modification 1>
The drive control unit 13c may perform drive control for controlling the drive of the mute voice drive unit 13a and the drive of the voice drive unit 13b based on an operation other than the hands-free call. An example thereof will be described below.

＜変形例１Ａ＞
駆動制御部１３ｃは、話者及び聴者からの操作に基づいて駆動制御を行ってもよい。例えば、駆動制御部１３ｃは、話者及び聴者の両者が図示しないアイコンやキーに対して会話許諾の意思を示す操作を行った場合には、消音音声駆動部１３ａを駆動せずに音声駆動部１３ｂを駆動する制御を行ってもよい。また例えば、駆動制御部１３ｃは、話者及び聴者の両者が図示しないアイコンやキーに対して会話拒否の意思を示す操作を行った場合には、音声駆動部１３ｂを駆動せずに消音音声駆動部１３ａを駆動する制御を行ってもよい。<Modification 1A>
The drive control unit 13c may perform drive control based on operations from the speaker and the listener. For example, when both the speaker and the listener perform an operation indicating the intention of permission to talk to an icon or key (not shown), the drive control unit 13c does not drive the mute voice drive unit 13a and is a voice drive unit. Control to drive 13b may be performed. Further, for example, when both the speaker and the listener perform an operation indicating the intention of refusing the conversation to an icon or key (not shown), the drive control unit 13c does not drive the voice drive unit 13b but drives the mute voice. Control to drive the unit 13a may be performed.

なお、話者の意思と聴者の意思とが異なる場合には、会話拒否優先規則、話者優先規則などの予め定められた規則にしたがって駆動制御を行ってもよい。ここで、会話拒否優先規則とは、話者及び聴者の少なくとも一方が会話拒否の意思を示す操作を行った場合に、音声駆動部１３ｂを駆動せずに消音音声駆動部１３ａを優先的に駆動する規則である。話者優先規則とは、聴者の操作よりも話者の操作を優先して消音音声駆動部１３ａ及び音声駆動部１３ｂの駆動制御を行う規則である。 If the intention of the speaker and the intention of the listener are different, the drive control may be performed according to a predetermined rule such as a conversation refusal priority rule or a speaker priority rule. Here, the conversation refusal priority rule means that when at least one of the speaker and the listener performs an operation indicating the intention of refusing the conversation, the muffling voice driving unit 13a is preferentially driven without driving the voice driving unit 13b. It is a rule to do. The speaker priority rule is a rule for driving and controlling the muffling voice driving unit 13a and the voice driving unit 13b by giving priority to the speaker's operation over the listener's operation.

なお、駆動制御部１３ｃは、話者及び聴者からの操作による設定を記憶することにより、それ以降、当該設定に基づいて駆動制御を行ってもよい。また、話者及び聴者からの操作は、図示しないアイコンやキーに対する操作に限ったものではなく、例えば、カメラなどを用いて話者及び聴者から検出されたジェスチャー操作などであってもよい。ジェスチャー操作は、例えば、会話許諾を示すジェスチャー操作として、指でＯＫマークを作成するジェスチャー操作を含んでもよいし、会話拒否を示すジェスチャー操作として、手で払いのけるようなジェスチャー操作を含んでもよい。なお、ジェスチャー操作は、音場制御装置１が位置する範囲（例えば地域、国）によって適宜変更されてもよい。 The drive control unit 13c may store the settings by the operation from the speaker and the listener, and thereafter perform the drive control based on the settings. Further, the operations from the speaker and the listener are not limited to the operations for icons and keys (not shown), and may be, for example, gesture operations detected by the speaker and the listener using a camera or the like. The gesture operation may include, for example, a gesture operation for creating an OK mark with a finger as a gesture operation for indicating a conversation permission, or may include a gesture operation for manually dispelling as a gesture operation for indicating a conversation refusal. .. The gesture operation may be appropriately changed depending on the range (for example, region, country) in which the sound field control device 1 is located.

＜変形例１Ｂ＞
駆動制御部１３ｃは、顔情報取得部１１ａで取得された顔情報に基づいて、話者の顔の表情及び聴者の顔の表情を判定し、当該判定の結果に基づいて駆動制御を行ってもよい。例えば、駆動制御部１３ｃは、話者及び聴者の一方の顔がウィンクをしていると判定した場合には、会話許諾の意思が示されていると判定して、消音音声駆動部１３ａを駆動せずに音声駆動部１３ｂを駆動する制御を行ってもよい。また例えば、駆動制御部１３ｃは、話者及び聴者の一方の顔がゆがんだ表情をしていると判定した場合には、会話拒否の意思が示されていると判定して、音声駆動部１３ｂを駆動せずに消音音声駆動部１３ａを駆動する制御を行ってもよい。<Modification 1B>
Even if the drive control unit 13c determines the facial expression of the speaker and the facial expression of the listener based on the face information acquired by the face information acquisition unit 11a, and performs drive control based on the result of the determination. good. For example, when the drive control unit 13c determines that one of the faces of the speaker and the listener is winking, the drive control unit 13c determines that the intention to permit conversation is indicated and drives the mute voice drive unit 13a. Control may be performed to drive the voice driving unit 13b without using the method. Further, for example, when the drive control unit 13c determines that one of the faces of the speaker and the listener has a distorted facial expression, the drive control unit 13c determines that the intention to refuse the conversation is indicated, and the voice drive unit 13b You may control to drive the mute voice driving unit 13a without driving.

または、駆動制御部１３ｃは、顔情報取得部１１ａで取得された顔情報に基づいて、話者の顔の向き及び聴者の顔の向きを判定し、当該判定の結果に基づいて駆動制御を行ってもよい。例えば、駆動制御部１３ｃは、話者及び聴者の一方の顔が他方に向いていると判定した場合には、会話許諾の意思が示されていると判定して、消音音声駆動部１３ａを駆動せずに音声駆動部１３ｂを駆動する制御を行ってもよい。また例えば、駆動制御部１３ｃは、話者及び聴者の一方の顔が他方から背けていると判定した場合には、会話拒否の意思が示されていると判定して、音声駆動部１３ｂを駆動せずに消音音声駆動部１３ａを駆動する制御を行ってもよい。 Alternatively, the drive control unit 13c determines the orientation of the speaker's face and the orientation of the listener's face based on the face information acquired by the face information acquisition unit 11a, and performs drive control based on the result of the determination. You may. For example, when the drive control unit 13c determines that one of the faces of the speaker and the listener is facing the other, the drive control unit 13c determines that the intention to permit conversation is indicated and drives the mute voice drive unit 13a. Control may be performed to drive the voice driving unit 13b without using the method. Further, for example, when the drive control unit 13c determines that one of the faces of the speaker and the listener is turned away from the other, the drive control unit 13c determines that the intention of refusing the conversation is indicated and drives the voice drive unit 13b. You may control to drive the mute voice driving unit 13a without doing so.

または、駆動制御部１３ｃは、顔情報取得部１１ａで取得された顔情報に基づいて、話者の顔の動き及び聴者の顔の動きを判定し、当該判定の結果に基づいて駆動制御を行ってもよい。例えば、駆動制御部１３ｃは、話者及び聴者の一方の顔が他方に近づいた、または、縦方向に揺れたと判定した場合には、会話許諾の意思が示されていると判定して、消音音声駆動部１３ａを駆動せずに音声駆動部１３ｂを駆動する制御を行ってもよい。また例えば、駆動制御部１３ｃは、話者及び聴者の一方の顔が他方から離れた、または、横方向に揺れたと判定した場合には、会話拒否の意思が示されていると判定して、音声駆動部１３ｂを駆動せずに消音音声駆動部１３ａを駆動する制御を行ってもよい。 Alternatively, the drive control unit 13c determines the movement of the speaker's face and the movement of the listener's face based on the face information acquired by the face information acquisition unit 11a, and performs drive control based on the result of the determination. You may. For example, when the drive control unit 13c determines that one of the faces of the speaker and the listener approaches the other or sways in the vertical direction, the drive control unit 13c determines that the intention to permit conversation is indicated and mute the sound. Control may be performed to drive the voice driving unit 13b without driving the voice driving unit 13a. Further, for example, when the drive control unit 13c determines that one of the faces of the speaker and the listener is separated from the other or sways in the lateral direction, the drive control unit 13c determines that the intention of refusing the conversation is indicated. Control may be performed to drive the mute voice driving unit 13a without driving the voice driving unit 13b.

なお本変形例１Ｂにおいて、話者の意思と聴者の意思とが異なる場合には、変形例１Ａと同様に会話拒否優先規則、話者優先規則などの予め定められた規則にしたがって駆動制御を行ってもよい。 In the present modification 1B, when the intention of the speaker and the intention of the listener are different, the drive control is performed according to predetermined rules such as the conversation refusal priority rule and the speaker priority rule as in the modification 1A. You may.

＜変形例１Ｃ＞
駆動制御部１３ｃは、話者及び聴者からの音声の内容に基づいて駆動制御を行ってもよい。例えば、駆動制御部１３ｃは、話者が聴者の名前を呼んだ、または、聴者が名前の呼びかけに返事したと判定した場合には、会話許諾の意思が示されていると判定して、消音音声駆動部１３ａを駆動せずに音声駆動部１３ｂを駆動する制御を行ってもよい。また例えば、駆動制御部１３ｃは、話者が聴者の名前を呼ばなかった、または、聴者が名前の呼びかけに返事しなかったと判定した場合には、会話拒否の意思が示されていると判定して、音声駆動部１３ｂを駆動せずに消音音声駆動部１３ａを駆動する制御を行ってもよい。<Modification 1C>
The drive control unit 13c may perform drive control based on the contents of voices from the speaker and the listener. For example, when the drive control unit 13c determines that the speaker has called the listener's name or the listener has responded to the call for the name, the drive control unit 13c determines that the intention to permit the conversation is indicated and mute the sound. Control may be performed to drive the voice driving unit 13b without driving the voice driving unit 13a. Further, for example, when the drive control unit 13c determines that the speaker did not call the listener's name or the listener did not respond to the call for the name, the drive control unit 13c determined that the intention to refuse the conversation was indicated. Alternatively, control may be performed to drive the muffling voice driving unit 13a without driving the voice driving unit 13b.

なお本変形例１Ｃにおいて、話者の意思と聴者の意思とが異なる場合には、変形例１Ａと同様に会話拒否優先規則、話者優先規則などの予め定められた規則にしたがって駆動制御を行ってもよい。 In the present modification 1C, when the intention of the speaker and the intention of the listener are different, the drive control is performed according to predetermined rules such as the conversation refusal priority rule and the speaker priority rule as in the modification 1A. You may.

＜変形例２＞
実施の形態２では、駆動制御部１３ｃは、話者がハンドフリー通話をしていると判定した場合に、全ての聴者に対して一律に駆動制御を行ったが、これに限ったものではない。例えば、駆動制御部１３ｃは、駆動制御を聴者ごとに行ってもよい。つまり、駆動制御部１３ｃは、ある聴者に対しては音声駆動部１３ｂを駆動する制御を行い、それと同時に、別の聴者に対しては消音音声駆動部１３ａを駆動する制御を行ってもよい。このような構成によれば、消音音声駆動部１３ａの駆動及び音声駆動部１３ｂの駆動という、相反する駆動を同時に行うことができるので、車両内の快適な通話空間を実現することができる。<Modification 2>
In the second embodiment, the drive control unit 13c uniformly performs drive control to all listeners when it is determined that the speaker is making a hands-free call, but the present invention is not limited to this. .. For example, the drive control unit 13c may perform drive control for each listener. That is, the drive control unit 13c may control to drive the voice drive unit 13b for a certain listener, and at the same time, control to drive the mute voice drive unit 13a for another listener. According to such a configuration, it is possible to simultaneously perform contradictory drives of driving the muffling voice driving unit 13a and driving the voice driving unit 13b, so that a comfortable talking space in the vehicle can be realized.

＜変形例３＞
例えば、車両内において音楽が出力されている間に、音声駆動部１３ｂを駆動して話者からの音声をスピーカアレイ５２ａから聴者に出力させる場合、音場制御装置１は、音楽のボリュームを下げる制御を行ってもよい。また、駆動制御部１３ｃは、音声駆動部１３ｂを駆動して話者からの音声を出力する場合に、消音音声駆動部１３ａを駆動してノイズキャンセル（例えば車両のエンジン音や走行に伴う騒音のキャンセル）を行ってもよい。<Modification 3>
For example, when the voice drive unit 13b is driven to output the voice from the speaker to the listener from the speaker array 52a while the music is being output in the vehicle, the sound field control device 1 lowers the volume of the music. Control may be performed. Further, when the drive control unit 13c drives the voice drive unit 13b to output the voice from the speaker, the drive control unit 13c drives the mute voice drive unit 13a to cancel noise (for example, noise caused by vehicle engine noise or running). Cancellation) may be performed.

＜変形例４＞
音声駆動部１３ｂは、聴者の聴力及び嗜好などの特性に基づいて、スピーカアレイ５２ａから当該聴者に出力される音声の周波数特性を変更してもよい。例えば、聴者が低周波数の音を聞き取りにくい人である場合には、音声駆動部１３ｂは、スピーカアレイ５２ａから当該聴者に出力される音声の低周波数成分を大きくしてもよい。<Modification example 4>
The voice driving unit 13b may change the frequency characteristics of the voice output from the speaker array 52a to the listener based on the characteristics such as the hearing ability and the taste of the listener. For example, when the listener is a person who has difficulty in hearing the low frequency sound, the voice driving unit 13b may increase the low frequency component of the voice output to the listener from the speaker array 52a.

なお、聴者の特性は、当該聴者のスマートフォン及び補聴器から取得してもよいし、ユーザからの操作によって音場制御装置１に予め記憶されていてもよい。 The characteristics of the listener may be acquired from the listener's smartphone and hearing aid, or may be stored in advance in the sound field control device 1 by an operation from the user.

＜実施の形態３＞
図４は、上述した実施の形態２に係る音場制御装置１の動作を説明するための図であり、図５は、本発明の実施の形態３に係る音場制御装置１の動作を説明するための図である。なお、本実施の形態３に係る音場制御装置１のブロック構成は、実施の形態２に係る音場制御装置１のブロック構成（図２）と同じである。このため、以下、本実施の形態３に係る構成要素のうち、上述の構成要素と同じまたは類似する構成要素については同じまたは類似する参照符号を付し、異なる構成要素について主に説明する。<Embodiment 3>
FIG. 4 is a diagram for explaining the operation of the sound field control device 1 according to the second embodiment described above, and FIG. 5 is a diagram for explaining the operation of the sound field control device 1 according to the third embodiment of the present invention. It is a figure to do. The block configuration of the sound field control device 1 according to the third embodiment is the same as the block configuration (FIG. 2) of the sound field control device 1 according to the second embodiment. Therefore, among the components according to the third embodiment, the same or similar components as those described above are designated by the same or similar reference numerals, and different components will be mainly described.

まず図４を用いて実施の形態２に係る音場制御装置１の動作について説明する。指向性マイク５１ａで受け付ける音声Ｖｓは、話者２１から指向性マイク５１ａに達した音声Ｓ１と、車両の外部から指向性マイク５１ａに達した車外音Ｎ１とが重畳した波形を有する。このため、音声Ｖｓは、
Ｖｓ＝Ｓ１＋Ｎ１・・・（１）
と表すことができる。First, the operation of the sound field control device 1 according to the second embodiment will be described with reference to FIG. The voice Vs received by the directional microphone 51a has a waveform in which the voice S1 reaching the directional microphone 51a from the speaker 21 and the outside sound N1 reaching the directional microphone 51a from the outside of the vehicle are superimposed. Therefore, the voice Vs is
Vs = S1 + N1 ... (1)
It can be expressed as.

消音音声駆動部１３ａが駆動しない場合に、聴者２２が聞こえる音声ｈ１は、話者２１から聴者２２の耳の位置に達した音声Ｓ２と、車両の外部から聴者２２の耳の位置に達した車外音Ｎ２とが重畳した波形を有する。このため、音声ｈ１は、
ｈ１＝Ｓ２＋Ｎ２・・・（２）
と表すことができる。When the muffling voice driving unit 13a is not driven, the voice h1 that the listener 22 can hear is the voice S2 that reaches the position of the ear of the listener 22 from the speaker 21 and the outside of the vehicle that reaches the position of the ear of the listener 22 from the outside of the vehicle. It has a waveform superimposed on the sound N2. Therefore, the voice h1 is
h1 = S2 + N2 ... (2)
It can be expressed as.

音場制御装置１は、上式（１）の音声Ｖｓに基づいてスピーカアレイ５２ａから聴者２２の耳の位置に出力される音声Ｖａを生成する。この音声Ｖａは、
Ｖａ＝ｆ（Ｖｓ）＝ｆ（Ｓ１＋Ｎ１）≒ｆ１（Ｓ１）＋ｆ２（Ｎ１）・・・（３）
と表すことができる。The sound field control device 1 generates a voice Va output from the speaker array 52a to the position of the ear of the listener 22 based on the voice Vs of the above equation (1). This voice Va is
Va = f (Vs) = f (S1 + N1) ≒ f1 (S1) + f2 (N1) ... (3)
It can be expressed as.

この音声Ｖａは、音声Ｓ２及び車外音Ｎ１を打ち消すので、
ｆ１（Ｓ１）≒－Ｓ２・・・（４）
ｆ２（Ｎ１）≒－Ｎ１・・・（５）
という関係が成り立つ。Since this voice Va cancels the voice S2 and the outside sound N1,
f1 (S1) ≈ −S2 ・・・ (4)
f2 (N1) ≒ -N1 ... (5)
The relationship holds.

このため、消音音声駆動部１３ａが駆動している場合に、聴者２２が聞こえる音声ｈ２は、
ｈ２＝Ｓ２＋Ｎ２＋Ｖａ
＝Ｓ２＋Ｎ２＋ｆ１（Ｓ１）＋ｆ２（Ｎ１）
＝｛Ｓ２＋ｆ１（Ｓ１）｝＋｛Ｎ２＋ｆ２（Ｎ１）｝
≒０＋（Ｎ２－Ｎ１）・・・（６）
となる。なお、式（６）の最後の行では、式（４）及び式（５）を適用した。Therefore, when the muffling voice driving unit 13a is being driven, the voice h2 heard by the listener 22 is
h2 = S2 + N2 + Va
= S2 + N2 + f1 (S1) + f2 (N1)
= {S2 + f1 (S1)} + {N2 + f2 (N1)}
≒ 0 + (N2-N1) ・・・ (6)
Will be. In the last line of the equation (6), the equations (4) and (5) were applied.

車外音は乗員の位置によらずほぼ同一であると仮定すると、Ｎ２≒Ｎ１となることから、式（６）の音声ｈ２は、
ｈ２≒０・・・（７）
となる。Assuming that the sound outside the vehicle is almost the same regardless of the position of the occupant, N2 ≈ N1. Therefore, the voice h2 in the equation (6) is
h2 ≒ 0 ・・・ (7)
Will be.

このため、聴者２２には、話者２１からの音声Ｓ２だけでなく車外音Ｎ２も消去されることになる。しかしながら、例えば聴者２２が運転者である場合には、他車両のクラクション、衝突音、急ブレーキ音や、緊急車両のサイレンなどの運転に関係する車外音が、運転者（聴者２２）に聞こえることが望ましい。 Therefore, for the listener 22, not only the voice S2 from the speaker 21 but also the outside sound N2 is erased. However, for example, when the listener 22 is the driver, the driver (listener 22) can hear the horn, collision sound, sudden braking sound of another vehicle, and the outside sound related to driving such as the siren of the emergency vehicle. Is desirable.

そこで、本実施の形態３では、消音音声駆動部１３ａは、車両の外部からの音声を打ち消さずに話者からの音声を打ち消す音声を、スピーカアレイ５２ａから聴者に出力させる駆動を行う。これにより、条件付きまたは条件なしで、聴者２２は車外音を聞くことが可能となっている。 Therefore, in the third embodiment, the muffling voice driving unit 13a drives the listener to output the voice that cancels the voice from the speaker without canceling the voice from the outside of the vehicle from the speaker array 52a. This allows the listener 22 to hear the outside sound conditionally or unconditionally.

以下、図５を用いて本実施の形態３に係る音場制御装置１の動作について説明する。本実施の形態３では、車外音を積極的に受け付けるためのマイク５６が車両に設けられている。マイク５６は、例えば、無指向性のマイクであり、指向性マイク５１ａよりも車外音を受け付けやすい位置に設けられる。 Hereinafter, the operation of the sound field control device 1 according to the third embodiment will be described with reference to FIG. In the third embodiment, the vehicle is provided with a microphone 56 for positively receiving the sound outside the vehicle. The microphone 56 is, for example, an omnidirectional microphone, and is provided at a position where it is easier to receive outside sound than the directional microphone 51a.

このマイク５６で受け付ける音声Ｖｒは、話者２１からマイク５６に達した音声Ｓ３と、車両の外部からマイク５６に達した車外音Ｎ３とが重畳した波形を有する。このため、音声Ｖｒは、
Ｖｒ＝Ｓ３＋Ｎ３・・・（８）
と表すことができる。The voice Vr received by the microphone 56 has a waveform in which the voice S3 reaching the microphone 56 from the speaker 21 and the outside sound N3 reaching the microphone 56 from the outside of the vehicle are superimposed. Therefore, the voice Vr is
Vr = S3 + N3 ... (8)
It can be expressed as.

本実施の形態３に係る音場制御装置１は、音声Ｖｒと音声Ｖｓとを入力音声として扱い、その差分の音声Ｖｍｉｘを示す信号を生成する。この音声Ｖｍｉｘは、
Ｖｍｉｘ＝Ｖｓ－Ｖｒ
＝Ｓ１＋Ｎ１－（Ｓ３＋Ｎ３）・・・（９）
と表すことができる。The sound field control device 1 according to the third embodiment treats the voice Vr and the voice Vs as input voices, and generates a signal indicating the voice Vmix of the difference. This voice Vmix is
Vmix = Vs-Vr
= S1 + N1- (S3 + N3) ... (9)
It can be expressed as.

通常であれば、音場制御装置１は、音声Ｓ３を含めた音響フィルタを考慮して駆動制御ひいては音場制御を行うが、説明を簡単にするためＳ３を無視して、Ｓ３≒０とする。また、車外音は乗員の位置によらずほぼ同一であると仮定すると、Ｎ１≒Ｎ３となる。これらから式（９）の音声Ｖｍｉｘは、
Ｖｍｉｘ≒Ｓ１・・・（１０）
と表すことができる。Normally, the sound field control device 1 performs drive control and thus sound field control in consideration of an acoustic filter including the voice S3, but for the sake of simplicity of explanation, S3 is ignored and S3≈0 is set. .. Further, assuming that the sound outside the vehicle is almost the same regardless of the position of the occupant, N1 ≈ N3. From these, the audio Vmix of equation (9) is
Vmix ≒ S1 ... (10)
It can be expressed as.

音場制御装置１が、この音声Ｖｍｉｘの信号を用いて消音音声駆動部１３ａを駆動すると、スピーカアレイ５２ａから聴者２２の耳の位置に出力される音声Ｖａは、
Ｖａ＝ｆ（Ｖｍｉｘ）≒ｆ１（Ｓ１）≒－Ｓ２・・・（１１）
と表すことができる。When the sound field control device 1 drives the muffling voice driving unit 13a using the voice Vmix signal, the voice Va output from the speaker array 52a to the position of the ear of the listener 22 becomes.
Va = f (Vmix) ≒ f1 (S1) ≒ −S2 ・・・ (11)
It can be expressed as.

したがって、消音音声駆動部１３ａが駆動している場合に、聴者２２が聞こえる音声ｈ２は、
ｈ２＝Ｓ２＋Ｎ２＋Ｖａ≒Ｎ２・・・（１２）
となる。Therefore, when the muffling voice driving unit 13a is being driven, the voice h2 heard by the listener 22 is
h2 = S2 + N2 + Va≈N2 ... (12)
Will be.

このため、聴者２２には、話者２１からの音声Ｓ２は聞こえないが、車外音Ｎ２を聞くことができる。 Therefore, the listener 22 cannot hear the voice S2 from the speaker 21, but can hear the outside sound N2.

＜実施の形態３のまとめ＞
以上のような本実施の形態３に係る音場制御装置１によれば、車外音を打ち消さずに話者２１からの音声を打ち消す音声を、スピーカアレイ５２ａから聴者２２に出力させる駆動を行う。このような構成によれば、聴者２２は、他車両のクラクション、衝突音、急ブレーキ音や、緊急車両のサイレンなど、運転者の運転に関係する車外音を聞くことができる。<Summary of Embodiment 3>
According to the sound field control device 1 according to the third embodiment as described above, the speaker array 52a drives the listener 22 to output the sound that cancels the sound from the speaker 21 without canceling the sound outside the vehicle. According to such a configuration, the listener 22 can hear outside vehicle sounds related to the driver's driving, such as horns, collision sounds, sudden braking sounds of other vehicles, and sirens of emergency vehicles.

＜その他の変形例＞
上述した図１の取得部１１、判定制御部１２、及び、音場制御部１３を、以下「取得部１１等」と記す。取得部１１等は、図６に示す処理回路８１により実現される。すなわち、処理回路８１は、車両内の複数の乗員の撮影画像から、複数の乗員の顔情報を取得する取得部１１と、取得部１１で取得された顔情報と、車両内のマイクで受け付けた音声とに基づいて、複数の乗員の中から話者と聴者とを判定する判定制御部１２と、判定制御部１２による判定の結果に基づいて、複数の乗員の位置における音場を制御する音場制御部１３と、を備える。処理回路８１には、専用のハードウェアが適用されてもよいし、メモリに格納されるプログラムを実行するプロセッサが適用されてもよい。プロセッサには、例えば、中央処理装置、処理装置、演算装置、マイクロプロセッサ、マイクロコンピュータ、ＤＳＰ（Digital Signal Processor）などが該当する。<Other variants>
The acquisition unit 11, the determination control unit 12, and the sound field control unit 13 of FIG. 1 described above are hereinafter referred to as “acquisition unit 11 and the like”. The acquisition unit 11 and the like are realized by the processing circuit 81 shown in FIG. That is, the processing circuit 81 receives the acquisition unit 11 that acquires the face information of the plurality of occupants from the images taken by the plurality of occupants in the vehicle, the face information acquired by the acquisition unit 11, and the microphone in the vehicle. A sound that controls the sound field at the positions of a plurality of occupants based on the determination control unit 12 that determines a speaker and a listener from among a plurality of occupants based on the voice and the determination result by the determination control unit 12. A field control unit 13 is provided. Dedicated hardware may be applied to the processing circuit 81, or a processor that executes a program stored in the memory may be applied to the processing circuit 81. The processor corresponds to, for example, a central processing unit, a processing unit, an arithmetic unit, a microprocessor, a microcomputer, a DSP (Digital Signal Processor), and the like.

処理回路８１が専用のハードウェアである場合、処理回路８１は、例えば、単一回路、複合回路、プログラム化したプロセッサ、並列プログラム化したプロセッサ、ＡＳＩＣ（Application Specific Integrated Circuit）、ＦＰＧＡ（Field Programmable Gate Array）、またはこれらを組み合わせたものが該当する。取得部１１等の各部の機能それぞれは、処理回路を分散させた回路で実現されてもよいし、各部の機能をまとめて一つの処理回路で実現されてもよい。 When the processing circuit 81 is dedicated hardware, the processing circuit 81 may be, for example, a single circuit, a composite circuit, a programmed processor, a parallel programmed processor, an ASIC (Application Specific Integrated Circuit), or an FPGA (Field Programmable Gate). Array), or a combination of these. Each of the functions of each part such as the acquisition unit 11 may be realized by a circuit in which processing circuits are distributed, or the functions of each part may be collectively realized by one processing circuit.

処理回路８１がプロセッサである場合、取得部１１等の機能は、ソフトウェア等との組み合わせにより実現される。なお、ソフトウェア等には、例えば、ソフトウェア、ファームウェア、または、ソフトウェア及びファームウェアが該当する。ソフトウェア等はプログラムとして記述され、メモリに格納される。図７に示すように、処理回路８１に適用されるプロセッサ８２は、メモリ８３に記憶されたプログラムを読み出して実行することにより、各部の機能を実現する。すなわち、音場制御装置１は、処理回路８１により実行されるときに、車両内の複数の乗員の撮影画像から、複数の乗員の顔情報を取得するステップと、取得された顔情報と、車両内のマイクで受け付けた音声とに基づいて、複数の乗員の中から話者と聴者とを判定するステップと、話者及び聴者の判定の結果に基づいて、複数の乗員の位置における音場を制御するステップと、が結果的に実行されることになるプログラムを格納するためのメモリ８３を備える。換言すれば、このプログラムは、取得部１１等の手順や方法をコンピュータに実行させるものであるともいえる。ここで、メモリ８３は、例えば、ＲＡＭ（Random Access Memory）、ＲＯＭ（Read Only Memory）、フラッシュメモリ、ＥＰＲＯＭ（Erasable Programmable Read Only Memory）、ＥＥＰＲＯＭ（Electrically Erasable Programmable Read Only Memory）などの、不揮発性または揮発性の半導体メモリ、ＨＤＤ（Hard Disk Drive）、磁気ディスク、フレキシブルディスク、光ディスク、コンパクトディスク、ミニディスク、ＤＶＤ（Digital Versatile Disc）、そのドライブ装置等、または、今後使用されるあらゆる記憶媒体であってもよい。 When the processing circuit 81 is a processor, the functions of the acquisition unit 11 and the like are realized by combining with software and the like. The software or the like corresponds to, for example, software, firmware, or software and firmware. Software and the like are described as programs and stored in memory. As shown in FIG. 7, the processor 82 applied to the processing circuit 81 realizes the functions of each part by reading and executing the program stored in the memory 83. That is, when the sound field control device 1 is executed by the processing circuit 81, the step of acquiring the face information of the plurality of occupants from the captured images of the plurality of occupants in the vehicle, the acquired face information, and the vehicle. Based on the voice received by the microphone inside, the step of determining the speaker and the listener from among a plurality of occupants, and the sound field at the positions of the plurality of occupants based on the result of the determination of the speaker and the listener. It includes a step to control and a memory 83 for storing a program that will be executed as a result. In other words, it can be said that this program causes the computer to execute the procedure or method of the acquisition unit 11 or the like. Here, the memory 83 is, for example, non-volatile such as RAM (Random Access Memory), ROM (Read Only Memory), flash memory, EPROM (Erasable Programmable Read Only Memory), and EEPROM (Electrically Erasable Programmable Read Only Memory). Volatile semiconductor memory, HDD (Hard Disk Drive), magnetic disk, flexible disk, optical disk, compact disk, mini disk, DVD (Digital Versatile Disc), its drive device, etc., or any storage medium used in the future. You may.

以上、取得部１１等の各機能が、ハードウェア及びソフトウェア等のいずれか一方で実現される構成について説明した。しかしこれに限ったものではなく、取得部１１等の一部を専用のハードウェアで実現し、別の一部をソフトウェア等で実現する構成であってもよい。例えば、取得部１１については専用のハードウェアとしての処理回路８１、インターフェース及びレシーバなどでその機能を実現し、それ以外についてはプロセッサ８２としての処理回路８１がメモリ８３に格納されたプログラムを読み出して実行することによってその機能を実現することが可能である。 The configuration in which each function of the acquisition unit 11 and the like is realized by either hardware or software has been described above. However, the present invention is not limited to this, and a configuration may be configured in which a part of the acquisition unit 11 or the like is realized by dedicated hardware and another part is realized by software or the like. For example, the acquisition unit 11 realizes its function by a processing circuit 81 as dedicated hardware, an interface, a receiver, and the like, and in other cases, the processing circuit 81 as a processor 82 reads a program stored in the memory 83. It is possible to realize the function by executing it.

以上のように、処理回路８１は、ハードウェア、ソフトウェア等、またはこれらの組み合わせによって、上述の各機能を実現することができる。 As described above, the processing circuit 81 can realize each of the above-mentioned functions by hardware, software, or a combination thereof.

また、以上で説明した音場制御装置１は、ＤＭＳ（Driver Monitoring System）装置などの車両装置と、携帯電話、スマートフォン及びタブレットなどの携帯端末を含む通信端末と、車両装置及び通信端末の少なくとも１つにインストールされるアプリケーションの機能と、サーバとを適宜に組み合わせてシステムとして構築される音場制御システムにも適用することができる。この場合、以上で説明した音場制御装置１の各機能あるいは各構成要素は、前記システムを構築する各機器に分散して配置されてもよいし、いずれかの機器に集中して配置されてもよい。 Further, the sound field control device 1 described above includes at least one of a vehicle device such as a DMS (Driver Monitoring System) device, a communication terminal including a mobile terminal such as a mobile phone, a smartphone and a tablet, and a vehicle device and a communication terminal. It can also be applied to a sound field control system constructed as a system by appropriately combining the functions of the application to be installed and the server. In this case, each function or each component of the sound field control device 1 described above may be distributed and arranged in each device for constructing the system, or may be centrally arranged in any of the devices. May be good.

図８は、本変形例に係るサーバ９１の構成を示すブロック図である。図８のサーバ９１は、通信部９１ａと制御部９１ｂとを備えており、車両９２のＤＭＳ装置などの車両装置９３と無線通信を行うことが可能となっている。 FIG. 8 is a block diagram showing the configuration of the server 91 according to this modification. The server 91 of FIG. 8 includes a communication unit 91a and a control unit 91b, and can perform wireless communication with a vehicle device 93 such as a DMS device of the vehicle 92.

取得部である通信部９１ａは、車両装置９３と無線通信を行うことにより、車両装置９３で取得された車両内の複数の乗員の顔情報と車両内の音声とを受信する。 The communication unit 91a, which is an acquisition unit, receives the face information of a plurality of occupants in the vehicle acquired by the vehicle device 93 and the voice in the vehicle by performing wireless communication with the vehicle device 93.

制御部９１ｂは、サーバ９１の図示しないプロセッサなどが、サーバ９１の図示しないメモリに記憶されたプログラムを実行することにより、図１の判定制御部１２及び音場制御部１３と同様の機能を有している。つまり、制御部９１ｂは、通信部９１ａで受信された顔情報及び音声に基づいて車両内の複数の乗員の中から話者と聴者とを判定し、その判定の結果に基づいて複数の乗員の位置における音場を制御するための制御信号を生成する。そして、通信部９１ａは、制御部９１ｂで生成された制御信号を車両装置９３に送信する。このように構成されたサーバ９１によれば、実施の形態１で説明した音場制御装置１と同様の効果を得ることができる。 The control unit 91b has the same functions as the determination control unit 12 and the sound field control unit 13 of FIG. 1 when a processor (not shown) of the server 91 or the like executes a program stored in a memory (not shown) of the server 91. is doing. That is, the control unit 91b determines the speaker and the listener from among the plurality of occupants in the vehicle based on the face information and the voice received by the communication unit 91a, and based on the result of the determination, the plurality of occupants. Generates a control signal to control the sound field at the position. Then, the communication unit 91a transmits the control signal generated by the control unit 91b to the vehicle device 93. According to the server 91 configured in this way, the same effect as that of the sound field control device 1 described in the first embodiment can be obtained.

図９は、本変形例に係る通信端末９６の構成を示すブロック図である。図９の通信端末９６は、通信部９１ａと同様の通信部９６ａと、制御部９１ｂと同様の制御部９６ｂとを備えており、車両９７の車両装置９８と無線通信を行うことが可能となっている。なお、通信端末９６には、例えば車両９７の運転者が携帯する携帯電話、スマートフォン、及びタブレットなどの携帯端末が適用される。このように構成された通信端末９６によれば、実施の形態１で説明した音場制御装置１と同様の効果を得ることができる。 FIG. 9 is a block diagram showing the configuration of the communication terminal 96 according to this modification. The communication terminal 96 of FIG. 9 includes a communication unit 96a similar to the communication unit 91a and a control unit 96b similar to the control unit 91b, and can perform wireless communication with the vehicle device 98 of the vehicle 97. ing. The communication terminal 96 is, for example, a mobile terminal such as a mobile phone, a smartphone, or a tablet carried by the driver of the vehicle 97. According to the communication terminal 96 configured in this way, the same effect as that of the sound field control device 1 described in the first embodiment can be obtained.

なお、本発明は、その発明の範囲内において、各実施の形態及び各変形例を自由に組み合わせたり、各実施の形態及び各変形例を適宜、変形、省略したりすることが可能である。 In the present invention, each embodiment and each modification can be freely combined, and each embodiment and each modification can be appropriately modified or omitted within the scope of the invention.

本発明は詳細に説明されたが、上記した説明は、すべての態様において、例示であって、本発明がそれに限定されるものではない。例示されていない無数の変形例が、本発明の範囲から外れることなく想定され得るものと解される。 Although the present invention has been described in detail, the above description is exemplary in all embodiments and the present invention is not limited thereto. It is understood that innumerable variations not illustrated can be assumed without departing from the scope of the present invention.

１音場制御装置、１１取得部、１２判定制御部、１３音場制御部、１３ａ消音音声駆動部、１３ｂ音声駆動部、１３ｃ駆動制御部、５１マイク、５２スピーカ。 1 Sound field control device, 11 Acquisition unit, 12 Judgment control unit, 13 Sound field control unit, 13a Mute voice drive unit, 13b Voice drive unit, 13c Drive control unit, 51 Microphone, 52 Speaker.

Claims

A sound field control device that controls a sound field at a specific position in the vehicle, which is generated by a speaker in the vehicle.
An acquisition unit that acquires face information of the plurality of occupants from images taken by the plurality of occupants in the vehicle, and an acquisition unit.
A determination control unit that determines a speaker and a listener from among the plurality of occupants based on the face information acquired by the acquisition unit and the voice received by the microphone in the vehicle.
A sound field control unit that controls the sound field at the positions of the plurality of occupants based on the result of the determination by the determination control unit is provided .
The sound field control unit is
Based on the result of the determination by the determination control unit, the mute voice driving unit that drives the speaker to output the voice that cancels the voice from the speaker to the listener.
Based on the result of the determination by the determination control unit, the voice drive unit that drives the speaker to output the voice from the speaker to the listener.
A drive control unit that performs drive control for controlling the drive of the mute voice drive unit and the drive of the voice drive unit.
Including sound field control device.

The sound field control device according to claim 1 .
The mute voice drive unit is
A sound field control device that drives a speaker to output a sound that cancels a sound from a speaker without canceling a sound from the outside of the vehicle to the listener.

The sound field control device according to claim 1 .
The voice drive unit is
A sound field control device that changes the frequency characteristics of the sound output from the speaker to the listener based on the characteristics of the listener.

The sound field control device according to claim 1 .
The drive control unit
Based on the face information acquired by the acquisition unit, the facial expression of the speaker and the facial expression of the listener, the orientation of the face of the speaker and the orientation of the face of the listener, or the orientation of the speaker's face. A sound field control device that determines the movement of the face and the movement of the listener's face, and performs the drive control based on the result of the determination.

The sound field control device according to claim 1 .
The drive control unit
A sound field control device that performs the drive control based on whether or not the speaker is making a hands-free call.

The sound field control device according to claim 1 .
The drive control unit
A sound field control device that performs the drive control based on operations from the speaker and the listener.

The sound field control device according to claim 1 .
The drive control unit
A sound field control device that performs the drive control based on the contents of voices from the speaker and the listener.

The sound field control device according to claim 1 .
The drive control unit is a sound field control device that performs the drive control for each listener.

A sound field control method for controlling a sound field at a specific position in the vehicle, which is generated by a speaker in the vehicle.
The face information of the plurality of occupants is acquired from the images taken by the plurality of occupants in the vehicle, and the face information of the plurality of occupants is acquired.
Based on the acquired face information and the voice received by the microphone in the vehicle, the speaker and the listener are determined from the plurality of occupants.
Based on the judgment results of the speaker and the listener, the sound field at the positions of the plurality of occupants is controlled .
The control of the sound field at the positions of the plurality of occupants is
Based on the result of the determination, the drive to output the voice for canceling the voice from the speaker from the speaker to the listener, and based on the result of the determination, the voice from the speaker is transmitted from the speaker to the listener. A sound field control method including drive to output and drive control to control.