JP7460407B2

JP7460407B2 - Audio output device, audio output system, and audio output method

Info

Publication number: JP7460407B2
Application number: JP2020049549A
Authority: JP
Inventors: 乘西山; 剛仁寺口; 裕史井上; 雄宇志小田; 翔太大久保; 放歌陳; 純河西; 雅己岡本
Original assignee: Renault SAS
Current assignee: Renault SAS
Priority date: 2020-03-19
Filing date: 2020-03-19
Publication date: 2024-04-02
Anticipated expiration: 2040-03-19
Also published as: JP2021150834A

Description

本発明は、音響出力装置、音響出力システム及び音響出力方法に関するものである。 The present invention relates to a sound output device, a sound output system, and a sound output method.

車室内と車室外を仕切る開閉部の開閉状態を検出し、開閉部が開状態であると検出した時に、車載用オーディオ装置の音響レベルを増加するように制御する技術が知られている（例えば、特許文献１）。 There is known technology that detects the open/closed state of an opening/closing section that separates the inside and outside of the vehicle, and controls the sound level of the in-vehicle audio device to increase when the opening/closing section is detected to be open (for example, Patent Document 1).

特開２０００－９２６００号公報JP 2000-92600 A

しかしながら、特許文献１に係る技術を、現実の環境の変化が反映される仮想立体音響装置に適用する場合には、以下のような問題がある。すなわち、車両の開閉部が開状態になれば、車載用オーディオ装置の音は聞こえにくくなるのが自然であるが、当該技術では音量を大きくするという不自然な加工がされるため、現実の環境の変化に追従した仮想音響を出力することができない。 However, when the technology of Patent Document 1 is applied to a virtual stereophonic device that reflects changes in the real environment, the following problem occurs. That is, when the vehicle's door is open, it is natural that the sound of the in-car audio device becomes hard to hear, but the technology in question unnaturally increases the volume, making it impossible to output virtual sound that follows changes in the real environment.

本発明が解決しようとする課題は、現実の環境の変化に追従した仮想音響を出力することができる音響出力装置を提供することである。 The problem that this invention aims to solve is to provide a sound output device that can output virtual sound that follows changes in the real environment.

本発明は、車両を基準とした所定位置に仮想音響の音像を定位し、音響環境に関する音響環境情報を取得し、音響環境情報に応じた出力態様により、所定位置に音像が定位された仮想音響に対応する実音響を出力することによって上記課題を解決する。 The present invention localizes a sound image of a virtual sound at a predetermined position based on a vehicle, acquires acoustic environment information regarding the acoustic environment, and outputs a virtual sound in which the sound image is localized at a predetermined position using an output mode according to the acoustic environment information. The above problem is solved by outputting real sound corresponding to.

本発明によれば、現実の環境の変化に追従した仮想音響を出力することができる。 The present invention makes it possible to output virtual sound that follows changes in the real environment.

図１は、本実施形態における音響出力システムの一実施形態を示すブロック図である。FIG. 1 is a block diagram showing an embodiment of a sound output system according to the present invention. 図２Ａは、本実施形態における実施状況の一例を示す図である。FIG. 2A is a diagram showing an example of an implementation status in this embodiment. 図２Ｂは、本実施形態における実施状況の一例を示す図である。FIG. 2B is a diagram showing an example of an implementation status in this embodiment. 図２Ｃは、本実施形態における実施状況の一例を示す図である。FIG. 2C is a diagram showing an example of an implementation situation in this embodiment. 図２Ｄは、本実施形態における実施状況の一例を示す図である。FIG. 2D is a diagram showing an example of an implementation situation in this embodiment. 図３は、本実施形態に係る音響出力制御の手順を示すフローチャートである。FIG. 3 is a flowchart showing the procedure of sound output control according to this embodiment.

以下、本願発明の実施形態を図面に基づいて説明する。 The following describes an embodiment of the present invention with reference to the drawings.

図１は、本実施形態にかかる音響出力システムを示すブロック図である。音響出力システム１０００は、音響出力装置１と、端末装置２とを備える。音響出力装置１は、ユーザによって利用される装置であって、現実空間を聴覚的に拡張することができる装置である。例えば、ヘッドホンを備えるＡＲ用ヘッドセットである。音響出力装置１は、仮想音響に対応する実音響を、ヘッドホンを介して出力する。これによって、現実空間を聴覚的に拡張した拡張現実空間をユーザに提示することができる。本実施形態では、現実空間において、現実には存在しない仮想的な音源（以下、仮想音源）が配置され、仮想音源から仮想音響が出力される。つまり、仮想音響は、現実空間に設定された仮想音源の位置から聞こえてくるように知覚される音響である。現実には、ユーザにはヘッドホンから実音響が出力されるが、あたかも仮想音源の位置から仮想音響が出力されているかのように仮想音響の音像が定位されるように実音響が出力される。これによって、ユーザには、仮想音響が仮想音源の位置から聞こえてくるように感じられる。また、音響出力装置１は、ヘッドホンを備えるヘッドセットの代わりにスピーカーを用いることとしてもよいし、ユーザの声を取得するマイクを備えることとしてもよい。また、音響出力装置１は、現実空間を視覚的・聴覚的に拡張することのできる装置であってもよい。例えば、ヘッドホンを備えるＡＲ用ヘッドマウントディスプレイが挙げられる。ユーザは、ヘッドマウントディスプレイを介して、現実空間に仮想オブジェクトが存在するかのような拡張現実空間を見ることができる。 1 is a block diagram showing an audio output system according to the present embodiment. The audio output system 1000 includes an audio output device 1 and a terminal device 2. The audio output device 1 is a device used by a user and capable of auditorily expanding a real space. For example, it is an AR headset equipped with headphones. The audio output device 1 outputs real sound corresponding to virtual sound through the headphones. This allows the user to be presented with an augmented reality space in which the real space is auditorily expanded. In this embodiment, a virtual sound source that does not exist in reality (hereinafter, virtual sound source) is placed in the real space, and virtual sound is output from the virtual sound source. In other words, the virtual sound is sound that is perceived as coming from the position of the virtual sound source set in the real space. In reality, the real sound is output from the headphones to the user, but the real sound is output so that the sound image of the virtual sound is localized as if the virtual sound is being output from the position of the virtual sound source. This allows the user to feel that the virtual sound is coming from the position of the virtual sound source. In addition, the audio output device 1 may be equipped with a speaker instead of a headset equipped with headphones, or may be equipped with a microphone that acquires the user's voice. The sound output device 1 may also be a device that can visually and aurally expand real space. For example, it may be a head-mounted display for AR equipped with headphones. Through the head-mounted display, the user can view an augmented reality space in which virtual objects appear to exist in real space.

本実施形態では、車両３に乗車している乗員ユーザが音響出力装置１としてヘッドホンを備えるＡＲ用ヘッドマウントディスプレイを装着している場面を想定している。乗員ユーザは、音響出力装置１のヘッドホン及びディスプレイを介して、視覚的・聴覚的に拡張された拡張現実空間を見ることができる。例えば、乗員ユーザは、ディスプレイを介して、車両３内の隣の席に、車両３の遠隔にいる遠隔ユーザのアバターが仮想オブジェクトとして表示されている拡張現実空間を見ることができる。そして、遠隔ユーザが発した音声に基づく仮想音響が遠隔ユーザのアバターの位置から出力されるように実音響が出力されることで、乗員ユーザには、あたかも遠隔ユーザのアバターが音声を発しているかのように聞こえる。遠隔ユーザは、車両３の空間とは異なる空間、例えば、自室にいるユーザである。 In this embodiment, a scenario is assumed in which a passenger user aboard the vehicle 3 is wearing an AR head-mounted display equipped with headphones as the sound output device 1. The passenger user can view an augmented reality space that is visually and audibly augmented through the headphones and display of the sound output device 1. For example, the passenger user can view an augmented reality space through the display in which an avatar of a remote user located far away from the vehicle 3 is displayed as a virtual object in the seat next to the passenger user in the vehicle 3. Then, real sound is output so that virtual sound based on the voice emitted by the remote user is output from the position of the remote user's avatar, so that the passenger user hears as if the remote user's avatar is emitting the voice. The remote user is a user who is in a space different from the space of the vehicle 3, for example, in his or her own room.

音響出力装置１は、コントローラ１０と、出力装置１１と、通信装置１２とを備える。音響出力装置１は、車両３における音響環境の情報を取得し、音響環境に応じた出力態様により、仮想音響に対応する実音響を、ヘッドホンを介して出力する。例えば、音響出力装置１は、遠隔ユーザが発する音声に対応する仮想音響や、道路付近の標識や看板等の案内情報を聴覚的に提示する仮想音響を出力する。遠隔ユーザが発する音声に対応する仮想音響を乗員ユーザに出力することにより、乗員ユーザは、車両３内にいながら、ヘッドホンを介して遠隔ユーザの話を聞くことができる。また、案内情報等を聴覚的に示す仮想音響の例としては、道路付近に設置される標識や看板等に視覚的に記されている案内情報を音声情報としてヘッドホンを介して出力するということが挙げられる。一般的に、現実には標識や看板から音が出力されることはないが、仮想音響で案内情報を聴覚的に提示することにより、あたかも標識や看板に記されている案内情報に基づく仮想音響が出力されるかのように感じられる。 The sound output device 1 includes a controller 10, an output device 11, and a communication device 12. The sound output device 1 acquires information on the sound environment in the vehicle 3, and outputs real sound corresponding to the virtual sound via headphones in an output mode according to the sound environment. For example, the sound output device 1 outputs virtual sound corresponding to the voice emitted by a remote user, or virtual sound that audibly presents guidance information such as a sign or a sign near a road. By outputting virtual sound corresponding to the voice emitted by the remote user to the passenger user, the passenger user can listen to the remote user through the headphones while remaining in the vehicle 3. Furthermore, as an example of virtual sound that audibly indicates guidance information, etc., it is possible to output guidance information visually written on signs and billboards installed near roads as audio information through headphones. Can be mentioned. In general, signs and billboards do not actually output sound, but by presenting guidance information aurally using virtual sounds, it is possible to create virtual sounds based on the guidance information written on the signs and billboards. It feels as if it is being output.

また、音響出力装置１は、乗員ユーザが視認しているディスプレイに仮想オブジェクトの画像を表示させる制御を行うこととしてもよい。仮想のオブジェクトとしては、遠隔ユーザのアバターや交通案内等の情報を視覚的に表示したオブジェクトが挙げられる。音響出力装置１のディスプレイとしては、透過型ディスプレイや非透過型ディスプレイが挙げられる。透過型ディスプレイは、ディスプレイの背後の光景が透けて見えるディスプレイであり、当該ディスプレイ上に仮想オブジェクトを表示することができる。これにより、ユーザには、直接視認している現実の光景に仮想空間上のオブジェクトが表示されているように見える。また、非透過型ディスプレイは、ディスプレイの背後の光景を撮像した撮像画像が表示され、さらにその撮像画像の上に仮想空間上のオブジェクトが重畳表示されるものである。なお、本実施形態では、音響出力装置１は、車両に乗車しているユーザが装着するものとしているが、これに限らず、その他の開放空間あるいは閉鎖空間で装着するものであってもよい。例えば、ユーザの自室や飲食店、テーマパーク等において、ユーザは音響出力装置１を装着することとしてもよい。また、仮想オブジェクトは、例えば、実在の人間であるユーザのアバターに限らず、バーチャルエージェントであってもよい。 The sound output device 1 may also control the display of a virtual object that is visually recognized by the passenger user. Examples of virtual objects include an avatar of a remote user and an object that visually displays information such as traffic information. Examples of the display of the sound output device 1 include a transparent display and a non-transparent display. A transparent display is a display through which the scene behind the display can be seen, and a virtual object can be displayed on the display. As a result, the user sees an object in a virtual space displayed on the real scene that is directly recognized. A non-transparent display displays an image of the scene behind the display, and further displays an object in a virtual space superimposed on the image. In this embodiment, the sound output device 1 is worn by a user riding in a vehicle, but this is not limited to this, and the sound output device 1 may be worn in other open or closed spaces. For example, the user may wear the sound output device 1 in the user's own room, a restaurant, a theme park, etc. In addition, the virtual object is not limited to the avatar of the user, who is a real person, and may be a virtual agent.

コントローラ１０は、ハードウェア及びソフトウェアを有するコンピュータを備えており、このコンピュータはプログラムを格納したＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）と、ＲＯＭに格納されたプログラムを実行するＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）と、アクセス可能な記憶装置として機能するＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）を含むものである。コントローラ１０は、機能ブロックとして、音像定位部１００と、仮想音響加工部１０１と、仮想音源取得部１０２と、乗員ユーザ状態取得部１０３と、環境情報取得部１０４とを備え、上記各機能を実現する又は各処理を実行するためのソフトウェアと、ハードウェアとの協働により各機能を実行する。コントローラ１０は、仮想音源取得部１０２、乗員ユーザ状態取得部１０３及び環境情報取得部１０４によりそれぞれ取得された情報の入力を受けて、当該情報に基づいて、仮想音響の音像を所定位置に定位し、所定の出力態様に基づいて仮想音響を加工する。そして、コントローラ１０は、出力装置１１に、所定位置に音像を定位された仮想音響に対応する実音響を出力させる制御を行う。これにより、ユーザには、感覚的に所定位置から仮想音響が出力されるように聞こえる。 The controller 10 includes a computer having hardware and software, and this computer can access a ROM (Read Only Memory) that stores programs and a CPU (Central Processing Unit) that executes the programs stored in the ROM. It includes a RAM (Random Access Memory) that functions as a storage device. The controller 10 includes a sound image localization section 100, a virtual sound processing section 101, a virtual sound source acquisition section 102, a passenger user status acquisition section 103, and an environment information acquisition section 104 as functional blocks, and realizes each of the above functions. Each function is executed through cooperation between software and hardware for executing each process. The controller 10 receives input of information acquired by the virtual sound source acquisition unit 102, the passenger user state acquisition unit 103, and the environment information acquisition unit 104, and localizes the sound image of the virtual sound to a predetermined position based on the information. , processes the virtual sound based on a predetermined output mode. The controller 10 then controls the output device 11 to output real sound corresponding to the virtual sound with the sound image localized at a predetermined position. As a result, the user feels as if the virtual sound is being output from a predetermined position.

音像定位部１００は、所定位置に仮想音響の音像を定位させるように仮想音響の音声信号処理を行う。これにより、感覚的に仮想音響が所定位置から聞こえてくるように実音響が出力される。音像定位部１００は、仮想音響の音源である仮想音源の位置に仮想音響の音像を定位する。仮想音源の位置は、乗員ユーザに対する相対的な位置であり、方向や距離によって設定される。例えば、車室内の隣の座席に遠隔ユーザのアバターが提示されている場合には、仮想音源は遠隔ユーザのアバターである。また、仮想音源の位置は、遠隔ユーザのアバターの位置となる。また、仮想音響は、遠隔ユーザが発する音声に基づいて生成されるものである。このとき、音像定位部１００は、遠隔ユーザのアバターの位置に仮想音響の音像を定位させるように仮想音響の音声信号処理を行う。これにより、遠隔ユーザのアバターの位置に遠隔ユーザの音声に基づく仮想音響の音像が定位される。すなわち、乗員ユーザには、遠隔ユーザの音声が遠隔ユーザのアバターの位置から聞こえてくるように、乗員ユーザは感じられる。また、遠隔ユーザのアバターが車室外、例えば、ドアの前に立っているような場面を想定するとしてもよい。この場合には、音像定位部１００は、遠隔ユーザのアバターが立っている車室外の位置を仮想音源の位置に設定し、車室外の位置に仮想音響の音像を定位させるように仮想音響の音声信号処理を行う。これにより、遠隔ユーザの音声に基づく仮想音響の音像が車室外の位置に定位される。すなわち、乗員ユーザには、遠隔ユーザのアバターが立っている車室外の位置から遠隔ユーザの音声が聞こえてくるように感じられる。 The sound image localization unit 100 performs audio signal processing of the virtual sound so as to localize the sound image of the virtual sound at a predetermined position. As a result, the real sound is output so that the virtual sound can be heard from a predetermined position. The sound image localization unit 100 localizes a sound image of virtual sound at the position of a virtual sound source that is a sound source of virtual sound. The position of the virtual sound source is a position relative to the passenger user, and is set based on the direction and distance. For example, if a remote user's avatar is presented in an adjacent seat in the vehicle, the virtual sound source is the remote user's avatar. Further, the position of the virtual sound source is the position of the remote user's avatar. Moreover, virtual sound is generated based on the voice emitted by a remote user. At this time, the sound image localization unit 100 performs audio signal processing of the virtual sound so as to localize the sound image of the virtual sound at the position of the remote user's avatar. As a result, a sound image of virtual sound based on the remote user's voice is localized at the position of the remote user's avatar. That is, the passenger user feels as if the voice of the remote user is being heard from the position of the remote user's avatar. Alternatively, a scene may be assumed in which the remote user's avatar is standing outside the vehicle, for example, in front of the door. In this case, the sound image localization unit 100 sets the position outside the vehicle where the remote user's avatar is standing as the position of the virtual sound source, and sets the sound image of the virtual sound to the position outside the vehicle. Perform signal processing. As a result, the sound image of the virtual sound based on the voice of the remote user is localized at a position outside the vehicle interior. That is, the occupant user feels as if the remote user's voice is coming from a position outside the vehicle where the remote user's avatar is standing.

さらに、仮想音響は、車両の走行に必要な案内情報を伝える音声に基づいて生成されてもよい。車両の走行に必要な案内情報は、例えば、標識や看板等の対象物に記されている案内情報である。対象物としては、規制や指示を示す道路標識、高速道路における出口案内やサービスエリアに関する案内標識、道路沿いの店舗の看板等が例として挙げられる。また、ランドマークとなる建物等に関する案内情報であってもよい。本実施形態では、対象物ごとに案内情報が予め設定されていて、当該案内情報が地図上における対象物の位置と関連付けられている。例えば、対象物が高速道路の出口案内に関する案内標識であれば、案内情報は、「およそ１ｋｍ先に出口があります。」というような音声情報である。仮想音響が案内情報を伝える音声である場合には、音像定位部１００は、案内情報に基づく仮想音響の仮想音源の位置を所定位置に定位する。具体的には、音像定位部１００は、当該案内情報に基づく仮想音響の音像を案内標識の位置に定位させるように音声信号処理を行う。これにより、仮想音響の音像が案内標識の位置に定位される。すなわち、乗員ユーザには、案内標識の位置から案内情報の音声が聞こえてくるように感じられる。例えば、車両３が地図上の対象物の位置に接近したときに、対象物の位置に音像が定位された仮想音響に対応する実音響が出力される。 Furthermore, the virtual sound may be generated based on a voice that conveys guidance information necessary for the vehicle to travel. The guidance information necessary for the vehicle to travel is, for example, guidance information written on an object such as a sign or a billboard. Examples of objects include road signs indicating restrictions or instructions, guide signs related to exit guidance and service areas on expressways, and signs of shops along the road. The virtual sound may also be guidance information related to landmark buildings. In this embodiment, guidance information is set in advance for each object, and the guidance information is associated with the position of the object on the map. For example, if the object is a guide sign related to exit guidance on an expressway, the guidance information is audio information such as "there is an exit about 1 km ahead." When the virtual sound is a voice that conveys guidance information, the sound image localization unit 100 localizes the position of the virtual sound source of the virtual sound based on the guidance information at a predetermined position. Specifically, the sound image localization unit 100 performs audio signal processing so as to localize the sound image of the virtual sound based on the guidance information at the position of the guide sign. As a result, the sound image of the virtual sound is localized at the position of the guide sign. That is, the passenger user feels as if the voice of the guidance information is coming from the position of the guide sign. For example, when the vehicle 3 approaches the position of an object on the map, real sound corresponding to the virtual sound with a sound image localized at the position of the object is output.

仮想音響加工部１０１は、仮想音源取得部１０２、乗員ユーザ状態取得部１０３及び環境情報取得部１０４によりそれぞれ取得された情報に基づいて、仮想音響を加工する。具体的には、仮想音響加工部１０１は、まず、仮想音響に対応する実音響が出力される際の出力態様を選択する。次に、仮想音響加工部１０１は、選択された出力態様に応じて、仮想音響の音声信号処理を行う。出力態様は、例えば、出力強度、音の高さ、音の質やエコーの程度で表される。 The virtual sound processing unit 101 processes virtual sound based on the information acquired by the virtual sound source acquisition unit 102, the passenger user state acquisition unit 103, and the environment information acquisition unit 104, respectively. Specifically, the virtual sound processing unit 101 first selects the output mode when the real sound corresponding to the virtual sound is output. Next, the virtual sound processing unit 101 performs audio signal processing of the virtual sound according to the selected output mode. The output mode is expressed by, for example, output intensity, sound pitch, sound quality, and degree of echo.

図２は、音響出力装置１を装着する乗員ユーザが車室内の運転席に座っている場合を想定して、仮想音源の位置、仮想音響の方向及び音響環境の違いごとに異なる実際の状況を模式的に表した図である。仮想音響加工部１０１は、図２に示されるような状況の違いに応じて、仮想音響の出力態様を選択する。以下、仮想音源を遠隔ユーザのアバター、開閉部を車両３の窓として説明する。図２Ａは、仮想音源の位置が車室内、仮想音響が出力される方向に車両３の窓があり、かつ、車両３の窓が開いている場合を表している。例としては、遠隔ユーザのアバターが車室内の助手席の位置にいて、遠隔ユーザのアバターの顔が車両３の窓に向いている、かつ、車両３の窓が開いているというような場合である。また、図２Ｂは、例えば、遠隔ユーザのアバターが車室内の座席の位置にいて、アバターの顔が車両３の窓に向いている、かつ、車両３の窓が閉じているというような場合を表している。図２Ｃは、例えば、遠隔ユーザのアバターが車室外の位置にいて、アバターの顔が車両３に向いている、かつ、車両３の窓が開いている場合を表している。また、図２Ｄは、遠隔ユーザのアバターが車室外の位置にいて、アバターの顔が車両３の窓に向いている、かつ車両３の窓が閉じている場合を表している。 Figure 2 is a diagram that shows the actual situation that differs depending on the position of the virtual sound source, the direction of the virtual sound, and the acoustic environment, assuming that the passenger user wearing the sound output device 1 is sitting in the driver's seat in the vehicle cabin. The virtual sound processing unit 101 selects the output mode of the virtual sound according to the difference in the situation as shown in Figure 2. In the following, the virtual sound source is described as the avatar of the remote user, and the opening and closing part is the window of the vehicle 3. Figure 2A shows a case where the position of the virtual sound source is in the vehicle cabin, the window of the vehicle 3 is in the direction in which the virtual sound is output, and the window of the vehicle 3 is open. For example, the avatar of the remote user is in the passenger seat position in the vehicle cabin, the face of the avatar of the remote user is facing the window of the vehicle 3, and the window of the vehicle 3 is open. Also, Figure 2B shows a case where the avatar of the remote user is in the seat position in the vehicle cabin, the face of the avatar is facing the window of the vehicle 3, and the window of the vehicle 3 is closed. FIG. 2C illustrates, for example, a case where the remote user's avatar is located outside the vehicle cabin, the avatar's face is facing the vehicle 3, and the window of the vehicle 3 is open. FIG. 2D illustrates a case where the remote user's avatar is located outside the vehicle cabin, the avatar's face is facing the vehicle 3 window, and the window of the vehicle 3 is closed.

以下、上述の各状況に応じた仮想音響の加工内容を説明する。加工内容は、例えば、表１で示されるように、音響環境情報と仮想音響の位置方向の情報に応じて定められている。表１では、（１）の場合、すなわち、仮想音源の位置が室内、仮想音響の音響方向に対象の開閉部があり、かつ開閉部の状態が開状態である場合（図２Ａに相当）には、仮想音響加工部１０１は、仮想音響の出力強度を低く設定する。出力態様の設定は、（２）の場合（図２Ｂに相当）における出力態様を基準として、相対的に設定される。つまり、仮想音響加工部１０１は、（１）の場合には、（２）の場合における出力強度よりも、出力強度を低く設定する。一般的に、車両の開閉部が空いている場合には、開閉部が閉じている場合よりも、室内の音響は聞こえづらくなるからである。また、（３）の場合、すなわち、仮想音源の位置が室内、仮想音響の音響方向に開閉部がなしの場合には、（２）の場合と同じ出力態様を設定する。つまり、（２）、（３）の場合には、取得した仮想音響を加工せずに出力する。また、（４）の場合、すなわち、仮想音源の位置が室外であり、音響方向に対象の開閉部があり、かつ、開閉部の状態が開状態である場合（図２Ｃに相当）、仮想音響の出力強度を高く設定する。このときには、出力態様の設定は、（５）の場合（図２Ｄに相当）における出力態様を基準として、相対的に設定される。つまり、仮想音響加工部１０１は、（４）の場合には、（５）の場合における出力強度よりも、出力強度を高く設定する。一般的に、車両の開閉部が空いている場合には、開閉部が閉じている場合よりも、室外から聞こえてくる音響が聞こえやすくなるからである。また、（６）の場合、すなわち、仮想音源の位置が室外、仮想音響の方向に対象の開閉部がない場合には、（５）の場合における出力強度を基準として、出力強度を低く設定する。なお、出力態様の基準は、上記に限らず、例えば、（１）の場合の出力態様を基準として、（２）及び（３）の場合における出力態様を加工することとしてもよい。また、出力強度の設定は、仮想音源の位置が室内にあるか室外にあるかによって、相対的に設定されることとしてもよい。例えば、仮想音響加工部１０１は、仮想音源の位置が室内にある場合には、仮想音源の位置が室外にある場合よりも、出力強度を高く設定する。これは、乗員ユーザにとって、室外から聞こえてくる音響よりも、室内から聞こえてくる音響のほうが聞こえやすくなるからである。 Hereinafter, the contents of virtual sound processing according to each of the above-mentioned situations will be explained. For example, as shown in Table 1, the processing details are determined according to the acoustic environment information and the information on the positional direction of the virtual sound. In Table 1, in case (1), that is, when the virtual sound source is in the room, the target opening/closing part is in the acoustic direction of the virtual sound, and the opening/closing part is in the open state (corresponding to FIG. 2A). In this case, the virtual sound processing unit 101 sets the output intensity of the virtual sound to be low. The setting of the output mode is set relatively with respect to the output mode in case (2) (corresponding to FIG. 2B). That is, the virtual sound processing unit 101 sets the output intensity lower in the case (1) than the output intensity in the case (2). This is because, in general, when the opening/closing section of a vehicle is open, it is harder to hear the sounds in the room than when the opening/closing section is closed. In the case of (3), that is, when the virtual sound source is located indoors and there is no opening/closing section in the acoustic direction of the virtual sound, the same output mode as in the case (2) is set. That is, in cases (2) and (3), the acquired virtual sound is output without being processed. In the case of (4), that is, when the virtual sound source is located outdoors, there is a target opening/closing part in the acoustic direction, and the opening/closing part is in the open state (corresponding to FIG. 2C), the virtual sound source Set the output intensity to high. At this time, the setting of the output mode is set relatively with respect to the output mode in case (5) (corresponding to FIG. 2D). That is, the virtual sound processing unit 101 sets the output intensity higher in the case (4) than the output intensity in the case (5). This is because, in general, when the opening/closing section of a vehicle is vacant, sounds coming from outside are easier to hear than when the opening/closing section is closed. Furthermore, in the case of (6), that is, when the virtual sound source is located outdoors and there is no target opening/closing part in the direction of the virtual sound, the output intensity is set low based on the output intensity in the case (5). . Note that the criteria for the output mode is not limited to the above. For example, the output mode in cases (2) and (3) may be processed based on the output mode in case (1). Further, the output intensity may be set relatively depending on whether the position of the virtual sound source is indoors or outdoors. For example, the virtual sound processing unit 101 sets the output intensity higher when the virtual sound source is located indoors than when the virtual sound source is located outdoors. This is because it is easier for the passenger user to hear sounds coming from inside the vehicle than sounds coming from outside.

なお、仮想音響加工部１０１は、開閉部が開状態または閉状態かに応じて、出力態様を設定することに限らず、開閉部の開き度合いに応じて、出力態様を設定することとしてもよい。すなわち、仮想音響加工部１０１は、開閉部が開状態であるときに、開閉部の開口率を連続的あるいは段階的に算出して、開口率に応じて出力強度を連続的あるいは段階的に変更することとしてもよい。開口率とは、開き始めから完全に開いている状態までの開口率、すなわち、開閉部の面積当たりの開いている部分の割合である。具体的には、仮想音響加工部１０１は、例えば、車両３の開閉部の開閉状態をエンコーダー等のセンサで検出した結果に基づいて、開閉部の開口率を算出する。開口率は、パーセントで算出することとしてもよいし、予めいくつかの段階的な区分を設定して当該区分により区分分けすることとしてもよい。例えば、完全に開いている状態から順に「大」、「中」、「小」に区分分けする。そして、仮想音響加工部１０１は、算出された開口率に応じて仮想音響を出力する出力強度を変更する。例えば、仮想音源の位置が車室内である場合、開閉部の開口率が大きいほど、仮想音響の出力強度を低く設定する。あるいは、開閉部の開口率が小さいほど、仮想音響の出力強度を高く設定する。 The virtual sound processing unit 101 may set the output mode according to the degree to which the opening and closing part is open, not limited to setting the output mode according to whether the opening and closing part is open or closed. That is, when the opening and closing part is open, the virtual sound processing unit 101 may continuously or stepwise calculate the opening ratio of the opening and closing part and continuously or stepwise change the output intensity according to the opening ratio. The opening ratio is the opening ratio from the beginning of opening to the completely open state, that is, the ratio of the open part per area of the opening and closing part. Specifically, the virtual sound processing unit 101 calculates the opening ratio of the opening and closing part based on the result of detecting the opening and closing state of the opening and closing part of the vehicle 3 by a sensor such as an encoder. The opening ratio may be calculated as a percentage, or may be divided into several stepwise divisions in advance. For example, the opening ratio is divided into "large", "medium", and "small" in order from the completely open state. Then, the virtual sound processing unit 101 changes the output intensity for outputting the virtual sound according to the calculated opening ratio. For example, if the position of the virtual sound source is inside the vehicle, the output intensity of the virtual sound is set lower as the opening ratio of the opening/closing part increases. Alternatively, the output intensity of the virtual sound is set higher as the opening ratio of the opening/closing part decreases.

仮想音源取得部１０２は、仮想音源の位置情報、仮想音源から仮想音響が出力される方向情報及び仮想音響の音声情報を取得する。仮想音源が遠隔ユーザのアバターである場合には、仮想音源取得部１０２は、遠隔ユーザのアバターの位置情報を、仮想音源の位置情報として取得する。仮想音源取得部１０２は、少なくとも車両３を基準とした位置、すなわち、遠隔ユーザのアバターが車室内及び車室外のどちらに存在するかに応じて、仮想音源の位置情報を取得する。例えば、遠隔ユーザのアバターが車室内の座席の位置に存在すれば、仮想音源の位置は車室内の当該座席の位置として取得される。具体的な位置情報の取得方法としては、まず、遠隔ユーザが装着する端末装置２の顔向き検出装置２１を含むセンサにより、遠隔ユーザの頭部の位置及び頭部の姿勢が計測される。次に、遠隔ユーザの頭部の位置及び姿勢が計測されると、仮想音源取得部１０２は、遠隔ユーザの頭部の位置姿勢情報を、車両３内における遠隔ユーザのアバターの位置及び姿勢に変換することで、遠隔ユーザのアバターの位置姿勢情報を取得する。このとき、遠隔ユーザのアバターの位置姿勢への変換は、対応する位置を起点として相対的に実行される。例えば、仮想音源取得部１０２は、車両３内の所定の座席を基準位置として設定し、遠隔ユーザの空間内に、当該基準位置に対応する対応位置を設定する。次に、遠隔ユーザの位置姿勢情報は、空間内の対応位置を基準とした相対的な位置関係（対応位置からの距離や方向）として取得される。そして、仮想音源取得部１０２は、対応位置を基準とした、遠隔ユーザの相対的な位置関係を、車両３内の基準位置を基準とした、遠隔ユーザのアバターの相対的な位置関係に変換することで、遠隔ユーザのアバターの位置姿勢を取得する。 The virtual sound source acquisition unit 102 acquires position information of the virtual sound source, directional information of the virtual sound output from the virtual sound source, and audio information of the virtual sound. When the virtual sound source is an avatar of a remote user, the virtual sound source acquisition unit 102 acquires position information of the avatar of the remote user as position information of the virtual sound source. The virtual sound source acquisition unit 102 acquires position information of the virtual sound source based on at least a position relative to the vehicle 3, that is, depending on whether the avatar of the remote user is present inside or outside the vehicle. For example, if the avatar of the remote user is present at a seat position in the vehicle, the position of the virtual sound source is acquired as the position of the seat in the vehicle. As a specific method of acquiring position information, first, the position and head orientation of the remote user are measured by a sensor including a face direction detection device 21 of the terminal device 2 worn by the remote user. Next, when the position and orientation of the remote user's head are measured, the virtual sound source acquisition unit 102 converts the position and orientation information of the remote user's head into the position and orientation of the avatar of the remote user in the vehicle 3 to acquire position and orientation information of the avatar of the remote user. At this time, the conversion to the position and orientation of the remote user's avatar is performed relatively with the corresponding position as the starting point. For example, the virtual sound source acquisition unit 102 sets a specific seat in the vehicle 3 as a reference position, and sets a corresponding position corresponding to the reference position in the remote user's space. Next, the position and orientation information of the remote user is acquired as a relative positional relationship (distance and direction from the corresponding position) based on the corresponding position in the space. Then, the virtual sound source acquisition unit 102 converts the relative positional relationship of the remote user based on the corresponding position into the relative positional relationship of the remote user's avatar based on the reference position in the vehicle 3, thereby acquiring the position and orientation of the remote user's avatar.

また、仮想音源取得部１０２は、遠隔ユーザのアバターの顔の向きを、仮想音響が出力される方向情報として取得する。本実施形態では、仮想音源取得部１０２は、遠隔ユーザのアバターの顔が開閉部に向いているか否かを方向情報として取得する。具体的には、仮想音源取得部１０２は、開閉部の位置と、遠隔ユーザの顔の向きの情報を取得し、これらの情報に基づいて、遠隔ユーザのアバターの顔が開閉部に向いているか否かを判定する。遠隔ユーザのアバターの顔が開閉部に向いていると判定される場合には、仮想音源取得部１０２は、仮想音響が出力される方向に開閉部が位置するという情報を、仮想音響の方向情報として取得する。遠隔ユーザのアバターの顔が開閉部に向いているか否かの判定方法は、例えば、遠隔ユーザの顔の向きを、車両３内における遠隔ユーザのアバターの顔の向きに変換して、車両３内の開閉部の位置と遠隔ユーザのアバターの顔の向きとの関係によって判定する方法がある。すなわち、仮想音源取得部１０２は、遠隔ユーザの位置姿勢情報を取得し、車両３内の基準位置を起点とした相対的な位置関係として変換して、車両３内における遠隔ユーザのアバターの顔の向きの情報を取得する。このとき、車両３内の開閉部の位置情報についても、車両３内の基準位置を起点とした相対的な位置関係として算出することができる。したがって、仮想音源取得部１０２は、開閉部の位置情報と、遠隔ユーザのアバターの顔の向きの情報から、遠隔ユーザのアバターの顔が開閉部に向いているか否かを判定することができる。なお、判定方法としては、例えば、仮想音源取得部１０２は、遠隔ユーザの顔の動きに対応する遠隔ユーザのアバターの視線の方向の車室内外の映像が撮像された撮像画像の中に開閉部が特定されるか否かによって判定してもよい。仮想音源取得部１０２は、撮像画像の画像認識を行い、開閉部の特徴点を抽出し、開閉部が特定されるか否かによって判定を行う。 Further, the virtual sound source acquisition unit 102 acquires the direction of the face of the remote user's avatar as direction information in which virtual sound is output. In this embodiment, the virtual sound source acquisition unit 102 acquires as direction information whether the face of the remote user's avatar is facing the opening/closing unit. Specifically, the virtual sound source acquisition unit 102 acquires information on the position of the opening/closing unit and the direction of the remote user's face, and based on these information, determines whether the face of the remote user's avatar is facing the opening/closing unit. Determine whether or not. If it is determined that the face of the remote user's avatar is facing the opening/closing part, the virtual sound source acquisition unit 102 converts the information that the opening/closing part is located in the direction in which the virtual sound is output into the direction information of the virtual sound. Get as. A method for determining whether or not the face of the remote user's avatar faces the opening/closing part is, for example, by converting the face direction of the remote user to the face direction of the remote user's avatar inside the vehicle 3. There is a method of determining this based on the relationship between the position of the opening/closing part of the remote user and the direction of the face of the remote user's avatar. That is, the virtual sound source acquisition unit 102 acquires the position and orientation information of the remote user, converts it into a relative positional relationship starting from the reference position in the vehicle 3, and calculates the face of the remote user's avatar in the vehicle 3. Get orientation information. At this time, the positional information of the opening/closing parts within the vehicle 3 can also be calculated as a relative positional relationship with the reference position within the vehicle 3 as the starting point. Therefore, the virtual sound source acquisition unit 102 can determine whether the face of the remote user's avatar is facing the opening/closing part from the position information of the opening/closing part and the information on the face direction of the remote user's avatar. Note that, as a determination method, for example, the virtual sound source acquisition unit 102 detects an opening/closing part in a captured image in which images of the interior and exterior of the vehicle are captured in the direction of the line of sight of the remote user's avatar that corresponds to the movement of the remote user's face. The determination may be made based on whether or not it is specified. The virtual sound source acquisition unit 102 performs image recognition of the captured image, extracts feature points of the opening/closing part, and makes a determination based on whether the opening/closing part is identified.

また、仮想音源取得部１０２は、仮想音響の音声情報を取得する。仮想音響の音声情報は、例えば、音声取得装置２２により取得された遠隔ユーザが発した音声の情報である。具体的には、仮想音源取得部１０２は、遠隔ユーザの近くに設置されている音声取得装置２２（例えば、ＶＲヘッドセットに備え付けられているマイクロフォン）により取得された音声から変換された音声信号を取得する。 The virtual sound source acquisition unit 102 also acquires audio information of the virtual sound. The audio information of the virtual sound is, for example, information on the voice uttered by the remote user acquired by the voice acquisition device 22. Specifically, the virtual sound source acquisition unit 102 acquires an audio signal converted from the voice acquired by the voice acquisition device 22 (for example, a microphone attached to a VR headset) installed near the remote user.

また、仮想音源取得部１０２は、車両３が走行する道路周辺の標識や看板等の位置情報及び音声情報を取得する。標識や看板等の位置情報は、ＧＰＳに基づく自車両の現在位置と標識や看板等の位置との相対的位置の情報であり、地図上の位置として取得される。また、標識や看板等の音声情報は、予め標識や看板等に記される案内情報に基づいて設定されている。そして、車両３が当該標識や看板が設置されている位置に接近すると、当該音声情報が取得される。 The virtual sound source acquisition unit 102 also acquires position information and audio information of signs, billboards, etc. around the road on which the vehicle 3 is traveling. The position information of signs, billboards, etc. is information on the relative position between the current position of the vehicle based on GPS and the positions of signs, billboards, etc., and is acquired as positions on a map. Furthermore, audio information of signs, billboards, etc. is set in advance based on guidance information written on the signs, billboards, etc. Then, when the vehicle 3 approaches the position where the sign or billboard is installed, the audio information is acquired.

乗員ユーザ状態取得部１０３は、乗員ユーザの左右の耳の位置および姿勢に関する情報を取得する。乗員ユーザの位置姿勢情報は、ジャイロセンサ等を含むサングラス型の位置姿勢推定装置またはユーザを観測する可視光カメラ、ＩＲカメラ、距離センサ等のセンサを入力部とした位置姿勢推定によって取得される。 The occupant user state acquisition unit 103 acquires information regarding the positions and postures of the left and right ears of the occupant user. The position and orientation information of the occupant user is acquired by position and orientation estimation using a sunglasses-type position and orientation estimation device including a gyro sensor or the like, or a sensor for observing the user such as a visible light camera, an IR camera, or a distance sensor as an input unit.

環境情報取得部１０４は、車両３における音響環境に関する音響環境情報を取得する。車両３における音響環境とは、車両３において音響に影響を与える環境のことであり、例えば、音響の聞き取りやすさに影響を与える騒音を発生させる環境のことである。本実施形態では、音響環境情報として、車両３の車室内と車室外を仕切る開閉部の開閉状態の情報、車両３の車速情報、車両３の外の天候情報が例として挙げられる。開閉部は、ユーザがいる車室内と車室外を仕切り、開閉が可能な機構であり、例えば、車両のドアやパワーウィンドウ、オープンカーなどの開閉可能なルーフや着脱可能なルーフが開閉部に該当する。一般に、車両の窓が開いていれば、車室内のユーザには、車室内で出力される音響は小さく聞こえるものである。すなわち、車両の開閉部の開閉状態は、車両における音響に影響を与えている。開閉状態の情報は、各開閉部の可動箇所に取り付けたエンコーダーなどのセンサによって計測することで取得される。 The environmental information acquisition unit 104 acquires acoustic environment information regarding the acoustic environment in the vehicle 3. The acoustic environment in the vehicle 3 is an environment that affects the sound in the vehicle 3, and is, for example, an environment that generates noise that affects the audibility of the sound. In this embodiment, examples of the acoustic environment information include information on the opening/closing state of an opening/closing section that partitions the vehicle interior and exterior of the vehicle 3, vehicle speed information of the vehicle 3, and weather information outside the vehicle 3. An opening/closing part is a mechanism that can be opened and closed by separating the interior of the vehicle where the user is located and the outside of the vehicle.For example, the opening/closing part includes vehicle doors, power windows, and the openable and removable roofs of convertible cars. do. In general, if the windows of a vehicle are open, the sound output inside the vehicle will be audible to a user inside the vehicle. That is, the open/close state of the opening/closing part of the vehicle affects the sound in the vehicle. Information on the opening/closing state is obtained by measuring with a sensor such as an encoder attached to a movable part of each opening/closing part.

また、環境情報取得部１０４は、車両３における音響環境として、車両３の車速情報を取得することとしてもよい。一般に、車両の車速が高ければ、それに伴ってエンジン音が大きくなり、車両における音響は聞きとりにくくなるものである。すなわち、車両の走行状態は、車両における音響に影響を与える環境の一部である。また、環境情報取得部１０４は、天候情報を取得することとしてもよい。一般に、雨天であれば、雨音によって車両における音響は聞き取りにくくなるものである。すなわち、天候の状態は、車両における音響に影響を与える環境の一部である。天候情報は、例えば、ワイパー動作から検知する。外部から通信によって天候情報を取得することとしてもよい。 The environmental information acquisition unit 104 may also acquire vehicle speed information of the vehicle 3 as the acoustic environment in the vehicle 3. Generally, the higher the vehicle speed, the louder the engine sound becomes, making the sound in the vehicle harder to hear. In other words, the running state of the vehicle is part of the environment that affects the sound in the vehicle. The environmental information acquisition unit 104 may also acquire weather information. Generally, if it is raining, the sound in the vehicle becomes harder to hear due to the sound of the rain. In other words, the weather state is part of the environment that affects the sound in the vehicle. The weather information is detected, for example, from the operation of the wipers. Weather information may also be acquired from outside by communication.

出力装置１１は、所定位置に音像が定位され、かつ、出力態様が設定された仮想音響に対応する実音響を出力する。具体的には、出力装置１１は、音像定位部１００及び仮想音響加工部１０１により音声信号処理された音声信号が入力されると、当該音声信号を変換して実音響を出力する。出力装置１１は、例えば、ヘッドホンやスピーカーである。所定位置に音像が定位された仮想音響に対応する実音響が出力されることで、所定位置に定位された仮想音響の音像が形成される。ユーザには、仮想音響が所定位置から聞こえてくるように感じられる。 The output device 11 outputs real sound corresponding to virtual sound in which a sound image is localized at a predetermined position and an output mode is set. Specifically, when the output device 11 receives an audio signal subjected to audio signal processing by the sound image localization section 100 and the virtual sound processing section 101, it converts the audio signal and outputs real sound. The output device 11 is, for example, headphones or speakers. By outputting real sound corresponding to the virtual sound with the sound image localized at the predetermined position, a sound image of the virtual sound localized at the predetermined position is formed. The user feels as if the virtual sound is coming from a predetermined position.

通信装置１２は、端末装置２と通信を行い、情報の送受信を行う。通信装置１２は、車室内外が撮像された撮像画像を端末通信装置２３に送信し、端末装置２から、遠隔ユーザの位置姿勢情報や音声情報を受信する。 The communication device 12 communicates with the terminal device 2 to transmit and receive information. The communication device 12 transmits captured images of the interior and exterior of the vehicle to the terminal communication device 23, and receives position and posture information and voice information of the remote user from the terminal device 2.

端末装置２は、遠隔ユーザが装着する装置であり、車両３内の空間に対応する仮想空間を視覚的かつ聴覚的に遠隔ユーザに提示することができる。例えば、ヘッドホンが備え付けられているＶＲ用ヘッドマウントディスプレイである。端末装置２は、ディスプレイに車両３内外の光景が撮像された画像を表示する。これにより、車両３の遠隔にいるユーザは、あたかも自分が車両３内にいるような光景を見ることができる。端末装置２は、コントローラ２０と、顔向き検出装置２１と、音声取得装置２２と、端末通信装置２３とを備える。コントローラ２０は、画像取得部２００、画像提示部２０１、情報生成部２０２を備える。端末装置２は、端末通信装置２３を介して、音響出力装置１と通信可能である。 The terminal device 2 is a device worn by a remote user, and can visually and audibly present to the remote user a virtual space corresponding to the space inside the vehicle 3. For example, it is a head-mounted display for VR equipped with headphones. The terminal device 2 displays images of the scenery inside and outside the vehicle 3 on the display. This allows a user located remotely from the vehicle 3 to see a scene as if they were inside the vehicle 3. The terminal device 2 comprises a controller 20, a face direction detection device 21, a voice acquisition device 22, and a terminal communication device 23. The controller 20 comprises an image acquisition unit 200, an image presentation unit 201, and an information generation unit 202. The terminal device 2 can communicate with the sound output device 1 via the terminal communication device 23.

顔向き検出装置２１は、遠隔ユーザの顔の向きを検出する。顔向き検出装置２１は、例えば、加速度センサやジャイロセンサにより構成される。顔向き検出装置２１は、加速度センサやジャイロセンサにより遠隔ユーザの頭部の動きを計測して顔の向きを検出する。 The face orientation detection device 21 detects the face orientation of a remote user. The face orientation detection device 21 includes, for example, an acceleration sensor or a gyro sensor. The face orientation detection device 21 measures the movement of the remote user's head using an acceleration sensor or a gyro sensor to detect the orientation of the face.

音声取得装置２２は、遠隔ユーザが発する音声を取得する。つまり、音声取得装置２２は、遠隔ユーザが話している内容を取得する。音声取得装置２２は、例えば、ヘッドマウントディスプレイに備え付けられているマイクロフォンである。音声取得装置２２は、遠隔ユーザの音声を取得すると、取得された音声を音声信号に変換する。 The audio acquisition device 22 acquires audio emitted by a remote user. In other words, the audio acquisition device 22 acquires what the remote user is saying. The audio acquisition device 22 is, for example, a microphone installed in a head-mounted display. When the voice acquisition device 22 acquires the voice of a remote user, it converts the acquired voice into an audio signal.

端末通信装置２３は、音響出力装置１と情報の送受信を行う。端末通信装置２３は、情報生成部２０２により生成された仮想音響の音声情報や仮想音響が出力される方向の情報を音響出力装置１の通信装置１２に送信する。また、端末通信装置２３は、音響出力装置１から、自車両内外の映像が撮像された撮像画像を取得する。 The terminal communication device 23 transmits and receives information to and from the audio output device 1 . The terminal communication device 23 transmits audio information of the virtual sound generated by the information generation unit 202 and information on the direction in which the virtual sound is output to the communication device 12 of the sound output device 1. The terminal communication device 23 also acquires a captured image of the inside and outside of the own vehicle from the sound output device 1.

画像取得部２００は、車両３内における遠隔ユーザのアバターの位置を基準として、遠隔ユーザの顔の向きに対応する方向の車室内外の映像が撮像された画像を取得する。遠隔ユーザの顔の向きに対応する方向は、車両３内の遠隔ユーザのアバターの顔の向きに対応している。例えば、遠隔ユーザが右側を向けば、遠隔ユーザのアバターも右側に顔を向ける。車両３内のカメラにより、遠隔ユーザのアバターの位置を基準として、遠隔ユーザの顔の向きに対応する方向の車両３内外の映像が撮像されると、撮像された画像が通信装置１２を介して端末通信装置２３に送信される。そして、画像取得部２００は、端末通信装置２３を介して、車両３内外の映像の撮像画像を取得する。画像取得部２００により撮像画像が取得されると、当該撮像画像は画像提示部２０１に出力される。 The image acquisition unit 200 acquires an image of the interior and exterior of the vehicle in a direction corresponding to the direction of the remote user's face, with the position of the remote user's avatar in the vehicle 3 as a reference. The direction corresponding to the direction of the face of the remote user corresponds to the direction of the face of the avatar of the remote user in the vehicle 3. For example, if the remote user faces to the right, the remote user's avatar also faces to the right. When a camera in the vehicle 3 captures an image of the interior and exterior of the vehicle 3 in a direction corresponding to the direction of the remote user's face based on the position of the remote user's avatar, the captured image is transmitted via the communication device 12. It is transmitted to the terminal communication device 23. Then, the image acquisition unit 200 acquires captured images of the inside and outside of the vehicle 3 via the terminal communication device 23. When a captured image is acquired by the image acquisition unit 200, the captured image is output to the image presentation unit 201.

画像提示部２０１は、画像取得部２００により取得された車室内外の映像の撮像画像を端末装置２のディスプレイに表示する。これにより、遠隔ユーザは、当該撮像画像が表示されるディスプレイを介して、遠隔にある車両３内の空間を見ることができる。 The image presentation unit 201 displays the captured images of the video of the interior and exterior of the vehicle acquired by the image acquisition unit 200 on the display of the terminal device 2. This allows the remote user to view the space inside the remote vehicle 3 via the display on which the captured images are displayed.

情報生成部２０２は、音声取得装置２２により取得された音声に基づいて、仮想音響の音声情報を生成する。情報生成部２０２は、仮想音源が遠隔ユーザのアバターである場合には、遠隔ユーザの音声に基づいて、仮想音響の音声情報を生成し、仮想音源が道路付近の標識や看板等の位置に設定される場合には、標識や看板等に記されている案内情報に基づいて、仮想音響の音声情報を生成する。また、情報生成部２０２は、顔向き検出装置２１により検出された顔向きに基づいて、仮想音響が出力される方向の情報を生成する。 The information generation unit 202 generates audio information of virtual sound based on the audio acquired by the audio acquisition device 22. When the virtual sound source is a remote user's avatar, the information generation unit 202 generates audio information of the virtual sound based on the remote user's voice, and sets the virtual sound source at a position such as a sign or a billboard near the road. If so, virtual sound audio information is generated based on guidance information written on signs, signboards, etc. Furthermore, the information generation unit 202 generates information on the direction in which the virtual sound is output based on the face orientation detected by the face orientation detection device 21.

次に、図３を用いて、本実施形態に係る音響出力制御の手順について説明する。図３は、音響出力制御を実行するための手順を示すフローチャートである。以下、車両３内において、乗員ユーザが視認しているディスプレイに、遠隔ユーザのアバターが表示されている場合を想定する。すなわち、仮想音源を遠隔ユーザのアバターとして、遠隔ユーザの音声に基づいて生成される音声情報を仮想音響とする場合における音響出力制御を説明する。この場合には、音声取得装置２２が遠隔ユーザの音声を取得したことを検知したときに、コントローラ１０は、音響出力制御を開始する。つまり、遠隔ユーザがマイクに向かって話をすると、音響出力制御を開始する。また、これに限らず、仮想音源が道路付近の標識や看板等に位置する場合には、車両３が道路付近の標識や看板等に接近していると判定されるとき、コントローラ１０が音響出力制御を開始することとしてもよい。 Next, the procedure of sound output control according to this embodiment will be explained using FIG. 3. FIG. 3 is a flowchart showing a procedure for executing sound output control. Hereinafter, a case will be assumed in which an avatar of a remote user is displayed on a display viewed by a passenger user in the vehicle 3. That is, sound output control in the case where a virtual sound source is used as a remote user's avatar and audio information generated based on the remote user's voice is used as virtual sound will be described. In this case, when detecting that the audio acquisition device 22 has acquired the remote user's voice, the controller 10 starts audio output control. That is, when a remote user speaks into the microphone, it initiates audio output control. In addition, the present invention is not limited to this, and if the virtual sound source is located at a sign, signboard, etc. near the road, the controller 10 may output sound when it is determined that the vehicle 3 is approaching the sign, signboard, etc. near the road. Control may also be started.

ステップＳ３０１では、仮想音源取得部１０２は、仮想音響の仮想音源の位置情報を取得する。仮想音源の位置情報は、車両３を基準にした位置、すなわち、仮想音源が車両３の車室内に位置するか、または車室外に位置するかに関する情報である。仮想音源の位置は、車両３を基準とした遠隔ユーザのアバターの位置に基づいて特定される。例えば、遠隔ユーザのアバターが車室内の座席、例えば、助手席に位置するときには、仮想音源の位置は、助手席の位置に特定される。また、仮想音源取得部１０２は、遠隔ユーザがいる空間に対する遠隔ユーザの相対的位置関係を、車両３内の空間に対する遠隔ユーザのアバターの相対的位置関係に変換することによって、遠隔ユーザのアバターの位置情報を取得する。 In step S301, the virtual sound source acquisition unit 102 acquires position information of a virtual sound source of virtual sound. The position information of the virtual sound source is information regarding the position with respect to the vehicle 3, that is, whether the virtual sound source is located inside the vehicle interior of the vehicle 3 or outside the vehicle interior. The position of the virtual sound source is specified based on the position of the remote user's avatar with respect to the vehicle 3. For example, when the remote user's avatar is located at a seat in the vehicle, for example, the passenger seat, the position of the virtual sound source is specified to the passenger seat. Further, the virtual sound source acquisition unit 102 converts the relative positional relationship of the remote user with respect to the space in which the remote user is present into the relative positional relationship of the remote user's avatar with respect to the space inside the vehicle 3, thereby Get location information.

ステップＳ３０２では、仮想音源取得部１０２は、仮想音響が出力される方向の情報を取得する。具体的には、仮想音源取得部１０２は、遠隔ユーザの顔の向きを取得し、車両３の開閉部の位置の情報を取得する。そして、仮想音源取得部１０２は、遠隔ユーザの顔が車両３の開閉部に向いているか否かを判定し、判定結果を仮想音響の方向の情報として取得する。 In step S302, the virtual sound source acquisition unit 102 acquires information on the direction in which the virtual sound is output. Specifically, the virtual sound source acquisition unit 102 acquires the direction of the remote user's face and acquires information on the position of the opening and closing part of the vehicle 3. The virtual sound source acquisition unit 102 then determines whether the remote user's face is facing the opening and closing part of the vehicle 3, and acquires the determination result as information on the direction of the virtual sound.

ステップＳ３０３では、仮想音源取得部１０２は、仮想音響の音声情報を取得する。仮想音響の音声情報は、遠隔ユーザが発する声に基づく音声情報である。具体的には、仮想音源取得部１０２は、遠隔ユーザの声に基づいて生成された音声情報を取得する。遠隔ユーザの声は、端末装置２の音声取得装置２２により取得される。すなわち、遠隔ユーザが話をしている内容が音声情報となる。 In step S303, the virtual sound source acquisition unit 102 acquires audio information of the virtual sound. The audio information of the virtual sound is audio information based on the voice of the remote user. Specifically, the virtual sound source acquisition unit 102 acquires audio information generated based on the voice of the remote user. The voice of the remote user is acquired by the audio acquisition device 22 of the terminal device 2. In other words, the content of what the remote user is saying becomes the audio information.

ステップＳ３０４では、乗員ユーザ状態取得部１０３は、乗員ユーザの頭部の位置姿勢情報を取得する。例えば、乗員ユーザ状態取得部１０３は、加速度センサやジャイロセンサにより頭部の動きを計測して頭部の向きを検知する。 In step S304, the occupant user state acquisition unit 103 acquires the position and orientation information of the occupant user's head. For example, the passenger user state acquisition unit 103 measures the movement of the head using an acceleration sensor or a gyro sensor to detect the direction of the head.

ステップＳ３０５では、環境情報取得部１０４は、車両３の開閉部の情報を取得する。具体的には、環境情報取得部１０４は、開閉部の開閉状態の情報を取得する。このとき、環境情報取得部１０４は、開閉部の開口率を取得することとしてもよい。 In step S305, the environmental information acquisition unit 104 acquires information on the opening/closing part of the vehicle 3. Specifically, the environmental information acquisition unit 104 acquires information on the open/close state of the opening/closing unit. At this time, the environmental information acquisition section 104 may acquire the aperture ratio of the opening/closing section.

ステップＳ３０６では、音像定位部１００は、仮想音響の音像を仮想音源の位置に定位する。具体的には、音像定位部１００は、ステップＳ３０１で取得された遠隔ユーザのアバターの位置と、ステップＳ３０４で取得された乗員ユーザの位置姿勢に基づき、遠隔ユーザの声に基づく仮想音響の音像を遠隔ユーザのアバターの位置に定位させるように音声信号処理を行う。すなわち、音像定位部１００は、遠隔ユーザの声に基づく仮想音響が遠隔ユーザのアバターの位置から乗員ユーザの位置まで聞こえてくるように音像を定位させる。 In step S306, the sound image localization unit 100 localizes the sound image of the virtual sound at the position of the virtual sound source. Specifically, the sound image localization unit 100 generates a sound image of a virtual sound based on the remote user's voice based on the position of the remote user's avatar acquired in step S301 and the position and orientation of the passenger user acquired in step S304. Audio signal processing is performed to localize to the location of the remote user's avatar. That is, the sound image localization unit 100 localizes the sound image so that virtual sound based on the remote user's voice can be heard from the position of the remote user's avatar to the position of the passenger user.

ステップＳ３０７では、仮想音響加工部１０１は、ステップＳ３０１、Ｓ３０２及びＳ３０５で取得した各情報に基づいて、仮想音響の出力態様を選択する。具体的には、仮想音響加工部１０１は、仮想音源の位置、仮想音響の方向、及び開閉部の開閉状態の情報に応じて出力態様の加工内容を選択する。出力態様の加工内容は、前述の表１に記載されているように、仮想音源の位置、仮想音響の方向、及び開閉部の開閉状態に応じて予め設定されている。例えば、仮想音源の位置が車室内、仮想音響の方向に開閉部があり、開閉部が開状態である場合には、仮想音響加工部１０１は、仮想音響の出力強度を低く設定する。出力強度の加工パラメータの設定の例としては、開閉部の開閉状態に基づいて設定される場合には、開閉部が閉状態であるとき、加工パラメータは１に設定され、開閉部が開状態であるとき、加工パラメータは０．８に設定される。仮想音響加工部１０１は、各条件に基づいて加工パラメータを設定する。 In step S307, the virtual sound processing unit 101 selects the output mode of the virtual sound based on each piece of information acquired in steps S301, S302, and S305. Specifically, the virtual sound processing unit 101 selects the processing content of the output mode according to the information on the position of the virtual sound source, the direction of the virtual sound, and the open/close state of the opening/closing part. The processing content of the output mode is set in advance according to the position of the virtual sound source, the direction of the virtual sound, and the open/close state of the opening/closing part, as described in Table 1 above. For example, when the position of the virtual sound source is inside the vehicle cabin, there is an opening/closing part in the direction of the virtual sound, and the opening/closing part is in an open state, the virtual sound processing unit 101 sets the output intensity of the virtual sound low. As an example of setting the processing parameter of the output intensity based on the opening/closing state of the opening/closing part, when the opening/closing part is in a closed state, the processing parameter is set to 1, and when the opening/closing part is in an open state, the processing parameter is set to 0.8. The virtual sound processing unit 101 sets the processing parameter based on each condition.

ステップＳ３０８では、仮想音響加工部１０１は、ステップＳ３０７で選択された加工内容に基づいて仮想音響を加工する。具体的には、仮想音響加工部１０１は、選択された出力態様に応じて仮想音響の音声信号処理を行う。このとき、仮想音響加工部１０１は、ステップＳ３０７で設定された加工パラメータと仮想音響の出力強度に基づいて、加工後の出力強度を算出する。そして、仮想音響加工部１０１は、音声信号処理がされた音声信号を出力装置１１に出力する。なお、ステップＳ３０６において、音像定位部１００は、仮想音響の音像定位に基づく加工パラメータの設定だけを行い、ステップＳ３０８において、仮想音響加工部１０１が、音像定位に基づく加工パラメータと出力態様の設定に基づく加工パラメータを合わせて音声信号処理を行うこととしてもよい。 In step S308, the virtual sound processing unit 101 processes the virtual sound based on the processing content selected in step S307. Specifically, the virtual sound processing unit 101 processes the audio signal of the virtual sound according to the selected output mode. At this time, the virtual sound processing unit 101 calculates the output intensity after processing based on the processing parameters set in step S307 and the output intensity of the virtual sound. Then, the virtual sound processing unit 101 outputs the audio signal processed to the output device 11. Note that in step S306, the sound image localization unit 100 may only set the processing parameters based on the sound image localization of the virtual sound, and in step S308, the virtual sound processing unit 101 may perform audio signal processing by combining the processing parameters based on the sound image localization and the processing parameters based on the output mode setting.

ステップＳ３０９では、出力装置１１は、ユーザのアバターの位置に音像を定位され、かつ、出力態様が設定された仮想音響に対応する実音響を出力する。具体的には、出力装置１１は、ステップＳ３０６及びステップＳ３０８で音声信号処理された音声信号を実音響に変換して出力する。これによって、設定された出力態様により、遠隔ユーザの声に基づく仮想音響が遠隔ユーザのアバターの位置から乗員ユーザの位置まで聞こえてくるように実音響が出力される。 In step S309, the output device 11 outputs real sound corresponding to the virtual sound in which the sound image is localized to the position of the user's avatar and the output mode is set. Specifically, the output device 11 converts the audio signal subjected to the audio signal processing in step S306 and step S308 into actual sound and outputs it. As a result, according to the set output mode, real sound is output so that the virtual sound based on the voice of the remote user can be heard from the position of the remote user's avatar to the position of the passenger user.

また、本実施形態では、仮想音源は、道路付近にある標識や看板等、車両３の走行に必要な案内情報であってもよい。この場合には、仮想音源取得部１０２は、仮想音源の位置を車室外に特定する。具体的には、車両３が走行する道路付近の標識や看板等が設置されている位置が仮想音源の位置として特定される。また、仮想音源から出力される仮想音響の方向は、仮想音源の位置から乗員ユーザの方向となる。仮想音源取得部１０２は、当該方向に開閉部が位置するか否かを判定する。さらに、仮想音響の音声情報として、標識や看板等に記されている案内情報が取得される。 Further, in the present embodiment, the virtual sound source may be guidance information necessary for the vehicle 3 to travel, such as a sign or a sign near a road. In this case, the virtual sound source acquisition unit 102 specifies the position of the virtual sound source outside the vehicle interior. Specifically, a position where a sign, a signboard, etc. are installed near the road on which the vehicle 3 is traveling is specified as the position of the virtual sound source. Further, the direction of the virtual sound output from the virtual sound source is from the position of the virtual sound source to the passenger user. The virtual sound source acquisition unit 102 determines whether the opening/closing unit is located in the direction. Furthermore, guidance information written on a sign, a signboard, etc. is acquired as the audio information of the virtual sound.

仮想音源取得部１０２は、各情報を取得すると、当該情報に基づいて、音像定位部１００は、仮想音響の音像を仮想音源の位置に定位させるように仮想音響を音声信号処理する。すなわち、案内情報の音声に基づく仮想音響の音像が、標識や看板等の位置に定位される。さらに、仮想音響加工部１０１は、選択された出力態様に応じて仮想音響を音声信号処理する。これにより、仮想音響の出力態様は、音響環境の変化に追従したものになる。そして、出力装置１１は、音声信号処理された仮想音響の音声信号を実音響に変換して出力する。これにより、ユーザには、標識や看板等の案内情報の音声が、標識や看板等の位置から聞こえてくるように感じられる。 When the virtual sound source acquisition unit 102 acquires each piece of information, the sound image localization unit 100 performs audio signal processing on the virtual sound based on the information so as to localize the sound image of the virtual sound at the position of the virtual sound source. That is, the sound image of the virtual sound based on the sound of the guidance information is localized at the position of a sign, billboard, etc. Furthermore, the virtual sound processing unit 101 performs audio signal processing on the virtual sound according to the selected output mode. As a result, the output mode of the virtual sound follows changes in the acoustic environment. Then, the output device 11 converts the audio signal of the audio signal-processed virtual sound into real sound and outputs it. As a result, the user feels as if the sound of the guidance information of the sign, billboard, etc. is coming from the position of the sign, billboard, etc.

なお、本実施形態では、環境情報取得部１０４が取得する音響環境の情報は、車両３の開閉部の開閉状態だけに限らず、例えば、車両３の車速情報や天候情報を取得することとしてもよい。この場合には、仮想音響加工部１０１は、車速情報に基づいて、車両３の車速が速いほど、仮想音響の出力強度を弱く設定し、当該出力強度に応じて仮想音響の音声信号処理を行う。また、仮想音響加工部１０１は、天候情報に基づいて、天候が雨天であれば、晴天時よりも、仮想音響の出力強度を弱く設定し、当該出力強度に応じて仮想音響の音声信号処理を行う。このように、本実施形態では、環境情報取得部１０４が取得する車両３における音響環境に応じて、仮想音響加工部１０１が、音響環境を反映した仮想音響の出力態様を選択し、選択された出力態様に応じて仮想音響の音声信号処理を行う。また、上記の音響環境の条件は複数組み合わせて仮想音響の出力態様の選択に用いられることとしてもよい。例えば、車両３の開閉部が開状態、かつ、車両３の車速が速い場合には、風切り音によって車両３内の音響は聞き取りにくくなる。したがって、このような場合には、仮想音響加工部１０１は、仮想音響の出力強度を低く設定する。 In this embodiment, the information on the acoustic environment acquired by the environmental information acquisition unit 104 is not limited to the open/closed state of the opening/closing part of the vehicle 3, and may also include, for example, vehicle speed information and weather information of the vehicle 3. In this case, the virtual sound processing unit 101 sets the output intensity of the virtual sound to be weaker as the vehicle speed of the vehicle 3 increases based on the vehicle speed information, and performs audio signal processing of the virtual sound according to the output intensity. Also, based on the weather information, if the weather is rainy, the virtual sound processing unit 101 sets the output intensity of the virtual sound to be weaker than when it is sunny, and performs audio signal processing of the virtual sound according to the output intensity. Thus, in this embodiment, the virtual sound processing unit 101 selects the output mode of the virtual sound that reflects the acoustic environment according to the acoustic environment in the vehicle 3 acquired by the environmental information acquisition unit 104, and performs audio signal processing of the virtual sound according to the selected output mode. Also, a combination of multiple conditions of the acoustic environment described above may be used to select the output mode of the virtual sound. For example, when the door of the vehicle 3 is open and the vehicle 3 is traveling at a high speed, wind noise makes it difficult to hear the sounds inside the vehicle 3. Therefore, in such a case, the virtual sound processing unit 101 sets the output intensity of the virtual sound to a low value.

以上のように、本実施形態では、車両を基準とした所定位置に仮想音響の音像を定位し、所定位置に音像が定位された仮想音響に対応する実音響を出力し、車両における音響環境に関する音響環境情報を取得し、音響環境情報に応じた出力態様により実音響を出力する。これにより、現実の環境の変化に追従した仮想音響を出力することができる。 As described above, in this embodiment, a sound image of a virtual sound is localized at a predetermined position based on the vehicle, real sound corresponding to the virtual sound with the sound image localized at the predetermined position is output, acoustic environment information regarding the acoustic environment in the vehicle is obtained, and the real sound is output in an output mode according to the acoustic environment information. This makes it possible to output virtual sound that follows changes in the real environment.

また、本実施形態では、車室内と車室外を仕切る開閉部を有する車両を基準とした所定位置に、所定方向に出力される仮想音響の音像を定位し、音響環境情報は、開閉部の開閉状態を含み、所定位置から所定方向に開閉部が位置する場合に、開閉部の開閉状態に応じた出力強度により実音響を出力する。これにより、車両の開閉部と仮想音響の位置方向に応じて出力強度が変更されるため、現実の環境の変化として車両の開閉部の変化に追従するように仮想音響を出力することができる。 In addition, in this embodiment, a sound image of virtual sound output in a predetermined direction is localized at a predetermined position based on a vehicle having an opening and closing part that separates the inside and outside of the vehicle, and the acoustic environment information includes the open/closed state of the opening and closing part. When the opening and closing part is located in a predetermined direction from the predetermined position, real sound is output with an output intensity according to the open/closed state of the opening and closing part. As a result, the output intensity is changed according to the position and direction of the opening and closing part of the vehicle and the virtual sound, so that the virtual sound can be output to follow changes in the opening and closing part of the vehicle as changes in the real environment.

また、本実施形態では、所定位置が車室内に位置する場合かつ所定位置から所定方向に開閉部が位置する場合に、開閉部が開状態であるときには、開閉部が閉状態であるときよりも低い出力強度により実音響を出力する。これにより、仮想音源の位置が室内にあり、車両の開閉部が開いている場合における車室内の音響を再現するように仮想音響を出力することができる。 Furthermore, in the present embodiment, when the predetermined position is located inside the vehicle interior and when the opening/closing part is located in a predetermined direction from the predetermined position, when the opening/closing part is in the open state, the opening/closing part is in the open state more than when the opening/closing part is in the closed state. Outputs real sound with low output intensity. This makes it possible to output virtual sound so as to reproduce the sound inside the vehicle when the virtual sound source is located inside the vehicle and the opening/closing section of the vehicle is open.

また、本実施形態では、所定位置が車室内に位置する場合かつ所定位置から所定方向に開閉部が位置する場合に、開閉部が開状態であるときには、開閉部の開口率が大きいほど、より低い出力強度により実音響を出力する。これにより、仮想音源の位置が室内にあり、車両の開閉部が開いている場合において、開閉部が開いている度合いに応じて変化する車室内の音響を再現するように仮想音響を出力することができる。 In addition, in this embodiment, when the predetermined position is located inside the vehicle cabin and the opening/closing part is located in a predetermined direction from the predetermined position, when the opening/closing part is in an open state, the larger the opening ratio of the opening/closing part, the lower the output intensity of the real sound is output. As a result, when the virtual sound source is located inside the vehicle cabin and the opening/closing part of the vehicle is open, virtual sound can be output to reproduce the sound inside the vehicle cabin that changes depending on the degree to which the opening/closing part is open.

また、本実施形態では、所定位置が車室外に位置する場合かつ所定位置から所定方向に開閉部が位置する場合に、開閉部が開状態であるときには、開閉部が閉状態であるときよりも高い出力強度により実音響を出力する。これにより、仮想音源の位置が室外にあり、車両の開閉部が開いている場合における車室内の音響を再現するように仮想音響を出力することができる。 In addition, in this embodiment, when the predetermined position is located outside the vehicle cabin and the opening/closing part is located in a predetermined direction from the predetermined position, when the opening/closing part is in the open state, real sound is output with a higher output intensity than when the opening/closing part is in the closed state. This makes it possible to output virtual sound so as to reproduce the sound inside the vehicle cabin when the virtual sound source is located outside the vehicle and the opening/closing part of the vehicle is open.

また、本実施形態では、所定位置が車室外に位置する場合かつ所定位置から所定方向に開閉部が位置する場合に、開閉部が開状態であるときには、開閉部の開口率が大きいほど、より高い出力強度により実音響を出力する。これにより、仮想音源の位置が室外にあり、車両の開閉部が開いている場合において、開閉部が開いている度合いに応じて変化する車室内の音響を再現するように仮想音響を出力することができる。 Further, in this embodiment, when the predetermined position is located outside the vehicle interior and the opening/closing part is located in a predetermined direction from the predetermined position, when the opening/closing part is in the open state, the larger the opening ratio of the opening/closing part, the more Outputs real sound with high output intensity. As a result, when the position of the virtual sound source is outside the vehicle and the opening/closing part of the vehicle is open, virtual sound can be output to reproduce the sound inside the vehicle that changes depending on the degree to which the opening/closing part is open. Can be done.

また、本実施形態では、開閉部は、少なくとも車両の窓、ドア、開閉可能なルーフ及び着脱可能なルーフのうちのいずれかひとつである。これにより、現実の環境の変化として、車両の窓、ドア及びルーフの開閉状態の変化に追従するように仮想音響を出力することができる。 Further, in this embodiment, the opening/closing part is at least one of a window, a door, an openable/closeable roof, and a removable roof of the vehicle. Thereby, virtual sound can be output to follow changes in the open/closed states of the windows, doors, and roof of the vehicle as changes in the real environment.

また、本実施形態では、音響出力装置と、音響出力装置と通信可能な端末装置とを備え、端末装置は、車両の遠隔にいる第１ユーザの顔の向きを検出し、所定位置を基準にして第１ユーザの顔の向きに対応する方向の車室内外の映像が撮像された撮像画像を取得し、取得された撮像画像を第１ユーザに提示し、第１ユーザにより発せられた音声を取得し、取得された音声に基づいて、仮想音響に関する情報を生成し、第１ユーザの顔の向きに対応する方向に基づいて、所定方向に関する情報を生成し、生成された仮想音響に関する情報及び所定方向に関する情報を音響出力装置に送信し、音響出力装置は、第１ユーザに提示された撮像画像上において開閉部が提示されている位置の方向に、第１ユーザの顔の向きが向いている場合に、所定位置から所定方向に開閉部が位置すると判定し、車両に乗車している第２ユーザに、所定位置に音像が定位された仮想音響に対応する実音響を出力する。これにより、車両の遠隔にいる第１ユーザに車両内外の映像を見せることができ、第１ユーザの顔の向き及び音声に基づいた仮想音響を出力することができる。 Further, the present embodiment includes an audio output device and a terminal device that can communicate with the audio output device, and the terminal device detects the direction of the face of the first user who is located remotely in the vehicle, and uses the predetermined position as a reference. acquires a captured image in which images of inside and outside of the vehicle are captured in a direction corresponding to the direction of the first user's face, presents the acquired captured image to the first user, and listens to the voice emitted by the first user. generate information regarding the virtual sound based on the acquired audio, generate information regarding a predetermined direction based on the direction corresponding to the direction of the first user's face, and generate information regarding the generated virtual sound and Information regarding the predetermined direction is transmitted to the sound output device, and the sound output device is configured to transmit information regarding the predetermined direction to the sound output device, and the sound output device is configured to transmit information about the predetermined direction to the sound output device so that the first user's face is oriented in the direction of the position where the opening/closing part is presented on the captured image presented to the first user. If so, it is determined that the opening/closing part is located in a predetermined direction from a predetermined position, and real sound corresponding to the virtual sound with the sound image localized at the predetermined position is output to the second user riding in the vehicle. Thereby, it is possible to show images of the interior and exterior of the vehicle to the first user who is located remotely in the vehicle, and it is possible to output virtual sound based on the first user's facial orientation and voice.

なお、以上に説明した実施形態は、本発明の理解を容易にするために記載されたものであって、本発明を限定するために記載されたものではない。したがって、上記の実施形態に開示された各要素は、本発明の技術的範囲に属する全ての設計変更や均等物をも含む趣旨である。 Note that the embodiments described above are described to facilitate understanding of the present invention, and are not described to limit the present invention. Therefore, each element disclosed in the above embodiments is intended to include all design changes and equivalents that fall within the technical scope of the present invention.

１…音響出力装置
１０…コントローラ
１００…音像定位部
１０１…仮想音響加工部
１０２…仮想音源取得部
１０３…乗員ユーザ状態取得部
１０４…環境情報取得部
１１…出力装置
１２…通信装置
２…端末装置
２０…コントローラ
２００…画像取得部
２０１…画像提示部
２０２…情報生成部
２１…顔向き検出装置
２２…音声取得装置
２３…端末通信装置
１０００…音響出力システム Reference Signs List 1: Sound output device 10: Controller 100: Sound image localization section 101: Virtual sound processing section 102: Virtual sound source acquisition section 103: Occupant user state acquisition section 104: Environmental information acquisition section 11: Output device 12: Communication device 2: Terminal device 20: Controller 200: Image acquisition section 201: Image presentation section 202: Information generation section 21: Face direction detection device 22: Voice acquisition device 23: Terminal communication device 1000: Sound output system

Claims

a sound image localization unit that localizes a sound image of virtual sound output in a predetermined direction at a predetermined position with respect to a vehicle having an opening/closing part that partitions the interior of the vehicle and the exterior of the vehicle ;
a sound output unit that outputs real sound corresponding to the virtual sound with the sound image localized at the predetermined position;
an environmental information acquisition unit that acquires acoustic environment information regarding the acoustic environment in the vehicle , including the open/close state of the opening/closing unit ;
The sound output unit is a sound output device that outputs the actual sound in an output mode depending on whether the opening/closing part is located in the predetermined direction from the predetermined position and the opening/closing state of the opening/closing part .

a sound image localization unit that localizes a sound image of a virtual sound at a predetermined position based on the vehicle;
a sound output unit that outputs a real sound corresponding to the virtual sound with the sound image localized at the predetermined position;
an environmental information acquisition unit that acquires acoustic environment information related to an acoustic environment in the vehicle;
the sound image localization unit localizes the sound image of the virtual sound output in a predetermined direction at the predetermined position based on the vehicle having an opening/closing unit that separates the vehicle interior from the exterior of the vehicle;
the acoustic environment information includes an open/closed state of the opening/closing unit,
The sound output unit is a sound output device that outputs the actual sound with an output intensity according to the open/closed state of the opening/closing unit when the opening/closing unit is located in the predetermined direction from the predetermined position.

The sound output device according to claim 2,
The acoustic output unit is an acoustic output device that, when the specified position is located within the vehicle cabin and the opening/closing unit is located in the specified direction from the specified position, outputs the actual sound with a lower output intensity when the opening/closing unit is in an open state than when the opening/closing unit is in a closed state.

4. The sound output device according to claim 2,
When the predetermined position is located within the vehicle cabin and the opening/closing part is located in the predetermined direction from the predetermined position, when the opening/closing part is in an open state, the larger the opening ratio of the opening/closing part is, the lower the output intensity of the actual sound is output.

The sound output device according to any one of claims 2 to 4,
The acoustic output unit is an acoustic output device that, when the specified position is located outside the vehicle cabin and the opening/closing unit is located in the specified direction from the specified position, outputs the actual sound with a higher output intensity when the opening/closing unit is in an open state than when the opening/closing unit is in a closed state.

The sound output device according to any one of claims 2 to 5,
When the specified position is located outside the vehicle cabin and the opening/closing part is located in the specified direction from the specified position, when the opening/closing part is in an open state, the larger the opening ratio of the opening/closing part, the higher the output intensity of the actual sound output device.

The sound output device according to any one of claims 2 to 6,
The opening/closing part is at least one of a window, a door, an openable roof, and a removable roof of the vehicle.

A sound output device according to any one of claims 2 to 7,
a terminal device capable of communicating with the sound output device,
The terminal device
a face direction detection unit that detects a face direction of a first user located remotely from the vehicle;
an image acquisition unit that acquires a captured image in which a video of an interior and exterior of the vehicle in a direction corresponding to a face orientation of the first user with respect to the predetermined position is captured;
an image presentation unit that presents the captured image acquired by the image acquisition unit to the first user;
A voice acquisition unit that acquires a voice uttered by the first user;
an information generating unit that generates information about the virtual sound based on the voice acquired by the voice acquiring unit, and generates information about the predetermined direction based on a direction corresponding to a face orientation of the first user;
an information transmitting unit that transmits information about the virtual sound and information about the predetermined direction generated by the information generating unit to the sound output device,
The acoustic output unit includes:
when the orientation of the face of the first user is in a direction toward a position where the opening/closing unit is presented on the captured image presented to the first user by the image presentation unit, it is determined that the opening/closing unit is located in the predetermined direction from the predetermined position;
a sound output system that outputs to a second user riding in the vehicle the real sound corresponding to the virtual sound with the sound image localized at the predetermined position;

A sound output method performed by a sound output device, the method comprising:
The sound output device includes:
A sound image of virtual sound output in a predetermined direction is localized at a predetermined position based on a vehicle that has an opening/closing part that partitions the interior and exterior of the vehicle ,
acquiring acoustic environment information regarding the acoustic environment in the vehicle, including the open/close state of the opening/closing part;
The actual sound corresponding to the virtual sound in which the sound image is localized at the predetermined position is determined by the output mode depending on whether the opening/closing part is located in the predetermined direction from the predetermined position and the opening/closing state of the opening/closing part. A sound output method for outputting sound.