JP2018026701A

JP2018026701A - Sound recording device, image/sound processing program, and game device

Info

Publication number: JP2018026701A
Application number: JP2016157387A
Authority: JP
Inventors: 一樹北村; Kazuki Kitamura
Original assignee: Capcom Co Ltd
Current assignee: Capcom Co Ltd
Priority date: 2016-08-10
Filing date: 2016-08-10
Publication date: 2018-02-15

Abstract

PROBLEM TO BE SOLVED: To reproduce a sound in a VR space in which a HMD is used so as to provide a high reality.SOLUTION: In a game in which a user plays with a HMD 300, a camera control part 52 controls a virtual camera in accordance with a motion of a head of the user detected by the HMD 300, and previously prepares sound data in each direction of a virtual space, and a sound processing part 54 outputs sound data in a direction corresponding to an orientation of the virtual camera. Thus, since the sound corresponded to the virtual space where the user is viewing is outputted, the sound can be reproduced with higher reality. In a case where the sound processing part 54 produces a sound effect, the sound effect is outputted by setting a plurality of virtual microphones in accordance with each direction, determining each virtual microphone capable of collecting the sound effect in accordance with the direction of the virtual camera under the condition that the sound effect is assumed to be produced in the direction of the virtual microphone capable of collecting the sound effect.SELECTED DRAWING: Figure 6

Description

本発明は、バーチャルリアリティ空間においてサウンドを再生する技術に関する。 The present invention relates to a technique for reproducing sound in a virtual reality space.

近年、ヘッドマウントディスプレイ（ＨＭＤ）を利用したバーチャルリアリティ（ＶＲ）が普及しつつある。ＨＭＤを利用したバーチャルリアリティに対応したＶＲゲームでは、ユーザは、仮想３次元空間内に実際に存在しているかのような感覚が得られる。 In recent years, virtual reality (VR) using a head mounted display (HMD) is becoming widespread. In the VR game corresponding to the virtual reality using the HMD, the user can feel as if it actually exists in the virtual three-dimensional space.

特許第５５６５２５８号公報Japanese Patent No. 5565258 特許第５８６９１７７号公報Japanese Patent No. 5869177

ＶＲ空間では、ユーザは、頭を動かすことで仮想３次元空間を見渡すことができる。 In the VR space, the user can look around the virtual three-dimensional space by moving his head.

従来のＶＲゲームではないゲームでは、モニタに目線を向けてプレイするため、目線の方向を考慮してサウンドを再生する必要がなかった。しかしながら、ＶＲゲームにおいてユーザの見ている方向を考慮せずにサウンドを出力した場合、どの方向を向いても同じサウンドが聞こえてしまうという問題があった。ＶＲゲームでは、３６０度にゲーム画面が表示されるため、目線の方向を考慮してサウンドを再生する必要がある。 In a game that is not a conventional VR game, the game is played with the eyes on the monitor, so there is no need to reproduce the sound in consideration of the direction of the eyes. However, when sound is output without considering the direction that the user is viewing in the VR game, there is a problem that the same sound can be heard in any direction. In the VR game, since the game screen is displayed at 360 degrees, it is necessary to reproduce the sound in consideration of the direction of the line of sight.

本発明は、上記に鑑みてなされたものであり、ＨＭＤを利用したＶＲ空間において、より臨場感が出るようにサウンドを再生することを目的とする。 The present invention has been made in view of the above, and an object of the present invention is to reproduce sound so that a sense of reality can be further obtained in a VR space using an HMD.

第１の本発明に係る録音装置は、同心円上に等間隔で配置した複数のマイクと、前記複数のマイクのそれぞれで集音した音声信号を前記複数のマイクのそれぞれに対応する方向の音声信号として録音する録音手段と、を有することを特徴とする。 A recording apparatus according to a first aspect of the present invention includes a plurality of microphones arranged at equal intervals on a concentric circle, and a sound signal collected in each of the plurality of microphones in a direction corresponding to each of the plurality of microphones. And a recording means for recording as follows.

第２の本発明に係る映像音声処理プログラムは、ユーザがヘッドマウントディスプレイを装着して映像を視聴する装置としてコンピュータを動作させる映像音声処理プログラムであって、前記コンピュータを前記ヘッドマウントディスプレイからユーザの頭の動き情報を入力し、前記ユーザの向いている方向を検出する入力手段、前記ユーザの向いている方向に応じた映像を前記ヘッドマウントディスプレイに表示する映像処理手段、前記ユーザの向いている方向に応じた音声を出力する音声処理手段、として機能させることを特徴とする。 A video / audio processing program according to a second aspect of the present invention is a video / audio processing program for operating a computer as an apparatus for viewing a video by a user wearing a head-mounted display, wherein the computer is connected to the user from the head-mounted display. Input means for inputting head movement information and detecting the direction in which the user is facing, video processing means for displaying an image corresponding to the direction in which the user is facing on the head-mounted display, facing the user It is characterized by functioning as sound processing means for outputting sound corresponding to the direction.

第３の本発明に係るゲーム装置は、上記映像音声処理プログラムを記憶したプログラム記憶部と、前記プログラム記憶部に記憶された映像音声処理プログラムを実行するコンピュータと、を備える。 A game apparatus according to a third aspect of the present invention includes a program storage unit that stores the video / audio processing program and a computer that executes the video / audio processing program stored in the program storage unit.

本発明によれば、ヘッドマウントディスプレイからの情報に基づいてユーザの向いている方向を検出し、ユーザの向いている方向に応じた映像を表示するとともに、ユーザの向いている方向に応じた音声を出力することで、ＨＭＤを利用したＶＲ空間において、より臨場感が出るようにサウンドを再生することができる。 According to the present invention, the direction in which the user is facing is detected based on the information from the head mounted display, the video according to the direction in which the user is facing is displayed, and the sound according to the direction in which the user is facing. By outputting the sound, it is possible to reproduce the sound so as to give a more realistic feeling in the VR space using the HMD.

本実施形態に係る録音システムの構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of the recording system which concerns on this embodiment. 録音された音声信号の例を示す図である。It is a figure which shows the example of the recorded audio | voice signal. 本実施形態に係るゲームシステムの構成を示す概略図である。It is the schematic which shows the structure of the game system which concerns on this embodiment. ゲーム機のハードウェア構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of a game machine. ヘッドマウントディスプレイのハードウェア構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of a head mounted display. 本実施形態のゲーム装置の機能的な構成を示す機能ブロック図である。It is a functional block diagram which shows the functional structure of the game device of this embodiment. 仮想カメラの向きが特定方向のときに読み出すサウンドデータを説明する図である。It is a figure explaining the sound data read when the direction of a virtual camera is a specific direction. 仮想カメラの向きが領域の境界のときに読み出すサウンドデータを説明する図である。It is a figure explaining the sound data read when the direction of a virtual camera is an area | region boundary. 仮想カメラの向きに応じて出力する音量を調節することを説明する図である。It is a figure explaining adjusting the sound volume output according to the direction of a virtual camera. ステレオでサウンドデータを出力する例を説明する図である。It is a figure explaining the example which outputs sound data by stereo. 仮想空間内に仮想マイクを配置し、効果音が聞こえる方向を説明する図である。It is a figure explaining the direction which arranges a virtual microphone in virtual space and can hear a sound effect. 仮想空間内で鳴らした効果音の音量の算出を説明する図である。It is a figure explaining calculation of the volume of the sound effect sounded within the virtual space. 仮想空間内のサッカーフィールドに仮想マイクを配置した様子を示す図である。It is a figure which shows a mode that the virtual microphone was arrange | positioned to the soccer field in virtual space. サッカーフィールドの一部を表示する例を示す図である。It is a figure which shows the example which displays a part of soccer field. サッカーフィールドの別の一部を表示する例を示す図である。It is a figure which shows the example which displays another part of a soccer field. 本実施形態に係るバーチャルリアリティシステムを含む全体構成図である。It is a whole lineblock diagram containing the virtual reality system concerning this embodiment. マイクアレイの一例を示す図である。It is a figure which shows an example of a microphone array. 本実施形態に係る映像音声処理装置の構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of the audiovisual processing apparatus which concerns on this embodiment. 本実施形態に係る映像音声処理装置の処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a process of the audiovisual processing apparatus which concerns on this embodiment.

＜第１の実施例＞
［録音システム］
図１は、本実施形態に係る録音システムの構成を示す機能ブロック図である。 <First embodiment>
[Recording system]
FIG. 1 is a functional block diagram showing the configuration of the recording system according to the present embodiment.

図１に示す録音システムは、マイクアレイ１０と録音装置２０を備える。 The recording system shown in FIG. 1 includes a microphone array 10 and a recording device 20.

マイクアレイ１０は、同心円上に均等な間隔で配置された６つのマイクＣ，Ｌ，Ｒ，Ｌｓ，Ｒｓ，Ｂを備える。同心円の中心と各マイクＣ，Ｌ，Ｒ，Ｌｓ，Ｒｓ，Ｂを結ぶ線のなす角度は６０度である。マイクＣ，Ｌ，Ｒ，Ｌｓ，Ｒｓ，Ｂは、単一指向性マイクであるとよい。図１では、マイクの数を６つとしたが、これに限らない。本録音システムは６チャンネル分の音声信号を処理できればよいので、５．１ｃｈ用の録音システムを流用することができる。 The microphone array 10 includes six microphones C, L, R, Ls, Rs, and B arranged at equal intervals on a concentric circle. The angle formed by the line connecting the center of the concentric circle and each of the microphones C, L, R, Ls, Rs, B is 60 degrees. The microphones C, L, R, Ls, Rs, and B may be unidirectional microphones. In FIG. 1, the number of microphones is six, but this is not a limitation. Since this recording system only needs to be able to process audio signals for six channels, a 5.1ch recording system can be used.

マイクアレイ１０の周囲で録音したい音を演奏又は出力すると、録音装置２０は、各マイクＣ，Ｌ，Ｒ，Ｌｓ，Ｒｓ，Ｂによって集音された音を録音する。 When a sound to be recorded around the microphone array 10 is played or output, the recording device 20 records the sound collected by each microphone C, L, R, Ls, Rs, B.

録音装置２０は、増幅部２１、アナログデジタル（Ａ／Ｄ）変換部２２、録音部２３、及び蓄積部２４を備える。各マイクＣ，Ｌ，Ｒ，Ｌｓ，Ｒｓ，Ｂからの音声信号は、増幅部２１で増幅され、Ａ／Ｄ変換部２２でデジタルデータに変換される。録音部２３は、各デジタルデータを蓄積部２４に蓄積する。 The recording device 20 includes an amplification unit 21, an analog / digital (A / D) conversion unit 22, a recording unit 23, and a storage unit 24. Audio signals from the microphones C, L, R, Ls, Rs, and B are amplified by the amplifying unit 21 and converted into digital data by the A / D converting unit 22. The recording unit 23 stores each digital data in the storage unit 24.

図２に、録音装置２０が録音した、各マイクＣ，Ｌ，Ｒ，Ｌｓ，Ｒｓ，Ｂで集音された音声信号の例を示す。各マイクＣ，Ｌ，Ｒ，Ｌｓ，Ｒｓ，Ｂが集音した音声は、それぞれ個別にデジタルデータに変換されて蓄積部２４に蓄積される。本実施形態では、６チャンネル分のサウンドデータが蓄積部２４に蓄積される。 FIG. 2 shows an example of audio signals collected by the microphones C, L, R, Ls, Rs, and B recorded by the recording device 20. The sounds collected by the microphones C, L, R, Ls, Rs, and B are individually converted into digital data and stored in the storage unit 24. In the present embodiment, sound data for six channels is stored in the storage unit 24.

以上説明したように、本実施の形態によれば、マイクＣ，Ｌ，Ｒ，Ｌｓ，Ｒｓ，Ｂを同心円上に均等な間隔で配置し、各マイクＣ，Ｌ，Ｒ，Ｌｓ，Ｒｓ，Ｂで集音した音声ごとに録音することで、マイクアレイ１０の周囲の各方向の音声を方向別に録音することができる。 As described above, according to the present embodiment, the microphones C, L, R, Ls, Rs, B are arranged on the concentric circles at equal intervals, and the microphones C, L, R, Ls, Rs, B are arranged. By recording for each voice collected at, voice in each direction around the microphone array 10 can be recorded by direction.

［ゲームシステム］
次に、ユーザがＨＭＤを装着して遊ぶＶＲゲームを例に、上記の録音システムで録音したサウンドを再生する装置について説明する。 [Game system]
Next, an example of a VR game played by a user wearing an HMD will be described as an apparatus for reproducing the sound recorded by the recording system.

ユーザがＨＭＤを装着して遊ぶゲームの一例として一人称視点のゲームがある。一人称視点のゲームでは、ユーザが操作するプレイヤキャラクタの視点で仮想３次元空間が表示される。ユーザが頭を動かすと、ＨＭＤがユーザの頭の動きを検知する。ゲームプログラムは、ＨＭＤからユーザの頭の動きの情報を受信し、ユーザの頭の動きに合わせて仮想３次元空間を撮影する仮想カメラを制御し、ユーザの頭の向きに応じた画像を表示する。本実施形態では、ゲームプログラムが、ユーザの見ている仮想３次元空間内の方向を判定し、その方向のサウンドデータを出力する。 One example of a game that a user plays with wearing an HMD is a first-person view game. In a first-person viewpoint game, a virtual three-dimensional space is displayed from the viewpoint of the player character operated by the user. When the user moves his / her head, the HMD detects the movement of the user's head. The game program receives information on the movement of the user's head from the HMD, controls a virtual camera that captures a virtual three-dimensional space in accordance with the movement of the user's head, and displays an image corresponding to the orientation of the user's head. . In the present embodiment, the game program determines the direction in the virtual three-dimensional space that the user is viewing and outputs sound data in that direction.

図３は、本実施形態に係るゲームシステムの構成を示す概略図である。 FIG. 3 is a schematic diagram showing the configuration of the game system according to the present embodiment.

図３に示すゲームシステムは、ゲーム機１００、コントローラ２００、及びＨＭＤ３００を備える。ゲーム機１００は、ゲームプログラムを実行可能な、演算処理装置及び記憶装置を備えたコンピュータである。ゲーム機１００は、家庭用のゲーム専用機、パーソナルコンピュータ、スマートフォンなどの携帯端末、あるいはアーケードゲーム機であってもよい。コントローラ２００は、ユーザの入力した操作をゲーム機１００へ送信する。コントローラ２００は、有線あるいは無線でゲーム機１００に接続される。ＨＭＤ３００は、ユーザの頭部に装着され、ユーザの頭部の回転角及び動きを検出してゲーム機１００へ送信する。ＨＭＤ３００は、ゲーム機１００から受信した画像を表示し、ゲーム機１００から受信したサウンドを出力する。 The game system shown in FIG. 3 includes a game machine 100, a controller 200, and an HMD 300. The game machine 100 is a computer including an arithmetic processing device and a storage device that can execute a game program. The game machine 100 may be a home game machine, a personal computer, a mobile terminal such as a smartphone, or an arcade game machine. The controller 200 transmits the operation input by the user to the game machine 100. The controller 200 is connected to the game machine 100 by wire or wireless. The HMD 300 is mounted on the user's head, detects the rotation angle and movement of the user's head, and transmits the detected result to the game machine 100. The HMD 300 displays an image received from the game machine 100 and outputs a sound received from the game machine 100.

図４は、ゲーム機１００のハードウェア構成を示すブロック図である。ゲーム機１００は、中央演算処理装置（ＣＰＵ）１０１、ＲｅａｄＯｎｌｙＭｅｍｏｒｙ（ＲＯＭ）１０２、ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ（ＲＡＭ）１０３、ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ（ＧＰＵ）１０４、ＳｏｕｎｄＰｒｏｃｅｓｓｉｎｇＵｎｉｔ（ＳＰＵ）１０５、インタフェース１０６、及びＤＲＩＶＥ１０７を備える。ＣＰＵ１０１は、ゲームプログラムを実行する。ＲＯＭ１０２は、システムプログラムを記憶する。ＲＡＭ１０３は、ゲームプログラム及び各種データを記憶する。ＧＰＵ１０４は、仮想空間の画像を生成する。ＳＰＵ１０５は、音声を処理する。インタフェース１０６は、コントローラ２００及びＨＭＤ３００を接続し、操作情報とＨＭＤ情報を入力する。インタフェース１０６は、ＧＰＵ１０４とＳＰＵ１０５で処理された画像と音声を出力する。ＤＲＩＶＥ１０７は、ゲームプログラムを記録した記録媒体からゲームプログラムを読み出してＲＡＭ１０３に格納する。記録媒体は、ＣＤ、ＤＶＤ、Ｂｌｕ−Ｒａｙ（登録商標）Ｄｉｓｋなどの光ディスクであってもよいし、磁気ディスク、あるいは半導体メモリであってもよい。ゲーム機１００が通信機能を備えて、ネットワークを介してゲームプログラムを取得してもよい。 FIG. 4 is a block diagram illustrating a hardware configuration of the game machine 100. The game machine 100 includes a central processing unit (CPU) 101, a read only memory (ROM) 102, a random access memory (RAM) 103, a graphics processing unit (GPU) 104, a sound processing unit (SPU) 105, an interface 106, and DRIVE 107 is provided. The CPU 101 executes a game program. The ROM 102 stores a system program. The RAM 103 stores a game program and various data. The GPU 104 generates a virtual space image. The SPU 105 processes audio. The interface 106 connects the controller 200 and the HMD 300, and inputs operation information and HMD information. The interface 106 outputs images and sounds processed by the GPU 104 and the SPU 105. The DRIVE 107 reads the game program from the recording medium on which the game program is recorded and stores it in the RAM 103. The recording medium may be an optical disk such as a CD, DVD, Blu-Ray (registered trademark) Disk, a magnetic disk, or a semiconductor memory. The game machine 100 may have a communication function and acquire a game program via a network.

図５は、ＨＭＤ３００のハードウェア構成を示すブロック図である。ＨＭＤ３００は、表示部３０１Ａ，３０１Ｂ、サウンド出力部３０２、ジャイロセンサ３０３、加速度センサ３０４、及び制御部３０５を備える。表示部３０１Ａ，３０１Ｂは、それぞれ右眼用画像、左眼用画像を表示する。右眼用画像と左眼用画像が視差を有すると、ユーザの見る画像は３次元立体画像となる。サウンド出力部３０２は、ゲーム機１００から受信したサウンドを出力する。サウンド出力部３０２は、ＨＭＤ３００が備えるヘッドホンであってもよいし、ヘッドホンなどを接続する音声出力端子であってもよい。ジャイロセンサ３０３は、ユーザの頭部の回転角を検知する。ＨＭＤ３００を装着したユーザの頭部を原点として、ユーザの正面方向にロール軸、ユーザの左方向にピッチ軸、ユーザの頭上方向にヨー軸をとる。ジャイロセンサ３０３は、各軸についてユーザ頭部の回転角（ロール角、ピッチ角、ヨー角）を検知する。加速度センサ３０４は、ユーザの頭部の動きを検知する。制御部３０５は、ゲーム機１００から画像を受信して表示部３０１Ａ，３０１Ｂに表示させ、ゲーム機１００からサウンドを受信してサウンド出力部３０２に出力させる。また、制御部３０５は、ジャイロセンサ３０３及び加速度センサ３０４で検知したデータをゲーム機１００に送信する。 FIG. 5 is a block diagram showing a hardware configuration of the HMD 300. As shown in FIG. The HMD 300 includes display units 301A and 301B, a sound output unit 302, a gyro sensor 303, an acceleration sensor 304, and a control unit 305. The display units 301A and 301B display a right-eye image and a left-eye image, respectively. When the right-eye image and the left-eye image have parallax, the image viewed by the user is a three-dimensional stereoscopic image. The sound output unit 302 outputs the sound received from the game machine 100. The sound output unit 302 may be a headphone included in the HMD 300 or an audio output terminal for connecting a headphone or the like. The gyro sensor 303 detects the rotation angle of the user's head. With the head of the user wearing the HMD 300 as the origin, the roll axis is in the front direction of the user, the pitch axis is in the left direction of the user, and the yaw axis is in the upward direction of the user. The gyro sensor 303 detects the rotation angle (roll angle, pitch angle, yaw angle) of the user's head for each axis. The acceleration sensor 304 detects the movement of the user's head. The control unit 305 receives images from the game machine 100 and displays them on the display units 301 </ b> A and 301 </ b> B, receives sound from the game machine 100, and outputs the sound to the sound output unit 302. In addition, the control unit 305 transmits data detected by the gyro sensor 303 and the acceleration sensor 304 to the game machine 100.

なお、ＨＭＤ３００がサウンド出力部３０２を備えずに、ゲーム機１００にヘッドホンなどを接続してもよい。 Note that the HMD 300 may not be provided with the sound output unit 302, and headphones or the like may be connected to the game machine 100.

［ゲーム装置］
図６は、本実施形態のゲーム装置５０の機能的な構成を示す機能ブロック図である。図６に示すゲーム装置５０は、キャラクタ制御部５１、カメラ制御部５２、レンダリング部５３、サウンド処理部５４、及び蓄積部５５を備える。ゲーム機１００が本実施形態のゲームプログラムを実行することにより、ゲーム機１００は、図４に示す機能部を備えたゲーム装置５０として動作する。ゲーム機１００は、図示していない機能も発揮するが、ここでは本実施形態に関係する機能部のみを図示している。 [Game device]
FIG. 6 is a functional block diagram showing a functional configuration of the game apparatus 50 according to the present embodiment. The game device 50 shown in FIG. 6 includes a character control unit 51, a camera control unit 52, a rendering unit 53, a sound processing unit 54, and a storage unit 55. When the game machine 100 executes the game program according to the present embodiment, the game machine 100 operates as the game apparatus 50 including the functional unit illustrated in FIG. Although the game machine 100 also exhibits functions that are not shown, only functional units related to the present embodiment are illustrated here.

キャラクタ制御部５１は、ユーザの操作に従ってプレイヤキャラクタの位置及び向きを制御する。 The character control unit 51 controls the position and orientation of the player character in accordance with a user operation.

カメラ制御部５２は、プレイヤキャラクタの向きとＨＭＤ３００から受信したＨＭＤ情報に含まれるユーザの頭の向きに基づいて仮想カメラの向きを設定する。例えば、プレイヤキャラクタが仮想３次元空間内の北方向を向いており、ユーザの頭が右方向に旋回した場合、カメラ制御部５２は、仮想カメラを北方向から東方向に旋回させる。 The camera control unit 52 sets the orientation of the virtual camera based on the orientation of the player character and the orientation of the user's head included in the HMD information received from the HMD 300. For example, when the player character faces the north direction in the virtual three-dimensional space and the user's head turns to the right, the camera control unit 52 turns the virtual camera from the north to the east.

レンダリング部５３は、仮想３次元空間を仮想カメラで撮影した２次元画像を生成する。レンダリング部５３が、視差を有する右眼用画像と左眼用画像を生成し、ＨＭＤ３００が各画像を表示部３０１Ａ，３０１Ｂのそれぞれで表示した場合は、ユーザはＨＭＤ３００で３次元立体画像を見ることができる。 The rendering unit 53 generates a two-dimensional image obtained by photographing a virtual three-dimensional space with a virtual camera. When the rendering unit 53 generates a right-eye image and a left-eye image having parallax and the HMD 300 displays each image on each of the display units 301A and 301B, the user views a three-dimensional stereoscopic image on the HMD 300. Can do.

サウンド処理部５４は、仮想カメラの向きに応じた方向のサウンドデータを蓄積部５５から読み出して出力する。また、サウンド処理部５４は、仮想３次元空間内で発生する効果音が聞こえる方向を判定し、仮想カメラの向きと効果音が聞こえる方向に基づいて効果音を出力する音量を決定する。 The sound processing unit 54 reads out sound data in a direction corresponding to the direction of the virtual camera from the storage unit 55 and outputs the sound data. Further, the sound processing unit 54 determines the direction in which the sound effect generated in the virtual three-dimensional space can be heard, and determines the sound volume for outputting the sound effect based on the direction of the virtual camera and the direction in which the sound effect can be heard.

蓄積部５５は、仮想３次元空間内の各方向に対応するサウンドデータ及び効果音データを蓄積する。本実施形態では、蓄積部５５は、図１のマイクＣ，Ｌ，Ｒ，Ｌｓ，Ｒｓ，Ｂの向きに対応した、センター（Ｃ）チャンネル、レフト（Ｌ）チャンネル、ライト（Ｒ）チャンネル、レフトサラウンド（Ｌｓ）チャンネル、ライトサラウンド（Ｒｓ）チャンネル、及びバック（Ｂ）チャンネルのサウンドデータを蓄積する。 The accumulation unit 55 accumulates sound data and sound effect data corresponding to each direction in the virtual three-dimensional space. In the present embodiment, the storage unit 55 includes a center (C) channel, a left (L) channel, a right (R) channel, and a left corresponding to the directions of the microphones C, L, R, Ls, Rs, and B in FIG. Sound data of the surround (Ls) channel, the light surround (Rs) channel, and the back (B) channel is stored.

［サウンド処理］
次に、サウンド処理部が仮想カメラの向きに応じた方向のサウンドデータを出力する処理について説明する。 [Sound processing]
Next, a process in which the sound processing unit outputs sound data in a direction corresponding to the direction of the virtual camera will be described.

サウンド処理部５４は、仮想３次元空間のヨー軸（プレイヤキャラクタの頭上方向）回りの全周を６つの領域に分割し、仮想３次元空間の北方向の領域にＣチャンネルを対応させて、時計回りに、Ｒチャンネル、Ｒｓチャンネル、Ｂチャンネル、Ｌｓチャンネル、及びＬチャンネルを各領域に対応付ける。サウンド処理部５４は、仮想カメラの向きに対応する領域のサウンドデータを出力する。 The sound processing unit 54 divides the entire circumference around the yaw axis (upward direction of the player character) of the virtual three-dimensional space into six regions, and associates the C channel with the north region of the virtual three-dimensional space, Around, the R channel, Rs channel, B channel, Ls channel, and L channel are associated with each region. The sound processing unit 54 outputs sound data in an area corresponding to the direction of the virtual camera.

例えば、図７Ａに示すように、仮想カメラの向きが北向きの場合は、サウンド処理部５４は、Ｃチャンネルのサウンドデータを読み出して出力する。 For example, as shown in FIG. 7A, when the virtual camera is facing north, the sound processing unit 54 reads out and outputs sound data of the C channel.

サウンド処理部５４は、仮想カメラの向きに応じて複数のチャンネルのサウンドデータを合成して出力してもよい。例えば、仮想カメラが領域の境界を向いている場合は、境界を挟んだ両方の領域のチャンネルのサウンドデータを５０％ずつの大きさで出力する。図７Ｂの例では、仮想カメラは、Ｃチャンネルの領域とＲチャンネルの領域の境界を向いている。この場合、サウンド処理部５４は、ＣチャンネルのサウンドデータとＲチャンネルのサウンドデータをそれぞれ５０％の大きさで出力する。また、図７Ｃの例では、仮想カメラは、Ｂチャンネルの方向とＬｓチャンネルの方向の間のＬｓチャンネル寄りの方向を向いているので、サウンド処理部５４は、Ｌｓチャンネルのサウンドデータを７５％、Ｂチャンネルのサウンドデータを２５％の大きさで出力する。より具体的に説明すると、仮想カメラがＢチャンネルを向いているときの数値を０、仮想カメラがＬｓチャンネルを向いているときの数値を１００として、ＢチャンネルとＬｓチャンネルの間の仮想カメラの向きを０から１００までの数値で表す。サウンド処理部５４は、仮想カメラの向きを表す数値に基づいて各チャンネルの音量を決定する。図７Ｃの例では、仮想カメラの向きを表す数値は７５である。したがって、サウンド処理部５４は、Ｌｓチャンネルのサウンドデータを７５％、Ｂチャンネルのサウンドデータを２５％の大きさで出力する。 The sound processing unit 54 may synthesize and output sound data of a plurality of channels according to the orientation of the virtual camera. For example, when the virtual camera is facing the boundary of the area, the sound data of the channels in both areas sandwiching the boundary is output in a size of 50%. In the example of FIG. 7B, the virtual camera faces the boundary between the C channel region and the R channel region. In this case, the sound processing unit 54 outputs the C channel sound data and the R channel sound data at a size of 50%. In the example of FIG. 7C, since the virtual camera faces the direction closer to the Ls channel between the direction of the B channel and the direction of the Ls channel, the sound processing unit 54 converts the sound data of the Ls channel to 75%, B channel sound data is output at 25% size. More specifically, the orientation of the virtual camera between the B channel and the Ls channel is assumed to be 0 when the virtual camera is facing the B channel and 100 when the virtual camera is facing the Ls channel. Is represented by a numerical value from 0 to 100. The sound processing unit 54 determines the volume of each channel based on a numerical value indicating the direction of the virtual camera. In the example of FIG. 7C, the numerical value indicating the direction of the virtual camera is 75. Therefore, the sound processing unit 54 outputs the sound data of the Ls channel in a size of 75% and the sound data of the B channel in a size of 25%.

続いて、ステレオで再生する処理について説明する。 Next, processing for reproducing in stereo will be described.

サウンド処理部５４は、仮想カメラの向きを基準として、ステレオの左チャンネル、右チャンネルのそれぞれから出力するサウンドデータを決定してもよい。 The sound processing unit 54 may determine sound data to be output from each of the stereo left channel and the right channel on the basis of the orientation of the virtual camera.

図８の例では、サウンド処理部５４は、仮想カメラの向きを基準として、反時計回りに３０度回転した方向に基づいてステレオの左チャンネルから出力するサウンドデータを決定し、時計回りに３０度回転した方向に基づいてステレオの右チャンネルから出力するサウンドデータを決定する。ステレオの左チャンネルの基準は、Ｃチャンネルの領域とＬチャンネルの領域の境界を指すので、ステレオの左チャンネルからはＣチャンネルとＬチャンネルのサウンドデータをそれぞれ５０％の大きさで出力する。ステレオの右チャンネルの基準は、Ｃチャンネルの領域とＲチャンネルの境界を指すので、ステレオの右チャンネルからはＣチャンネルとＲチャンネルのサウンドデータをそれぞれ５０％の大きさで出力する。 In the example of FIG. 8, the sound processing unit 54 determines sound data to be output from the stereo left channel based on the direction rotated 30 degrees counterclockwise with reference to the orientation of the virtual camera, and 30 degrees clockwise. The sound data to be output from the right channel of stereo is determined based on the rotated direction. Since the standard of the stereo left channel indicates the boundary between the C channel region and the L channel region, the C channel and L channel sound data are output from the stereo left channel at a size of 50%. Since the stereo right channel reference indicates the boundary between the C channel region and the R channel, the stereo channel right channel outputs the sound data of the C channel and the R channel at 50% each.

なお、上記で示した角度や割合は一例であって、これ以外の値を用いてもよい。 Note that the angles and ratios shown above are examples, and other values may be used.

［効果音の処理］
次に、サウンド処理部が効果音を出力する処理について説明する。 [Sound effect processing]
Next, processing in which the sound processing unit outputs sound effects will be described.

これまでは、仮想カメラの向きに応じて、仮想３次元空間内の各方向に対応するサウンドデータを出力する処理について説明した。音が鳴る位置や方向が予め分かっている場合は、その方向のサウンドデータを事前に用意できる。仮想３次元空間内での位置が予め決められていないオブジェクトが鳴らす音は、どの方向のサウンドデータとして用意すれば良いのか不明である。 So far, the process of outputting sound data corresponding to each direction in the virtual three-dimensional space according to the orientation of the virtual camera has been described. If the position and direction in which sound is produced is known in advance, sound data in that direction can be prepared in advance. It is unclear which direction sound data should be prepared for the sound of an object whose position in the virtual three-dimensional space is not determined in advance.

そこで、本実施形態では、サウンド処理部５４は、各方向に対応した複数の仮想マイクを仮想３次元空間内に設定しておき、仮想３次元空間内のオブジェクトが音を鳴らしたときに、その音を集音可能な仮想マイクと音の大きさを決定する。そして、サウンド処理部５４は、集音できた仮想マイクの方向で音が鳴ったものとし、仮想カメラの向きに応じて音を出力する。なお、仮想マイクと仮想カメラ間の距離は例えば５０ｃｍとするが、仮想３次元空間の広さに応じて任意に設定できる。 Therefore, in the present embodiment, the sound processing unit 54 sets a plurality of virtual microphones corresponding to each direction in the virtual three-dimensional space, and when an object in the virtual three-dimensional space sounds, A virtual microphone that can collect sound and the volume of sound are determined. Then, the sound processing unit 54 outputs sound according to the direction of the virtual camera, assuming that sound is produced in the direction of the virtual microphone that has collected sound. The distance between the virtual microphone and the virtual camera is 50 cm, for example, but can be arbitrarily set according to the size of the virtual three-dimensional space.

まず、効果音が聞こえる方向を決定する処理について説明する。 First, the process for determining the direction in which the sound effect can be heard will be described.

図９は、効果音が聞こえる方向を説明する図である。図９の例では、点Ｐにおいて効果音が鳴るとする。図中の円６０は、点Ｐで鳴る効果音が聞こえる範囲を示す。サウンド処理部５４は、仮想３次元空間内に６つの仮想マイクＣ，Ｌ，Ｒ，Ｌｓ，Ｒｓ，Ｂを設定する。図９の例では、２つの仮想マイクＲ，Ｒｓで効果音が集音される。したがって、サウンド処理部５４は、点Ｐの効果音をＲチャンネルとＲｓチャンネルの方向で鳴る効果音として扱う。効果音の大きさは、点Ｐから仮想マイクＲ，Ｒｓまでの距離に応じて減衰させる。例えば、仮想マイクＲで集音される効果音は８０％減衰されて、２０％の音量で聞こえるとする。 FIG. 9 is a diagram for explaining the direction in which a sound effect can be heard. In the example of FIG. 9, it is assumed that a sound effect sounds at the point P. A circle 60 in the figure indicates a range where a sound effect produced at the point P can be heard. The sound processing unit 54 sets six virtual microphones C, L, R, Ls, Rs, and B in the virtual three-dimensional space. In the example of FIG. 9, sound effects are collected by the two virtual microphones R and Rs. Therefore, the sound processing unit 54 treats the sound effect at the point P as a sound effect sounded in the directions of the R channel and the Rs channel. The magnitude of the sound effect is attenuated according to the distance from the point P to the virtual microphones R and Rs. For example, it is assumed that the sound effect collected by the virtual microphone R is attenuated by 80% and can be heard at a volume of 20%.

続いて、仮想マイクで集音された効果音を出力する処理について説明する。 Next, processing for outputting sound effects collected by the virtual microphone will be described.

仮想マイクで集音された効果音は、仮想カメラの向きに応じて、出力の有無、及び音量が決定される。 The sound effect collected by the virtual microphone determines whether or not the sound is output and the volume according to the direction of the virtual camera.

図１０は、効果音の音量の算出を説明する図である。図１０の例では、仮想カメラがＣチャンネルの領域とＲチャンネルの領域の境界を向いている。この場合、サウンド処理部５４は、ＣチャンネルとＲチャンネルを５０％の音量で出力する。したがって、サウンド処理部５４は、仮想マイクＲで集音した効果音を５０％の音量で出力する。点Ｐの効果音の音量をＶｐとし、仮想マイクＲでは点Ｐの効果音が２０％の音量で聞こえたとすると、サウンド処理部５４は、点Ｐの効果音を５０％×２０％×Ｖｐの音量で出力する。 FIG. 10 is a diagram for explaining the calculation of the sound effect volume. In the example of FIG. 10, the virtual camera faces the boundary between the C channel region and the R channel region. In this case, the sound processing unit 54 outputs the C channel and the R channel at a volume of 50%. Therefore, the sound processing unit 54 outputs the sound effect collected by the virtual microphone R at a volume of 50%. If the sound volume of the sound effect at point P is Vp and the sound effect at point P is heard at a volume of 20% in the virtual microphone R, the sound processing unit 54 sets the sound effect at point P to 50% × 20% × Vp. Output at volume.

＜第２の実施例＞
第２の実施例について説明する。第２の実施例はユーザが観客席からサッカーゲームを見ている例である。 <Second embodiment>
A second embodiment will be described. The second embodiment is an example in which a user is watching a soccer game from a spectator seat.

サッカーゲームでは、ゲームプログラムは、サッカーフィールド全体を表示せずに、ＨＭＤ３００を装着したユーザの頭の動きに応じてサッカーフィールドの一部を表示する。例えば、仮想カメラの位置をサッカーフィールドのセンターラインの延長上の観客席に設定し、ユーザの頭の動きに合わせて仮想カメラの向きを制御して表示する位置を変更する。ユーザが左方向を向いた場合は左方向のゴール付近を表示し、ユーザが正面を向いた場合はセンターライン付近を表示する。 In the soccer game, the game program displays a part of the soccer field according to the movement of the head of the user wearing the HMD 300 without displaying the entire soccer field. For example, the position of the virtual camera is set to the audience seat on the extension of the center line of the soccer field, and the display position is changed by controlling the direction of the virtual camera in accordance with the movement of the user's head. When the user faces the left direction, the vicinity of the goal in the left direction is displayed, and when the user faces the front, the vicinity of the center line is displayed.

サッカーゲームにおいても仮想カメラの向き、つまり表示する映像に合わせて出力するチャンネルを決定する。 Also in the soccer game, the direction of the virtual camera, that is, the channel to be output is determined according to the video to be displayed.

図１１は、ゲーム内のサッカーフィールドに仮想マイクを配置した様子を示す図である。同図の例では、サッカーフィールドの周囲に仮想マイクを配置し、仮想マイクを９つのグループＬ１〜Ｌ３，Ｃ１〜Ｃ３，Ｒ１〜Ｒ３に分けた。サッカーフィールド内で発生した音は仮想マイクによって集音される。例えば、点Ｐの効果音は、円６０で示す範囲まで聞こえて、グループＬ２，Ｌ３，Ｃ１の仮想マイクで集音される。点Ｐの効果音は、点Ｐから仮想マイクまでの距離に応じて減衰する。 FIG. 11 is a diagram illustrating a state in which a virtual microphone is arranged on a soccer field in a game. In the example of the figure, virtual microphones are arranged around the soccer field, and the virtual microphones are divided into nine groups L1 to L3, C1 to C3, and R1 to R3. Sound generated in the soccer field is collected by a virtual microphone. For example, the sound effect at the point P is heard up to the range indicated by the circle 60 and collected by the virtual microphones of the groups L2, L3, and C1. The sound effect at the point P attenuates according to the distance from the point P to the virtual microphone.

仮想マイクで集音された音は、仮想カメラの向きに応じて、各グループＬ１〜Ｌ３，Ｃ１〜Ｃ３，Ｒ１〜Ｒ３毎に、出力の有無、及び音量が決定される。 Whether or not the sound collected by the virtual microphone is output and the volume is determined for each of the groups L1 to L3, C1 to C3, and R1 to R3 according to the direction of the virtual camera.

例えば、仮想カメラの向きを０から１００までの数値で表す。仮想カメラが最も左に向いているときの数値を０、中央を向いているときの数値を５０、最も右に向いているとき数値を１００とする。各グループＬ１〜Ｌ３，Ｃ１〜Ｃ３，Ｒ１〜Ｒ３の音量を仮想カメラの向きを表す数値に基づいて決める。 For example, the direction of the virtual camera is represented by a numerical value from 0 to 100. The numerical value when the virtual camera is facing the leftmost is 0, the numerical value when the virtual camera is facing the center is 50, and the numerical value is 100 when the virtual camera is facing the rightmost. The volume of each of the groups L1 to L3, C1 to C3, and R1 to R3 is determined based on a numerical value representing the direction of the virtual camera.

サッカーフィールドの左端に配置されたマイクを含むグループＬ１は、数値が０のときに最大の音量で、数値が１００に近づくにつれて音量が小さくなるとする。また、グループＬ２，Ｌ３は、数値がそれぞれ１０，２０のときに最大の音量で、数値が０又は１００に近づくにつれて音量が小さくなるとする。他のグループＣ１〜Ｃ３，Ｒ１〜Ｒ３についても、仮想カメラの向きを表す数値と音量の関係を決めておく。 The group L1 including the microphone arranged at the left end of the soccer field has a maximum volume when the numerical value is 0, and the volume decreases as the numerical value approaches 100. The groups L2 and L3 have the maximum volume when the numerical values are 10 and 20, respectively, and the volume decreases as the numerical values approach 0 or 100. For the other groups C1 to C3 and R1 to R3, the relationship between the numerical value representing the direction of the virtual camera and the volume is determined.

ユーザが左方向を向き、仮想カメラが最も左に向いているときは、図１２Ａに示すように、左方向のゴール付近が表示される。このとき、仮想カメラの向きを表す数値は０である。サウンド処理部５４は、数値が０のときの各グループＬ１〜Ｌ３，Ｃ１〜Ｃ３，Ｒ１〜Ｒ３の音量を求める。求めた音量が、グループＬ１〜Ｌ３はそれぞれ１００％，９０％，８０％の大きさで、他のグループＣ１〜Ｃ３，Ｒ１〜Ｒ３は０％の大きさであったとすると、サウンド処理部５４は、グループＬ１，Ｌ２，Ｌ３それぞれの仮想マイクで集音する音を１００％，９０％，８０％の音量で出力する。別の例としては、図１２Ｂに示すように、サッカーフィールドの中央が表示されているときは、サウンド処理部５４は、グループＣ１，Ｃ２，Ｃ３それぞれの仮想マイクで集音する音を９０％，１００，９０％の音量で出力する。なお、観客の歓声があがる場合は、仮想カメラの向きにかかわらず、大音量で再生してもよい。 When the user is facing left and the virtual camera is facing left, the vicinity of the goal in the left direction is displayed as shown in FIG. 12A. At this time, the numerical value indicating the orientation of the virtual camera is zero. The sound processing unit 54 obtains the volume of each of the groups L1 to L3, C1 to C3, and R1 to R3 when the numerical value is zero. If the obtained volumes are 100%, 90%, and 80% for the groups L1 to L3, respectively, and the other groups C1 to C3, R1 to R3 are 0%, the sound processing unit 54 The sounds collected by the virtual microphones of the groups L1, L2, and L3 are output at a volume of 100%, 90%, and 80%. As another example, as shown in FIG. 12B, when the center of the soccer field is displayed, the sound processing unit 54 collects 90% of the sounds collected by the virtual microphones of the groups C1, C2, and C3. Output at 100,90% volume. When the audience cheers, the sound may be played back at a high volume regardless of the orientation of the virtual camera.

以上説明したように、本実施の形態によれば、ユーザがＨＭＤ３００を装着して遊ぶゲームにおいて、カメラ制御部５２が、ＨＭＤ３００の検知したユーザの頭の動きに応じて仮想カメラを制御し、仮想空間内の各方向のサウンドデータを用意しておき、サウンド処理部５４が、仮想カメラの向きに応じた方向のサウンドデータを出力することにより、ユーザの見ている映像に応じたサウンドが出力されるので、より臨場感が出るようにサウンドを再生することができる。 As described above, according to the present embodiment, in a game where the user wears the HMD 300 and plays, the camera control unit 52 controls the virtual camera according to the movement of the user's head detected by the HMD 300, and the virtual Sound data in each direction in the space is prepared, and the sound processing unit 54 outputs sound data in a direction corresponding to the direction of the virtual camera, so that a sound corresponding to the video viewed by the user is output. Therefore, it is possible to reproduce the sound so that it feels more realistic.

本実施の形態によれば、サウンド処理部５４が、効果音を鳴らす場合、各方向に対応した複数の仮想マイクを設定し、効果音を集音可能な仮想マイクを判定し、効果音を集音できる仮想マイクの方向で効果音が鳴ってものとして、仮想カメラの向きに応じて効果音を出力することにより、より臨場感のあるサウンドを再生することができる。なお、第１の実施例と第２の実施例を組み合わせてもよい。例えば、サッカーフィールド上で発生する音に関しては第２の実施例によって出力し、観客席で発生する音に関しては第１の実施例によって出力する。 According to the present embodiment, when the sound processing unit 54 plays a sound effect, it sets a plurality of virtual microphones corresponding to each direction, determines a virtual microphone capable of collecting the sound effects, and collects the sound effects. Even if a sound effect sounds in the direction of the virtual microphone that can make sound, by outputting the sound effect according to the direction of the virtual camera, a more realistic sound can be reproduced. The first embodiment and the second embodiment may be combined. For example, sounds generated on a soccer field are output according to the second embodiment, and sounds generated at a spectator seat are output according to the first embodiment.

＜第３の実施例＞
第３の実施例について説明する。第３の実施例は、ユーザがＨＭＤを装着し、カメラで撮影された映像を視聴するバーチャルリアリティシステムの例である。 <Third embodiment>
A third embodiment will be described. The third embodiment is an example of a virtual reality system in which a user wears an HMD and views a video shot by a camera.

本バーチャルリアリティシステムは、コンピュータが作り出した仮想３次元空間ではなく、カメラで撮影した映像をＨＭＤに表示するとともに、撮影環境で集音した音声をＨＭＤに表示した映像に合わせて出力する。 This virtual reality system displays not only a virtual three-dimensional space created by a computer but also video captured by a camera on the HMD and outputs audio collected in the shooting environment in accordance with the video displayed on the HMD.

図１３は、本実施形態に係るバーチャルリアリティシステムを含む全体構成図である。 FIG. 13 is an overall configuration diagram including a virtual reality system according to the present embodiment.

本実施形態に係るバーチャルリアリティシステムは、映像音声処理装置７０とＨＭＤ３００を備える。映像音声処理装置７０は、ＨＭＤ３００を装着したユーザの頭の動きに応じて、カメラ３０で撮影されたパノラマ映像の一部を切り出してＨＭＤ３００に表示するとともに、ＨＭＤ３００で表示される映像に合わせてマイクアレイ１０で集音された音声を出力する。 The virtual reality system according to the present embodiment includes a video / audio processing device 70 and an HMD 300. The video / audio processing device 70 cuts out a part of the panoramic video shot by the camera 30 and displays it on the HMD 300 according to the movement of the head of the user wearing the HMD 300, and a microphone according to the video displayed on the HMD 300. The sound collected by the array 10 is output.

マイクアレイ１０は、撮影環境に配置された複数のマイクで構成され、撮影環境の音声を集音して送信する。図１４は、マイクアレイ１０の一例を示す図である。同図に示すマイクアレイ１０は、サッカーフィールドの周囲に配置された複数のマイクで構成される。複数のマイクは、９つのグループＬ１〜Ｌ３，Ｃ１〜Ｃ３，Ｒ１〜Ｒ３に分けられている。マイクアレイ１０は、各グループのマイクで入力した音声信号をグループ毎にネットワークを介して映像音声処理装置７０へ送信する。 The microphone array 10 is composed of a plurality of microphones arranged in the shooting environment, and collects and transmits the sound of the shooting environment. FIG. 14 is a diagram illustrating an example of the microphone array 10. The microphone array 10 shown in the figure is composed of a plurality of microphones arranged around the soccer field. The plurality of microphones are divided into nine groups L1 to L3, C1 to C3, and R1 to R3. The microphone array 10 transmits the audio signal input from the microphones of each group to the video / audio processing device 70 via the network for each group.

カメラ３０は、サッカーフィールドのセンターラインの延長上の観客席に配置され、サッカーフィールド全体を撮影し、撮影した映像をネットワークを介して映像音声処理装置７０へ送信する。カメラ３０として、サッカーフィールド全体をパノラマ撮影できるカメラを利用してもよいし、複数のカメラでサッカーフィールドの各領域を撮影し、撮影した映像を合成してサッカーフィールド全体の映像としてもよい。 The camera 30 is arranged in a spectator seat on the extension of the center line of the soccer field, captures the entire soccer field, and transmits the captured video to the video / audio processing device 70 via the network. As the camera 30, a camera capable of panoramic shooting of the entire soccer field may be used, or each area of the soccer field may be shot with a plurality of cameras, and the shot videos may be combined to form a video of the entire soccer field.

映像音声処理装置７０は、カメラ３０からサッカーフィールド全体を撮影した映像を受信し、ＨＭＤ３００を装着したユーザの頭の向きに応じて、受信した映像から一部を切り出してＨＭＤ３００に表示させる。また、映像音声処理装置７０は、マイクアレイ１０から各グループの音声信号を受信し、ＨＭＤ３００を装着したユーザの頭の向きに応じて音声信号を選択し、選択した音声信号をミキシングして出力する。 The video / audio processing device 70 receives a video of the entire soccer field from the camera 30, and cuts out a part of the received video and displays it on the HMD 300 according to the orientation of the head of the user wearing the HMD 300. Also, the audio / video processing device 70 receives audio signals of each group from the microphone array 10, selects audio signals according to the direction of the head of the user wearing the HMD 300, mixes the selected audio signals, and outputs them. .

図１５は、本実施形態に係る映像音声処理装置７０の構成を示す機能ブロック図である。同図に示す映像音声処理装置７０は、ミキサー７１、音声処理部７２、映像入力部７３、映像処理部７４、及び制御部７５を備える。映像音声処理装置７０は、演算処理装置、記憶装置等を備えたコンピュータにより構成して、各部の処理がプログラムによって実行されるものとしてもよい。このプログラムは映像音声処理装置７０は、が備える記憶装置に記憶されており、磁気ディスク、光ディスク、半導体メモリ等の記録媒体に記録することも、ネットワークを通して提供することも可能である。 FIG. 15 is a functional block diagram showing the configuration of the video / audio processing apparatus 70 according to the present embodiment. The video / audio processing apparatus 70 shown in the figure includes a mixer 71, an audio processing unit 72, a video input unit 73, a video processing unit 74, and a control unit 75. The video / audio processing device 70 may be configured by a computer including an arithmetic processing device, a storage device, and the like, and the processing of each unit may be executed by a program. This program is stored in a storage device included in the video / audio processing apparatus 70, and can be recorded on a recording medium such as a magnetic disk, an optical disk, or a semiconductor memory, or provided through a network.

ミキサー７１は、マイクアレイ１０からグループ単位で音声信号を入力し、音声選択情報に基づいて音声信号をミキシングする。音声選択情報は、音声処理部７２から受信する情報であり、グループを指定する情報を含む。例えば、音声選択情報がグループＬ１を示す場合、ミキサー７１は、グループＬ１の音声信号の入力をオンとし、他のグループの音声信号の入力をオフとする。音声選択情報は、複数のグループを含んでもよいし、オンとするグループについて入力する音声信号の大きさの割合を含んでもよい。例えば、音声選択情報がグループＬ１を１００％、グループＬ２を９０％という情報を含む場合、ミキサー７１は、グループＬ１の音声信号を１００％の大きさで入力し、グループＬ２の音声信号を９０％の大きさで入力して、グループＬ１，Ｌ２の音声信号をミキシングする。 The mixer 71 inputs audio signals in units of groups from the microphone array 10 and mixes the audio signals based on the audio selection information. The audio selection information is information received from the audio processing unit 72 and includes information for specifying a group. For example, when the audio selection information indicates the group L1, the mixer 71 turns on the input of the audio signal of the group L1 and turns off the input of the audio signal of the other group. The audio selection information may include a plurality of groups, or may include a proportion of the magnitude of the audio signal input for the group to be turned on. For example, when the audio selection information includes information indicating that the group L1 is 100% and the group L2 is 90%, the mixer 71 inputs the audio signal of the group L1 with a magnitude of 100% and the audio signal of the group L2 is 90%. The audio signals of the groups L1 and L2 are mixed.

音声処理部７２は、制御部７５から受信したユーザの向いている方向に基づいて、ミキシングする音声信号を選択するための音声選択情報を生成してミキサー７１へ送信する。音声処理部７２は、上述のサッカーゲームにおいて各グループの音量を決定する処理と同様に、各グループについて入力する音声信号の大きさを決定して音声選択情報を生成する。例えば、ユーザが左方向を向いているときは、音声処理部７２は、マイクアレイ１０の左側のグループＬ１〜Ｌ３を選択する音声選択情報を生成する。また、音声処理部７２は、ミキサー７１でミキシングされた音声信号を受信してＨＭＤ３００へ送信する。 The audio processing unit 72 generates audio selection information for selecting an audio signal to be mixed based on the direction the user is facing received from the control unit 75 and transmits the audio selection information to the mixer 71. Similar to the process of determining the volume of each group in the above-described soccer game, the sound processing unit 72 determines the size of the sound signal input for each group and generates sound selection information. For example, when the user is facing the left direction, the sound processing unit 72 generates sound selection information for selecting the groups L1 to L3 on the left side of the microphone array 10. Also, the audio processing unit 72 receives the audio signal mixed by the mixer 71 and transmits it to the HMD 300.

映像入力部７３は、カメラ３０から受信したパノラマ映像をデコードする。 The video input unit 73 decodes the panoramic video received from the camera 30.

映像処理部７４は、制御部７５から受信したユーザの向いている方向に基づいて、デコードしたパノラマ映像から表示する映像を切り出してＨＭＤ３００へ送信する。例えば、ユーザが左方向を向いている場合は、映像処理部７４は、パノラマ映像の左側から映像を切り出してＨＭＤ３００へ送信する。 The video processing unit 74 cuts out a video to be displayed from the decoded panoramic video based on the direction of the user that is received from the control unit 75 and transmits the video to the HMD 300. For example, when the user is facing left, the video processing unit 74 cuts out the video from the left side of the panoramic video and transmits it to the HMD 300.

制御部７５は、ＨＭＤ３００から受信したＨＭＤ情報に基づいてユーザの頭の方向を判定し、ユーザの向いている方向を音声処理部７２と映像処理部７４へ送信する。 The control unit 75 determines the direction of the user's head based on the HMD information received from the HMD 300, and transmits the direction in which the user is facing to the audio processing unit 72 and the video processing unit 74.

次に、本実施形態に係る映像音声処理装置７０の動作について説明する。 Next, the operation of the video / audio processing apparatus 70 according to the present embodiment will be described.

図１６は、本実施形態に係る映像音声処理装置７０の処理の流れを示すフローチャートである。 FIG. 16 is a flowchart showing a process flow of the video / audio processing apparatus 70 according to the present embodiment.

制御部７５は、ＨＭＤ３００からユーザの頭の向きを含むＨＭＤ情報を入力し、ユーザの向いている方向を検出する（ステップＳ１１）
映像処理部７４は、制御部７５からユーザの向いている方向を受信し、ユーザの向いている方向に基づいて、映像入力部７３が受信したパノラマ映像から切り出す領域を決定する（ステップＳ１２）。例えば、ユーザが左方向を向いている場合は、サッカーフィールド全体の映像の左側の領域を切り出してＨＭＤ３００へ出力する。ユーザが正面を向いている場合は、サッカーフィールド全体の映像の中央の領域を切り出してＨＭＤ３００へ出力する。 The control unit 75 inputs HMD information including the orientation of the user's head from the HMD 300, and detects the direction in which the user is facing (step S11).
The video processing unit 74 receives the direction in which the user is facing from the control unit 75, and determines an area to be cut out from the panoramic video received by the video input unit 73 based on the direction in which the user is facing (step S12). For example, if the user is facing left, the left area of the entire soccer field video is cut out and output to the HMD 300. When the user is facing the front, the central area of the video of the entire soccer field is cut out and output to the HMD 300.

音声処理部７２は、制御部７５からユーザの向いている方向を受信し、ユーザの向いている方向に基づいて、ミキサー７１がミキシングする音声信号を決定する（ステップＳ１３）。例えば、ユーザが左方向を向いている場合は、サッカーフィールドの左側の音声が出力されるようにミキシングする音声信号を決定する。ユーザが正面を向いている場合は、サッカーフィールド全体の映像の中央の音声が出力されるようにミキシングする音声信号を決定する。 The sound processing unit 72 receives the direction in which the user is facing from the control unit 75, and determines the sound signal to be mixed by the mixer 71 based on the direction in which the user is facing (step S13). For example, when the user is facing the left direction, the audio signal to be mixed is determined so that the audio on the left side of the soccer field is output. When the user is facing the front, the audio signal to be mixed is determined so that the audio at the center of the video of the entire soccer field is output.

音声処理部７２がユーザの向いている方向に基づいてミキシングする音声信号を決定することで、ＨＭＤ３００で表示される映像に合った音声を出力できる。 When the audio processing unit 72 determines the audio signal to be mixed based on the direction in which the user is facing, the audio suitable for the video displayed on the HMD 300 can be output.

なお、３６０度全天周カメラにより３６０度の映像を撮影し、図１の録音システムのマイクアレイのように、同心円上に均等な間隔でマイクを配置したマイクアレイで集音してもよい。この場合も、映像音声処理装置７０は、ＨＭＤ３００からの情報に基づいてユーザの向いている方向を特定し、ユーザの向いている方向の映像をＨＭＤ３００で表示するとともに、ユーザの向いている方向のマイクが入力する音声信号を出力する。 Note that 360-degree images may be taken by a 360-degree omnidirectional camera and collected by a microphone array in which microphones are arranged at equal intervals on concentric circles, like the microphone array of the recording system of FIG. Also in this case, the video / audio processing device 70 specifies the direction in which the user is facing based on the information from the HMD 300, displays the video in the direction in which the user is facing on the HMD 300, and also displays the direction in which the user is facing. Outputs the audio signal input by the microphone.

以上説明したように、本実施の形態によれば、ユーザがヘッドマウントディスプレイを装着し、カメラで撮影された映像を視聴するバーチャルリアリティシステムにおいて、制御部７５がＨＭＤ３００からの情報に基づいてユーザの向いている方向を検出し、映像処理部７４がユーザの向いている方向の映像を表示し、音声処理部７２が、ユーザの向いている方向に基づき、マイクアレイ１０から受信した複数の音声信号のなかからミキシングする音声信号を選択することにより、ＨＭＤ３００で表示される映像に合った音声を出力することができる。 As described above, according to the present embodiment, in a virtual reality system in which a user wears a head-mounted display and views an image captured by a camera, the control unit 75 is based on information from the HMD 300. The video processing unit 74 detects the direction in which the user is facing, displays the video in the direction in which the user is facing, and the audio processing unit 72 receives a plurality of audio signals received from the microphone array 10 based on the direction in which the user is facing. By selecting an audio signal to be mixed from among them, it is possible to output audio that matches the video displayed on the HMD 300.

１０…マイクアレイ
２０…録音装置
２１…増幅部
２２…変換部
２３…録音部
２４…蓄積部
３０…カメラ
５０…ゲーム装置
５１…キャラクタ制御部
５２…カメラ制御部
５３…レンダリング部
５４…サウンド処理部
５５…蓄積部
７０…映像音声処理装置
７１…ミキサー
７２…音声処理部
７３…映像入力部
７４…映像処理部
７５…制御部
１００…ゲーム機
１０１…ＣＰＵ
１０２…ＲＯＭ
１０３…ＲＡＭ
１０４…ＧＰＵ
１０５…ＳＰＵ
１０６…インタフェース
１０７…ＤＲＩＶＥ
２００…コントローラ
３００…ＨＭＤ
３０１Ａ，３０１Ｂ…表示部
３０２…サウンド出力部
３０３…ジャイロセンサ
３０４…加速度センサ
３０５…制御部 DESCRIPTION OF SYMBOLS 10 ... Microphone array 20 ... Recording apparatus 21 ... Amplification part 22 ... Conversion part 23 ... Recording part 24 ... Accumulation part 30 ... Camera 50 ... Game device 51 ... Character control part 52 ... Camera control part 53 ... Rendering part 54 ... Sound processing part 55 ... Accumulation unit 70 ... Video / audio processing device 71 ... Mixer 72 ... Audio processing unit 73 ... Video input unit 74 ... Video processing unit 75 ... Control unit 100 ... Game machine 101 ... CPU
102 ... ROM
103 ... RAM
104 ... GPU
105 ... SPU
106 ... interface 107 ... DRIVE
200 ... Controller 300 ... HMD
301A, 301B ... Display unit 302 ... Sound output unit 303 ... Gyro sensor 304 ... Acceleration sensor 305 ... Control unit

Claims

A plurality of microphones arranged at equal intervals on a concentric circle;
Recording means for recording audio signals collected by each of the plurality of microphones as audio signals in directions corresponding to the plurality of microphones,
A recording apparatus comprising:

A video / audio processing program for operating a computer as a device for a user to wear a head-mounted display and view a video,
Input means for inputting movement information of the user's head from the head-mounted display and detecting the direction in which the user is facing;
Video processing means for displaying a video corresponding to the direction the user is facing on the head-mounted display;
Audio processing means for outputting audio in accordance with the direction the user is facing;
A video / audio processing program characterized by functioning as

If the direction in which the user is facing is between a plurality of directions corresponding to sound data, the plurality of sound processing means may correspond to the direction in which the user is facing between the plurality of directions. The video / audio processing program according to claim 2, wherein sound data corresponding to the direction is synthesized.

Further causing the computer to function as storage means for storing sound data corresponding to a plurality of directions,
The video / audio processing program according to claim 2 or 3, wherein the audio processing means reads out and outputs the sound data corresponding to the direction in which the user is facing from the storage means.

5. The video / audio processing program according to claim 4, wherein the storage unit holds sound data collected by each of the plurality of microphones corresponding to the plurality of directions.

The video processing means displays a video taken of a virtual space,
The sound processing means sets a virtual microphone corresponding to a plurality of directions when sounding a sound effect in the virtual space, determines the virtual microphone collecting the sound effect, and determines the direction of the sound effect The video / audio processing program according to claim 2, wherein a sound volume for outputting the sound effect is determined based on a direction in which the user is facing and a direction of the sound effect.

Allowing the computer to further function as audio input means for receiving audio signals in a plurality of directions from each of a plurality of microphones;
The voice processing means selects and outputs the voice signal to be output from the voice signals in the plurality of directions received by the voice input means based on the direction in which the user is facing. The video / audio processing program according to claim 2 or 3.

A program storage unit storing the video / audio processing program according to claim 2;
A computer for executing the video / audio processing program stored in the program storage unit;
A game device comprising: