JP6515057B2

JP6515057B2 - Simulation system, simulation apparatus and program

Info

Publication number: JP6515057B2
Application number: JP2016072840A
Authority: JP
Inventors: 義人矢野; 修一小笠原; 益実山本
Original assignee: Namco Ltd; Bandai Namco Entertainment Inc
Current assignee: Namco Ltd; Bandai Namco Entertainment Inc
Priority date: 2016-03-31
Filing date: 2016-03-31
Publication date: 2019-05-15
Anticipated expiration: 2036-03-31
Also published as: JP2017176728A

Description

本発明は、シミュレーションシステム、シミュレーション装置及びプログラム等に関する。 The present invention relates to a simulation system , a simulation apparatus, a program and the like.

従来より、サッカーゲームや野球ゲームにおいて観客を登場させ、当該観客により、チームに対して声援を行わせるようなゲームシステムが知られている。このようなゲームシステムの従来技術としては、例えば特許文献１に開示される技術がある。また音声入力部を設け、音声入力部により入力された音声を認識することで、キャラクタの動作を指示して、ゲームを進行させるようなゲームシステムが知られている。このようなゲームシステムの従来技術としては、例えば特許文献２に開示される技術がある。 BACKGROUND ART Conventionally, a game system is known in which a spectator is made to appear in a soccer game or a baseball game, and the spectator cheers a team. As a prior art of such a game system, there exists a technique disclosed by patent document 1, for example. There is also known a game system in which a voice input unit is provided, and by recognizing a voice inputted by the voice input unit, an operation of a character is instructed to advance a game. As a prior art of such a game system, there exists a technique disclosed by patent document 2, for example.

特開２００２−４２１６４公報JP, 2002-42164, A 特開２００１−２２４８５１公報Japanese Patent Application Publication No. 2001-224851

しかしながら特許文献１の従来技術では、ゲーム画像において観客席に観客の画像が表示されるだけであり、ユーザが入力したユーザ音声に対して当該観客が応答するようなシステムについては提案されていない。また特許文献２の従来技術では、ユーザが入力したユーザ音声に基づいてキャラクタの動作を指示できるだけであり、当該キャラクタが、ユーザ音声に対して、応答音声や応答音などで応答することはなかった。 However, in the prior art of Patent Document 1, only the image of the spectator is displayed on the spectator seat in the game image, and a system in which the spectator responds to the user voice input by the user has not been proposed. Further, in the prior art of Patent Document 2, only the motion of the character can be instructed based on the user's voice input by the user, and the character does not respond to the user's voice with a response voice, a response tone, etc. .

本発明の幾つかの態様によれば、ユーザ音声の入力に対して適切な応答音声又は応答音を出力できるシミュレーションシステム及びプログラム等を提供できる。 According to some aspects of the present invention, it is possible to provide a simulation system, a program and the like capable of outputting a response voice or response sound appropriate for the input of user voice.

本発明の一態様は、ユーザが音入力装置を用いて入力したユーザ音声の情報を取得する入力処理部と、前記ユーザ音声の測定処理を行う測定処理部と、前記ユーザ音声の入力に対して、前記ユーザ音声とは異なる応答音声又は応答音の出力処理を行う音処理部と、を含み、前記測定処理部は、前記ユーザ音声の音量及び長さの測定処理を行い、前記音処理部は、前記ユーザ音声の音量及び長さの測定処理の結果に基づく前記応答音声又は前記応答音の出力処理を行うシミュレーションシステムに関係する。また本発明は、上記各部としてコンピュータを機能させるプログラム、又は該プログラムを記憶したコンピュータ読み取り可能な情報記憶媒体に関係する。 According to one aspect of the present invention, an input processing unit for acquiring information of user voice input by a user using a sound input device, a measurement processing unit for performing measurement processing of the user voice, and input of the user voice A sound processing unit that performs an output process of a response voice or a response sound different from the user voice, the measurement processing unit performs a measurement process of the volume and the length of the user voice, and the sound processing unit The present invention relates to a simulation system that performs output processing of the response sound or the response sound based on a result of measurement processing of volume and length of the user voice. The present invention also relates to a program that causes a computer to function as the above-described sections or a computer-readable information storage medium storing the program.

本発明の一態様によれば、ユーザが音入力装置を用いてユーザ音声を入力すると、当該ユーザ音声の入力に対して、ユーザ音声とは異なる応答音声又は応答音が出力される。即ち、ユーザ音声の入力に応答して、応答音声又は応答音が出力される。この場合に本発明の一態様では、入力されたユーザ音声の音量及び長さの測定処理が行われ、測定処理の結果に基づく応答音声又は応答音が出力されるようになる。このようにすれば、ユーザ音声の入力に対応して出力される応答音声又は応答音に、当該ユーザ音声の音量及び長さの測定処理の結果を反映させることができる。従って、ユーザ音声の入力に対して適切な応答音声又は応答音を出力できるシミュレーションシステム等の提供が可能になる。 According to one aspect of the present invention, when the user inputs a user voice using the sound input device, a response voice or response sound different from the user voice is output in response to the input of the user voice. That is, in response to the input of the user voice, the response voice or the response tone is output. In this case, in one aspect of the present invention, measurement processing of the volume and length of the input user voice is performed, and a response voice or response sound based on the result of the measurement processing is output. In this way, it is possible to reflect the result of the measurement processing of the volume and the length of the user voice on the response voice or the response voice outputted corresponding to the input of the user voice. Therefore, it is possible to provide a simulation system or the like capable of outputting an appropriate response voice or response sound to the input of the user voice.

また本発明の一態様では、前記音処理部は、前記ユーザ音声の音量及び長さに応じて、前記応答音声又は前記応答音を異ならせる処理を行ってもよい。 Further, in one aspect of the present invention, the sound processing unit may perform processing to make the response sound or the response sound different according to the volume and the length of the user sound.

このようにすれば、ユーザ音声の音量や長さに応じて、応答音声又は応答音が異なる音になり、ユーザ音声に対応した、より適切な応答音声又は応答音を出力できるようになる。 In this way, the response voice or the response tone becomes different depending on the volume and the length of the user voice, and it becomes possible to output a more appropriate response voice or response voice corresponding to the user voice.

また本発明の一態様では、前記測定処理部は、前記ユーザ音声の音量が、前記第１〜第Ｎの音量レベルのうちの第ｉの音量レベル（１≦ｉ≦Ｎ）を越えたタイミングから、前記第ｉの音量レベルを下回ったタイミングまでの長さを測定し、前記音処理部は、測定された前記長さに応じて、前記応答音声又は前記応答音を異ならせる処理を行ってもよい。 Further, in one aspect of the present invention, the measurement processing unit is configured such that the volume of the user's voice exceeds the i-th volume level (1 ≦ i ≦ N) of the first to Nth volume levels. And measuring the length up to the timing at which the sound volume level falls below the i-th sound level, and the sound processing unit performs processing to make the response sound or the response sound different according to the measured length. Good.

このようにすれば、ユーザ音声の音量や長さを、負荷の少ない処理で測定できるようになるため、処理負荷の軽減化を図れる。 In this way, the volume and the length of the user's voice can be measured by a process with a small load, so that the processing load can be reduced.

また本発明の一態様では、前記音処理部は、前記ユーザ音声の音量が、前記第ｉの音量レベルを越えた場合と、前記第１〜第Ｎの音量レベルのうちの第ｊ（１≦ｉ＜ｊ≦Ｎ）の音量レベルを越えた場合とで、前記応答音声又は前記応答音を異ならせる処理を行ってもよい。 Further, in one aspect of the present invention, the sound processing unit is configured to set a case where the volume of the user voice exceeds the i-th volume level, and the j-th (1 ≦ 1) of the first to N-th volume levels. The processing for differentiating the response sound or the response sound may be performed when the volume level of i <j ≦ N is exceeded.

このようにすれば、ユーザ音声の音量が第ｉの音量レベルを越えた場合と第ｊの音量レベルを越えた場合とで、出力される応答音声又は応答音を異ならせることが可能になり、負荷の少ない処理で、適切な応答音声又は応答音を出力できるようになる。 In this way, it is possible to make the output response voice or response sound different when the volume of the user voice exceeds the i-th volume level and the j-th volume level. It becomes possible to output an appropriate response voice or response tone by low-load processing.

また本発明の一態様では、前記音処理部は、前記応答音声又は前記応答音を、ゲームにおける前記ユーザのターゲットの音声又は音として出力する処理を行ってもよい。 In one aspect of the present invention, the sound processing unit may perform processing for outputting the response sound or the response sound as sound or sound of a target of the user in a game.

このようにすれば、ユーザが音入力装置を用いてユーザ音声を入力すると、そのユーザ音声の入力に対する応答音声又は応答音を、ゲームにおけるユーザのターゲットの音声又は音として出力できるようになる。 In this way, when the user inputs a user voice using the sound input device, a response voice or response sound to the input of the user voice can be output as the voice or sound of the user's target in the game.

また本発明の一態様では、前記ターゲットは、ゲームに登場する観客のキャラクタであってもよい。 In one aspect of the present invention, the target may be a character of a spectator appearing in a game.

このようにすれば、ゲームに登場する観客の歓声等を応答音声又は応答音として出力できるようになる。 In this way, the cheers and the like of the spectators appearing in the game can be output as a response sound or a response sound.

また本発明の一態様では、前記音処理部は、前記ターゲットの種類、前記ターゲットと前記ユーザとの位置関係、及び前記ターゲットに対する前記ユーザの視線方向の少なくとも１つに応じて、前記応答音声又は前記応答音を変化させる処理を行ってもよい。 Further, in one aspect of the present invention, the sound processing unit may perform the response voice or the response voice according to at least one of a type of the target, a positional relationship between the target and the user, and a gaze direction of the user with respect to the target. A process of changing the response sound may be performed.

このようにすれば、ターゲットの種類やターゲットとの位置関係やターゲットに対するユーザの視線方向に応じて、応答音声又は応答音が変化するようになり、ユーザ音声の入力に対して、より適切な応答音声又は応答音を出力できるようになる。 In this way, the response voice or the response sound changes according to the type of the target, the positional relationship with the target, and the direction of the user's line of sight with respect to the target, and a more appropriate response to user voice input. It becomes possible to output voice or response sound.

また本発明の一態様では、前記測定処理部は、前記ユーザ音声の特徴量の解析処理を行い、前記音処理部は、前記ユーザ音声の前記特徴量の解析処理の結果に応じて、前記応答音声又は前記応答音を変化させる処理を行ってもよい。 In one aspect of the present invention, the measurement processing unit performs analysis processing of the feature amount of the user voice, and the sound processing unit performs the response according to a result of analysis processing of the feature amount of the user voice. A process of changing voice or the response sound may be performed.

このようにすれば、ユーザ音声の入力に対して、ユーザ音声の特徴を反映させた応答音声又は応答音を出力できるようになる。 In this way, it is possible to output a response voice or response sound reflecting the feature of the user voice in response to the input of the user voice.

また本発明の一態様では、前記音処理部は、前記ユーザが手に持つ前記音入力装置の位置、方向、前記手の位置、方向、前記ユーザが装着する頭部装着型表示装置の位置、方向、前記ユーザの姿勢、及び前記ユーザの視線の少なくとも１つに応じて、前記応答音声又は前記応答音を変化させる処理、或いは前記応答音声又は前記応答音を出力する処理、或いは前記応答音声又は前記応答音を出力させるための前記ユーザ音声の入力を受け付ける処理を行ってもよい。 Further, in one aspect of the present invention, the sound processing unit is a position and a direction of the sound input device held by the user, a position and a direction of the hand, and a position of a head mounted display worn by the user. Processing to change the response voice or the response sound according to at least one of the direction, the posture of the user, and the line of sight of the user, or the process of outputting the response voice or the response sound, or the response voice or A process of receiving the input of the user voice for outputting the response sound may be performed.

このようにすれば、音入力装置の位置や方向、手の位置や方向、頭部装着型表示装置の位置や方向、ユーザの姿勢、或いはユーザの視線などに応じて、応答音声又は応答音を様々に変化させたり、応答音声又は応答音の出力タイミング等を制御したり、ユーザ音声の入力を受け付けるか否かの判断を行ったりすることなどが可能になる。 In this way, the response voice or the response sound is selected according to the position and orientation of the sound input device, the position and orientation of the hand, the position and orientation of the head-mounted display device, the posture of the user, or the line of sight of the user. It is possible to make various changes, control the output timing of the response sound or the response sound, etc., and determine whether or not to receive the input of the user sound.

また本発明の一態様では、前記音処理部は、前記ユーザの過去のプレイ履歴情報に基づいて、前記応答音声又は前記応答音を変化させる処理を行ってもよい。 In one aspect of the present invention, the sound processing unit may perform the process of changing the response sound or the response sound based on past play history information of the user.

このようにすれば、プレーヤの過去のプレイ履歴を反映させた応答音声又は応答音を出力できるようになる。 In this way, it is possible to output a response sound or response sound reflecting the player's past play history.

また本発明の一態様では、音データを記憶する音データ記憶部を含み、前記音処理部は、前記音データ記憶部に記憶される複数の音データの中から使用する音データを選択する処理、或いは複数の音データを組み合わせる処理を行うことで、前記応答音声又は前記応答音の出力処理を行ってもよい。 In one aspect of the present invention, the sound processing unit further includes a sound data storage unit for storing sound data, and the sound processing unit selects sound data to be used from a plurality of sound data stored in the sound data storage unit. Alternatively, output processing of the response voice or the response sound may be performed by performing processing of combining a plurality of sound data.

このようにすれば、複数の音データの中からの音データの選択処理や、音データを組み合わせる処理により、適切な応答音声又は応答音を出力できるようになる。 In this way, it is possible to output an appropriate response voice or response sound by selecting sound data from a plurality of sound data and combining sound data.

また本発明の一態様では、前記入力処理部は、前記ユーザ音声に対する評価処理が行われる評価期間以外の期間において、前記応答音声又は前記応答音を出力させるための前記ユーザ音声の入力を受け付けてもよい。 In one aspect of the present invention, the input processing unit receives an input of the response voice or the user voice for causing the response sound to be output in a period other than an evaluation period in which an evaluation process on the user voice is performed. It is also good.

このようにすれば、応答音声又は応答音を出力させるユーザ音声の入力が、ユーザ音声に対する評価処理に対して悪影響を及ぼす事態等を防止することが可能になる。 In this way, it is possible to prevent the response voice or the input of the user voice for outputting the response voice from adversely affecting the evaluation process for the user voice.

また本発明の他の態様は、ユーザが音入力装置を用いて入力したユーザ音声の情報を取得する入力処理部と、前記ユーザ音声の入力に対して、前記ユーザ音声とは異なる応答音声又は応答音の出力処理を行う音処理部と、を含み、前記入力処理部は、前記ユーザ音声に対する評価処理が行われる評価期間以外の期間において、前記応答音声又は前記応答音を出力させるための前記ユーザ音声の入力を受け付けるシミュレーションシステムに関係する。また本発明は、上記各部としてコンピュータを機能させるプログラム、又は該プログラムを記憶したコンピュータ読み取り可能な情報記憶媒体に関係する。 Further, according to another aspect of the present invention, there is provided an input processing unit for acquiring information of a user voice input by a user using a sound input device, and a response voice or response different from the user voice with respect to the user voice input. A sound processing unit that performs sound output processing, and the input processing unit is configured to output the response voice or the response sound in a period other than an evaluation period in which evaluation processing on the user voice is performed. It relates to a simulation system that accepts voice input. The present invention also relates to a program that causes a computer to function as the above-described sections or a computer-readable information storage medium storing the program.

本発明の他の態様によれば、ユーザが音入力装置を用いてユーザ音声を入力すると、当該ユーザ音声の入力に対して、ユーザ音声とは異なる応答音声又は応答音が出力される。この場合に本発明の一態様では、ユーザ音声に対する評価処理が行われる評価期間以外の期間で、このような応答音声又は応答音を出力させるユーザ音声の入力が受け付けられる。従って、当該ユーザ音声の入力が、ユーザ音声に対する評価処理に対して悪影響を及ぼす事態等を防止できるようになる。 According to another aspect of the present invention, when the user inputs a user voice using the sound input device, a response voice or response sound different from the user voice is output in response to the input of the user voice. In this case, in one aspect of the present invention, the input of the user voice that causes such a response voice or the response sound to be output is accepted in a period other than the evaluation period in which the evaluation process on the user voice is performed. Therefore, it is possible to prevent a situation in which the input of the user voice adversely affects the evaluation process on the user voice.

本実施形態のシミュレーションシステムの構成例を示すブロック図。FIG. 1 is a block diagram showing a configuration example of a simulation system of the present embodiment. 図２（Ａ）、図２（Ｂ）は本実施形態に用いられるＨＭＤの一例。FIGS. 2A and 2B show examples of the HMD used in the present embodiment. 図３（Ａ）、図３（Ｂ）は本実施形態に用いられるＨＭＤの他の例。FIGS. 3A and 3B show other examples of the HMD used in the present embodiment. プレイエリアである個室の説明図。Explanatory drawing of the private room which is a play area. プレイエリアである個室の説明図。Explanatory drawing of the private room which is a play area. 本実施形態により生成されるゲーム画像の例。The example of the game image produced | generated by this embodiment. 本実施形態により生成されるゲーム画像の例。The example of the game image produced | generated by this embodiment. ステージでのユーザ（仮想ユーザ）の移動についての説明図。Explanatory drawing about the movement of the user (virtual user) in a stage. ユーザ音声の入力に対して応答音声、応答音で応答させる手法の説明図。Explanatory drawing of the method made to respond with response voice and response sound with respect to the input of a user voice. ユーザ音声の音量及び長さに応じて応答音声、応答音を異ならせる手法の説明図。Explanatory drawing of the method of making response sound and response sound different according to the volume and length of user's voice. 図１１（Ａ）、図１１（Ｂ）はユーザ音声の音量及び長さに応じて応答音声、応答音を異ならせる処理の具体例。FIGS. 11A and 11B are specific examples of processing for making response voices and response sounds different according to the volume and length of user voices. 応答音声、応答音の出力タイミングについての説明図。Explanatory drawing about an output timing of response sound and response sound. 図１３（Ａ）、図１３（Ｂ）はターゲットである観客の種類に応じて応答音声、応答音を変化させる手法の説明図。13 (A) and 13 (B) are explanatory views of a method for changing response voice and response sound according to the type of audience as a target. 図１４（Ａ）、図１４（Ｂ）は観客とユーザの位置関係に応じて応答音声、応答音を変化させる手法の説明図。FIGS. 14 (A) and 14 (B) are explanatory views of a method for changing the response voice and the response sound according to the positional relationship between the audience and the user. 観客に対するユーザの視線方向に応じて応答音声、応答音を変化させる手法の説明図。Explanatory drawing of the method of changing response sound and response sound according to the user's gaze direction with respect to an audience. 図１６（Ａ）、図１６（Ｂ）はユーザ音声の特徴量の解析処理の結果に応じて応答音声、応答音を変化させる手法の説明図。16 (A) and 16 (B) are explanatory views of a method of changing response voice and response sound according to the result of analysis processing of the feature amount of the user voice. マイク（手）の位置、方向、ＨＭＤの位置、方向、ユーザの姿勢、ユーザの視線に応じて応答音声、応答音を変化させる手法の説明図。Explanatory drawing of the method of changing a response audio | voice and response sound according to the position of microphone (hand), direction, position of HMD, direction, attitude | position of a user, and a user's gaze. 図１８（Ａ）、図１８（Ｂ）はユーザのプレイ履歴に応じて応答音声、応答音を変化させる手法の説明図。18 (A) and 18 (B) are explanatory views of a method of changing the response sound and the response sound according to the play history of the user. 図１９（Ａ）、図１９（Ｂ）は音データの選択処理や音データの組合わせ処理による応答音声、応答音の出力処理の説明図。19 (A) and 19 (B) are explanatory views of response sound and response sound output processing by sound data selection processing and sound data combination processing. 図２０（Ａ）、図２０（Ｂ）はユーザの歌の評価処理についての説明図。20 (A) and 20 (B) are explanatory diagrams about evaluation processing of a user's song. 歌の評価期間以外の期間において、Ｃ＆Ｒ用のユーザ音声入力を受け付ける手法の説明図。Explanatory drawing of the method of receiving the user's voice input for C & R in periods other than the evaluation period of a song. 本実施形態の処理例を示すフローチャート。3 is a flowchart showing an example of processing of the present embodiment.

以下、本実施形態について説明する。なお、以下に説明する本実施形態は、特許請求の範囲に記載された本発明の内容を不当に限定するものではない。また本実施形態で説明される構成の全てが、本発明の必須構成要件であるとは限らない。 Hereinafter, the present embodiment will be described. Note that the embodiments described below do not unduly limit the contents of the present invention described in the claims. Further, not all of the configurations described in the present embodiment are necessarily essential configuration requirements of the present invention.

１．構成
図１に本実施形態のシミュレーションシステム（ゲームシステム、映像表示システム、シミュレーション装置）の構成例を示す。なお、本実施形態のシミュレーションシステムは図１の構成に限定されず、その構成要素（各部）の一部を省略したり、他の構成要素を追加するなどの種々の変形実施が可能である。 1. Configuration FIG. 1 shows a configuration example of a simulation system (game system, video display system, simulation apparatus) of the present embodiment. Note that the simulation system of the present embodiment is not limited to the configuration of FIG. 1, and various modifications may be made such as omitting some of the components (each part) or adding other components.

入力装置１６０は、ユーザが種々の入力情報を入力するための装置である。この入力装置１６０は、音入力装置１６１、振動デバイス１６４を含むことができる。また入力装置１６０は、ユーザがゲームの操作情報を入力するためのゲームコントローラの機能を有していてもよい。ゲームコントローラは、例えば操作ボタン、方向指示キー、ジョイスティック又はレバー等により実現される。この場合にゲームコントローラと音入力装置１６１は、一体の筐体で実現してもよいし、別体の筐体で実現してもよい。 The input device 160 is a device for the user to input various input information. The input device 160 can include a sound input device 161 and a vibrating device 164. Further, the input device 160 may have a function of a game controller for the user to input game operation information. The game controller is realized by, for example, an operation button, a direction indication key, a joystick or a lever. In this case, the game controller and the sound input device 161 may be realized by an integral housing or may be realized by a separate housing.

音入力装置１６１は、ユーザが音情報を入力するための装置である。音入力装置１６１により、例えばユーザの歌声や呼び声や掛け声などのユーザ音声を入力できる。この音入力装置１６１は例えば図２（Ａ）で説明するマイク１６２などにより実現できる。なお音入力装置１６１の形状は図２（Ａ）のような形状のマイク１６２には限定されず、例えばヘッドバンドを有するヘッドセット型マイクや小型マイクなどの種々のタイプのものを用いることができる。また音入力装置１６１は、楽器或いは楽器を模した装置における音の入力装置（ピックアップマイク等）であってもよい。楽器としては、弦楽器（ギター）、打楽器（ドラム、太鼓）、或いは鍵盤楽器（ピアノ、キーボード）などがある。 The sound input device 161 is a device for the user to input sound information. The sound input device 161 can, for example, input user voices such as the user's singing voice, calling voice and squeal. The sound input device 161 can be realized by, for example, the microphone 162 described with reference to FIG. The shape of the sound input device 161 is not limited to the microphone 162 having a shape as shown in FIG. 2A, and various types of microphones such as a headset microphone having a headband and a small microphone can be used. . The sound input device 161 may be a sound input device (pickup microphone or the like) in a musical instrument or a device imitating a musical instrument. Musical instruments include string instruments (guitars), percussion instruments (drums, drums), and keyboard instruments (pianos, keyboards).

振動デバイス１６４（振動発生部）は、警告等のための振動を発生するデバイスであり、例えば振動モータ（バイブレータ）などにより実現される。振動モータは、例えば、偏芯した錘を回転させることで振動を発生する。具体的には駆動軸の両端に偏心した錘を取り付けてモータ自体が揺れるようにする。なお振動デバイス１６４は、振動モータには限定されず、例えばピエゾ素子などにより実現されるものであってもよい。 The vibrating device 164 (vibration generating unit) is a device that generates vibration for warning or the like, and is realized by, for example, a vibrating motor (vibrator) or the like. The vibration motor generates vibration, for example, by rotating an eccentric weight. Specifically, eccentric weights are attached to both ends of the drive shaft so that the motor itself swings. The vibrating device 164 is not limited to the vibrating motor, and may be realized by, for example, a piezo element or the like.

記憶部１７０は各種の情報を記憶する。記憶部１７０は、処理部１００や通信部１９６などのワーク領域として機能する。ゲームプログラムや、ゲームプログラムの実行に必要なゲームデータは、この記憶部１７０に保持される。記憶部１７０の機能は、半導体メモリ（ＤＲＡＭ、ＶＲＡＭ）、ＨＤＤ（ハードディスクドライブ）、ＳＤＤ、光ディスク装置などにより実現できる。記憶部１７０は、空間情報記憶部１７２、楽曲情報記憶部１７４、音データ記憶部１７５、パラメータ記憶部１７６、描画バッファ１７８を含む。 The storage unit 170 stores various types of information. The storage unit 170 functions as a work area of the processing unit 100, the communication unit 196, and the like. The game program and game data necessary for executing the game program are held in the storage unit 170. The function of the storage unit 170 can be realized by a semiconductor memory (DRAM, VRAM), an HDD (hard disk drive), an SDD, an optical disk device, or the like. The storage unit 170 includes a space information storage unit 172, a music information storage unit 174, a sound data storage unit 175, a parameter storage unit 176, and a drawing buffer 178.

情報記憶媒体１８０（コンピュータにより読み取り可能な媒体）は、プログラムやデータなどを格納するものであり、その機能は、光ディスク（ＤＶＤ、ＢＤ、ＣＤ）、ＨＤＤ、或いは半導体メモリ（ＲＯＭ）などにより実現できる。処理部１００は、情報記憶媒体１８０に格納されるプログラム（データ）に基づいて本実施形態の種々の処理を行う。即ち情報記憶媒体１８０には、本実施形態の各部としてコンピュータ（入力装置、処理部、記憶部、出力部を備える装置）を機能させるためのプログラム（各部の処理をコンピュータに実行させるためのプログラム）が記憶される。 An information storage medium 180 (a computer readable medium) stores programs, data, etc., and its function can be realized by an optical disc (DVD, BD, CD), HDD, semiconductor memory (ROM), etc. . The processing unit 100 performs various processes of the present embodiment based on a program (data) stored in the information storage medium 180. That is, a program for causing a computer (an apparatus including an input device, a processing unit, a storage unit, and an output unit) to function as each unit of the present embodiment in the information storage medium 180 (a program for causing the computer to execute processing of each unit) Is stored.

頭部装着型表示装置２００（ＨＭＤ）は、ユーザの頭部に装着されて、ユーザの眼前に画像を表示する装置である。ＨＭＤ２００は非透過型であることが望ましいが、透過型であってもよい。またＨＭＤ２００は、いわゆるメガネタイプのＨＭＤであってもよい。 The head mounted display 200 (HMD) is a device that is mounted on the head of the user and displays an image in front of the user's eyes. The HMD 200 is preferably non-transmissive, but may be transmissive. The HMD 200 may be a so-called glasses-type HMD.

ＨＭＤ２００は、センサ部２１０、表示部２２０、処理部２４０を含む。なおＨＭＤ２００に発光素子を設ける変形実施も可能である。センサ部２１０は、例えばヘッドトラッキングなどのトラッキング処理を実現するためものである。例えばセンサ部２１０を用いたトラッキング処理により、ＨＭＤ２００の位置、方向を特定する。ＨＭＤ２００の位置、方向を特定することで、ユーザの位置、方向を特定できる。ユーザの位置、方向により、ユーザ（プレーヤ）に対応する仮想空間の仮想ユーザ（仮想プレーヤ）の位置、方向が特定される。ユーザの位置、方向は例えばユーザの視点位置、視線方向である。仮想ユーザの位置、方向は例えば仮想ユーザの視点位置、視線方向である。 The HMD 200 includes a sensor unit 210, a display unit 220, and a processing unit 240. In addition, the modification implementation which provides a light emitting element in HMD200 is also possible. The sensor unit 210 is for implementing tracking processing such as head tracking, for example. For example, the position and the direction of the HMD 200 are specified by tracking processing using the sensor unit 210. By specifying the position and direction of the HMD 200, it is possible to specify the position and direction of the user. The position and direction of the virtual user (virtual player) in the virtual space corresponding to the user (player) are specified by the position and direction of the user. The position and direction of the user are, for example, the viewpoint position and the gaze direction of the user. The position and the direction of the virtual user are, for example, the viewpoint position and the sight line direction of the virtual user.

トラッキング方式としては種々の方式を採用できる。トラッキング方式の一例である第１のトラッキング方式では、後述の図２（Ａ）、図２（Ｂ）で詳細に説明するように、センサ部２１０として複数の受光素子（フォトダイオード等）を設ける。そして外部に設けられた発光素子（ＬＥＤ等）からの光（レーザー等）をこれらの複数の受光素子により受光することで、現実世界の３次元空間でのＨＭＤ２００（ユーザの頭部）の位置、方向を特定する、第２のトラッキング方式では、後述の図３（Ａ）、図３（Ｂ）で詳細に説明するように、複数の発光素子（ＬＥＤ）をＨＭＤ２００に設ける。そして、これらの複数の発光素子からの光を、外部に設けられた撮像部で撮像することで、ＨＭＤ２００の位置、方向を特定する。第３のトラッキング方式では、センサ部２１０としてモーションセンサを設け、このモーションセンサを用いてＨＭＤ２００の位置、方向を特定する。モーションセンサは例えば加速度センサやジャイロセンサなどにより実現できる。例えば３軸の加速度センサと３軸のジャイロセンサを用いた６軸のモーションセンサを用いることで、現実世界の３次元空間でのＨＭＤ２００の位置、方向を特定できる。なお、第１のトラッキング方式と第２のトラッキング方式の組合わせ、或いは第１のトラッキング方式と第３のトラッキング方式の組合わせなどにより、ＨＭＤ２００の位置、方向を特定してもよい。 Various methods can be employed as the tracking method. In the first tracking method, which is an example of the tracking method, as will be described in detail with reference to FIGS. 2A and 2B described later, a plurality of light receiving elements (photodiodes or the like) are provided as the sensor unit 210. The position of the HMD 200 (user's head) in the three-dimensional space of the real world by receiving light (laser etc.) from a light emitting element (LED etc.) provided outside by these plural light receiving elements, In the second tracking method for specifying the direction, a plurality of light emitting elements (LEDs) are provided in the HMD 200 as described in detail in FIG. 3A and FIG. 3B described later. And the position and direction of HMD200 are pinpointed by imaging the light from these several light emitting elements with the imaging part provided outside. In the third tracking method, a motion sensor is provided as the sensor unit 210, and the position and direction of the HMD 200 are specified using this motion sensor. The motion sensor can be realized by, for example, an acceleration sensor or a gyro sensor. For example, by using a six-axis motion sensor using a three-axis acceleration sensor and a three-axis gyro sensor, it is possible to specify the position and direction of the HMD 200 in the three-dimensional space of the real world. The position and the direction of the HMD 200 may be specified by a combination of the first tracking method and the second tracking method, or a combination of the first tracking method and the third tracking method.

ＨＭＤ２００の表示部２２０は例えば液晶ディスプレイ（ＬＣＤ）や有機ＥＬディスプレイなどにより実現できる。例えばＨＭＤ２００には、表示部２２０として、ユーザの左目の前に配置される第１のディスプレイと、右目の前に配置される第２のディスプレイが設けられており、例えば立体視表示が可能になっている。立体視表示を行う場合には、例えば視差が異なる左目用画像と右目用画像を生成し、第１のディスプレイに左目用画像を表示し、第２のディスプレイに右目用画像を表示すればよい。 The display unit 220 of the HMD 200 can be realized by, for example, a liquid crystal display (LCD) or an organic EL display. For example, the HMD 200 is provided with a first display disposed in front of the user's left eye and a second display disposed in front of the right eye as the display unit 220. For example, stereoscopic display is enabled. ing. When stereoscopic display is performed, for example, a left-eye image and a right-eye image having different parallaxes may be generated, the left-eye image may be displayed on the first display, and the right-eye image may be displayed on the second display.

ＨＭＤ２００の処理部２４０は、ＨＭＤ２００において必要な各種の処理を行う。例えば処理部２４０は、センサ部２１０の制御処理や表示部２２０の表示制御処理などを行う。また処理部２４０が、３次元音響（立体音響）処理を行って、３次元的な音の方向や距離や広がりの再現を実現してもよい。 The processing unit 240 of the HMD 200 performs various processes necessary for the HMD 200. For example, the processing unit 240 performs control processing of the sensor unit 210, display control processing of the display unit 220, and the like. In addition, the processing unit 240 may perform three-dimensional sound (three-dimensional sound) processing to realize three-dimensional reproduction of the direction, distance, and the spread of sound.

音出力部１９２は、本実施形態により生成された音を出力するものであり、例えばスピーカ又はヘッドホン等により実現できる。 The sound output unit 192 outputs the sound generated according to the present embodiment, and can be realized by, for example, a speaker or headphones.

Ｉ／Ｆ（インターフェース）部１９４は、携帯型情報記憶媒体１９５とのインターフェース処理を行うものであり、その機能はＩ／Ｆ処理用のＡＳＩＣなどにより実現できる。携帯型情報記憶媒体１９５は、ユーザが各種の情報を保存するためのものであり、電源が非供給になった場合にもこれらの情報の記憶を保持する記憶装置である。携帯型情報記憶媒体１９５は、ＩＣカード（メモリカード）、ＵＳＢメモリ、或いは磁気カードなどにより実現できる。 An I / F (interface) unit 194 performs interface processing with the portable information storage medium 195, and the function thereof can be realized by an ASIC or the like for I / F processing. The portable information storage medium 195 is a storage device for storing various types of information by the user, and is a storage device that retains storage of the information even when the power is not supplied. The portable information storage medium 195 can be realized by an IC card (memory card), a USB memory, a magnetic card, or the like.

通信部１９６は、有線や無線のネットワークを介して外部（他の装置）との間で通信を行うものであり、その機能は、通信用ＡＳＩＣ又は通信用プロセッサなどのハードウェアや、通信用ファームウェアにより実現できる。 The communication unit 196 communicates with the outside (another device) via a wired or wireless network, and the function thereof is hardware such as a communication ASIC or a communication processor, or communication firmware. Can be realized by

なお本実施形態の各部としてコンピュータを機能させるためのプログラム（データ）は、サーバ（ホスト装置）が有する情報記憶媒体からネットワーク及び通信部１９６を介して情報記憶媒体１８０（あるいは記憶部１７０、補助記憶装置１９４）に配信してもよい。このようなサーバ（ホスト装置）による情報記憶媒体の使用も本発明の範囲内に含めることができる。 The program (data) for causing the computer to function as each part of the present embodiment is from the information storage medium possessed by the server (host device) via the network and communication unit 196 via the information storage medium 180 (or storage unit 170, auxiliary storage) It may be distributed to the device 194). The use of an information storage medium by such a server (host device) can also be included within the scope of the present invention.

処理部１００（プロセッサ）は、入力装置１６０からの入力情報やＨＭＤ２００でのトラッキング情報（ＨＭＤの位置、方向、或いは視点位置、視線方向）と、プログラムなどに基づいて、ゲーム処理、ゲーム成績演算処理、表示処理、或いは音処理などを行う。 The processing unit 100 (processor) performs game processing and game score calculation processing based on input information from the input device 160, tracking information in the HMD 200 (position, direction, viewpoint position, gaze direction of the HMD), programs, and the like. , Display processing or sound processing.

処理部１００の各部が行う本実施形態の各処理（各機能）はプロセッサ（ハードウェアを含むプロセッサ）により実現できる。例えば本実施形態の各処理は、プログラム等の情報に基づき動作するプロセッサと、プログラム等の情報を記憶するメモリにより実現できる。プロセッサは、例えば各部の機能が個別のハードウェアで実現されてもよいし、或いは各部の機能が一体のハードウェアで実現されてもよい。プロセッサは、例えばＣＰＵ（Central Processing Unit）であってもよい。但し、プロセッサはＣＰＵに限定されるものではなく、ＧＰＵ（Graphics Processing Unit）、或いはＤＳＰ（Digital Processing Unit）等、各種のプロセッサを用いることが可能である。またプロセッサはＡＳＩＣによるハードウェア回路であってもよい。メモリ（記憶部１７０）は、ＳＲＡＭ、ＤＲＡＭ等の半導体メモリであってもよいし、レジスターであってもよい。或いはハードディスク装置（ＨＤＤ）等の磁気記憶装置であってもよいし、光学ディスク装置等の光学式記憶装置であってもよい。例えば、メモリはコンピュータにより読み取り可能な命令を格納しており、当該命令がプロセッサにより実行されることで、処理部１００の各部の処理（機能）が実現されることになる。ここでの命令は、プログラムを構成する命令セットでもよいし、プロセッサのハードウェア回路に対して動作を指示する命令であってもよい。 Each process (each function) of this embodiment which each part of processing part 100 performs is realizable by a processor (processor containing hardware). For example, each process of the present embodiment can be realized by a processor that operates based on information such as a program and a memory that stores information such as a program. In the processor, for example, the function of each unit may be realized by separate hardware, or the function of each unit may be realized by integral hardware. The processor may be, for example, a CPU (Central Processing Unit). However, the processor is not limited to the CPU, and various processors such as a graphics processing unit (GPU) or a digital processing unit (DSP) can be used. The processor may also be a hardware circuit with an ASIC. The memory (storage unit 170) may be a semiconductor memory such as SRAM or DRAM, or may be a register. Alternatively, it may be a magnetic storage device such as a hard disk drive (HDD) or an optical storage device such as an optical disk drive. For example, the memory stores an instruction readable by a computer, and the processing (function) of each unit of the processing unit 100 is realized by the instruction being executed by the processor. The instructions here may be an instruction set that configures a program, or may be instructions that instruct an operation to a hardware circuit of a processor.

処理部１００は、入力処理部１０２、演算処理部１１０、出力処理部１４０を含む。演算処理部１１０は、ゲーム処理部１１１、ゲーム成績演算部１１８、測定処理部１１９、表示処理部１２０、音処理部１３０を含む。上述したように、これらの各部により実行される本実施形態の各処理は、プロセッサ（或いはプロセッサ及びメモリ）により実現できる。なお、これらの構成要素（各部）の一部を省略したり、他の構成要素を追加するなどの種々の変形実施が可能である。 The processing unit 100 includes an input processing unit 102, an arithmetic processing unit 110, and an output processing unit 140. The arithmetic processing unit 110 includes a game processing unit 111, a game score calculation unit 118, a measurement processing unit 119, a display processing unit 120, and a sound processing unit 130. As described above, each process of the present embodiment executed by these units can be realized by the processor (or processor and memory). Note that various modifications may be made such as omitting some of these components (each part) or adding other components.

入力処理部１０２（入力処理のプログラムモジュール）は、入力情報やトラッキング情報を受け付ける処理や、記憶部１７０から情報を読み出す処理や、通信部１９６を介して情報を受信する処理を、入力処理として行う。例えば入力処理部１０２は、入力装置１６０を用いてユーザが入力した入力情報やＨＭＤ２００のセンサ部２１０等により検出されたトラッキング情報（ユーザの位置、方向又は視線の情報等）を取得する処理や、読み出し命令で指定された情報を、記憶部１７０から読み出す処理や、外部装置（サーバ等）からネットワークを介して情報を受信する処理を、入力処理として行う。ここで受信処理は、通信部１９６に情報の受信を指示したり、通信部１９６が受信した情報を取得して記憶部１７０に書き込む処理などである。 The input processing unit 102 (program module for input processing) performs processing for receiving input information and tracking information, processing for reading information from the storage unit 170, and processing for receiving information via the communication unit 196 as input processing. . For example, the input processing unit 102 acquires input information input by the user using the input device 160 or tracking information (such as information on the position, direction, or line of sight of the user) detected by the sensor unit 210 of the HMD 200 or the like. A process of reading out information designated by the read command from the storage unit 170 and a process of receiving information from an external apparatus (server or the like) via a network are performed as input processes. Here, the reception process is a process of instructing the communication unit 196 to receive information, or a process of acquiring the information received by the communication unit 196 and writing the information in the storage unit 170.

演算処理部１１０は、各種の演算処理を行う。例えばゲーム処理、ゲーム成績演算処理、表示処理、或いは音処理などの演算処理を行う。 The arithmetic processing unit 110 performs various arithmetic processing. For example, calculation processing such as game processing, game score calculation processing, display processing, or sound processing is performed.

ゲーム処理部１１１（ゲーム処理のプログラムモジュール）はユーザがゲームをプレイするための種々のゲーム処理を行う。ゲーム処理部１１１は、ゲーム進行処理部１１２、評価処理部１１３、キャラクタ処理部１１４、パラメータ処理部１１５、オブジェクト空間設定部１１６、仮想カメラ制御部１１７を含む。 The game processing unit 111 (program module for game processing) performs various game processing for the user to play a game. The game processing unit 111 includes a game progress processing unit 112, an evaluation processing unit 113, a character processing unit 114, a parameter processing unit 115, an object space setting unit 116, and a virtual camera control unit 117.

ゲーム進行処理部１１２は、ゲーム開始条件が満たされた場合にゲームを開始する処理、ゲームを進行させる処理、或いはゲーム終了条件が満たされた場合にゲームを終了する処理などを行う。評価処理部１１３は、ユーザのゲームプレイの評価処理を行う。例えば音楽ゲームでのユーザの演奏や、ゲーム操作についての評価処理を行う。音楽ゲームに使用される楽曲情報は楽曲情報記憶部１７４に記憶される。 The game progress processing unit 112 performs a process of starting a game when the game start condition is satisfied, a process of advancing the game, or a process of ending the game when the game end condition is satisfied. The evaluation processing unit 113 performs an evaluation process of the user's game play. For example, evaluation processing of the user's performance in the music game and the game operation is performed. The music information used for the music game is stored in the music information storage unit 174.

キャラクタ処理部１１４は、キャラクタに関する種々の処理を行う。例えばオブジェクト空間（仮想空間、ゲーム空間）においてキャラクタを移動させる処理や、キャラクタを動作させる処理を行う。例えばキャラクタを動作させる処理は、モーションデータを用いたモーション処理（モーション再生等）により実現できる。パラメータ処理部１１５は、ゲームに使用される種々のパラメータ（ゲームパラメータ）の演算処理を行う。例えばパラメータの値を増減させる処理を行う。パラメータの情報はパラメータ記憶部１７６に記憶される。 The character processing unit 114 performs various processes related to the character. For example, processing of moving a character in an object space (virtual space, game space) or processing of moving a character is performed. For example, processing for moving a character can be realized by motion processing (motion reproduction and the like) using motion data. The parameter processing unit 115 performs arithmetic processing of various parameters (game parameters) used for the game. For example, processing is performed to increase or decrease the value of the parameter. The parameter information is stored in the parameter storage unit 176.

オブジェクト空間設定部１１６は、複数のオブジェクトが配置されるオブジェクト空間（広義には仮想空間）の設定処理を行う。例えば、キャラクタ（人、動物、ロボット等）、マップ（地形）、建物、観客席、コース（道路）、樹木、壁、水面などの表示物を表す各種オブジェクト（ポリゴン、自由曲面又はサブディビジョンサーフェイスなどのプリミティブ面で構成されるオブジェクト）をオブジェクト空間に配置設定する処理を行う。即ちワールド座標系でのオブジェクトの位置や回転角度（向き、方向と同義）を決定し、その位置（Ｘ、Ｙ、Ｚ）にその回転角度（Ｘ、Ｙ、Ｚ軸回りでの回転角度）でオブジェクトを配置する。具体的には、記憶部１７０の空間情報記憶部１７２には、オブジェクト空間での複数のオブジェクト（パーツオブジェクト）の位置、回転角度（方向）等の情報が空間情報として記憶される。オブジェクト空間設定部１１６は、例えば各フレーム毎にこの空間情報を更新する処理などを行う。 The object space setting unit 116 performs setting processing of an object space (virtual space in a broad sense) in which a plurality of objects are arranged. For example, various objects (polygons, free-form surfaces, subdivision surfaces, etc.) representing displayed objects such as characters (people, animals, robots etc.), maps (terrain), buildings, audience seats, courses (roads), trees, walls, water surfaces etc. Processing of arranging and setting objects in the object space). That is, the position and rotation angle (equivalent to direction and direction) of the object in the world coordinate system are determined, and the rotation angle (rotation angle around X, Y, Z axis) is determined at that position (X, Y, Z) Place an object. Specifically, the space information storage unit 172 of the storage unit 170 stores, as space information, information such as positions of plural objects (part objects) in the object space and rotation angles (directions). The object space setting unit 116 performs, for example, a process of updating the space information for each frame.

仮想カメラ制御部１１７は、オブジェクト空間内の所与（任意）の視点から見える画像を生成するための仮想カメラ（視点、基準仮想カメラ）の制御処理を行う。具体的には、仮想カメラの位置（Ｘ、Ｙ、Ｚ）又は回転角度（Ｘ、Ｙ、Ｚ軸回りでの回転角度）を制御する処理（視点位置、視線方向あるいは画角を制御する処理）を行う。この仮想カメラはユーザの視点に相当する。立体視表示の場合は、左目用の第１の視点（左目用の第１の仮想カメラ）と、右目用の第２の視点（右目用の第２の仮想カメラ）が設定される。 The virtual camera control unit 117 performs control processing of a virtual camera (viewpoint, reference virtual camera) for generating an image viewed from a given (arbitrary) viewpoint in the object space. Specifically, processing for controlling the position (X, Y, Z) or rotation angle (rotation angle around the X, Y, Z axes) of the virtual camera (processing for controlling the viewpoint position, sight direction or angle of view) I do. This virtual camera corresponds to the viewpoint of the user. In the case of stereoscopic display, a first viewpoint for the left eye (first virtual camera for the left eye) and a second viewpoint for the right eye (the second virtual camera for the right eye) are set.

ゲーム成績演算部１１８はユーザのゲーム成績を演算する処理を行う。例えばユーザのゲームプレイにより獲得された得点、ポイントなどのゲーム成績の演算処理を行う。 The game score calculation unit 118 performs processing to calculate the game score of the user. For example, calculation processing of game scores such as points, points, etc. obtained by the user's game play is performed.

測定処理部１１９は音に関する種々の測定処理を行う。例えば音入力装置１６１により入力されたユーザ音声の測定処理を行う。 The measurement processing unit 119 performs various measurement processing on sound. For example, measurement processing of the user voice input by the sound input device 161 is performed.

表示処理部１２０（表示処理のプログラムモジュール）は、ゲーム画像の表示処理を行う。例えば処理部１００で行われる種々の処理（ゲーム処理、シミュレーション処理）の結果に基づいて描画処理を行い、これにより画像を生成し、ＨＭＤ２００の表示部２２０に表示する。具体的には、座標変換（ワールド座標変換、カメラ座標変換）、クリッピング処理、透視変換、或いは光源処理等のジオメトリ処理が行われ、その処理結果に基づいて、描画データ（プリミティブ面の頂点の位置座標、テクスチャ座標、色データ、法線ベクトル或いはα値等）が作成される。そして、この描画データ（プリミティブ面データ）に基づいて、透視変換後（ジオメトリ処理後）のオブジェクト（１又は複数プリミティブ面）を、描画バッファ１７８（フレームバッファ、ワークバッファ等のピクセル単位で画像情報を記憶できるバッファ）に描画する。これにより、オブジェクト空間内において仮想カメラ（所与の視点。左目用、右目用の第１、第２の視点）から見える画像が生成される。なお、表示処理部１２０で行われる描画処理は、頂点シェーダ処理やピクセルシェーダ処理等により実現することができる。 The display processing unit 120 (program module for display processing) performs display processing of a game image. For example, drawing processing is performed based on the results of various processing (game processing, simulation processing) performed by the processing unit 100, and thereby an image is generated and displayed on the display unit 220 of the HMD 200. Specifically, geometry processing such as coordinate conversion (world coordinate conversion, camera coordinate conversion), clipping processing, perspective conversion, or light source processing is performed, and based on the processing result, drawing data (the position of the vertex of the primitive surface) Coordinates, texture coordinates, color data, normal vectors or α values, etc. are created. Then, based on the drawing data (primitive plane data), the object (one or a plurality of primitive planes) after perspective transformation (after geometry processing) is converted into image information in pixel units such as the drawing buffer 178 (frame buffer, work buffer, etc.) Draw in a buffer that can be stored. As a result, an image viewed from a virtual camera (given viewpoints; first and second viewpoints for the left and right eyes) in the object space is generated. The drawing processing performed by the display processing unit 120 can be realized by vertex shader processing, pixel shader processing, or the like.

音処理部１３０（音処理のプログラムモジュール）は、処理部１００で行われる種々の処理の結果に基づいて音処理を行う。具体的には、楽曲（音楽、ＢＧＭ）、効果音、又は音声などのゲーム音を生成し、ゲーム音を音出力部１９２に出力させる。ゲーム中に出力（再生）される音のデータは音データ記憶部１７５に記憶される。なお音処理部１３０の音処理の一部（例えば３次元音響処理）を、ＨＭＤ２００の処理部２４０により実現してもよい。 The sound processing unit 130 (program module for sound processing) performs sound processing based on the results of various processing performed by the processing unit 100. Specifically, game sound such as music (music, BGM), sound effect, or voice is generated, and the game sound is output to the sound output unit 192. Sound data to be output (reproduced) during the game is stored in the sound data storage unit 175. A part of the sound processing of the sound processing unit 130 (for example, three-dimensional sound processing) may be realized by the processing unit 240 of the HMD 200.

出力処理部１４０は各種の情報の出力処理を行う。例えば出力処理部１４０は、記憶部１７０に情報を書き込む処理や、通信部１９６を介して情報を送信する処理を、出力処理として行う。例えば出力処理部１４０は、書き込み命令で指定された情報を、記憶部１７０に書き込む処理や、外部の装置（サーバ等）に対してネットワークを介して情報を送信する処理を行う。送信処理は、通信部１９６に情報の送信を指示したり、送信する情報を通信部１９６に指示する処理などである。 The output processing unit 140 performs output processing of various types of information. For example, the output processing unit 140 performs a process of writing information in the storage unit 170 and a process of transmitting information via the communication unit 196 as an output process. For example, the output processing unit 140 performs a process of writing information designated by the write command in the storage unit 170, and a process of transmitting information to an external device (such as a server) via the network. The transmission process is a process of instructing the communication unit 196 to transmit information or instructing the communication unit 196 to transmit information.

例えば本実施形態では、ゲーム処理部１１１は、複数のオブジェクトが配置される仮想空間（ゲーム空間）において、ユーザがプレイするゲームの処理を行う。例えばオブジェクト空間である仮想空間には、キャラクタ等の複数のオブジェクトが配置されており、ゲーム処理部１１１は、この仮想空間でのゲームを実現するための種々のゲーム処理（ゲーム進行処理、キャラクタ処理、オブジェクト空間設定処理、或いは仮想カメラ制御処理等）を実行する。そして表示処理部１２０は、仮想空間において所与の視点（左目用、右目用の第１、第２の視点）から見えるゲーム画像を、ＨＭＤ２００の表示部２２０（第１、第２のディスプレイ）に表示する処理を行う。即ち、仮想空間であるオブジェクト空間において、仮想ユーザ（ユーザ）の視点（仮想カメラ）から見えるゲーム画像を表示する処理を行う。 For example, in the present embodiment, the game processing unit 111 performs processing of a game played by the user in a virtual space (game space) in which a plurality of objects are arranged. For example, in a virtual space which is an object space, a plurality of objects such as characters are arranged, and the game processing unit 111 performs various game processing (game progress processing, character processing) for realizing the game in this virtual space. , Object space setting processing, or virtual camera control processing). Then, the display processing unit 120 causes the game image viewed from a given viewpoint (first and second viewpoints for left and right eyes) in the virtual space to be displayed on the display unit 220 (first and second displays) of the HMD 200. Perform processing to display. That is, in the object space which is a virtual space, a process of displaying a game image viewed from the viewpoint (virtual camera) of the virtual user (user) is performed.

例えばＨＭＤ２００を装着した現実世界のユーザが、後述の図４、図５の個室のプレイエリアにおいて移動したり、移動方向が変化したり、首を振ったり、しゃがんだりして、その位置、方向が変化すると、ユーザに対応する仮想空間の仮想ユーザの位置、方向も変化する。現実世界でのユーザの位置、方向は、ＨＭＤ２００のトラッキング処理により特定できるため、仮想空間での仮想ユーザの位置、方向も特定できる。ユーザ、仮想ユーザの位置、方向は、例えばユーザ、仮想ユーザの視点位置、視線方向でもある。仮想ユーザがキャラクタとして表示されない場合には、ＨＭＤ２００の表示画像は一人称視点の画像となり、仮想ユーザがキャラクタとして表示される場合には表示画像は三人称視点の画像になる。 For example, a user in the real world wearing the HMD 200 moves, changes the moving direction, shakes the head or squats in the play area of the individual room in FIGS. 4 and 5 described later, and the position and direction are When it changes, the position and the direction of the virtual user in the virtual space corresponding to the user also change. Since the position and direction of the user in the real world can be specified by the tracking process of the HMD 200, the position and direction of the virtual user in the virtual space can also be specified. The position and the direction of the user and the virtual user are, for example, the viewpoint position and the gaze direction of the user and the virtual user. When the virtual user is not displayed as a character, the display image of the HMD 200 is an image of a first person viewpoint, and when the virtual user is displayed as a character, the display image is an image of a third person viewpoint.

なお、プレイエリア（行動エリア）は、例えばユーザが移動可能な範囲として設定されるエリアであり、例えばユーザがゲームプレイ等の行動を行うエリア（フィールド、スペース）として予め規定されているエリアである。このプレイエリア（広義には移動可能範囲）は、例えばユーザの位置情報等のトラッキングが可能な範囲を内包するエリアである。プレイエリアは、例えば周囲が壁で囲まれたエリアであってもよいが、オープンスペースのエリアであってもよい。 The play area (action area) is, for example, an area set as a range in which the user can move, and is an area defined in advance as an area (field, space) in which the user performs an action such as game play, for example. . The play area (a movable range in a broad sense) is an area including a range in which tracking of position information of the user, for example, is possible. The play area may be, for example, an area surrounded by a wall, but may be an open space area.

そして本実施形態では入力処理部１０２は、ユーザが音入力装置１６１を用いて入力したユーザ音声の情報を取得する。一例としては、音入力装置１６１により入力されたアナログの音声信号に対して、Ａ／Ｄ変換を行い、デジタルのユーザ音声の情報として取得する。 Then, in the present embodiment, the input processing unit 102 acquires information of the user voice input by the user using the sound input device 161. As an example, A / D conversion is performed on an analog voice signal input by the sound input device 161, and acquired as digital user voice information.

測定処理部１１９は、取得されたユーザ音声の測定処理を行う。例えばユーザ音声の波形に対して各種の測定処理を行う。また音処理部１３０は、ユーザ音声の入力に対して、ユーザ音声とは異なる応答音声又は応答音の出力処理を行う。例えば音データ記憶部１７５に記憶される複数の音データ（音データファイル）からの音データの選択処理や、音データの組合わせ処理などを行って、ユーザ音声とは異なる応答音声又は応答音を生成（合成）し、音出力部１９２（スピーカ等）により出力させる。 The measurement processing unit 119 performs measurement processing of the acquired user voice. For example, various measurement processes are performed on the waveform of the user voice. Further, the sound processing unit 130 performs an output process of a response sound or a response sound different from the user's voice in response to the input of the user's voice. For example, processing for selecting sound data from a plurality of sound data (sound data files) stored in the sound data storage unit 175, processing for combining sound data, etc. It is generated (composed) and output by a sound output unit 192 (speaker or the like).

そして本実施形態ではユーザ音声の音量及び長さの測定処理を行う。例えばユーザ音声の信号波形が所定の音量レベル（音圧）を越えたか否かを測定することで、ユーザ音声の音量を測定する。またユーザ音声の信号波形が所定の音量レベルを越えた長さを測定することで、ユーザ音声の長さ（音声の尺）を測定する。そして音処理部１３０は、ユーザ音声の音量及び長さの測定処理の結果に基づく応答音声又は応答音の出力処理を行う。この応答音声又は応答音の出力処理や変化処理等は応答音処理部１３２が行う。 And in this embodiment, the measurement process of the volume and length of a user's voice is performed. For example, the volume of the user voice is measured by measuring whether or not the signal waveform of the user voice exceeds a predetermined volume level (sound pressure). Further, by measuring the length at which the signal waveform of the user voice exceeds a predetermined volume level, the length of the user voice (sound measure) is measured. Then, the sound processing unit 130 performs an output process of the response voice or the response sound based on the result of the measurement process of the volume and the length of the user voice. The response sound processing unit 132 performs output processing or change processing of the response sound or the response sound.

例えば音処理部１３０は、ユーザ音声の音量及び長さの測定処理の結果が第１の測定結果である場合には、当該ユーザ音声の入力に対応して、第１の応答音声又は第１の応答音を音出力部１９２に出力させる。一方、ユーザ音声の音量及び長さの測定処理の結果が第２の測定結果である場合には、当該ユーザ音声の入力に対応して、第２の応答音声又は第２の応答音を音出力部１９２に出力させる。同様にユーザ音声の音量及び長さの測定処理の結果が第Ｋ（Ｋは２以上の整数）の測定結果である場合には、当該ユーザ音声の入力に対応して、第Ｋの応答音声又は第Ｋの応答音を音出力部１９２に出力させる。 For example, when the result of the measurement process of the volume and the length of the user voice is the first measurement result, the sound processing unit 130 may respond to the input of the user voice in response to the first response voice or the first response voice. A response sound is output to the sound output unit 192. On the other hand, when the result of the measurement process of the volume and the length of the user voice is the second measurement result, the second response voice or the second response sound is output corresponding to the input of the user voice. It is output to the part 192. Similarly, when the result of the measurement process of the volume and length of the user voice is the measurement result of the Kth (K is an integer of 2 or more), the Kth response voice or the corresponding response to the input of the user voice The Kth response sound is output to the sound output unit 192.

例えば音処理部１３０は、ユーザ音声の音量及び長さに応じて、応答音声又は応答音を異ならせる処理を行う。即ち、ユーザ音声の入力に対して出力する応答音声又は応答音を、当該ユーザ音声の音量や長さに応じて異ならせる。例えばユーザ音声の音量をＬＶ（ｍ）とし、ユーザ音声の長さをＴＬ（ｎ）とする（ｍ、ｎは２以上の整数）。この場合に、例えばＲＳ（ｍ，ｎ）＝ＲＳ（ＬＶ（ｍ），ＴＬ（ｎ））と表されるような応答音声又は応答音を出力する。例えば音量、長さがＬＶ（１）、ＴＬ（１）である場合には、応答音声又は応答音としてＲＳ（１，１）を出力し、音量、長さがＬＶ（２）、ＴＬ（１）である場合には、応答音声又は応答音としてＲＳ（２，１）を出力する。そしてＲＳ（１，１）とＲＳ（２，１）は、異なる応答音声又は応答音となっている。また音量、長さがＬＶ（１）、ＴＬ（２）である場合には、応答音声又は応答音としてＲＳ（１，２）を出力し、音量、長さがＬＶ（２）、ＴＬ（２）である場合には、応答音声又は応答音としてＲＳ（２，２）を出力する。そしてＲＳ（１，２）とＲＳ（２，２）は異なる応答音声又は応答音となっている。またＲＳ（１，１）、ＲＳ（２，１）、ＲＳ（１，２）、ＲＳ（２，２）も、各々、互いに異なる応答音声又は応答音となっている。 For example, the sound processing unit 130 performs processing to make the response sound or the response sound different according to the volume and the length of the user sound. That is, the response voice or the response tone to be output in response to the input of the user voice is made to differ according to the volume and the length of the user voice. For example, the volume of the user voice is LV (m), and the length of the user voice is TL (n) (m and n are integers of 2 or more). In this case, for example, a response voice or a response tone as represented by RS (m, n) = RS (LV (m), TL (n)) is output. For example, when the volume and length are LV (1) and TL (1), RS (1, 1) is output as a response voice or response sound, and the volume and length are LV (2) and TL (1). In the case of), RS (2, 1) is output as a response voice or response tone. And RS (1, 1) and RS (2, 1) are different response voices or response tones. When the volume and length are LV (1) and TL (2), RS (1, 2) is output as a response voice or response sound, and the volume and length are LV (2) and TL (2). In the case of), RS (2, 2) is output as a response voice or response tone. And RS (1, 2) and RS (2, 2) are different response voices or response tones. Further, RS (1, 1), RS (2, 1), RS (1, 2), and RS (2, 2) are also mutually different response voices or response tones.

なお応答音声又は応答音が異なるとは、例えば応答音声又は応答音の種類（音データ、波形、属性等）、音量（強さ）、高さ（ピッチ）、或いは音色等が異なることである。 The response sound or response sound being different means that, for example, the type of response sound or response sound (sound data, waveform, attribute, etc.), volume (intensity), height (pitch), timbre, etc. are different.

より具体的には測定処理部１１９は、ユーザ音声の音量が、第１〜第Ｎの音量レベルのうちの第ｉの音量レベル（１≦ｉ≦Ｎ）を越えたタイミングから、第ｉの音量レベルを下回ったタイミングまでの長さを測定する。例えばユーザ音声の信号波形において音量が上昇し、第ｉの音量レベルを越えたタイミングをＴＭ１とする。その後、ユーザ音声の信号波形において音量が減少し、第ｉの音量レベルを下回ったタイミングをＴＭ２とする。この場合にＴＭ２−ＴＭ１を、ユーザ音声の長さＴ＝ＴＭ２−ＴＭ１として測定する。そして音処理部１３０は、測定された長さＴ＝ＴＭ２−ＴＭ１に応じて、応答音声又は応答音を異ならせる処理を行う。即ち、測定された長さがＴ＝Ｔ１である場合とＴ＝Ｔ２（Ｔ１とＴ２は異なる長さ）とで、応答音声又は応答音を異ならせる。 More specifically, the measurement processing unit 119 sets the i-th volume from the timing when the volume of the user's voice exceeds the i-th volume level (1 ≦ i ≦ N) of the first to N-th volume levels. Measure the length to the timing below the level. For example, in the signal waveform of the user voice, the volume rises, and the timing exceeding the i-th volume level is taken as TM1. Thereafter, the volume of the signal waveform of the user's voice decreases, and the timing below the ith volume level is set to TM2. In this case, TM2-TM1 is measured as user voice length T = TM2-TM1. Then, the sound processing unit 130 performs processing to make the response sound or the response sound different according to the measured length T = TM2-TM1. That is, the response voice or the response tone is made different between when the measured length is T = T1 and T = T2 (T1 and T2 are different lengths).

また音処理部１３０は、ユーザ音声の音量が、第ｉの音量レベルを越えた場合と、第１〜第Ｎの音量レベルのうちの第ｊ（１≦ｉ＜ｊ≦Ｎ）の音量レベルを越えた場合とで、応答音声又は応答音を異ならせる処理を行う。例えばユーザ音声の音量が第ｉの音量を超えた場合に、第１の応答音声又は第１の応答音が出力されたとする。この場合にユーザ音声の音量が更に大きくなって第ｊの音量レベルを越えた場合には、第１の応答音声又は第１の応答音とは異なる第２の応答音声又は第２の応答音を出力する。この場合に第１の応答音声又は第１の応答音と、第２の応答音声又は第２の応答音の両方を出力してもよいし、第２の応答音声又は第２の応答音だけを出力してもよい。また応答音声又は応答音が歓声、掛け声又は拍手などである場合に、第２の応答音声又は第２の応答音は、第１の応答音声又は第１の応答音に比べて、より盛り上がって熱狂度が増したような歓声、掛け声又は拍手になる。 Further, the sound processing unit 130 sets the jth (1 ≦ i <j ≦ N) volume level of the first to Nth volume levels when the volume of the user voice exceeds the i-th volume level. In the case of exceeding, the processing of making the response voice or the response tone different is performed. For example, it is assumed that the first response sound or the first response sound is output when the volume of the user voice exceeds the i-th volume. In this case, when the volume of the user voice is further increased and exceeds the j-th volume level, the second response voice or the second response voice different from the first response voice or the first response voice is selected. Output. In this case, both the first response sound or the first response sound and the second response sound or the second response sound may be output, or only the second response sound or the second response sound may be output. You may output it. Also, when the response voice or response voice is cheering, scream or applause, the second response voice or second response voice is more excited and frenzy than the first response voice or the first response voice. Cheers, screams or applauses that have increased in frequency.

また音処理部１３０は、応答音声又は応答音を、ゲームにおけるユーザのターゲットの音声又は音として出力する処理を行う。ここでターゲットは、ユーザのゲームプレイの対象となるものであり、例えばゲームに登場するキャラクタなどである。この場合、応答音声は、キャラクタが発声する音声であり、応答音はキャラクタの動作により生じる動作音などである。なおターゲットは、ゲームに登場するキャラクタ以外の物体（動かない物体）などであってもよい。 Further, the sound processing unit 130 performs a process of outputting the response sound or the response sound as a sound or a sound of the target of the user in the game. Here, the target is a target of the user's game play, and is, for example, a character appearing in the game. In this case, the response sound is a sound uttered by the character, and the response sound is an operation sound or the like generated by the motion of the character. The target may be an object other than a character appearing in the game (an object which does not move).

例えばターゲットは、ゲームに登場する観客のキャラクタである。例えば音楽ゲーム、演奏ゲーム又はスポーツゲームなどにおける観客のキャラクタである。この場合、応答音声は歓声、掛け声、呼び声などの音声であり、応答音は拍手、口笛、足踏み音などである。或いは、ターゲットであるキャラクタは、対戦ゲームにおける対戦相手キャラクタや、ユーザの協力キャラクタなどであってもよい。 For example, the target is a character of a spectator appearing in the game. For example, it is a character of a spectator in a music game, a performance game or a sports game. In this case, the response voice is a voice such as cheers, screams and calls, and the response tone is a clap, a whistling, a stepping tone and the like. Alternatively, the target character may be an opponent character in a match game, a cooperative character of the user, or the like.

また音処理部１３０は、ターゲットの種類、ターゲットとユーザとの位置関係、及びターゲットに対するユーザの視線方向の少なくとも１つに応じて、応答音声又は応答音を変化させる処理を行う。例えばユーザ音声の入力に対して、ターゲットの応答音声又は応答音が出力される場合に、当該ターゲットの種類に応じて応答音声又は応答音を異ならせる。ターゲットの種類は、例えば性別（男、女）、年齢層（子供、大人等）、性格（熱狂タイプ、冷静タイプ等）、又は属性（火、土、水等）などに応じたターゲット（キャラクタ）の分類の違いである。ターゲットとユーザとの位置関係は、例えばターゲットの位置及び方向の少なくとも一方とユーザの位置及び方向の少なくとも一方の関係を表すものである。ターゲットに対するユーザの視線方向は、例えばターゲットの位置に対してユーザの視線方向がどのような方向になっているかである。 Further, the sound processing unit 130 performs processing of changing the response voice or the response sound according to at least one of the type of target, the positional relationship between the target and the user, and the direction of the user's gaze with respect to the target. For example, when a target response voice or response tone is output in response to user voice input, the response voice or response tone is differentiated according to the type of the target. The type of target is, for example, a target (character) according to gender (male, female), age group (child, adult, etc.), personality (enthusiastic type, coolness type, etc.) or attribute (fire, soil, water, etc.) Difference in the classification of The positional relationship between the target and the user represents, for example, the relationship between at least one of the position and orientation of the target and at least one of the position and orientation of the user. The gaze direction of the user with respect to the target is, for example, what direction the gaze direction of the user is with respect to the position of the target.

なお、ターゲットは、仮想空間（オブジェクト空間）における場所（例えば観客席における席）を表すものであってもよい。この場合には、仮想空間の各場所（ターゲット）に関連づけて属性データ（制御データ）を記憶しておき、ユーザの視線方向が、その場所（ターゲット）の方を向いていると判断した場合には、当該場所に関連づけられた属性データ（場所の属性を表すデータ）を用いて、応答音声又は応答音を変化させる処理を行う。この属性データは例えば空間情報記憶部１７２に記憶しておくことができる。 The target may represent a place in a virtual space (object space) (for example, a seat in a spectator seat). In this case, attribute data (control data) is stored in association with each place (target) in the virtual space, and it is determined that the user's gaze direction is directed to the place (target). The process of changing the response voice or the response sound is performed using the attribute data (data representing the attribute of the place) associated with the place. This attribute data can be stored, for example, in the space information storage unit 172.

また応答音声又は応答音を変化させる処理は、例えば応答音声又は応答音にエフェクトをかけるなどして加工する処理、応答音声又は応答音に対して異なる音声又は音を合成する処理、或いは応答音声又は応答音を生成する音データの組合わせや種類や数を変更する処理などである。 Further, the process of changing the response voice or the response sound is, for example, a process of processing the response voice or the response sound by applying an effect, a process of synthesizing a different voice or sound for the response voice or the response voice, or a response voice or It is a process of changing the combination, type, and number of sound data for generating a response sound.

また測定処理部１１９は、ユーザ音声の特徴量の解析処理を行う。そして音処理部１３０は、ユーザ音声の特徴量の解析処理の結果に応じて、応答音声又は応答音を変化させる処理を行う。ユーザ音声の特徴量は、例えばユーザ音声の音の大きさ、高さ、音色又は種類に関する特徴量である。ユーザ音声の特徴量の解析処理は、例えば周波数解析処理などにより実現できる。例えばユーザ音声のデータに対して所定区間（数十ｍｓ）ごとに周波数分析を行い、音声の周波数成分情報である音響スペクトルを求め、この音響スペクトルを手がかりとしてユーザ音声の特徴量の解析処理を行う。そして解析処理の結果に応じて、応答音声又は応答音を変化させる。例えばユーザ音声の特徴量の解析処理の結果から、ユーザの性別や年齢層が判別された場合には、判別された性別や年齢層に応じて、応答音声又は応答音を変化させる。例えばユーザが男性である場合には、男性用の応答音声又は応答音を出力し、ユーザが女性である場合には、女性用の応答音声又は応答音を出力する。ユーザが大人である場合には、大人用の応答音声又は応答音を出力し、ユーザが子供である場合には、子供用の応答音声又は応答音を出力する。 The measurement processing unit 119 also performs analysis processing of the feature amount of the user voice. Then, the sound processing unit 130 performs processing of changing the response sound or the response sound according to the result of the analysis processing of the feature amount of the user sound. The feature amount of the user voice is, for example, a feature amount related to the size, the height, the timbre, or the type of the sound of the user voice. The analysis processing of the feature amount of the user voice can be realized by, for example, frequency analysis processing. For example, frequency analysis is performed on user speech data for each predetermined interval (several tens of ms) to obtain an acoustic spectrum that is frequency component information of the speech, and analysis processing of the feature amount of the user speech is performed using this acoustic spectrum as a clue. . Then, the response voice or the response sound is changed according to the result of the analysis processing. For example, when the gender or age group of the user is determined from the result of analysis processing of the feature amount of the user voice, the response voice or the response sound is changed according to the determined sex or age group. For example, when the user is a male, a male response voice or response sound is output, and when the user is a female, a female response voice or response sound is output. When the user is an adult, a response voice or response sound for adults is output, and when the user is a child, a response voice or response sound for children is output.

また音処理部１３０は、ユーザが手に持つ音入力装置１６１の位置、方向（或いは手の位置、方向）、ユーザが装着するＨＭＤ２００の位置、方向、ユーザの姿勢、及びユーザの視線の少なくとも１つに応じて、応答音声又は応答音を変化させる処理、或いは応答音声又は応答音を出力する処理、或いは応答音声又は応答音を出力させるためのユーザ音声の入力を受け付ける処理を行う。例えば音処理部１３０は、音入力装置１６１（又はそれを持つ手）がどのような位置にあり、或いはどのような方向を向いているかに応じて、応答音声又は応答音を変化させたり、応答音声又は応答音の出力制御（出力タイミングの制御等）を行ったり、ユーザ音声の入力を受け付けるか否かを判断する。またＨＭＤ２００がどのような位置にあり、或いはどのような方向を向いているかに応じて、応答音声又は応答音を変化させたり、応答音声又は応答音の出力制御を行ったり、ユーザ音声の入力を受け付けるか否かを判断する。或いはユーザの姿勢や視線の状態を検出し、その検出結果に応じて、応答音声又は応答音を変化させたり、応答音声又は応答音の出力制御を行ったり、ユーザ音声の入力を受け付けるか否かを判断する。 Further, the sound processing unit 130 is at least one of the position and direction (or the position and direction of the hand) of the sound input device 161 held by the user, the position and direction of the HMD 200 worn by the user, the posture of the user, and the line of sight of the user. Depending on one, processing to change response voice or response sound, or processing to output response voice or response sound, or processing to receive input of user voice for outputting response voice or response sound is performed. For example, the sound processing unit 130 changes or responds to the response voice or the response tone depending on what position the sound input device 161 (or a hand holding the same) is at or in which direction it is facing. It performs output control (control of output timing, etc.) of voice or response sound, and determines whether or not to receive an input of user voice. Also, depending on the position of the HMD 200 or in which direction it is facing, the response voice or the response sound is changed, the output control of the response voice or the response sound is performed, or the user voice is input. Determine whether to accept. Alternatively, whether the state of the user's posture or line of sight is detected and the response voice or the response sound is changed according to the detection result, the output control of the response voice or the response sound is performed, or the input of the user voice is accepted To judge.

また音処理部１３０は、ユーザの過去のプレイ履歴情報に基づいて、応答音声又は応答音を変化させる処理を行う。例えばユーザの過去のプレイ履歴の情報を記憶部１７０に記憶しておく。そして、ユーザが音入力装置１６１によりユーザ音声を入力した場合に、このプレイ履歴の情報を記憶部１７０から読み出し、プレイ履歴の情報に応じて、応答音声又は応答音を変化させる。例えば頻繁にゲームプレイを行っているユーザがユーザ音声を入力した場合と、希にしかゲームプレイを行っていないユーザがユーザ音声を入力した場合とで、出力される応答音声又は応答音を異ならせる。 Further, the sound processing unit 130 performs processing of changing the response sound or the response sound based on the past play history information of the user. For example, information of the past play history of the user is stored in the storage unit 170. Then, when the user inputs a user voice through the sound input device 161, the information of the play history is read from the storage unit 170, and the response voice or the response sound is changed according to the information of the play history. For example, the response voice or response sound to be output is made different depending on whether the user playing the game frequently inputs the user voice and the user rarely playing the game inputs the user voice. .

また本実施形態のシミュレーションシステムは、図１に示すように音データを記憶する音データ記憶部１７５を含む。この音データ記憶部１７５には、応答音声又は応答音を出力するための複数の音データ（音データファイル）が記憶されている。この場合に音処理部１３０は、音データ記憶部１７５に記憶される複数の音データの中から使用する音データを選択する処理、或いは複数の音データを組み合わせる処理を行うことで、応答音声又は応答音の出力処理を行う。例えば応答音声又は応答音を異ならせる処理（変化させる処理）を、複数の音データの中から選択する音データを異なることで実現したり、音データ組合わせを異ならせることで実現する。 The simulation system of the present embodiment also includes a sound data storage unit 175 that stores sound data as shown in FIG. The sound data storage unit 175 stores a plurality of sound data (sound data files) for outputting a response sound or a response sound. In this case, the sound processing unit 130 performs a process of selecting sound data to be used from a plurality of sound data stored in the sound data storage unit 175 or a process of combining a plurality of sound data, Perform response sound output processing. For example, the process of changing the response voice or the response sound (process of changing) is realized by changing the sound data to be selected from the plurality of sound data, or by changing the sound data combination.

また入力処理部１０２は、ユーザ音声に対する評価処理が行われる評価期間以外の期間において、応答音声又は応答音を出力させるためのユーザ音声の入力を受け付ける。ユーザ音声に対する評価処理は、ユーザ音声が歌声である場合に、歌唱力に対する評価処理である。そして、歌唱力に対する評価処理が行われる評価期間は、例えば歌唱中の期間であり、評価期間以外の期間は、例えば前奏の期間や間奏の期間などである。ユーザの歌唱中に、その歌声に反応して、応答音声又は応答音が出力されてしまうのは望ましくない。このため、歌唱中の期間では、当該応答音声又は応答音の出力は行わず、歌唱中の期間以外の前奏期間や間奏期間などにおいて、当該応答音声又は応答音の出力を行うようにする。 Further, the input processing unit 102 receives an input of a user voice for outputting a response voice or a response sound in a period other than the evaluation period in which the evaluation process on the user voice is performed. The evaluation process for the user voice is an evaluation process for singing power when the user voice is a singing voice. And the evaluation period in which the evaluation process with respect to singing ability is performed is a period during singing, for example, and periods other than an evaluation period are a period of a prelude, a period of an interlude, etc., for example. It is not desirable for a response voice or response sound to be output in response to the user's singing while singing. Therefore, the response sound or the response sound is not output during the singing period, and the response sound or the response sound is output during the prelude period or the interlude period other than the singing period.

２．本実施形態の手法
次に本実施形態の手法について具体的に説明する。なお、以下では本実施形態の手法が適用されるゲームが、歌の演奏を行う音楽ゲーム（ライブステージのゲーム、カラオケゲーム等）である場合を主に例にとり説明する。しかしながら、本実施形態の手法が適用されるゲームは、これに限定されず、例えば弦楽器（ギター等）、打楽器（ドラム、太鼓等）、或いは鍵盤楽器（キーボード、ピアノ）等の楽器を演奏する音楽ゲーム（リズムや演奏の上手さを競うゲーム）などであってもよい。また本実施形態の手法は、異性キャラクタ等とのコミュニケーションゲーム（人間関係シミュレーションゲーム）、トークバトルなどを行う会話ゲーム（法廷闘争ゲーム、掛け合い漫才ゲーム）、戦闘ゲーム、ＲＰＧゲーム、ロボットゲーム、カードゲーム、スポーツゲーム、或いはアクションゲーム等の種々のゲームや、映像コンテンツや音楽コンテンツの再生にも適用可能である。 2. Method of this Embodiment Next, the method of this embodiment will be specifically described. In the following, the case where the game to which the method of the present embodiment is applied is a music game (a game on a live stage, a karaoke game, etc.) for playing a song will be mainly described as an example. However, the game to which the method of this embodiment is applied is not limited to this, and music playing an instrument such as a stringed instrument (guitar etc.), percussion instrument (drum, drum etc.) or keyboard instrument (keyboard, piano) etc. It may be a game (a game in which the player competes in rhythm and performance). In addition, the method according to the present embodiment includes a communication game with a heterosexual character (human relationship simulation game), a conversational game (talk battle game, interplay game), a battle game, an RPG game, a robot game, a card game, etc. The present invention is also applicable to various games such as sports games and action games, and reproduction of video content and music content.

２．１ＨＭＤ、プレイエリア
図２（Ａ）に本実施形態のシミュレーションシステムに用いられるＨＭＤ２００の一例を示す。図２（Ａ）に示すようにＨＭＤ２００には複数の受光素子２０１、２０２、２０３（フォトダイオード）が設けられている。受光素子２０１、２０２はＨＭＤ２００の前面側に設けられ、受光素子２０３はＨＭＤ２００の右側面に設けられている。またＨＭＤの左側面、上面等にも不図示の受光素子が設けられている。 2.1 HMD, Play Area FIG. 2A shows an example of the HMD 200 used in the simulation system of this embodiment. As shown in FIG. 2A, the HMD 200 is provided with a plurality of light receiving elements 201, 202, and 203 (photodiodes). The light receiving elements 201 and 202 are provided on the front side of the HMD 200, and the light receiving element 203 is provided on the right side of the HMD 200. A light receiving element (not shown) is also provided on the left side surface, the upper surface, etc. of the HMD.

またユーザＰＬは、左手、右手で入力装置１６０−１、１６０−２を持っている。入力装置１６０−１、１６０−２には、ＨＭＤ２００と同様に複数の受光素子（不図示）が設けられている。また入力装置１６０−１にはマイク１６２（広義には音入力装置）が設けられており、ユーザＰＬは歌の演奏ゲームにおいてマイク１６２に口を向けて歌うことになる。また入力装置１６０−１、１６０−２はゲームコントローラとしても機能し、不図示の操作ボタン、方向指示キー等が設けられている。なおユーザが持つ入力装置１６０の個数は１個であってもよい。 The user PL also has input devices 160-1 and 160-2 with the left hand and the right hand. Similar to the HMD 200, the input devices 160-1 and 160-2 are provided with a plurality of light receiving elements (not shown). Further, the input device 160-1 is provided with a microphone 162 (in a broad sense, a sound input device), and the user PL sings the mouth to the microphone 162 in a song playing game. The input devices 160-1 and 160-2 also function as game controllers, and are provided with operation buttons, direction instruction keys, and the like (not shown). The number of input devices 160 held by the user may be one.

またＨＭＤ２００には、ヘッドバンド２６０等が設けられており、ユーザＰＬは、より良い装着感で安定的に頭部にＨＭＤ２００を装着できるようになっている。そしてユーザＰＬは、ゲームコントローラとして機能する入力装置１６０−１、１６０−２を操作したり、頭部の頷き動作や首振り動作を行うことで、操作情報を入力し、ゲームプレイを楽しむ。頷き動作や首振り動作は、ＨＭＤ２００のセンサ部２１０等により検出できる。 Further, the headband 260 and the like are provided in the HMD 200, and the user PL can wear the HMD 200 on the head stably with a better feeling of wearing. Then, the user PL operates the input devices 160-1 and 160-2 functioning as a game controller, and performs operation of swinging or swinging the head to input operation information and enjoy game play. The twisting operation and the swinging operation can be detected by the sensor unit 210 or the like of the HMD 200.

図２（Ｂ）に示すように、ユーザＰＬのプレイエリアにはベースステーション２８０、２８４が設置されている。ベースステーション２８０には発光素子２８１、２８２が設けられ、ベースステーション２８４には発光素子２８５、２８６が設けられている。発光素子２８１、２８２、２８５、２８６は、例えばレーザー（赤外線レーザー等）を出射するＬＥＤにより実現される。ベースステーション２８０、２８４は、これら発光素子２８１、２８２、２８５、２８６を用いて、例えばレーザーを放射状に出射する。そして図２（Ａ）のＨＭＤ２００に設けられた受光素子２０１〜２０３等が、ベースステーション２８０、２８４からのレーザーを受光することで、ＨＭＤ２００のトラッキングが実現され、ユーザＰＬの頭の位置や向く方向（広義にはユーザの位置や方向）を検出できるようになる。また入力装置１６０−１、１６０−２に設けられた不図示の受光素子が、ベースステーション２８０、２８４からのレーザーを受光することで、入力装置１６０−１、１６０−２のトラッキングが実現され、入力装置１６０−１、１６０−２の位置や方向を検出できるようになる。これにより、例えばゲーム画像に、入力装置１６０−１に対応するマイクの画像等を表示することが可能になる。 As shown in FIG. 2 (B), base stations 280 and 284 are installed in the play area of the user PL. The base station 280 is provided with light emitting elements 281, 282, and the base station 284 is provided with light emitting elements 285, 286. The light emitting elements 281, 282, 285, 286 are realized by, for example, LEDs emitting laser (infrared laser etc.). The base station 280, 284 radiates, for example, a laser radially using the light emitting elements 281, 282, 285, 286. When the light receiving elements 201 to 203 and the like provided in the HMD 200 of FIG. 2A receive the lasers from the base stations 280 and 284, tracking of the HMD 200 is realized, and the head position and direction of the user PL It becomes possible to detect (the position and direction of the user in a broad sense). Also, when light receiving elements (not shown) provided in the input devices 160-1 and 160-2 receive the lasers from the base stations 280 and 284, tracking of the input devices 160-1 and 160-2 is realized, Positions and directions of the input devices 160-1 and 160-2 can be detected. Thus, for example, an image of a microphone corresponding to the input device 160-1 can be displayed on a game image.

図３（Ａ）にＨＭＤ２００の他の例を示す。図３（Ａ）では、ＨＭＤ２００に対して複数の発光素子２３１〜２３６が設けられている。これらの発光素子２３１〜２３６は例えばＬＥＤなどにより実現される。発光素子２３１〜２３４は、ＨＭＤ２００の前面側に設けられ、発光素子２３５や不図示の発光素子２３６は、背面側に設けられる。これらの発光素子２３１〜２３６は、例えば可視光の帯域の光を出射（発光）する。具体的には発光素子２３１〜２３６は、互いに異なる色の光を出射する。そして図３（Ｂ）に示す撮像部１５０をユーザＰＬの前方側に設置し、この撮像部１５０により、これらの発光素子２３１〜２３６の光を撮像する。即ち、撮像部１５０の撮像画像には、これらの発光素子２３１〜２３６のスポット光が映る。そして、この撮像画像の画像処理を行うことで、ユーザＰＬの頭部（ＨＭＤ）のトラッキングを実現する。即ちユーザＰＬの頭部の３次元位置や向く方向（ユーザの位置、方向）を検出する。 Another example of the HMD 200 is shown in FIG. In FIG. 3A, a plurality of light emitting elements 231 to 236 are provided for the HMD 200. These light emitting elements 231 to 236 are realized by, for example, LEDs. The light emitting elements 231 to 234 are provided on the front side of the HMD 200, and the light emitting element 235 and the light emitting element 236 (not shown) are provided on the back side. The light emitting elements 231 to 236 emit (emit) light in a visible light band, for example. Specifically, the light emitting elements 231 to 236 emit light of different colors. Then, the imaging unit 150 illustrated in FIG. 3B is installed on the front side of the user PL, and the imaging unit 150 images the light of the light emitting elements 231 to 236. That is, spot lights of these light emitting elements 231 to 236 appear on the captured image of the imaging unit 150. Then, by performing image processing of this captured image, tracking of the head (HMD) of the user PL is realized. That is, the three-dimensional position and the direction (the position and direction of the user) of the head of the user PL are detected.

例えば図３（Ｂ）に示すように撮像部１５０には第１、第２のカメラ１５１、１５２が設けられており、これらの第１、第２のカメラ１５１、１５２の第１、第２の撮像画像を用いることで、ユーザＰＬの頭部の奥行き方向での位置等が検出可能になる。またＨＭＤ２００に設けられたモーションセンサのモーション検出情報に基づいて、ユーザＰＬの頭部の回転角度（視線）も検出可能になっている。従って、このようなＨＭＤ２００を用いることで、ユーザＰＬが、周囲の３６０度の全方向うちのどの方向を向いた場合にも、それに対応する仮想空間（仮想３次元空間）での画像（ユーザの視点に対応する仮想カメラから見える画像）を、ＨＭＤ２００の表示部２２０に表示することが可能になる。なお、発光素子２３１〜２３６として、可視光ではなく赤外線のＬＥＤを用いてもよい。また、例えばデプスカメラ等を用いるなどの他の手法で、ユーザの頭部の位置や動き等を検出するようにしてもよい。 For example, as shown in FIG. 3B, the imaging unit 150 is provided with first and second cameras 151 and 152. The first and second cameras 151 and 152 of these first and second cameras 151 and 152 are provided. By using the captured image, the position or the like of the user PL in the depth direction of the head can be detected. Further, based on motion detection information of a motion sensor provided in the HMD 200, it is also possible to detect the rotation angle (gaze line) of the head of the user PL. Therefore, by using such an HMD 200, when the user PL faces in any direction of all 360 degrees of the surrounding, an image (user's image) in the corresponding virtual space (virtual three-dimensional space) It becomes possible to display on the display unit 220 of the HMD 200 an image viewed from the virtual camera corresponding to the viewpoint. In addition, you may use LED of not infrared rays but visible light as the light emitting elements 231-236. Further, the position, the movement, and the like of the user's head may be detected by another method such as using a depth camera.

なお、ユーザの視点位置、視線方向（ユーザの位置、方向）を検出するトラッキング処理の手法は、図２（Ａ）〜図３（Ｂ）で説明した手法には限定されない。例えばＨＭＤ２００に設けられたモーションセンサ等を用いて、ＨＭＤ２００の単体でトラッキング処理を実現してもよい。即ち、図２（Ｂ）のベースステーション２８０、２８４、図３（Ｂ）の撮像部１５０などの外部装置を設けることなく、トラッキング処理を実現する。或いは、公知のアイトラッキング、フェイストラッキング又はヘッドトラッキングなどの種々の視点トラッキング手法により、ユーザの視点位置、視線方向などの視点情報等を検出してもよい。 Note that the method of tracking processing for detecting the viewpoint position of the user and the gaze direction (the position and direction of the user) is not limited to the method described in FIGS. 2 (A) to 3 (B). For example, a tracking process may be realized with the HMD 200 alone using a motion sensor or the like provided in the HMD 200. That is, tracking processing is realized without providing an external device such as the base stations 280 and 284 in FIG. 2B and the imaging unit 150 in FIG. 3B. Alternatively, viewpoint information and the like such as the viewpoint position of the user and the gaze direction may be detected by various viewpoint tracking methods such as known eye tracking, face tracking, or head tracking.

図４、図５に本実施形態のゲームが実現されるプレイエリアの一例を示す。このプレイエリアは、ボックス状の防音の個室により実現される。図４、図５に示すようにボックスの個室は、壁３０１、３０２、３０３、３０４、天井３０５、ドア３０６を有する。壁３０１、３０２、３０３、３０４、天井３０５の内側にはクッション材としても機能する防音材３１１、３１２、３１３、３１４、３１５が設けられている。また天井３０５には前述のベースステーション２８０、２８４や照明器具２９０、２９２が設置されている。 4 and 5 show an example of a play area in which the game of the present embodiment is realized. This play area is realized by a box-like soundproof private room. As shown in FIGS. 4 and 5, the box compartment has walls 301, 302, 303, 304, a ceiling 305, and a door 306. Soundproofing materials 311, 312, 313, 314, and 315, which also function as cushioning materials, are provided inside the walls 301, 302, 303, and 304, and the ceiling 305. In the ceiling 305, the above-mentioned base stations 280 and 284 and lighting fixtures 290 and 292 are installed.

ユーザＰＬの前側にはフロントスピーカ３３０、３３１、センタースピーカ３３２が設置され、後ろ側にはリアスピーカ３３３、３３４、ウーハー３３５が設置される。これらのスピーカによりサラウンド音響が実現される。そしてウーハー３３５が収容されている収容ボックス内に、巻き取り装置５０が収容されている。この巻き取り装置５０は回転リール６２を有しており、ケーブル２０は、収容ボックス（棚）に設けられたケーブル通過口５２を通って、回転リール６２により巻き取られる。 Front speakers 330 and 331 and a center speaker 332 are installed on the front side of the user PL, and rear speakers 333 and 334 and a woofer 335 are installed on the rear side. Surround sound is realized by these speakers. The winding device 50 is accommodated in the accommodation box in which the woofer 335 is accommodated. The winding device 50 has a rotating reel 62, and the cable 20 is taken up by the rotating reel 62 through a cable passage port 52 provided in a storage box (shelf).

ユーザＰＬは図５のドア３０６を開けて個室内に入り、ゲームをプレイする。この個室内の空間がユーザＰＬのプレイエリア（プレイ空間、実空間）になる。そして図５に示すように、ボックスの個室のプレイエリアには、ユーザＰＬの移動範囲（移動可能範囲）として想定されるエリアＡＲが設定される。このエリアＡＲ内では、ステーション２８０、２８４等を用いたユーザＰＬの位置、方向（視点位置、視線方向）のトラッキングが可能になっている。一方、エリアＡＲの境界ＢＤを越えた位置では、確実なトラッキングを実現できない。またユーザＰＬがエリアＡＲの境界ＢＤを越えると、壁３０１、３０２、３０３、３０４にぶつかるおそれがあり、安全面の上で望ましくない。エリアＡＲの設定は、例えばゲーム装置のイニシャライズ設定などにより、その範囲を設定可能になっている。 The user PL opens the door 306 of FIG. 5, enters the compartment, and plays the game. The space of this individual room becomes the play area (play space, real space) of the user PL. Then, as shown in FIG. 5, an area AR assumed as a movement range (movable range) of the user PL is set in the play area of the boxed room. In this area AR, tracking of the position and direction (viewpoint position and gaze direction) of the user PL using the stations 280 and 284 is possible. On the other hand, in a position beyond the boundary BD of the area AR, reliable tracking can not be realized. In addition, if the user PL crosses the boundary BD of the area AR, there is a possibility that the wall 301, 302, 303, 304 may collide, which is not desirable in terms of safety. The setting of the area AR can be set, for example, by the initialization setting of the game apparatus.

そして図４、図５に示すように、ユーザＰＬは腰ベルト３０を装着している。腰ベルト３０には収容部３２が取り付けられており、この収容部３２内にケーブル２０の中継点ＲＰが設けられる。ケーブル２０は、ＨＭＤ２００から中継点ＲＰを経由して巻き取り装置５０により巻き取られる。ケーブル部分２１とケーブル部分２２の間のポイントが中継点ＲＰになる。なおケーブル２０には、ユーザＰＬが基準位置に立っている際にケーブル２０を弛ませるためのストッパー２６が設けられている。 Then, as shown in FIG. 4 and FIG. 5, the user PL wears the waist belt 30. A storage portion 32 is attached to the waist belt 30, and a relay point RP of the cable 20 is provided in the storage portion 32. The cable 20 is wound by the winding device 50 from the HMD 200 via the relay point RP. The point between the cable portion 21 and the cable portion 22 is the relay point RP. The cable 20 is provided with a stopper 26 for loosening the cable 20 when the user PL is standing at the reference position.

２．２ゲームの概要
次に、本実施形態の手法により実現されるゲームの概要について説明する。本実施形態により実現されるゲームは、本物のライブステージのような臨場感の中、バンドのボーカルになりきって、ボーカル演奏を行う音楽ゲームである。ユーザは、大観衆を目前にして歌うという、かつてない高揚感を感じつつ、自分のファンからの歓声を全身に浴びる強烈な快感を得ることができる。ＨＭＤと大出力のサラウンドスピーカーにより、まるで本物のライブステージに出演し、自分のファンに囲まれて歌っているかのような、臨場感を得ることができる。 2.2 Outline of Game Next, an outline of a game realized by the method of the present embodiment will be described. The game realized by the present embodiment is a music game in which a vocal performance is performed while becoming a vocal of a band in the presence like a real live stage. The user can obtain an intense feeling of exhilarating sensation from his fans while feeling the sensation of exhilaration that he sings in front of a large audience. The HMD and high-output surround speakers give you a sense of presence, as if you were singing on a real live stage and being surrounded by your fans.

ステージの周りの観客は、ユーザのボーカル演奏やステージアクションに反応して、派手な声援や様々なアクションをインタラクティブに返してくる。ＨＭＤによる、まるでその場に立っているかのような臨場感のライブステージの上で、表情まで見える最前列のファンをはじめ、会場を埋める満員の観客の前で、バンドメンバーの演奏と共に、自分の歌とライブパフォーマンスを行って、観客の期待に応える。 The audience around the stage interactively returns loud cheers and various actions in response to the user's vocal performance and stage actions. In front of a crowd of spectators who fill the hall, including the front row fans who can see facial expressions on a live stage with a sense of realism, such as standing on the spot by HMD, with their band members playing their own Meet the audience's expectations by singing and performing live performances.

ユーザは共有スペースに設けられた受け付けスペースで、入室時間の予約やプレイ設定を行い、図４、図５に示すようにクッション材（防音材）が貼られた安全な個室内で、ライブ出演体験を楽しむ。 Users make room reservation and play settings in the reception space provided in the common space, and as shown in Fig. 4 and Fig. 5, they can experience live appearances in a safe private room with a cushioning material (soundproofing material) affixed. Enjoy.

ユーザは、ステージ出演前のプレイ設定において、コンサート出演モードを選択する。その後、歌う曲の選択を行い、出演ステージを選択する。そして図２（Ａ）、図２（Ｂ）等で説明したＨＭＤ２００、入力装置１６０−１、１６０−２などのデバイスや、腰ベルト３０を装着する。店舗のオペレータが、注意事項等を説明し、ユーザのデバイス等の装着や調整を補助する。プレイエリアである個室空間のキャリブレーション（イニシャライズ）は、オペレータが事前に行う。 The user selects the concert appearance mode in the play setting before stage appearance. Then, select the song to sing, and select the appearance stage. Then, the devices such as the HMD 200 and the input devices 160-1 and 160-2 described with reference to FIGS. 2A and 2B and the like, and the waist belt 30 are attached. The operator of the store explains the notes and the like, and assists the installation and adjustment of the user's device and the like. The operator performs the calibration (initialization) of the room space as the play area in advance.

図２（Ａ）の入力装置１６０−１は、マイク＆ゲームコントローラになっている。ＶＲ（バーチャルリアリティ）空間内では、ユーザ（仮想ユーザ）の腕や手は描画されないが、ユーザが手で持っている入力装置１６０−１等の位置がセンシングされ、同じ位置にマイク画像が描画され、ユーザの動きに応じてマイク画像が動くようになる。 The input device 160-1 in FIG. 2A is a microphone & game controller. In the VR (Virtual Reality) space, although the user's (virtual user) 's arms and hands are not drawn, the position of the input device 160-1 etc. held by the user is sensed and a microphone image is drawn at the same position. The microphone image moves according to the movement of the user.

ユーザは、ＶＲ空間のスタンバイルームで、ボーカルのキー調整を行う。スタンバイルームは、ステージの下の待機スペースである。ＶＲ空間においてユーザが立っているスペースは大きなリフトになっていて、本番時にはステージ上にせり上がる。 The user performs vocal key adjustment in the standby room of the VR space. The standby room is a standby space below the stage. The space where the user stands in the VR space is a large lift, and during production, it rises on the stage.

リフトが上昇しステージが近づいて来るのに合わせて、遠くから聞こえていたホールの歓声や掛け声が徐々に大きくなり、迫力を増し、且つ、生々しく変化する。ステージ上にユーザが出現すると、ユーザに向けて前方から逆光のスポットライトが当てられ、ユーザの登場で最高潮に達した大歓声が起こる。 As the lift goes up and the stage approaches, the cheers and screeches of the hall that were heard from a distance gradually increase, becoming more powerful and vibrantly changing. When the user appears on the stage, the backlight is backlit from the front toward the user, and a loud cheer that reaches the climax occurs when the user appears.

ライブの本番ではユーザは、思う存分、ステージでの熱唱を楽しむ。図６、図７は、ステージ上のユーザのＨＭＤ２００に表示されるゲーム画像（ＶＲ空間での画像）の例である。図６、図７に示すようにユーザの目の前には満員の観客が映し出される。図６はユーザが正面を向いた場合のゲーム画像であり、図７は、ユーザが右方向を向いた場合のゲーム画像である。 In live performances, the user enjoys the most enjoyable on stage performance. 6 and 7 show examples of game images (images in the VR space) displayed on the HMD 200 of the user on the stage. As shown in FIGS. 6 and 7, a full audience is shown in front of the user. FIG. 6 is a game image when the user turns to the front, and FIG. 7 is a game image when the user turns to the right.

図６、図７に示すように、ＨＭＤ２００を用いる本実施形態のゲーム装置では、ユーザの全周囲の方向に亘って、仮想空間であるＶＲ空間の世界が広がる。例えばＨＭＤ２００を装着したユーザが前方向を向けば、図６のゲーム画像がＨＭＤ２００に表示され、右方向を向けば、図７のゲーム画像が表示される。後ろ方向を向けば、演奏バンドなどの画像が表示される。従って、多数の観客が歓声を上げる巨大なコンサートホールにおいて、ボーカル演奏しているかのような仮想現実感をユーザに与えることができ、ゲームへの没入度等を格段に向上できる。 As shown in FIGS. 6 and 7, in the game device of the present embodiment using the HMD 200, the world of the VR space, which is a virtual space, spreads in the direction of the entire periphery of the user. For example, if the user wearing the HMD 200 faces the front, the game image of FIG. 6 is displayed on the HMD 200, and if the user faces the right, the game image of FIG. 7 is displayed. If you look backwards, an image such as a performance band will be displayed. Accordingly, in a large concert hall where a large number of spectators cheer up, it is possible to give the user a virtual reality as if vocal performance is being performed, and the degree of immersion in the game can be remarkably improved.

また本実施形態では、曲の抑揚に合わせて観客の動き（アクション）や歓声が変化する。またユーザのアクションに応じるように、観客の動きや歓声が変化する。例えば図６において、ユーザが立つステージの近くの観客ＡＤ１〜ＡＤ７は、例えば多数のポリゴンで構成されるポリゴンモデルのオブジェクトにより表現されている。図７の観客ＡＤ８〜ＡＤ１１も同様である。一方、ステージから遠い位置にいる観客は、ユーザの視線に正対するビルボードポリゴンに描かれた画像により表現されている。 Further, in the present embodiment, the movement (action) and cheers of the audience change in accordance with the tonation of the song. Also, the movements and cheers of the audience change so as to respond to the user's action. For example, in FIG. 6, the spectators AD1 to AD7 near the stage on which the user stands are represented by objects of a polygon model configured of, for example, a large number of polygons. The same applies to the audiences AD8 to AD11 in FIG. On the other hand, a spectator at a position far from the stage is represented by an image drawn on a billboard polygon facing the user's gaze.

ポリゴンモデルの観客（ＡＤ１〜ＡＤ１１）は、例えばモーションデータによるモーション再生によりその動きが表現されている。これらの観客は、曲のリズムにのって基本アクション（基本モーション）を行う。観客は、ユーザの声や動きに応じも、インタラクティブにリアクションする。ユーザが、例えば「特定の方向を向いて歌う」、「手を振る」といったアクションを行うことで、その方向の観客のテンションが上がり、基本アクションが、１段階、派手になったり、或いは曲のリズムとは関係ない突発的な盛り上がりアクションを行うようになる。 The movement of the spectators (AD1 to AD11) of the polygon model is represented by motion reproduction using motion data, for example. These spectators perform basic actions (basic motions) according to the rhythm of the song. The audience reacts interactively according to the voice and movement of the user. For example, when the user performs an action such as "sing to a specific direction" or "swing", the audience's tension in that direction is increased, and the basic action becomes one step, becomes brighter, or It will perform a sudden upsurge action that has nothing to do with the rhythm.

図８は、ＶＲ空間においてユーザＰＬ（仮想ユーザ）が立つステージＳＧを、上方から俯瞰した様子を示す図である。ステージＳＧは境界ＢＤＳにより区画されており、ステージＳＧの周りの観客席ＳＥ１、ＳＥ２、ＳＥ３には、図６、図７に示すようにアクションする観客が配置されている。このステージＳＧの境界ＢＤＳは、例えば図５の現実世界のプレイエリアのエリアＡＲの境界ＢＤに対応している。一例としては、図５の現実世界のエリアＡＲの境界ＢＤから例えば所定距離だけ内側に対応する位置に、ＶＲ空間でのステージＳＧの境界ＢＤＳが設定されている。 FIG. 8 is a view showing the stage SG where the user PL (virtual user) stands in the VR space as viewed from above. The stage SG is divided by the boundary BDS, and in the audience seats SE1, SE2, SE3 around the stage SG, spectators who perform actions as shown in FIGS. 6 and 7 are arranged. The boundary BDS of the stage SG corresponds to, for example, the boundary BD of the area AR of the real-world play area shown in FIG. As an example, the boundary BDS of the stage SG in the VR space is set at a position corresponding to, for example, a predetermined distance inward from the boundary BD of the area AR of the real world in FIG.

本実施形態ではユーザは、観客を盛り上げるためのパフォーマンスを行うことができる。ユーザに向けて何らかのアピールのアクションをしてくる観客が、当該パフォーマンスのターゲットになる。例えば図６では、観客ＡＤ４が、右手を高く上げてユーザに対してアピールのアクションを行っており、この観客ＡＤ４がターゲットになる。また図７ではアピールのアクションを行っている観客ＡＤ１０がターゲットになる。 In the present embodiment, the user can perform a performance to boost the audience. A spectator who makes some appealing action towards the user is the target of the performance. For example, in FIG. 6, the spectator AD 4 raises the right hand to perform an appealing action to the user, and the spectator AD 4 becomes a target. Further, in FIG. 7, a spectator AD 10 performing an action of appeal is a target.

これらのアピールする観客に対して、ユーザがアクションを行うことで、これらの観客の熱狂度パラメータ（熱狂度ゲージ）の値が上昇する。そして熱狂度パラメータが最大値になると、これらの観客が、大喜びを表現する熱狂アクションを行うようになる。 As the user performs an action on these appealing spectators, the value of the enthusiasticness parameter (the enthusiasticity gauge) of these spectators increases. And when the enthusiasm parameter reaches its maximum value, these spectators will perform enthusiasm actions that express joy.

アピールする観客に対してユーザが行う第１のアクションは、そのターゲットの観客の方に視線を向けて歌うアクションである。この第１のアクションが行われるのは、楽曲の歌唱パートと間奏パートのうち、歌唱パートにおいてである。 The first action performed by the user on the appealing audience is an action of singing and aiming at the target audience. The first action is performed in the singing part of the singing part and the interlude part of the music.

アピールする観客に対してユーザが行う第２のアクションは、観客に対して呼びかけなどの発声を行うアクションである。ユーザの呼びかけの発声に対して、観客は声援や掛け声や拍手などで応答する。ユーザは、観客からの声援や掛け声や拍手に合わせて、腕を上げるアクションなどを行う。 The second action performed by the user on the appealing audience is an action for vocalizing the audience or the like. The audience responds to the user's vocalization by cheering, cheering, applause, etc. The user performs an action such as raising an arm in accordance with cheers, screams and applause from the audience.

これらの第１、第２のアクションのターゲットとなる観客は、例えばランダムにコンサートホール内で発生する。ユーザの１曲の演奏におけるターゲットの数は所定数に決まっている。ターゲットとなる所定数の全ての観客について、熱狂度パラメータが最大値になり盛り上げに成功すると、全クリアになる。例えばターゲットとなる観客の数が１０人である場合に、これらの１０人の全ての観客の熱狂度パラメータが最大値に達すると、ターゲットの全クリアになる。この場合に、ターゲットのクリア数に応じて、最後のステージエフェクトの種類が変化する。例えば１０人のターゲットの全てをクリアした場合には、最後のステージにおいて最も派手なエフェクトが発生する。また例えば８人のターゲットをクリアした場合には、５人のターゲットをクリアした場合に比べて、より派手なエフェクトが発生する。 Audiences targeted by these first and second actions occur, for example, randomly in the concert hall. The number of targets in one song played by the user is fixed to a predetermined number. If the enthusiasm parameter reaches its maximum value for all the predetermined number of target audiences to be targeted and if the enthusiasm is successful, then all will clear. For example, in the case where the number of target audiences is 10, when the enthusiasm parameter of all the 10 audiences reaches the maximum value, the target becomes completely clear. In this case, the type of the last stage effect changes in accordance with the number of cleared targets. For example, if all 10 targets are cleared, the most spectacular effect occurs on the last stage. Further, for example, when eight targets are cleared, a more flashy effect occurs than when five targets are cleared.

また本実施形態ではユーザの歌唱力が評価される。即ち、ユーザの歌の音程とリズムが検知されて評価される。具体的には、リズムに合わせた発声が出来たかを評価する。例えば、ベース又はドラムの演奏の指定タイミングに合わせて、ユーザが発声できると、ポイントが加算される。また、ロングトーンの上手さや、休符についての正確性なども評価される。また音程の正確性も評価する。即ち、ユーザの歌の音程判定を行い、正解な音程で歌った音程をグラフ的に表示する。なお、ユーザの音量がゼロの状態が一定時間、継続すると、観客の多くは待機モーションに移行し、その間はユーザはポイント加算を得ることができない。 In the present embodiment, the user's singing ability is evaluated. That is, the pitch and rhythm of the user's song are detected and evaluated. Specifically, it is evaluated whether a voice matched to the rhythm has been achieved. For example, points can be added when the user can utter at the designated timing of the bass or drum performance. In addition, the accuracy of the long tone, the accuracy of the rest, etc. are also evaluated. Also assess the accuracy of the pitch. That is, it determines the pitch of the user's song, and graphically displays the pitches sang at the correct pitch. It should be noted that when the user's volume continues to be zero for a certain period of time, most of the spectators shift to the standby motion, during which the user can not obtain point addition.

ユーザの演奏が終了し、ライブステージを通してのユーザの評価結果が一定基準を満たすと、観客はアンコールを求めて来る。そのアンコールに応えることで、ユーザは更に１曲、追加で演奏できるようになる。そしてライブステージの終了時には、フェードアウトする歓声の中で、照明はさらに暗くなり、ブラックアウトして終了する。その後、「お疲れ様でした、ＨＭＤを外してください」といった表示及びガイド音声が流れる。そしてユーザは装備を外して個室から退室し、ゲームプレイの終了となる。 When the user's performance is finished and the evaluation result of the user through the live stage satisfies a certain standard, the spectator comes for encore. Responding to the encore allows the user to play one more song additionally. And at the end of the live stage, in the faded out cheers, the lights get darker, black out and finished. After that, a message such as "Thank you for your kindness, please remove the HMD" and a guide sound are heard. Then, the user removes the equipment and leaves the room, and the game play ends.

２．３ユーザ音声入力に対する応答
本実施形態ではユーザの音声入力に対して、応答音声、応答音を出力して応答する手法を採用している。例えば図６、図７の観客（広義にはターゲット）に対して、ユーザがマイク１６２に音声を入力して呼びかけ（コール）を行うと、観客からの歓声、掛け声、拍手、足踏み音又は口笛などの応答（レスポンス）が返ってくる。即ち、これらの歓声、掛け声、拍手、足踏み音又は口笛などの応答音声、応答音が、図４、図５のスピーカ３３０〜３３４から出力される。なお、以下では、音入力装置１６１が、図２（Ａ）に示すようなハンドマイク形状のマイク１６２である場合を例にとり説明する。 2.3 Response to User Voice Input In this embodiment, a method of outputting a response voice and a response tone to respond to the user's voice input is adopted. For example, when the user makes a call to the audience (in a broad sense, the target in FIG. 6 or FIG. 7) by inputting a voice to the microphone 162, cheers from the audience, screams, applause, footsteps or whistling etc. Response is returned. That is, response voices such as cheers, screams, applauses, footsteps or whistles, and response sounds are output from the speakers 330 to 334 in FIGS. 4 and 5. In the following, the case where the sound input device 161 is a microphone 162 shaped like a hand microphone as shown in FIG. 2A will be described as an example.

例えば図９では、ユーザＰＬは、観客ＡＤＡ、ＡＤＢ、ＡＤＣの方に視線ＶＬＰを向けて、「イェーイ」という呼びかけを行っている。即ち、マイク１６２に「イェーイ」というユーザ音声を入力している。このようなユーザ音声の入力が行われると、所与の時間経過後に、図９に示すように観客ＡＤＡ、ＡＤＢ、ＡＤＣが歓声、拍手、口笛などで応答する。このようなコール＆レスポンス（Ｃ＆Ｒ）の手法を採用することで、ユーザは、仮想空間の観客を、あたかも本物の人間の観客のように感じることが可能になり、ユーザの仮想現実感を大幅に向上できる。 For example, in FIG. 9, the user PL turns the line of sight VLP toward the audience ADA, ADB, and ADC, and makes a call "Yay." That is, the user voice "Yay" is input to the microphone 162. When such user voice input is performed, after a given time has elapsed, as shown in FIG. 9, the spectators ADA, ADB, ADC respond with cheers, applause, whistling, and the like. By adopting such a call & response (C & R) method, the user can feel the audience in the virtual space as if it were a real human audience, and the virtual reality of the user is greatly enhanced. It can improve.

この場合に、ユーザからの呼びかけ（コール）に対する観客の応答（レスポンス）が、ワンパターンでバラエティー度のない応答になってしまうと、ユーザは、あたかもロボットを相手に呼びかけを行っているかのような感覚を持ってしまい、仮想現実感が損なわれてしまう。 In this case, if the audience's response to the user's call (call) is a one-pattern non-variate response, the user is as if it were calling to the robot. It has a sense and virtual reality is lost.

一方、ユーザ音声の音声認識処理などを行って、複雑なアルゴリズムの処理により、観客の応答音声や応答音を生成しようとすると、処理負荷が過大になってしまい、シミュレーションシステムが行う他の処理（画像生成処理等）に悪影響を与えてしまう。 On the other hand, if it is attempted to generate a response voice or a response sound of a spectator by processing of a complex algorithm by performing voice recognition processing of user voice, etc., the processing load becomes excessive, and other processing performed by the simulation system ( It adversely affects the image generation processing etc.).

そこで本実施形態では、マイク１６２（広義には音入力装置）により入力されたユーザ音声の音量や長さの測定処理を行う。そして、ユーザ音声の音量や長さの測定処理の結果に基づく応答音声、応答音を出力する手法を採用している。 So, in this embodiment, the measurement process of the volume and length of the user's voice inputted by microphone 162 (sound input device in a broad sense) is performed. And the method of outputting the response sound and the response sound based on the result of the measurement process of the volume and length of the user voice is adopted.

図９を例にとれば、ユーザＰＬがマイク１６２を用いて呼びかけの音声（「イェーイ」）を入力すると、この呼びかけの音声の音量や長さが測定される。そして、呼びかけの音声の音量に応じて、観客ＡＤＡ〜ＡＤＣの歓声、拍手等の応答音声、応答音を異ならせる。また呼びかけの音声の長さに応じて、観客ＡＤＡ〜ＡＤＣの歓声、拍手等の応答音声、応答音を異ならせる。例えば応答音声、応答音の種類、音量（強さ）、音の高さ又は音色などを異ならせる。例えば図１の音データ記憶部１７５の複数の音データの中から選択される音データや、音データの組合わせを、呼びかけの音声の音量や長さに応じて異ならせる。例えば呼びかけ音声の音量に応じて、異なる音データを選択したり、異なる組合わせの音データを使用する。また呼びかけ音声の長さに応じて、異なる音データを選択したり、異なる組合わせの音データを使用する。 In the example of FIG. 9, when the user PL inputs the voice of the call ("Yay") using the microphone 162, the volume and length of the voice of the call are measured. Then, depending on the volume of the voice of the call, the response voices such as cheers and applauses of the audience ADA to ADC and the response sounds are made different. Also, depending on the length of the voice of the call, the cheering of the audience ADA to ADC, the response sound such as applause, and the response sound are made different. For example, the response sound, the type of response sound, the volume (intensity), the pitch or tone of the sound, etc. are made different. For example, the sound data selected from among the plurality of sound data in the sound data storage unit 175 of FIG. 1 and the combination of the sound data are made different according to the volume and the length of the voice of the call. For example, different sound data may be selected or sound data of different combinations may be used according to the volume of the calling voice. Also, different sound data may be selected or sound data of different combinations may be used according to the length of the calling voice.

図１０は、ユーザ音声の音量、長さに応じて、応答音声、応答音を異ならせる手法について説明する図である。なお図１０のＲＳ１１〜ＲＳ３５の各応答音声、応答音は、音データ記憶部１７５の複数の音データから選択された音データ、或いは音データの組合わせにより実現される。 FIG. 10 is a view for explaining a method of making the response sound and the response sound different according to the volume and the length of the user sound. The response sounds and response sounds of RS11 to RS35 in FIG. 10 are realized by sound data selected from a plurality of sound data in the sound data storage unit 175 or a combination of sound data.

例えばマイク１６２に入力されたユーザ音声の音量がＬＶ１である場合（ＬＶ１を越えた場合又は０〜ＬＶ１である場合）には、ユーザ音声の長さ（尺）に応じて、ＲＳ１１、ＲＳ１２、ＲＳ１３、ＲＳ１４、ＲＳ１５のいずかの応答音声、応答音（音データ又は音データの組合わせ）が選択されて出力される。例えばユーザ音声の長さがＴ１である場合（Ｔ１を越えた場合又は０〜Ｔ１である場合）には、ＲＳ１１の応答音声、応答音が出力される。ユーザ音声の長さがＴ２である場合（Ｔ２を越えた場合又はＴ１〜Ｔ２である場合）には、ＲＳ１２の応答音声、応答音が出力される。同様にユーザ音声の長さがＴ３、Ｔ４、Ｔ５である場合には、各々、ＲＳ１３、ＲＳ１４、ＲＳ１５の応答音声、応答音が出力される。 For example, when the volume of the user voice input to the microphone 162 is LV1 (when LV1 is exceeded or 0 to LV1), RS11, RS12, RS13 according to the length (scale) of the user voice. The response sound of either RS14 or RS15 or the response sound (sound data or a combination of sound data) is selected and output. For example, when the length of the user voice is T1 (when T1 is exceeded or 0 to T1), the response voice and response tone of RS11 are output. When the length of the user voice is T2 (when T2 is exceeded or T1 to T2), the response voice and response tone of RS12 are output. Similarly, when the length of the user voice is T3, T4, or T5, the response voices and response tones of RS13, RS14, and RS15 are output, respectively.

またマイク１６２に入力されたユーザ音声の音量がＬＶ２である場合（ＬＶ２を越えた場合又はＬＶ１〜ＬＶ２である場合）には、ユーザ音声の長さに応じて、ＲＳ２１、ＲＳ２２、ＲＳ２３、ＲＳ２４、ＲＳ２５のいずかの応答音声、応答音が選択されて出力される。例えばユーザ音声の長さがＴ１、Ｔ２、Ｔ３、Ｔ４、Ｔ５である場合には、各々、ＲＳ２１、ＲＳ２２、ＲＳ２３、ＲＳ２４、ＲＳ２５の応答音声、応答音が出力される。 When the volume of the user's voice input to the microphone 162 is LV2 (when LV2 is exceeded or LV1 to LV2), RS21, RS22, RS23, RS24, and so on according to the length of the user's voice. Any response voice and response tone of RS 25 is selected and output. For example, when the lengths of the user voices are T1, T2, T3, T4, and T5, response voices and response sounds of RS21, RS22, RS23, RS24, and RS25 are output, respectively.

またマイク１６２に入力されたユーザ音声の音量がＬＶ３である場合（ＬＶ３を越えた場合又はＬＶ２〜ＬＶ３である場合）には、ユーザ音声の長さに応じて、ＲＳ３１、ＲＳ３２、ＲＳ３３、ＲＳ３４、ＲＳ３５のいずかの応答音声、応答音が選択されて出力される。例えばユーザ音声の長さがＴ１、Ｔ２、Ｔ３、Ｔ４、Ｔ５である場合には、各々、ＲＳ３１、ＲＳ３２、ＲＳ３３、ＲＳ３４、ＲＳ３５の応答音声、応答音が出力される。 When the volume of the user's voice input to the microphone 162 is LV3 (when LV3 is exceeded or LV2 to LV3), RS31, RS32, RS33, RS34, and so on according to the length of the user's voice. Any response voice and response tone of RS35 is selected and output. For example, when the lengths of the user voices are T1, T2, T3, T4, and T5, response voices and response sounds of RS31, RS32, RS33, RS34, and RS35 are output, respectively.

このような手法を採用すれば、例えば図９において、ユーザＰＬが入力した呼びかけの音声の音量、長さが異なると、それに応じて観客ＡＤＡ〜ＡＤＣの歓声、掛け声、拍手、口笛等が異なった音になる。 If such a method is adopted, for example, if the volume and length of the voice of the call inputted by the user PL differ in FIG. 9, for example, the cheers, screams, applauses, and whistles of the audience ADA to ADC differ accordingly. It becomes a sound.

例えば呼びかけの音声の音量が小さい場合には、歓声、掛け声が低い音声になったり、歓声、掛け声の音量が小さくなる。また拍手や口笛が混ざる割合も少なくなる。一方、呼びかけの音声の音量が大きい場合には、歓声、掛け声が高い音声（高周波成分が入った音）になったり、歓声、掛け声の音量が大きくなる。また拍手や口笛が混ざる割合も多くなる。 For example, when the volume of the voice of the call is small, the cheers and squeals become low voices, and the volume of the cheers and squeaks become smaller. Also, the ratio of clapping and whistling is reduced. On the other hand, when the volume of the calling voice is large, the cheering voice becomes a high voice (sound containing high frequency components) or the volume of the cheering voice or the cheering voice becomes large. In addition, the ratio of clapping and whistling also increases.

また呼びかけの音声の長さが短い場合には、歓声、掛け声の長さも短くなる。また拍手や口笛は混ざらなかったり、後半に少しだけ拍手や口笛の音がする。一方、呼びかけの音声の長さが長い場合には、歓声、掛け声の長さも長くなる。また歓声、掛け声の音の高さを上げたり、音量を上げてもよい。また呼びかけの音声の長さが長い場合には、多くの拍手や口笛が混ざるようになり、これに加えて足踏み音が混ざったり、アンコールの声などが混ざるようになる。 In addition, when the length of the voice of the call is short, the length of the cheers and the hail also becomes short. In addition, clapping and whistling do not mix, and in the second half, only a little clapping and whistling sounds. On the other hand, when the length of the voice of the call is long, the length of the cheers and the hail also becomes long. You may also want to raise the pitch of the cheers and urges, or raise the volume. In addition, when the length of the voice of the call is long, many applauses and whistles will be mixed, and in addition to this, footsteps will be mixed and an encore voice will be mixed.

このように本実施形態では、ユーザ音声の音量、長さに応じて、応答音声、応答音が異なった音になる。従って、あたかも本物の人間の観客が、歓声や拍手などを行っているかのような感覚を、ユーザに与えることができ、ユーザの仮想現実感を向上できる。即ち、観客の歓声や拍手による応答が、ワンパターンではなく、多様で変化に満ちているため、ロボットを相手にしているような感覚をユーザに与えることがなく、バーチャルリアリティへのユーザの没入度を増すことができる。 As described above, in the present embodiment, the response sound and the response sound have different sounds according to the volume and the length of the user voice. Therefore, it is possible to give the user a feeling as if a real human spectator is cheering or applauding, and the virtual reality of the user can be improved. That is, since the response from the spectators' cheers and applause is not one pattern, but varied and full of change, it does not give the user a sense of being opposed to the robot, and the user's degree of immersion in virtual reality Can be increased.

図１１（Ａ）、図１１（Ｂ）は、ユーザ音声の音量、長さに応じて、応答音声、応答音を異ならせ処理の具体例を示す図である。 FIGS. 11A and 11B are diagrams showing a specific example of the process of making the response sound and the response sound different according to the volume and length of the user voice.

図１１（Ａ）において、Ａ１はユーザ音声の波形である。そして図１１（Ａ）では、ユーザ音声の音量が、音量レベルＬＶ１（第１〜第Ｎの音量レベルのうちの第ｉの音量レベル）を越えたタイミングから、音量レベルＬＶ１を下回ったタイミングまでの長さＴＬ１を測定する。そして測定された長さＴＬ１に応じて、応答音声、応答音を異ならせる。例えば長さＴＬ１が長い場合と、短い場合とで、応答音声、応答音を異ならせる。 In FIG. 11A, A1 is a waveform of user voice. Then, in FIG. 11A, from the timing when the volume of the user's voice exceeds the volume level LV1 (the i'th volume level among the first to Nth volume levels), to the timing when the volume falls below the volume level LV1. Measure the length TL1. Then, the response sound and the response sound are made different according to the measured length TL1. For example, the response sound and the response sound are made different depending on whether the length TL1 is long or short.

またユーザ音声の音量が、音量レベルＬＶ２（第１〜第Ｎの音量レベルのうちの第ｊの音量レベル）を越えたタイミングから、音量レベルＬＶ２を下回ったタイミングまでの長さＴＬ２を測定する。そして測定された長さＴＬ２に応じて、応答音声、応答音を異ならせる。例えば長さＴＬ２が長い場合と、短い場合とで、応答音声、応答音を異ならせる。この場合にユーザ音声の音量が、音量レベルＬＶ１（第ｉの音量レベル）を越えた場合と、音量レベルＬＶ２（第ｊの音量レベル）を越えた場合とで、応答音声、応答音を異ならせる。 Further, the length TL2 from the timing when the volume of the user's voice exceeds the volume level LV2 (the jth volume level of the first to Nth volume levels) to the timing when the volume falls below the volume level LV2 is measured. Then, the response sound and the response sound are made different according to the measured length TL2. For example, the response sound and the response sound are made different between when the length TL2 is long and when the length TL2 is short. In this case, the response sound and the response sound are differentiated depending on whether the volume of the user voice exceeds the volume level LV1 (i-th volume level) and the volume level LV2 (j-th volume level). .

またユーザ音声の音量が、音量レベルＬＶ３を越えたタイミングから、音量レベルＬＶ３を下回ったタイミングまでの長さＴＬ３を測定する。そして測定された長さＴＬ３に応じて、応答音声、応答音を異ならせる。例えば長さＴＬ３が長い場合と、短い場合とで、応答音声、応答音を異ならせる。この場合にユーザ音声の音量が、音量レベルＬＶ１、ＬＶ２を越えた場合と、音量レベルＬＶ３を越えた場合とで、応答音声、応答音を異ならせる。 Also, the length TL3 from the timing when the volume of the user voice exceeds the volume level LV3 to the timing when the volume falls below the volume level LV3 is measured. Then, the response sound and the response sound are made different according to the measured length TL3. For example, the response sound and the response sound are made different between when the length TL3 is long and when the length TL3 is short. In this case, the response sound and the response sound are made different depending on whether the volume of the user's voice exceeds the volume levels LV1 and LV2 or LV3.

図１１（Ｂ）では、図１１（Ａ）とはユーザ音声の信号波形が異なっている。例えば音量レベルＬＶ１、ＬＶ２、ＬＶ３での音声の長さＴＬ１、ＴＬ２、ＴＬ３が、図１１（Ａ）と図１１（Ｂ）とでは異なっている。例えば図１１（Ｂ）では、音量レベルＬＶ３での長さＴＬ３が図１１（Ａ）よりもかなり短くなっている。従って、出力される応答音声、応答音も異なった音になる。図１１（Ｂ）のように後ろ盛り上がりタイプのユーザ音声が入力されると、その音量が音量レベルＬＶ３に到達するまでの時間が遅れるため、それに対応する応答音声、応答音も遅れるようになる。 In FIG. 11B, the signal waveform of the user voice is different from that of FIG. For example, the sound lengths TL1, TL2, and TL3 at the volume levels LV1, LV2, and LV3 are different between FIG. 11 (A) and FIG. 11 (B). For example, in FIG. 11 (B), the length TL3 at the volume level LV3 is considerably shorter than that of FIG. 11 (A). Therefore, the response sound and the response sound to be output also have different sounds. As shown in FIG. 11 (B), when a back rising type user voice is input, the time for the volume to reach the volume level LV3 is delayed, so that the corresponding response voice and response voice are also delayed.

例えば図１１（Ａ）、図１１（Ｂ）において、ユーザ音声の音量が音量レベルＬＶ１の場合には、歓声、掛け声は低い声となり、歓声、掛け声の音量も比較的小さくなる。そして長さＴＬ１が長くなると、例えば後半に拍手等が混ざるようになる。 For example, in FIGS. 11 (A) and 11 (B), when the volume of the user's voice is the volume level LV1, the cheering and screeching become low, and the volume of the cheering and screeching becomes relatively small. When the length TL1 is increased, for example, applause etc. will be mixed in the second half.

またユーザ音声の音量が音量レベルＬＶ２の場合には、歓声、掛け声は、音量レベルＬＶ１の場合に比べて、比較的高い音（高周波成分が混ざった音）になると共に音量も大きくなる。そして長さＴＬ２が長くなると、例えば後半に拍手に加えて口笛などが混ざるようになる。 Further, when the volume of the user's voice is the volume level LV2, cheers and cheers become relatively high sounds (sounds in which high frequency components are mixed) and the volume becomes larger as compared with the case of the volume level LV1. Then, when the length TL2 becomes longer, for example, in the second half, in addition to the applause, the whistling becomes mixed.

またユーザ音声の音量が音量レベルＬＶ３の場合には、歓声、掛け声は、音量レベルＬＶ１、ＬＶ２の場合に比べて、更に高い音（高周波成分が更に混ざった音）になると共に、音量も更に大きくなる。そして長さＴＬ３が長くなると、例えば後半に拍手、口笛に加えて、足踏み音やアンコールの声が混ざるようになる。 In the case where the volume of the user's voice is the volume level LV3, the cheers and cheers become a higher sound (a sound in which high frequency components are further mixed) and the volume is even larger than in the case of the volume levels LV1 and LV2. Become. When the length TL3 is increased, for example, in the second half, in addition to clapping and whistling, footsteps and encore voices become mixed.

例えばユーザ音声の音量が大きい場合には、ボーカルであるユーザは、観客を盛り上げるために大きな声で呼びかけており、観客の歓声等もその熱意に応えるように、より高周波成分の混ざった音になったり、音量も大きくなる。またユーザ音声の音の長さが長い場合には、ボーカルであるユーザが、何らかのメッセージを伝えようとしていると考えられ、それに応えるように観客が拍手をしたり、口笛を吹いたり、足踏みを行う。こうすることで、ユーザと観客との間で、声によるやり取りや駆け引きが生まれ、あたかも本物の人間の観客を相手に演奏を行っているかのような仮想現実感を、ユーザに与えることが可能になる。 For example, when the volume of the user's voice is large, the user who is the vocal is calling out with a loud voice to excite the audience, and the cheers of the audience also become a mixed sound of high frequency components so as to respond to the enthusiasm. The volume also increases. Also, if the length of the voice of the user voice is long, it is considered that the user who is vocal is trying to transmit some kind of message, and the audience applauds, whistling or stepping in response to it. . In this way, voice communication and bargaining are created between the user and the audience, and it is possible to give the user a virtual reality as if they were playing against the real human audience. Become.

なお音量レベルＬＶ２を越えた場合には、音量レベルＬＶ１用、ＬＶ２用の応答音声、応答音の両方を出力するようにしてもよい。同様に音量レベルＬＶ３を越えた場合には、音量レベルＬＶ１、ＬＶ２、ＬＶ３用の応答音声、応答音の全てを出力するようにしてもよい。 When the sound volume level LV2 is exceeded, both the response sound for the sound volume level LV1 and the response sound for the LV2 may be output. Similarly, when the volume level LV3 is exceeded, all of the response sounds for the volume levels LV1, LV2, LV3 and the response sound may be output.

図１２は応答音声、応答音の出力タイミングについて説明する図である。例えば図１２ではユーザ音声の音量が音量レベルＬＶ１を越えている。この場合には所与の時間ＴＤ１を経過後に、音量レベルＬＶ１及び長さＴＬ１に応じた応答音声、応答音であるＲＳ＝ＲＳ（ＬＶ１、ＴＬ１）を出力する。時間ＴＤ１は例えば１秒程度の時間である。 FIG. 12 is a diagram for explaining the output timing of the response sound and the response sound. For example, in FIG. 12, the volume of the user's voice exceeds the volume level LV1. In this case, after a given time TD1, a response sound according to the volume level LV1 and the length TL1, RS = RS (LV1, TL1) which is a response sound, is output. The time TD1 is, for example, about one second.

即ち、ユーザからの呼びかけに対して、観客が直ぐに歓声等の応答音声、応答音を返してしまうと、あたかもロボットが歓声等を行っているように聞こえてしまう。そこで図１２のように、ユーザからの呼びかけから遅延時間であるＴＤ１の経過後に、観客が応答するようにしている。このように、応答に間を持たせることで、あたかも本物の人間の観客が応答しているかのような感覚を、ユーザに与えることができ、仮想現実感の向上を図れる。 That is, when the spectator immediately returns a response voice such as cheers or a response sound in response to a call from the user, it sounds as if the robot is cheering or the like. Therefore, as shown in FIG. 12, the audience responds after the lapse of time TD1, which is the delay time, from the user's call. In this way, by giving a response to the response, it is possible to give the user a feeling as if a real human audience is responding, and virtual reality can be improved.

なお、ユーザ音声の音量、長さに応じて、応答音声、応答音を異ならせる処理としては種々の変形実施が可能である。 It should be noted that various modifications can be made as processing for differentiating the response sound and the response sound according to the volume and length of the user voice.

例えば図１１（Ａ）においてユーザ音声の音量が音量レベルＬＶ１である場合には、この音量レベルＬＶ１に対応する音データのファイルを選択する。そして、この音データの再生時間を、図１１（Ａ）の長さＴＬ１に応じて変化させてもよい。例えば音データによる再生音を、長さＴＬ１に応じた時間が経過したタイミングで、フェイドアウトする。例えば長さＴＬ１が短い場合には、音データの再生音がフェイドアウトするタイミングを早くし、長さＴＬ１が長い場合には、フェイドアウトするタイミングを遅くする。同様に、ユーザ音声の音量が音量レベルＬＶ２、ＬＶ３である場合にも、音量レベルＬＶ２、ＬＶ３に対応する音データのファイルを選択し、長さＴＬ２、ＴＬ３に応じて、当該音データのファイルの再生時間を変化させる。即ち、長さＴＬ２、ＴＬ３に応じた長さでフェイドアウトするようにする。 For example, when the volume of the user's voice is the volume level LV1 in FIG. 11A, the file of sound data corresponding to the volume level LV1 is selected. Then, the reproduction time of this sound data may be changed according to the length TL1 of FIG. For example, the reproduction sound based on the sound data is faded out at a timing when a time corresponding to the length TL1 has elapsed. For example, when the length TL1 is short, the timing at which the reproduced sound of the sound data is faded out is advanced, and when the length TL1 is long, the timing at which the faded out is delayed. Similarly, even when the volume of the user's voice is at the volume levels LV2 and LV3, the file of sound data corresponding to the volume levels LV2 and LV3 is selected, and the files of the sound data are selected according to the lengths TL2 and TL3. Change playback time. That is, it fades out at a length corresponding to the lengths TL2 and TL3.

或いは、ユーザ音声の長さ（ＴＬ１、ＴＬ２、ＴＬ３）に基づいて、マイク１６２を使って喋っているユーザの話の内容を推測し、それに応じた応答音声、応答音の音データのファイルを差し替えるようにしてもよい。例えばユーザ音声の長さに応じた複数個（例えば５個）の音データのファイルを用意しておき、ユーザ音声の長さに応じて、使用する音データのファイルを差し替える。 Alternatively, based on the length of the user voice (TL1, TL2, TL3), use the microphone 162 to infer the content of the user's talk and replace the file of the response voice and the sound data of the response tone accordingly. You may do so. For example, a plurality of (for example, five) sound data files corresponding to the length of the user voice are prepared, and the sound data files to be used are replaced according to the length of the user voice.

例えばユーザ音声の長さが極めて短い場合（例えば０．１秒程度）には、ユーザが、あまり意味のない呼びかけ（掛け声）を行っていると推測し、観客の応答もそれに対応する応答にする。例えば観客が短い歓声で応えるだけにする。一方、ユーザ音声の長さが長い場合（例えば１秒〜２秒以上）には、ユーザが、何か意味のある言葉を観客に話しかけていると推測する。そして観客の応答に笑い声を混ぜたり、ユーザの話しかけに賛同するような声を混ぜる。例えばユーザが静かな口調で長い時間、話している場合（ユーザ音声の音量が小さく長い音である場合）には、例えば「次のアルバムが出ます」などの話をしていると推測し、観客に喜んでいる感じの応答を行わせる。このようにすることで、より人間味の溢れる観客の応答が可能になる。 For example, if the length of the user's voice is very short (for example, about 0.1 second), it is presumed that the user is making a meaningful call (a hail) and the audience response is also a corresponding response. . For example, the audience only responds with short cheers. On the other hand, when the length of the user's voice is long (for example, 1 to 2 seconds or more), it is inferred that the user speaks a meaningful word to the audience. Then, laughter is mixed with the audience response, and a voice that agrees with the user's speech is mixed. For example, if the user is speaking in a quiet tone for a long time (when the volume of the user's voice is small and long), it is inferred that, for example, "the next album comes out" Make the audience respond with a happy feeling. By doing this, it is possible to respond to a more human-like audience.

また本実施形態では、ユーザ音声の入力に対する応答音声、応答音は、ゲームにおけるユーザのターゲットの音声、音として出力される。具体的にはゲームに登場する観客のキャラクタの音声、音として出力される。この場合に本実施形態では、観客（ターゲット）の種類、観客とユーザとの位置関係、或いは観客に対するユーザの視線方向に応じて、応答音声、応答音を変化させる。 Further, in the present embodiment, the response voice to the input of the user voice and the response tone are output as the target voice and sound of the user in the game. Specifically, it is output as the voice and sound of the spectator character appearing in the game. In this case, in the present embodiment, the response voice and the response sound are changed according to the type of audience (target), the positional relationship between the audience and the user, or the direction of the user's line of sight with respect to the audience.

例えば図１３（Ａ）では、ユーザＰＬは、男性の観客ＡＤＸに対して呼びかけを行っている。この場合に観客ＡＤＸの応答音声、応答音は、男性の観客用の応答音声、応答音になる。例えば男性が通常に使用する種類の歓声（例えば「オー」）が、男性の声で返ってくる。一方、図１３（Ｂ）では、ユーザＰＬは、女性の観客ＡＤＹに対して呼びかけを行っている。この場合に観客ＡＤＹの応答音声、応答音は、女性の観客用の応答音声、応答音になる。例えば女性が通常に使用する種類の歓声（例えば「キャー」）が、女性の声で返ってくる。 For example, in FIG. 13A, the user PL is calling for a male audience ADX. In this case, the response sound of the audience ADX, the response sound becomes the response sound for the male audience, the response sound. For example, the kind of cheering that men normally use (for example, "O") comes back in male voice. On the other hand, in FIG. 13 (B), the user PL has made a call to the female audience ADY. In this case, the response voice of the audience ADY and the response tone become the response voice and response tone for the female audience. For example, the kind of cheering that women normally use (for example, "Kah") comes back in the female voice.

このように本実施形態では、観客（ターゲット）の種類に応じて、応答音声、応答音が変化する。例えば観客が男性であるか女性であるかに応じて、応答音声、応答音が変化する。この場合に観客（ターゲット）の種類としては種々のものを想定でき、例えば観客が大人であるか子供であるかに応じて、応答音声、応答音が変化させてもよい。 As described above, in the present embodiment, the response sound and the response sound change in accordance with the type of audience (target). For example, depending on whether the audience is male or female, the response sound and the response sound change. In this case, various types of audiences (targets) can be assumed. For example, the response sound and the response sound may be changed according to whether the audience is an adult or a child.

また図１４（Ａ）では、ユーザＰＬ（仮想ユーザ）と観客ＡＤＡ〜ＡＤＣとの距離が遠い。この場合には観客ＡＤＡ〜ＡＤＣの歓声、拍手等の応答音声、応答音の音量を小さくする。一方、図１４（Ｂ）では、ユーザＰＬと観客ＡＤＡ〜ＡＤＣとの距離が近い。この場合には観客ＡＤＡ〜ＡＤＣの歓声、拍手等の応答音声、応答音の音量を大きくする。 Further, in FIG. 14A, the distance between the user PL (virtual user) and the audiences ADA to ADC is long. In this case, the volume of the response sound of the cheers of the audience ADA to ADC, the applause, etc., and the response sound is reduced. On the other hand, in FIG. 14B, the distance between the user PL and the audience ADA to ADC is short. In this case, the loudness of response sounds such as cheers and applauses of the audience ADA to ADC and the response sound is increased.

このように本実施形態では、観客（ターゲット）とユーザとの位置関係に応じて、応答音声、応答音を変化させている。即ち、図１４（Ａ）での観客ＡＤＡ〜ＡＤＣとユーザＰＬの位置関係と、図１４（Ｂ）での観客ＡＤＡ〜ＡＤＣとユーザＰＬの位置関係は異なるため、応答音声、応答音の音量が変化している。この場合に応答音声、応答音の音の種類、音の高さ又は音色等を、位置関係に応じて変化させてもよい。 As described above, in the present embodiment, the response sound and the response sound are changed according to the positional relationship between the audience (target) and the user. That is, since the positional relationship between the audience ADA to ADC and the user PL in FIG. 14A and the positional relationship between the audience ADA to ADC and the user PL in FIG. 14B are different, the volume of the response sound and the response sound is different. It is changing. In this case, the response sound, the type of the sound of the response sound, the pitch or timbre of the sound may be changed according to the positional relationship.

また図１５では、ユーザＰＬの視線ＶＬＰが、正面側の観客ＡＤＡ〜ＡＤＣの方を向いており、右側の観客ＡＤＤ〜ＡＤＦの方には向いていない。このような状態で、ユーザＰＬが呼びかけの音声を入力すると、ユーザＰＬが視線ＶＬＰを向けている観客ＡＤＡ〜ＡＤＣの応答音声、応答音の音量は大きくなる。一方、ユーザＰＬが視線ＶＬＰを向けていない観客ＡＤＤ〜ＡＤＦの応答音声、応答音の音量は小さくなる。 Further, in FIG. 15, the line of sight VLP of the user PL is directed to the audience ADA to ADC on the front side, and is not directed to the audience ADD to ADF on the right. In such a state, when the user PL inputs the voice of the call, the volume of the response sound and the response sound of the audience ADA to ADC to which the user PL points the sight line VLP becomes large. On the other hand, the volume of the response sound and the response sound of the spectators ADD to ADF in which the user PL is not pointing the line of sight VLP is reduced.

このように本実施形態では、観客（ターゲット）に対するユーザの視線方向に応じて、応答音声、応答音を変化させている。 As described above, in the present embodiment, the response sound and the response sound are changed according to the direction of the user's line of sight with respect to the audience (target).

なお、図１５では、観客に対するユーザの視線方向に応じて、応答音声、応答音の音量を変化させているが、応答音声、応答音の音の種類、音の高さ又は音色等を変化させてもよい。 In FIG. 15, the response sound and the volume of the response sound are changed according to the direction of the user's line of sight with respect to the audience, but the response sound, the type of the sound of the response sound, and the pitch or timbre of the sound are changed. May be

またターゲットは、仮想空間（オブジェクト空間）における場所を表すものであってもよい。例えばターゲットは、音楽ゲームやスポーツゲームにおける観客席の各席などの場所（スポット）であってもよい。この場合には、例えば席などの場所に関連づけて属性データ（制御データ）を記憶しておく。そして、熱狂的なファンの席（例えば最前列の席）に対しては第１の属性データ、ティーンエージャーなどの若者でノリが良い客の席（例えば二階席）には第２の属性データ、業界関係者用で殆ど反応が期待できない席には第３の属性データを関連づけておく。そして、ユーザの視線方向が、これらの各席の方を向いていると判断した場合には、その席に関連づけられた属性データに基づいて、応答音声、応答音を変化させて出力する。例えば、熱狂的なファンの席（第１の属性データ）の方にユーザの視線方向が向いていた場合には、熱狂的な歓声、拍手等の応答音声、応答音を出力する。ノリが良い客の席（第２の属性データ）の方に視線方向が向いていた場合には、ノリの良い歓声、拍手等の応答音声、応答音を出力する。 The target may also represent a place in a virtual space (object space). For example, the target may be a spot such as each seat of a spectator seat in a music game or a sports game. In this case, attribute data (control data) is stored in association with, for example, a place such as a seat. The first attribute data is for enthusiastic fan seats (for example, the front row seats), and the second attribute data is for the seats of young people such as teenagers (for example, the second floor seats), The third attribute data is associated with the seat for parties concerned that can hardly expect any reaction. Then, when it is determined that the user's gaze direction is directed to these seats, the response voice and the response sound are changed and output based on the attribute data associated with the seat. For example, when the gaze direction of the user is directed toward the enthusiastic fan's seat (first attribute data), response voices such as enthusiastic cheers and applause, and response sounds are output. When the line of sight is directed to the seat (second attribute data) of the customer who has a good loin, a response voice such as a good cheer or applause and a response sound are output.

また図１６（Ａ）では、マイク１６２に入力されたユーザ音声の特徴量の解析処理が行われている。即ち、ユーザ音声の特徴を検出している。そして本実施形態では、ユーザ音声の特徴量の解析処理の結果に応じて応答音声、応答音を変化させる。即ち、ユーザ音声の特徴（性別、年齢層等）に応じて、応答音声、応答音を変化させる。特徴量の解析処理は、ユーザ音声の音の大きさ、高さ、音色又は種類に関する特徴量を抽出して解析する処理である。これは、例えば周波数解析処理により音響スペクトルなどを求めることで実現される。 Further, in FIG. 16A, analysis processing of the feature amount of the user's voice input to the microphone 162 is performed. That is, the feature of the user voice is detected. In the present embodiment, the response sound and the response sound are changed according to the result of the analysis process of the feature amount of the user sound. That is, the response sound and the response sound are changed according to the feature (sex, age group, etc.) of the user sound. The feature amount analysis process is a process of extracting and analyzing feature amounts related to the size, height, timbre, or type of the sound of the user's voice. This is realized, for example, by obtaining an acoustic spectrum or the like by frequency analysis processing.

そして例えば図１６（Ａ）の特徴量解析により、ユーザの性別が男性であると判別されたとする。この場合には図１６（Ｂ）に示すように、男性の歓声の比率がＲＡとなり、女性の歓声の比率がＲＣとなるように、男性の声と女性の声が混ざった歓声を、観客の応答音声、応答音として出力する。ユーザの性別が男性であるため、例えば男性の歓声の比率ＲＡを低くし、女性の歓声の比率ＲＣを高くする。 Then, for example, it is assumed that the gender of the user is determined to be male by the feature amount analysis of FIG. 16 (A). In this case, as shown in FIG. 16 (B), the spectators' cheers are a mixture of male and female voices such that the ratio of males' cheers is RA and the ratio of females' cheers is RC. Response voice, output as response tone Since the gender of the user is male, for example, the ratio RA of male cheers is lowered and the ratio RC of female cheers is increased.

また図１６（Ａ）の特徴量解析により、ユーザの性別が女性であると判別されたとする。この場合には図１６（Ｂ）に示すように、男性の歓声の比率がＲＢとなり、女性の歓声の比率がＲＤとなるように、男性の声と女性の声が混ざった歓声を、観客の応答音声、応答音として出力する。ユーザの性別が女性であるため、例えば男性の歓声の比率ＲＢを高くし、女性の歓声の比率ＲＤを低くする。 Further, it is assumed that the gender of the user is determined to be female by the feature amount analysis of FIG. In this case, as shown in FIG. 16 (B), the spectators' cheers are a mixture of male and female voices such that the ratio of males' cheers is RB and the ratio of females' cheers is RD. Response voice, output as response tone Since the gender of the user is female, for example, the ratio RB of male cheers is increased, and the ratio RD of female cheers is decreased.

例えば図１６（Ａ）の特徴量解析により、ユーザの年齢層を判別してもよい。そしてユーザの年齢層が若年層である場合には、若年層用の歓声を、観客の応答音声、応答音として出力する。一方、ユーザの年齢層が高齢層である場合には、高齢層用の歓声を、観客の応答音声、応答音として出力する。 For example, the age group of the user may be determined by the feature amount analysis of FIG. Then, when the age group of the user is a young group, the cheers for the young group are output as the response sound and the response sound of the audience. On the other hand, when the user's age group is the elderly, the cheers for the elderly are output as the audience's response sound and response sound.

このようにユーザの音声の特徴を解析し、それに応じた応答音声、応答音を出力すれば、観客の歓声等に、ユーザの性別、年齢層等の特徴が反映されるようになるため、ユーザの仮想現実感を更に向上できる。なお、ユーザ音声の特徴を抽出し、その特徴に応じた加工処理を応答音声、応答音に施すようにしてもよい。例えばユーザ音声における母音の並びだけを抽出し、その母音の並びと同じような母音の並びとなるように、歓声等の応答音声、応答音を加工する。即ち、いわゆるボコーダー（ｖｏｃｏｄｅｒ）のような加工処理を行う。このようにすれば、ユーザ音声の特徴を、より直接的に歓声等に反映させることが可能になる。 In this way, analyzing the characteristics of the user's voice and outputting the response voice and the response sound according to it analyzes the characteristics of the user's gender, age group, etc. in the cheers of the audience, etc. The virtual reality of can be further improved. Note that the feature of the user voice may be extracted, and the processing corresponding to the feature may be applied to the response voice and the response tone. For example, only the sequence of vowels in the user's voice is extracted, and response voices such as cheers and response sounds are processed so as to be a sequence of vowels similar to the sequence of vowels. That is, processing such as a so-called vocoder is performed. In this way, it is possible to more directly reflect the characteristics of the user's voice on cheers and the like.

また本実施形態では、ユーザが手に持つマイク１６２の位置、方向（或いは手の位置、方向）、ユーザが装着するＨＭＤ２００の位置、方向、ユーザの姿勢、或いはユーザの視線などに応じて、応答音声、応答音を変化させる処理を行ったり、或いは応答音声、応答音を出力する処理を行ったり、或いは応答音声、応答音を出力させるためのユーザ音声の入力を受け付ける処理を行うようにしてもよい。 Further, in the present embodiment, the response is made according to the position and direction (or the position and direction of the hand) of the microphone 162 held by the user, the position and direction of the HMD 200 worn by the user, the posture of the user, or the line of sight of the user. It is possible to perform processing to change voice and response sound, or to perform processing to output response voice and response sound, or to receive input of user voice for outputting response voice and response sound. Good.

例えば本実施形態のゲームでは、図６、図７で説明したように、ユーザＰＬが観客に対して様々なアクションを行い、そのアクションの結果が評価される。例えば図１７ではユーザＰＬは、視線ＶＬＰを観客ＡＤの方に向けると共に、マイク１６２を観客ＡＤの方に向けるアクションを行っている。そしてマイク１６２に向かって歓声や掛け声を出すように促している。ユーザＰＬがこのようなアクションを行うことで、観客ＡＤの熱狂度パラメータ（盛り上がり度パラメータ）が上昇して、ユーザＰＬのパフォーマンスが高く評価されるようになる。 For example, in the game according to the present embodiment, as described with reference to FIGS. 6 and 7, the user PL performs various actions on the audience, and the result of the action is evaluated. For example, in FIG. 17, the user PL performs an action of directing the line of sight VLP toward the audience AD and directing the microphone 162 toward the audience AD. Then, the microphone 162 is urged to make cheers and screams. When the user PL performs such an action, the enthusiasticness parameter (excitation degree parameter) of the audience AD rises, and the performance of the user PL is highly evaluated.

例えば図１７において、マイク１６２、ＨＭＤ２００の位置、方向（位置及び方向の少なくとも一方）は、図２（Ａ）、図２（Ｂ）等で説明したマイク１６２、ＨＭＤ２００のトラッキング処理等により検出できる。例えば図２（Ａ）において入力装置１６０−１の位置、方向を検出することで、マイク１６２の位置、方向を検出できる。なおマイク１６２を持つユーザＰＬの手の位置、方向を検出して、マイク１６２の位置、方向を検出してもよい。またＨＭＤ２００の位置、方向を検出することで、図１７の視線ＶＬＰの方向や、ユーザの視線ＶＬＰの注視対象などを検出できる。 For example, in FIG. 17, the position and direction (at least one of position and direction) of the microphone 162 and the HMD 200 can be detected by the tracking process or the like of the microphone 162 and the HMD 200 described in FIG. For example, the position and direction of the microphone 162 can be detected by detecting the position and direction of the input device 160-1 in FIG. 2A. The position and the direction of the user PL holding the microphone 162 may be detected to detect the position and the direction of the microphone 162. Further, by detecting the position and direction of the HMD 200, it is possible to detect the direction of the line of sight VLP in FIG. 17 and the gaze target of the line of sight VLP of the user.

ユーザＰＬの姿勢は、図２（Ａ）のＨＭＤ２００の第１の位置と、入力装置１６０−１の第２の位置と、入力装置の１６０−２の第３の位置に基づいて検出できる。即ち、これらの第１、第２、第３の位置に基づいて、ユーザの概略的な姿勢の状態を検出できる。なお、ユーザの姿勢を検出するモーションセンサを設けて、当該モーションセンサに基づいてユーザの姿勢を検出してもよい。例えばモーションセンサである撮像センサ（カラー画像センサ、デプスセンサ等）により取得されたカラー画像とデプス情報に基づいて、ユーザのスケルトンの形状を推定して、ユーザの姿勢を検出してもよい。 The posture of the user PL can be detected based on the first position of the HMD 200 of FIG. 2A, the second position of the input device 160-1, and the third position of the input device 160-2. That is, based on these first, second and third positions, it is possible to detect the general posture state of the user. A motion sensor that detects the posture of the user may be provided, and the posture of the user may be detected based on the motion sensor. For example, the posture of the user may be detected by estimating the shape of the skeleton of the user based on a color image and depth information acquired by an imaging sensor (a color image sensor, a depth sensor or the like) which is a motion sensor.

そして図１７において、ユーザＰＬが視線ＶＬＰ（ＨＭＤ２００の視線）を観客ＡＤに向けて注視しているか否かに応じて、観客ＡＤの歓声、拍手等の応答音声、応答音を変化させる。またユーザＰＬが、マイク１６２の方向（指向方向）を観客ＡＤに向けているか否かに応じて、観客ＡＤの歓声、拍手等の応答音声、応答音を変化させる。或いは、図１７に示すユーザＰＬのしゃがみ姿勢などのパフォーマンスのための姿勢をとったか否かに応じて、観客ＡＤの歓声、拍手等の応答音声、応答音を変化させる。 Then, in FIG. 17, depending on whether or not the user PL is gazing toward the spectator AD with respect to the eye gaze VLP (the line of sight of the HMD 200), the response sound such as cheers and applause of the spectator AD is changed. Further, depending on whether or not the user PL directs the direction (directed direction) of the microphone 162 to the spectator AD, the response sound such as cheers and applause of the spectator AD and the response sound are changed. Alternatively, depending on whether or not the posture for performance such as the crouching posture of the user PL shown in FIG. 17 is taken, the response sound of the cheers of the audience AD, the applause, etc., and the response sound are changed.

例えば、ユーザＰＬが呼びかけなどの音声をマイク１６２に入力した後に、視線ＶＬＰを観客ＡＤの方に向けたり、マイク１６２を観客ＡＤの方に向けたり、パフォーマンスのための姿勢をとった場合には、観客ＡＤの歓声、掛け声、足踏み音又は拍手等の応答音声、応答音を、より熱狂的に盛り上がった状態を表す応答音声、応答音に変化させる。このようにすれば、ユーザＰＬの視線ＶＬＰの方向や、マイク１６２を向けた方向や、姿勢などに応じて、観客の応答音声、応答音がインタラクティブに変化するようになるため、ユーザＰＬの仮想現実感を更に向上できるようになる。 For example, if the user PL inputs a voice such as a call to the microphone 162, then the gaze VLP is directed to the audience AD, the microphone 162 is directed to the audience AD, or a posture for performance is taken. , A response voice such as cheers of a spectator AD, a screeching, a stepping sound or a clap, a response sound is changed into a response sound representing a more enthusiastically excited state, a response sound. In this way, the response sound and the response sound of the audience can be interactively changed according to the direction of the line of sight VLP of the user PL, the direction in which the microphone 162 is directed, the posture, etc. You will be able to improve your sense of reality.

また例えば図１２では、ユーザ音声の音量が音量レベルＬＶ１を越えてから所与の時間ＴＤ１の経過後に、応答音声、応答音を出力する制御について説明したが、本実施形態はこれに限定されない。例えば応答音声、応答音の出力制御（出力タイミング制御等）を、マイク１６２やＨＭＤ２００の位置や方向、ユーザの姿勢、或いはユーザの視線に基づいて行うようにしてもよい。例えば図１７においてユーザＰＬが呼びかけなどの音声をマイク１６２に入力した後、マイク１６２を観客ＡＤの方に向けたり、ＨＭＤ２００の方向（ユーザの視線）を観客ＡＤの方に向けたり、或いはユーザが所定の姿勢をとった場合に、観客ＡＤによる歓声、拍手等の応答音声、応答音を出力するようにしてもよい。 Further, for example, although FIG. 12 illustrates control of outputting a response sound and a response sound after a given time TD1 has passed since the volume of the user voice exceeds the volume level LV1, this embodiment is not limited to this. For example, response control of the response sound or the response sound (output timing control or the like) may be performed based on the position or direction of the microphone 162 or the HMD 200, the posture of the user, or the line of sight of the user. For example, after the user PL inputs a voice such as a call to the microphone 162 in FIG. 17, the microphone 162 is directed toward the audience AD, the direction of the HMD 200 (user's gaze) towards the audience AD, or the user When a predetermined posture is taken, a response voice such as cheers and applause by the audience AD or a response sound may be output.

或いは、ユーザ音声の入力を受け付けるか否かを、マイク１６２、ＨＭＤ２００の位置や方向、ユーザの姿勢、或いはユーザの視線などに基づいて判断してもよい。 Alternatively, it may be determined based on the position and direction of the microphone 162 and the HMD 200, the posture of the user, the line of sight of the user, or the like whether or not to receive the input of the user's voice.

例えば図２（Ａ）において、ＨＭＤ２００の位置は、ユーザＰＬの頭部（口）の位置に対応していると考えることができる。従って、マイク１６２とＨＭＤ２００（頭部、口）の距離が近いと判断されたり、マイク１６２の方向（指向方向）がＨＭＤ２００（頭部、口）の方を向いていると判断された場合には、ユーザＰＬがマイク１６２に対して呼びかけなどの音声を入力しようとしていると判断できる。従って、この場合には、ユーザ音声の入力を受け付ける。即ちマイク入力（マイクの入力チャンネル）をオンにする。一方、マイク１６２とＨＭＤ２００の距離が遠い場合や、マイク１６２の方向（指向方向）がＨＭＤ２００の方を向いていない場合には、ユーザＰＬには、呼びかけなどの音声を入力する意思が無いと判断できる。従って、この場合には、ユーザ音声の入力を受け付けず、マイク入力をオフにする。こうようにマイク入力をオフにすることで、マイク１６２の方向（指向方向）が実世界のスピーカの方に向いてしまい、ハウリングなどが発生するのも防止できるようになる
また本実施形態では、ユーザの過去のプレイ履歴の情報に基づいて、応答音声、応答音を変化させてもよい。例えば図１８（Ａ）にプレイ履歴の情報の一例を示す。このプレイ履歴では、ユーザが図６、図７のゲームをプレイした年月日及び時間が記録されている。このプレイ履歴の情報は、図１の記憶部１７０に保存した後、ネットワークを介してサーバシステムに送信することが望ましい。 For example, in FIG. 2A, the position of the HMD 200 can be considered to correspond to the position of the head (mouth) of the user PL. Therefore, when it is determined that the distance between the microphone 162 and the HMD 200 (head and mouth) is short, or it is determined that the direction (directing direction) of the microphone 162 is directed to the HMD 200 (head and mouth). It can be determined that the user PL is trying to input a voice such as a call to the microphone 162. Therefore, in this case, the input of the user's voice is accepted. That is, the microphone input (the input channel of the microphone) is turned on. On the other hand, when the distance between the microphone 162 and the HMD 200 is long, or when the direction (the pointing direction) of the microphone 162 is not directed to the HMD 200, it is determined that the user PL has no intention to input a voice such as a call. it can. Therefore, in this case, the input of the user voice is not accepted, and the microphone input is turned off. As described above, by turning off the microphone input, the direction of the microphone 162 (orientation direction) is directed to the speaker in the real world, and it is possible to prevent howling or the like from occurring. The response sound and the response sound may be changed based on the information of the user's past play history. For example, FIG. 18A shows an example of the information of the play history. In this play history, the date and time when the user played the game of FIGS. 6 and 7 are recorded. It is desirable that the information of the play history is stored in the storage unit 170 of FIG. 1 and then transmitted to the server system via the network.

そしてユーザが、例えばユーザ情報等が記憶されたＩＣカード（図１の携帯情報記憶媒体１９５）を所持して、シミュレーションシステムのカードリーダなどにタッチ操作すると、ユーザのプレイ履歴の情報がサーバシステムから転送され、図１の記憶部１７０に記憶される。そして、このプレイ履歴の情報を記憶部１７０から読み出し、読み出されたプレイ履歴の情報に基づいて、応答音声、応答音を変化させる処理を行う。 Then, when the user carries, for example, an IC card (portable information storage medium 195 in FIG. 1) in which user information and the like are stored and performs touch operation on a card reader of the simulation system, information on the user's play history is sent from the server system It is transferred and stored in the storage unit 170 of FIG. Then, the information of the play history is read from the storage unit 170, and the response sound and the response sound are changed based on the read information of the play history.

例えばユーザが、当該シミュレーションシステムでのゲームを、長らくプレイしていないと、プレイ履歴の情報に基づき判断された場合には、図１８（Ｂ）のような応答音声ファイルを選択する。そして観客の歓声の中に、「久しぶり」、「待ってたよ」などの音声がミックスされるように、応答音声、応答音の加工処理を行う。このようにすれば、ユーザの過去のプレイ履歴に応じて、観客の応答音声、応答音が様々に変化するようになり、より人間身が感じられる応答音声、応答音を出力できるようになる。 For example, if it is judged based on the information of the play history that the user has not played the game in the simulation system for a long time, the response voice file as shown in FIG. 18B is selected. Then, the processing of the response sound and the response sound is performed so that the voices such as "After a long time" and "I waited" are mixed in the cheers of the audience. In this way, the response sound of the spectator and the response sound change in various ways according to the user's past play history, and it is possible to output the response sound and the response sound that make the human body feel more.

図１９（Ａ）は音データ記憶部１７５についての説明図である。図１９（Ａ）に示すように音データ記憶部１７５は、複数の音データＳＤ１、ＳＤ２、ＳＤ３、ＳＤ４・・・を記憶している。そして本実施形態では、音データ記憶部１７５に記憶される複数の音データ（音データファイル）の中から使用する音データを選択する処理を行うことで、応答音声又は応答音の出力処理を実現している。例えばユーザ音声の測定結果に基づいて、複数の音データの中から、測定結果に対応する音データを選択する。例えばユーザ音声の音量や長さが測定された場合に、それに応じた音データを選択し、選択された音データに基づく再生処理を行って、応答音声、応答音として出力する。或いは、観客等のターゲットの種類、ターゲットとユーザとの位置関係、ターゲットに対するユーザの視線方向、ユーザ音声の特徴量の解析結果、マイク１６２の位置、方向、ユーザの姿勢、或いはユーザの視線などに応じて、選択する音データを異ならせてもよい。 FIG. 19A is an explanatory diagram of the sound data storage unit 175. As shown in FIG. As shown in FIG. 19A, the sound data storage unit 175 stores a plurality of sound data SD1, SD2, SD3, SD4. In the present embodiment, processing for selecting sound data to be used from a plurality of sound data (sound data files) stored in the sound data storage unit 175 realizes output processing of a response sound or a response sound. doing. For example, sound data corresponding to the measurement result is selected from the plurality of sound data based on the measurement result of the user voice. For example, when the volume and length of the user voice are measured, sound data corresponding to the user voice is selected, reproduction processing based on the selected sound data is performed, and a response voice and a response sound are output. Alternatively, depending on the type of target such as a spectator, the positional relationship between the target and the user, the gaze direction of the user with respect to the target, the analysis result of the feature of the user voice, the position and direction of the microphone 162, the posture of the user, or the gaze of the user Accordingly, sound data to be selected may be made different.

また本実施形態では、図１９（Ｂ）に示すように、音データ記憶部１７５から読み出された複数の音データを組み合わせる処理を行うことで、応答音声、応答音を出力する。例えば複数の音データを組み合わせるミキシング処理を行って、応答音声、応答音として出力する。この場合に、ユーザ音声の測定結果に基づいて、音データの組合わせを変化させて、応答音声、応答音として出力する。例えばユーザ音声の音量や長さが測定された場合に、それに応じた音データの組合わせ処理（ミキシング処理）を行って、応答音声、応答音として出力する。或いは、観客等のターゲットの種類、ターゲットとユーザとの位置関係、ターゲットに対するユーザの視線方向、ユーザ音声の特徴量の解析結果、マイク１６２の位置、方向、ユーザの姿勢、或いはユーザの視線などに応じて、音データの組合わせを変化させて、応答音声、応答音として出力してもよい。 Further, in the present embodiment, as shown in FIG. 19B, a response voice and a response sound are output by performing a process of combining a plurality of sound data read from the sound data storage unit 175. For example, mixing processing for combining a plurality of sound data is performed, and a response sound and a response sound are output. In this case, the combination of sound data is changed based on the measurement result of the user's voice, and output as the response voice and the response sound. For example, when the volume and length of the user voice are measured, the combination processing (mixing processing) of sound data according to the measurement is performed, and the response voice and the response sound are output. Alternatively, depending on the type of target such as a spectator, the positional relationship between the target and the user, the gaze direction of the user with respect to the target, the analysis result of the feature of the user voice, the position and direction of the microphone 162, the posture of the user, or the gaze of the user In response, the combination of sound data may be changed and output as a response sound or response sound.

２．４ユーザ音声の受け付けタイミング
さて、ユーザにより入力されたユーザ音声に対して、応答音声、応答音を出力する場合、当該ユーザ音声の受け付けタイミングについても考慮する必要がある。 2.4 Reception Timing of User Voice When outputting a response voice and a response sound to a user voice input by a user, it is also necessary to consider the reception timing of the user voice.

例えば本実施形態では、マイク１６２により入力されたユーザ音声に対する評価処理を行っている。具体的には、ユーザがマイク１６２を使って歌を唄った場合に、その歌の評価処理（採点処理）を行う。 For example, in the present embodiment, evaluation processing is performed on the user's voice input by the microphone 162. Specifically, when the user sings a song using the microphone 162, evaluation processing (score processing) of the song is performed.

例えば図２０（Ａ）に示すように、歌詞や歌の音程表示がユーザに映し出される。この音程表示は、評価処理の基準（お手本）となる音程を、ユーザが視覚的に把握するためのものである。また、この音程表示により、評価処理の基準（お手本）となるリズムについても、ユーザは把握できる。なお、このような音程表示については行わずに、歌詞だけを表示するようにしてもよい。 For example, as shown in FIG. 20 (A), a pitch display of lyrics or a song is shown to the user. This pitch display is for the user to visually grasp the pitch which becomes the reference (model) of the evaluation processing. In addition, the user can grasp the rhythm which is the standard (model) of the evaluation process by this pitch display. Note that only the lyrics may be displayed without performing such pitch display.

そしてユーザが歌を唄った場合に、図２０（Ｂ）に示すような音程、リズムの基準データとの比較処理が行われる。即ち、ユーザがマイク１６２を用いて入力した歌声の情報から、ユーザの歌の音程、リズムのデータを抽出する処理を行う。そして、抽出されたユーザの音程、リズムのデータと、音程、リズムの基準データとの比較処理を行うことで、ユーザの歌（ボーカル演奏）の評価処理を行う。このようにすれば、ユーザの歌の上手さを適正に評価できるようになる。なお、音程、リズムの基準データに加えて、音量の基準データを用意し、ユーザのボーカル音量のコントロールの上手さ（抑揚）についても評価するようにしてもよい。 Then, when the user sings a song, comparison processing with reference data of pitches and rhythms as shown in FIG. 20 (B) is performed. That is, processing of extracting the data of the user's song's song and rhythm from the information of the singing voice input by using the microphone 162 is performed. Then, evaluation processing of the user's song (vocal performance) is performed by performing comparison processing of the extracted user's pitch and rhythm data with the pitch and rhythm reference data. In this way, it is possible to properly evaluate the user's singing ability. In addition to the reference data of the pitch and the rhythm, reference data of volume may be prepared, and the user's vocal volume control may be evaluated as well.

そして本実施形態では、このようなユーザの歌声等のユーザ音声に対する評価処理が行われる期間では、応答音声、応答音を出力させるためのユーザ音声の入力、即ち、コール＆レスポンス用のユーザ音声の入力を受け付けないようにする。具体的には図２１に示すように、ユーザが歌唱中であり、ユーザの歌を評価している期間では、Ｃ＆Ｒ用（コール＆レスポンス用）のユーザ音声の入力を受け付けないようにする。そして、このような歌の評価期間以外の期間において、Ｃ＆Ｒ用のユーザ音声の入力を受け付ける。具体的には図２１に示すように、前奏期間や間奏期間において、Ｃ＆Ｒ用のユーザ音声の入力を受け付ける。そして、このようなユーザ音声の入力に対して、図９〜図１９（Ｂ）で説明した手法により、その応答音声、応答音を出力するようにする。 In this embodiment, during a period in which evaluation processing is performed on user voice such as the user's singing voice, response voice, user voice input for outputting the response voice, that is, user voice for call & response Do not accept input. Specifically, as shown in FIG. 21, while the user is singing and the user's song is being evaluated, the user voice input for C & R (for call & response) is not accepted. Then, in a period other than such a song evaluation period, the input of the user voice for C & R is accepted. Specifically, as shown in FIG. 21, the input of the user's voice for C & R is accepted in the prelude period and the interlude period. Then, in response to such user voice input, the response voice and the response sound are output by the method described in FIGS. 9 to 19B.

例えばユーザが歌唱中である歌の評価期間においては、ユーザは、その歌の歌詞を構成する言葉を、マイク１６２に入力することになる。そして、この言葉の音声に反応して、図９等のような歓声等の応答音声、応答音が出力されてしまうと、歌の評価処理の妨げになったり、ユーザの不自然感を招くという問題がある。 For example, in the evaluation period of a song in which the user is singing, the user will input into the microphone 162 the words constituting the lyrics of the song. Then, in response to the speech of this word, response voices such as cheers and the like as shown in FIG. 9 and the like are output, which may disturb the evaluation process of the song or cause the user to feel unnatural. There's a problem.

この点、図２１では、ユーザが歌唱中である歌の評価期間では、Ｃ＆Ｒ用のユーザ音声の入力は受け付けられず、それに対応する応答音声、応答音は出力されない。そして、このような評価期間以外の期間においてだけ、Ｃ＆Ｒ用のユーザ音声の入力が受け付けられて、応答音声、応答音が出力される。従って、上記のように応答音声や応答音が歌の評価処理の妨げになったり、ユーザの不自然感を招くという問題の発生を防止できる。 In this respect, in FIG. 21, in the evaluation period of the song in which the user is singing, the input of the user voice for C & R is not accepted, and the corresponding response voice and response sound are not output. Then, the input of the user voice for C & R is accepted only in a period other than such an evaluation period, and the response voice and the response sound are output. Therefore, as described above, it is possible to prevent the occurrence of the problem that the response voice or the response tone interferes with the evaluation process of the song, or the user feels unnatural.

３．詳細な処理
次に本実施形態の詳細な処理例について図２２のフローチャートを用いて説明する。 3. Detailed Processing Next, a detailed processing example of the present embodiment will be described using the flowchart of FIG.

まずＣ＆Ｒ用（コール＆レスポンス用）のユーザ音声入力の受け付け期間か否かを判断する（ステップＳ１）。そして図２１で説明したように、受付期間（前奏期間、間奏期間等）ではない場合には、Ｃ＆Ｒ用のユーザ音声入力を受け付けない。一方、受け付け期間である場合には、当該ユーザ音声入力を受け付けて、ユーザ音声の音量及び長さの測定処理を行う（ステップＳ２）。そして測定処理の結果に基づいて、応答音声又は応答音の音データを選択する処理を行う（ステップＳ３）。即ち、図１０〜図１２で説明したように、ユーザ音声の音量及び長さに応じて、応答音声又は応答音を異ならせる処理を行う。 First, it is determined whether it is a reception period of user voice input for C & R (for call & response) (step S1). Then, as described in FIG. 21, when it is not the reception period (prelude period, interlude period, etc.), the user voice input for C & R is not received. On the other hand, in the case of the reception period, the user voice input is received, and measurement processing of the volume and length of the user voice is performed (step S2). And based on the result of a measurement process, the process which selects the sound data of response voice or a response sound is performed (step S3). That is, as described with reference to FIGS. 10 to 12, processing for varying the response sound or the response sound is performed according to the volume and the length of the user sound.

次に、観客等のターゲットの情報（位置関係等）、ユーザ音声の特徴量（性別、年齢層等）、マイク１６２やＨＭＤ２００の位置、ユーザの姿勢、ユーザの視線、ユーザのプレイ履歴等に基づいて、応答音声又は応答音の変化処理（加工処理）を実行する（ステップＳ４）。即ち図１３（Ａ）〜図１７で説明したような応答音声又は応答音の変化処理を実行する。そして変化処理後（加工処理後）の応答音声又は応答音を出力する（ステップＳ５）。 Next, based on target information (positional relationship etc.) of audience etc., feature amount (sex, age group etc.) of user voice, position of microphone 162 or HMD 200, posture of user, user's gaze, user's play history etc. Then, change processing (processing processing) of the response voice or the response sound is executed (step S4). That is, the change processing of the response sound or the response sound as described in FIGS. 13A to 17 is executed. Then, the response voice or response tone after the change processing (after the processing) is output (step S5).

なお、上記のように本実施形態について詳細に説明したが、本発明の新規事項および効果から実体的に逸脱しない多くの変形が可能であることは当業者には容易に理解できるであろう。従って、このような変形例はすべて本発明の範囲に含まれるものとする。例えば、明細書又は図面において、少なくとも一度、より広義または同義な異なる用語（音入力装置、ターゲット、移動可能範囲等）と共に記載された用語（マイク、観客、プレイエリア等）は、明細書又は図面のいかなる箇所においても、その異なる用語に置き換えることができる。またユーザ音声の取得処理、ユーザ音声の測定処理、応答音声、応答音の出力処理、応答音声、応答音の変化処理、ユーザ音声入力の受け付け処理や、シミュレーションシステムの構成等も、本実施形態で説明したものに限定されず、これらと均等な手法・処理・構成も本発明の範囲に含まれる。また本発明は種々のゲームに適用できる。また本発明は、業務用ゲーム装置、家庭用ゲーム装置、又は多数のユーザが参加する大型アトラクションシステム等の種々のシミュレーションシステムに適用できる。 It should be understood by those skilled in the art that although the present embodiment has been described in detail as described above, many modifications can be made without departing substantially from the novel matters and effects of the present invention. Accordingly, all such modifications are intended to be included within the scope of the present invention. For example, the terms (microphone, audience, play area, etc.) described together with different terms (sound input device, target, movable range, etc.) more broadly or synonymously at least once in the specification or figures are the specification or drawings In any part of, the different terms can be substituted. Also, in this embodiment, user voice acquisition processing, user voice measurement processing, response voice, response sound output processing, response voice, response sound change processing, user voice input acceptance processing, simulation system configuration, etc. The present invention is not limited to the above-described ones, and equivalent methods, processes and configurations are also included in the scope of the present invention. The present invention is also applicable to various games. Further, the present invention can be applied to various simulation systems such as a arcade game machine, a home game machine, or a large attraction system in which a large number of users participate.

１００処理部、１０２入力処理部、１１０演算処理部、１１１ゲーム処理部、
１１２ゲーム進行処理部、１１３評価処理部、１１４キャラクタ処理部、
１１５パラメータ処理部、１１６オブジェクト空間設定部、
１１７仮想カメラ制御部、１１８ゲーム成績演算部、１１９測定処理部、
１２０表示処理部、１３０音処理部、１３２応答音処理部、
１４０出力処理部、１５０撮像部、１５１、１５２カメラ、
１６０（１６０−１、１６０−２）入力装置、１６１音入力装置、
１６２マイク、１７０記憶部、１７２空間情報記憶部、１７４楽曲情報記憶部、
１７５音データ記憶部、１７６パラメータ記憶部、１７８描画バッファ、
１８０情報記憶媒体、１９２音出力部、１９４Ｉ／Ｆ部、
１９５携帯型情報記憶媒体、１９６通信部、
２００ＨＭＤ（頭部装着型表示装置）、２０１〜２０３受光素子、２１０センサ部、
２２０表示部、２３１〜２３６発光素子、２４０処理部、２６０ヘッドバンド、
２８０、２８４ステーション、２８１、２８２、２８５、２８６発光素子、
２９０、２９２照明器具、３０１〜３０４壁、３０５天井、３０６ドア、
３１１〜３１５防音材、３３０、３３１フロントスピーカ、３３２リアスピーカ、
３３３、３３４、リアスピーカ、３３５ウーハー、 100 processing unit, 102 input processing unit, 110 arithmetic processing unit, 111 game processing unit,
112 game progress processing unit, 113 evaluation processing unit, 114 character processing unit,
115 parameter processing unit 116 object space setting unit
117 virtual camera control unit, 118 game result calculation unit, 119 measurement processing unit,
120 display processing unit, 130 sound processing unit, 132 response sound processing unit,
140 output processing unit, 150 imaging unit, 151, 152 cameras,
160 (160-1, 160-2) input device, 161 sound input device,
162 microphone, 170 storage unit, 172 space information storage unit, 174 music information storage unit,
175 sound data storage unit, 176 parameter storage unit, 178 drawing buffer,
180 information storage medium, 192 sound output unit, 194 I / F unit,
195 portable information storage medium, 196 communication unit,
200 HMD (head mounted display), 201 to 203 light receiving elements, 210 sensor unit,
220 display unit, 231 to 236 light emitting elements, 240 processing unit, 260 head band,
280, 284 stations, 281, 282, 285, 286 light emitting elements,
290, 292 lighting fixtures, 301-304 walls, 305 ceilings, 306 doors,
311 to 315 soundproofing materials, 330, 331 front speakers, 332 rear speakers,
333, 334, rear speakers, 335 woofer,

Claims

An input processing unit that acquires information on user voice input by a user using a sound input device, and information on a viewpoint position and a gaze direction of a virtual user in a virtual space corresponding to the user;
A measurement processing unit that performs measurement processing of the user voice;
A sound processing unit that performs an output process of a response voice or a response sound different from the user voice in response to the input of the user voice;
A virtual space setting unit that performs setting processing of the virtual space;
A display processing unit that generates an image viewed from the viewpoint of the virtual user in the virtual space and displays the image on a display unit of a head mounted display worn by the user;
Including
The measurement processing unit
Perform measurement processing of the volume and length of the user voice;
The sound processing unit
The response voice or the response tone based on the result of the measurement process of the volume and length of the user voice is output as the voice or sound of the target of the user in the virtual space,
The sound processing unit
Type of the target the virtual user directs the gaze direction in the virtual space in response to the at least one virtual user's gaze direction relative to the target in 及 beauty the virtual space, the response voice or the response sound A simulation system characterized by performing processing to change.

In claim 1,
The sound processing unit
The distance between the sound input device and the head-mounted display device, or whether the sound input device faces the head-mounted display device, whether the user's voice input is accepted or not The simulation system characterized by judging according to.

In claim 1 or 2,
The sound processing unit
A simulation system characterized by performing processing of changing the response voice or the response sound according to whether or not the virtual user gazes at a gaze direction toward the target in the virtual space.

In any one of claims 1 to 3,
The sound processing unit
Processing for detecting the posture of the user based on the position of the head-mounted display device and the position of the voice input device, and changing the response voice or the response sound according to the detected posture of the user A simulation system characterized by performing.

In any one of claims 1 to 4,
The sound processing unit
A simulation system characterized in that the response voice or the response tone is differentiated according to the volume and length of the user voice.

In any one of claims 1 to 5,
The measurement processing unit
The length from the timing at which the volume of the user voice exceeds the i-th volume level (1 ≦ i ≦ N) of the first to N-th volume levels to the timing at which the volume falls below the i-th volume level Measure the
The sound processing unit
A simulation system characterized by performing processing to make the response voice or the response sound different according to the measured length.

In claim 6,
The sound processing unit
When the volume of the user voice exceeds the i-th volume level and when the j-th (1 ≦ i <j ≦ N) volume level of the first to N-th volume levels is exceeded Processing the response voice or the response sound differently.

In any one of claims 1 to 7,
The simulation system characterized in that the target is a character of a spectator appearing in a game.

In any one of claims 1 to 8,
The measurement processing unit
The analysis process of the feature amount of the user voice is performed, and the sound processing unit is configured to perform the process of changing the response voice or the response sound according to the result of the analysis process of the feature amount of the user voice. Simulation system.

In any one of claims 1 to 9,
The sound processing unit
The position and the direction of the sound input device held by the user, the position and the direction of the hand, the position and the direction of the head mounted display worn by the user, the posture of the user, and the line of sight of the user According to at least one, processing to change the response sound or the response sound, or processing to output the response sound or the response sound, or input of the user voice for outputting the response sound or the response sound A simulation system characterized by performing processing to receive

In any one of claims 1 to 10,
The sound processing unit
A simulation system characterized by performing processing to change said response sound or said response sound based on past play history information of said user.

In any one of claims 1 to 11,
A sound data storage unit for storing sound data;
The sound processing unit
The response sound or the response sound is output by performing processing of selecting sound data to be used from a plurality of sound data stored in the sound data storage unit or processing of combining a plurality of sound data. Simulation system characterized by

In any one of claims 1 to 12,
The input processing unit
In an evaluation period in which evaluation processing is performed on the user voice, the response voice or the input of the user voice for outputting the response sound is not received, and the response voice or the response sound is received in a period other than the evaluation period. Receiving the input of the user's voice for outputting

The simulation system according to any one of claims 1 to 13.
The sound input device;
The head mounted display;
Simulation apparatus characterized by including.

An input processing unit that acquires information on user voice input by a user using a sound input device, and information on a viewpoint position and a gaze direction of a virtual user in a virtual space corresponding to the user;
A measurement processing unit that performs measurement processing of the user voice;
A sound processing unit that performs an output process of a response voice or a response sound different from the user voice in response to the input of the user voice;
A virtual space setting unit that performs setting processing of the virtual space;
A display processing unit that generates an image viewed from the viewpoint of the virtual user in the virtual space and displays the image on a display unit of a head mounted display worn by the user.
Make the computer work
The measurement processing unit
Perform measurement processing of the volume and length of the user voice;
The sound processing unit
The response voice or the response tone based on the result of the measurement process of the volume and length of the user voice is output as the voice or sound of the target of the user in the virtual space,
The sound processing unit
Type of the target the virtual user directs the gaze direction in the virtual space in response to the at least one virtual user's gaze direction relative to the target in 及 beauty the virtual space, the response voice or the response sound A program characterized by performing a process of changing.