JP2018157314A

JP2018157314A - Information processing system, information processing method and program

Info

Publication number: JP2018157314A
Application number: JP2017051242A
Authority: JP
Inventors: 誠庄原; Makoto Shohara
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2017-03-16
Filing date: 2017-03-16
Publication date: 2018-10-04

Abstract

PROBLEM TO BE SOLVED: To output sounds matching an image to enhance reality.SOLUTION: An information processing system that has an information processing device connected with a photographing device for taking a plurality of images, comprises: a sound input unit to which a plurality of sounds are inputted; an image output unit that outputs a display image on the basis of the plurality of images; a setting unit that sets a predetermined region outputted on the display image; a first conversion unit that converts the plurality of sounds inputted to the sound input unit, on the basis of the predetermined region; and a sound output unit that outputs the plurality of sounds converted by the first conversion unit.SELECTED DRAWING: Figure 1

Description

本発明は、情報処理システム、情報処理方法及びプログラムに関する。 The present invention relates to an information processing system, an information processing method, and a program.

従来、広い範囲を示す画像、いわゆるパノラマ画像を表示する方法が知られている。 Conventionally, a method of displaying an image showing a wide range, a so-called panoramic image is known.

例えば、画像処理システムが、まず、対象となる画像を３次元形状に貼り付け、３次元モデルを生成する。次に、画像処理システムが、入力値に基づいて、視点の位置及び視野角を決定する。そして、視点の位置及び視野角の決定では、画像処理システムは、入力値に基づいて、視野角を優先的に変化させるか、又は、視点の位置を優先的に変化させるかを決定する。このようにして、広い視野領域において、被写体が引き伸ばされる表示等の違和感のある画像が表示されるのを少なくする方法が知られている（例えば、特許文献１等）。 For example, the image processing system first pastes a target image into a three-dimensional shape to generate a three-dimensional model. Next, the image processing system determines the position of the viewpoint and the viewing angle based on the input value. In determining the viewpoint position and the viewing angle, the image processing system determines whether to change the viewing angle preferentially or to change the viewpoint position preferentially based on the input value. In this way, there is known a method for reducing the display of an uncomfortable image such as a display in which a subject is stretched in a wide field of view (for example, Patent Document 1).

しかしながら、従来の方法では、画像に合わせた音声が出力されず、臨場感が足りない課題がある。 However, in the conventional method, there is a problem that the sound matching the image is not output and the presence is insufficient.

本発明は、画像に合わせた音声を出力して、臨場感を高めることを目的とする。 An object of the present invention is to increase the sense of reality by outputting sound in accordance with an image.

上述した課題を解決するために、本発明の一態様における、複数の画像を撮影する撮影装置と接続される情報処理装置を有する情報処理システムは、
複数の音声を入力する音声入力部と、
前記複数の画像に基づいて、表示画像を出力する画像出力部と、
前記表示画像に出力される所定領域を設定する設定部と、
前記所定領域に基づいて、前記音声入力部に入力された複数の音声を変換する第１の変換部と、
前記第１の変換部で変換された複数の音声を出力する音声出力部と
を備える。 In order to solve the above-described problem, an information processing system including an information processing device connected to an imaging device that captures a plurality of images in one embodiment of the present invention is provided.
A voice input unit for inputting a plurality of voices;
An image output unit that outputs a display image based on the plurality of images;
A setting unit for setting a predetermined area to be output to the display image;
A first conversion unit that converts a plurality of voices input to the voice input unit based on the predetermined area;
An audio output unit that outputs a plurality of audios converted by the first conversion unit.

画像に合わせた音声を出力して、臨場感を高めることができる。 Sound that matches the image can be output to enhance the sense of presence.

本発明の一実施形態に係る情報処理システムの全体構成の一例を説明する図である。It is a figure explaining an example of the whole information processing system composition concerning one embodiment of the present invention. 本発明の一実施形態に係る撮影装置の一例を説明する図である。It is a figure explaining an example of the imaging device concerning one embodiment of the present invention. 本発明の一実施形態に係る撮影装置によって撮影された画像の一例を説明する図である。It is a figure explaining an example of the image image | photographed with the imaging device which concerns on one Embodiment of this invention. 本発明の一実施形態に係る撮影装置のハードウェア構成の一例を説明するブロック図である。It is a block diagram explaining an example of the hardware constitutions of the imaging device which concerns on one Embodiment of this invention. 本発明の一実施形態に係る情報処理装置のハードウェア構成の一例を説明するブロック図である。It is a block diagram explaining an example of the hardware constitutions of the information processing apparatus which concerns on one Embodiment of this invention. 本発明の一実施形態に係る情報処理システムによる全体処理の一例を説明するシーケンス図である。It is a sequence diagram explaining an example of the whole process by the information processing system which concerns on one Embodiment of this invention. 本発明の一実施形態に係る全天球画像の一例を説明する図である。It is a figure explaining an example of the omnidirectional image which concerns on one Embodiment of this invention. 本発明の一実施形態に係る全天球パノラマ画像の一例を説明する図である。It is a figure explaining an example of the omnidirectional panoramic image which concerns on one Embodiment of this invention. 本発明の一実施形態に係る初期画像の一例を説明するための図である。It is a figure for demonstrating an example of the initial image which concerns on one Embodiment of this invention. 本発明の一実施形態に係る別のズーム処理の一例を説明するための図である。It is a figure for demonstrating an example of another zoom process which concerns on one Embodiment of this invention. 本発明の一実施形態に係る別のズーム処理の一例を説明するための表である。It is a table | surface for demonstrating an example of another zoom process which concerns on one Embodiment of this invention. 本発明の一実施形態に係る別のズーム処理の「範囲」の一例を説明するための図である。It is a figure for demonstrating an example of the "range" of another zoom process which concerns on one Embodiment of this invention. 本発明の一実施形態に係る仮想スピーカの配置例を示す模式図である。It is a schematic diagram which shows the example of arrangement | positioning of the virtual speaker which concerns on one Embodiment of this invention. 本発明の一実施形態に係る仮想スピーカの配置を変更した第１例を示す模式図である。It is a schematic diagram which shows the 1st example which changed arrangement | positioning of the virtual speaker which concerns on one Embodiment of this invention. 本発明の一実施形態に係る仮想スピーカの配置の第２例を示す模式図である。It is a schematic diagram which shows the 2nd example of arrangement | positioning of the virtual speaker which concerns on one Embodiment of this invention. 本発明の一実施形態に係る仮想スピーカの配置の第３例を示す模式図である。It is a schematic diagram which shows the 3rd example of arrangement | positioning of the virtual speaker which concerns on one Embodiment of this invention. 本発明の一実施形態に係る仮想スピーカの配置の第４例を示す模式図である。It is a schematic diagram which shows the 4th example of arrangement | positioning of the virtual speaker which concerns on one Embodiment of this invention. 本発明の一実施形態に係る仮想スピーカの配置の第５例を示す模式図である。It is a schematic diagram which shows the 5th example of arrangement | positioning of the virtual speaker which concerns on one Embodiment of this invention. 本発明の一実施形態に係る情報処理システムの機能構成例を示す機能ブロック図である。It is a functional block diagram which shows the function structural example of the information processing system which concerns on one Embodiment of this invention.

以下、本発明の実施の形態について説明する。なお、本発明の実施の形態の音声とは、人が発する声に限らず、音楽、機械音、動作音、その他空気の振動によって伝搬する音を総称したものとして参照する。 Embodiments of the present invention will be described below. The sound of the embodiment of the present invention is not limited to a voice uttered by a person, but is referred to as a general term for music, mechanical sound, operation sound, and other sounds that are propagated by vibration of air.

＜情報処理システムの全体構成例＞
図１は、本発明の一実施形態に係る情報処理システムの全体構成の一例を説明する図である。情報処理システム１０は、撮影装置１と、情報処理装置の例であるスマートフォン２とを有する。 <Example of overall configuration of information processing system>
FIG. 1 is a diagram illustrating an example of the overall configuration of an information processing system according to an embodiment of the present invention. The information processing system 10 includes a photographing device 1 and a smartphone 2 that is an example of the information processing device.

撮影装置１は、少なくとも複数の光学系を有するカメラ等である。例えば、撮影装置１は、複数の光学系を用いて撮影した複数の画像に基づいて、全方位等の広い範囲を示す画像（以下「全天球画像」という。）を生成する。次に、撮影装置１は、全天球画像等をスマートフォン２に送信する。そして、スマートフォン２は、送信される画像を画像処理して表示画像を出力する。以下、入力される画像として、第１画像が全天球画像である例で説明する。なお、パノラマ画像は、例えば、全天球画像である。 The imaging device 1 is a camera having at least a plurality of optical systems. For example, the imaging device 1 generates an image showing a wide range such as an omnidirectional direction (hereinafter referred to as an “omnidirectional image”) based on a plurality of images captured using a plurality of optical systems. Next, the imaging device 1 transmits an omnidirectional image or the like to the smartphone 2. Then, the smartphone 2 performs image processing on the transmitted image and outputs a display image. Hereinafter, an example in which the first image is an omnidirectional image will be described as an input image. The panoramic image is, for example, an omnidirectional image.

また、この例では、撮影装置１及びスマートフォン２は、有線又は無線で接続される。そして、スマートフォン２は、全天球画像等のデータを撮影装置１からダウンロードする。なお、接続は、ネットワーク等を介してもよい。なお、情報処理システム１０は、複数の光学系を用いて撮影した複数の画像を撮影装置１からスマートフォン２に送信し、スマートフォン２で複数の画像を合成し、全天球画像を生成してもよい。 In this example, the photographing device 1 and the smartphone 2 are connected by wire or wirelessly. Then, the smartphone 2 downloads data such as an omnidirectional image from the imaging device 1. The connection may be via a network or the like. Note that the information processing system 10 transmits a plurality of images captured using a plurality of optical systems from the imaging device 1 to the smartphone 2, combines the plurality of images with the smartphone 2, and generates an omnidirectional image. Good.

さらに、全体構成は、図１に示す構成に限られない。例えば、撮影装置１及びスマートフォン２は、一体の装置であってもよい。このほか、情報処理装置は、ＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）又はタブレット等でもよい。また、情報処理システム１０は、撮影装置１及びスマートフォン２以外に、更に撮影装置又は情報処理装置等を有してもよい。 Furthermore, the overall configuration is not limited to the configuration shown in FIG. For example, the photographing device 1 and the smartphone 2 may be an integrated device. In addition, the information processing apparatus may be a PC (Personal Computer) or a tablet. The information processing system 10 may further include a photographing device or an information processing device in addition to the photographing device 1 and the smartphone 2.

＜撮影装置例＞
図２は、本発明の一実施形態に係る撮影装置の一例を説明する図である。具体的には、図２（ａ）は、撮影装置１の正面図の一例である。また、図２（ｂ）は、撮影装置１の左側面図の一例である。さらに、図２（ｃ）は、撮影装置１の平面図の一例である。 <Photographing device example>
FIG. 2 is a diagram for explaining an example of a photographing apparatus according to an embodiment of the present invention. Specifically, FIG. 2A is an example of a front view of the photographing apparatus 1. FIG. 2B is an example of a left side view of the photographing apparatus 1. Further, FIG. 2C is an example of a plan view of the photographing apparatus 1.

そして、撮影装置１は、前面撮影素子１Ｈ１と、後面撮影素子１Ｈ２と、スイッチ１Ｈ３とを有する。この例では、前面撮影素子１Ｈ１及び後面撮影素子１Ｈ２等の光学系が、撮影に用いられる。そして、それぞれの光学系を用いて撮影されたそれぞれの画像に基づいて、撮影装置１は、全天球画像を生成する。 The photographing apparatus 1 includes a front photographing element 1H1, a rear photographing element 1H2, and a switch 1H3. In this example, optical systems such as the front photographing element 1H1 and the rear photographing element 1H2 are used for photographing. And the imaging device 1 produces | generates an omnidirectional image based on each image image | photographed using each optical system.

さらに、撮影装置１は、複数箇所にマイクロフォンを有する。例えば、撮影装置１には、マイクロフォンが４個配置される。撮影装置１は、４箇所で撮影装置１の周囲の音を収音し、音の信号が撮影装置１に入力される。 Furthermore, the imaging device 1 has microphones at a plurality of locations. For example, the microphone 1 is arranged in the photographing apparatus 1. The photographing apparatus 1 collects sounds around the photographing apparatus 1 at four locations, and a sound signal is input to the photographing apparatus 1.

具体的には、図２では、撮影装置１の前面側に、マイクロフォン１ＨＭ１、マイクロフォン１ＨＭ２及びマイクロフォン１ＨＭ３が配置される。さらに、撮影装置１の後面側に、マイクロフォン１ＨＭ４が配置される。なお、マイクロフォンの数及び配置される位置は、図２に示す配置に限られない。マイクロフォンは、複数であればより臨場感のある音を出力できる。ただし、マイクロフォンの数は、４個以上、すなわち、音声を入力する箇所は、４箇所以上であるのが望ましい。 Specifically, in FIG. 2, the microphone 1 HM 1, the microphone 1 HM 2, and the microphone 1 HM 3 are arranged on the front side of the photographing apparatus 1. Further, a microphone 1HM4 is disposed on the rear surface side of the photographing apparatus 1. The number of microphones and the positions where they are arranged are not limited to the arrangement shown in FIG. Multiple microphones can output more realistic sound. However, it is desirable that the number of microphones is four or more, that is, the number of places where voice is input is four or more.

マイクロフォンは、指向性マイクロフォンであってもよいし、無指向性マイクロフォンであってもよい。指向性マイクロフォンを用いた場合には、撮影装置１は、各マイクロフォンによって、特定の方向の音を取得することができる。一方で、無指向性マイクロフォンを用いた場合には、撮影装置１は、各マイクロフォンのキャリブレーションを容易に行うことができる。さらに、マイクロフォンは、指向性マイクロフォンと、無指向性マイクロフォンとの組み合わせでもよい。マイクロフォンのうち、少なくとも１つが無指向性であると、キャリブレーションを容易に行うことができると共に、安価で、個体ばらつきが少なくなる。 The microphone may be a directional microphone or an omnidirectional microphone. When a directional microphone is used, the photographing apparatus 1 can acquire sound in a specific direction with each microphone. On the other hand, when an omnidirectional microphone is used, the photographing apparatus 1 can easily calibrate each microphone. Further, the microphone may be a combination of a directional microphone and an omnidirectional microphone. When at least one of the microphones is omnidirectional, calibration can be easily performed, and the cost is low and individual variation is small.

また、スイッチ１Ｈ３は、シャッタボタンである。スイッチ１Ｈ３は、ユーザが撮影装置１に対して撮影の指示を行うための入力装置の例である。 The switch 1H3 is a shutter button. The switch 1H3 is an example of an input device for the user to instruct the photographing apparatus 1 to perform photographing.

図２（ａ）のように、スイッチ１Ｈ３がユーザによって押されると、シャッタを切る制御が行われ、撮影装置１は、撮影を行う。このほか、情報処理システム１０は、スマートフォン２等の情報処理装置から、遠隔でシャッタを切る操作が入力される構成でもよい。そして、前面撮影素子１Ｈ１と、後面撮影素子１Ｈ２とによって、撮影装置１の全方位が撮影される。 As shown in FIG. 2A, when the switch 1H3 is pressed by the user, control to release the shutter is performed, and the photographing apparatus 1 performs photographing. In addition, the information processing system 10 may have a configuration in which an operation to remotely release a shutter is input from an information processing device such as the smartphone 2. All directions of the photographing apparatus 1 are photographed by the front photographing element 1H1 and the rear photographing element 1H2.

図３は、本発明の一実施形態に係る撮影装置によって撮影された画像の一例を説明する図である。具体的には、図３（ａ）は、前面撮影素子１Ｈ１によって撮影される画像の一例である。一方で、図３（ｂ）は、後面撮影素子１Ｈ２によって撮影される画像の一例である。 FIG. 3 is a diagram for explaining an example of an image photographed by the photographing apparatus according to the embodiment of the present invention. Specifically, FIG. 3A is an example of an image photographed by the front photographing element 1H1. On the other hand, FIG. 3B is an example of an image photographed by the rear photographing element 1H2.

そして、図３（ｃ）は、図３（ａ）の前面撮影素子１Ｈ１によって撮影される画像と、図３（ｂ）の後面撮影素子１Ｈ２によって撮影される画像とに基づいて生成される画像の一例である。 FIG. 3C shows an image generated based on the image photographed by the front photographing element 1H1 in FIG. 3A and the image photographed by the rear photographing element 1H2 in FIG. It is an example.

まず、前面撮影素子１Ｈ１によって撮影される画像は、撮影装置１の前方側の広い範囲、例えば、画角で１８０°以上の範囲を撮影範囲とする画像である。同様に、後面撮影素子１Ｈ２によって撮影される画像は、撮影装置１が撮影する範囲のうち、後方側の広い範囲、例えば、画角で１８０°の範囲を撮影範囲とする画像である。光学系として、魚眼レンズを用いる場合、歪曲収差を有する場合が多い。すなわち、図３（ａ）及び図３（ｂ）の画像は、それぞれ撮影装置１が撮影する範囲のうち、一方（この例では、前方側である。）及び他方（この例では、後方側である。）の広い範囲を示し、それぞれ半球画像（以下「半球画像」という。）である。 First, an image photographed by the front photographing element 1H1 is an image having a wide range on the front side of the photographing device 1, for example, a range of 180 ° or more in view angle. Similarly, the image captured by the rear imaging element 1H2 is an image having a wide range on the rear side, for example, a range of 180 ° in angle of view within the range captured by the imaging device 1. When a fisheye lens is used as the optical system, it often has distortion. That is, the images of FIG. 3A and FIG. 3B are one (in this example, the front side) and the other (in this example, the rear side) of the range captured by the imaging device 1. Each of which is a hemispherical image (hereinafter referred to as a “hemispherical image”).

なお、各光学系のそれぞれの画角は、１８０°以上かつ２００°以下の範囲が望ましい。特に、各画角が１８０°以上を超えると、各半球画像と半球画像を合成する際、重畳する画像領域があるため、撮影装置は、視差がある場合にも全天球画像を生成できる。 The angle of view of each optical system is preferably in the range of 180 ° to 200 °. In particular, when each angle of view exceeds 180 ° or more, there is an image area to be superimposed when the hemispherical image and the hemispherical image are combined, so that the photographing apparatus can generate an omnidirectional image even when there is parallax.

次に、撮影装置１は、歪補正処理及び合成処理等の処理を行い、図３（ａ）に示す前方側の半球画像と、図３（ｂ）の後方側の半球画像とに基づいて、図３（ｃ）の画像を生成する。すなわち、図３（ｃ）の画像は、いわゆるメルカトル（Ｍｅｒｃａｔｏｒ）図法又は正距円筒図法等の方法で生成される画像、すなわち、全天球画像の例である。撮影装置１から半球画像を情報処理装置に送信し、情報処理装置で全天球画像を生成してもよい。 Next, the photographing apparatus 1 performs processing such as distortion correction processing and composition processing, and based on the front hemispherical image shown in FIG. 3A and the rear hemispherical image in FIG. The image of FIG. 3C is generated. 3C is an example of an image generated by a method such as a so-called Mercator projection or equirectangular projection, that is, an omnidirectional image. A hemispherical image may be transmitted from the imaging device 1 to the information processing device, and the omnidirectional image may be generated by the information processing device.

なお、第１画像は、撮影装置１によって生成される画像に限られない。例えば、第１画像は、他のカメラ等で撮影される画像又は他のカメラで撮影される画像に基づいて生成された画像でもよい。なお、第１画像は、全方位カメラ又はいわゆる広角レンズのカメラ等によって、広い視野角の範囲を撮影した画像であるのが望ましい。 Note that the first image is not limited to an image generated by the photographing apparatus 1. For example, the first image may be an image captured by another camera or the like, or an image generated based on an image captured by another camera. The first image is preferably an image obtained by photographing a wide viewing angle range with an omnidirectional camera or a so-called wide-angle lens camera.

また、以下の説明では、第１画像は、全天球画像を例に説明するが、第１画像は、全天球画像に限られない。例えば、第１画像は、コンパクトカメラ、一眼レフカメラ又はスマートフォン等で撮影された画像でもよい。なお、画像は、水平又は垂直に伸びるパノラマ画像等でもよい。 In the following description, the first image is described as an example of an omnidirectional image, but the first image is not limited to an omnidirectional image. For example, the first image may be an image taken with a compact camera, a single-lens reflex camera, a smartphone, or the like. The image may be a panoramic image that extends horizontally or vertically.

＜撮影装置のハードウェア構成例＞
図４は、本発明の一実施形態に係る撮影装置のハードウェア構成の一例を説明するブロック図である。撮影装置１は、撮影ユニット１Ｈ４と、画像処理ユニット１Ｈ７、撮影制御ユニット１Ｈ８と、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１Ｈ９と、ＲＯＭ（Ｒｅａｄ−ＯｎｌｙＭｅｍｏｒｙ）１Ｈ１０とを有する。また、撮影装置１は、ＳＲＡＭ（ＳｔａｔｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）１Ｈ１１と、ＤＲＡＭ（ＤｙｎａｍｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）１Ｈ１２と、操作Ｉ／Ｆ（Ｉｎｔｅｒｆａｃｅ）１Ｈ１３とを有する。さらに、撮影装置１は、ネットワークＩ／Ｆ１Ｈ１４と、無線Ｉ／Ｆ１Ｈ１５と、アンテナ１Ｈ１６とを有する。 <Example of hardware configuration of photographing apparatus>
FIG. 4 is a block diagram for explaining an example of the hardware configuration of the photographing apparatus according to an embodiment of the present invention. The photographing apparatus 1 includes a photographing unit 1H4, an image processing unit 1H7, a photographing control unit 1H8, a CPU (Central Processing Unit) 1H9, and a ROM (Read-Only Memory) 1H10. In addition, the photographing apparatus 1 includes an SRAM (Static Random Access Memory) 1H11, a DRAM (Dynamic Random Access Memory) 1H12, and an operation I / F (Interface) 1H13. Furthermore, the imaging device 1 includes a network I / F 1H14, a wireless I / F 1H15, and an antenna 1H16.

さらに撮影装置１は、マイクロフォン１ＨＭ１、１ＨＭ２、１ＨＭ３及び１ＨＭ４を有する。これらのマイクロフォンによって、撮影装置１は、複数箇所にて音声を入力する。 Further, the photographing apparatus 1 includes microphones 1HM1, 1HM2, 1HM3, and 1HM4. With these microphones, the photographing apparatus 1 inputs sound at a plurality of locations.

撮影装置１は、姿勢センサ１Ｈ１８を有するのが望ましい。 The photographing apparatus 1 preferably has an attitude sensor 1H18.

また、撮影装置１が有する各ハードウェアは、バス１Ｈ１７で接続され、バス１Ｈ１７を介してデータ又は信号を入出力する。 Each hardware included in the photographing apparatus 1 is connected by a bus 1H17, and inputs or outputs data or signals through the bus 1H17.

撮影ユニット１Ｈ４は、前面撮影素子１Ｈ１と、後面撮影素子１Ｈ２とを有する。また、前面撮影素子１Ｈ１に対応してレンズ１Ｈ５、後面撮影素子１Ｈ２に対応してレンズ１Ｈ６がそれぞれ設置される。レンズ１Ｈ５と、レンズ１Ｈ６は、魚眼レンズや広角レンズであることが好ましい。 The photographing unit 1H4 includes a front photographing element 1H1 and a rear photographing element 1H2. Further, a lens 1H5 is installed corresponding to the front imaging element 1H1, and a lens 1H6 is installed corresponding to the rear imaging element 1H2. The lens 1H5 and the lens 1H6 are preferably fisheye lenses or wide-angle lenses.

前面撮影素子１Ｈ１及び後面撮影素子１Ｈ２は、いわゆるカメラユニットである。具体的には、前面撮影素子１Ｈ１及び後面撮影素子１Ｈ２は、ＣＭＯＳ（ＣｏｍｐｌｅｍｅｎｔａｒｙＭｅｔａｌＯｘｉｄｅＳｅｍｉｃｏｎｄｕｃｔｏｒ）又はＣＣＤ（ＣｈａｒｇｅＣｏｕｐｌｅｄＤｅｖｉｃｅ）等の光学センサをそれぞれ有する。そして、前面撮影素子１Ｈ１は、レンズ１Ｈ５を通して入射する光を変換し、半球画像等を示す画像データを生成する。同様に、後面撮影素子１Ｈ２は、レンズ１Ｈ６を通して入射する光を変換し、半球画像等を示す画像データを生成する。 The front photographing element 1H1 and the rear photographing element 1H2 are so-called camera units. Specifically, the front imaging element 1H1 and the rear imaging element 1H2 each have an optical sensor such as a CMOS (Complementary Metal Oxide Semiconductor) or a CCD (Charge Coupled Device). Then, the front photographing element 1H1 converts light incident through the lens 1H5 and generates image data indicating a hemispherical image or the like. Similarly, the rear imaging element 1H2 converts light incident through the lens 1H6 and generates image data indicating a hemispherical image or the like.

次に、撮影ユニット１Ｈ４は、前面撮影素子１Ｈ１及び後面撮影素子１Ｈ２が生成するそれぞれの画像データを画像処理ユニット１Ｈ７へ出力する。なお、出力される画像データは、例えば、図３（ａ）の前方の半球画像及び図３（ｂ）の後方の半球画像等である。 Next, the photographing unit 1H4 outputs the respective image data generated by the front photographing element 1H1 and the rear photographing element 1H2 to the image processing unit 1H7. The output image data is, for example, the front hemispherical image in FIG. 3A and the rear hemispherical image in FIG.

さらに、前面撮影素子１Ｈ１及び後面撮影素子１Ｈ２は、高画質の撮影を行うため、絞り又はローパスフィルタ等の他の光学要素を更に有してもよい。また、前面撮影素子１Ｈ１及び後面撮影素子１Ｈ２は、高画質の撮影を行うために、欠陥画素補正又は手振れ補正等を行ってもよい。 Further, the front imaging element 1H1 and the rear imaging element 1H2 may further include other optical elements such as a diaphragm or a low-pass filter in order to perform high-quality imaging. Further, the front imaging element 1H1 and the rear imaging element 1H2 may perform defective pixel correction or camera shake correction in order to perform high-quality imaging.

画像処理ユニット１Ｈ７は、撮影ユニット１Ｈ４から入力される画像データに基づいて、図３（ｃ）の全天球画像を生成する。 The image processing unit 1H7 generates the omnidirectional image of FIG. 3C based on the image data input from the photographing unit 1H4.

撮影制御ユニット１Ｈ８は、撮影装置１が有するハードウェアを制御する制御装置である。 The photographing control unit 1H8 is a control device that controls hardware included in the photographing device 1.

ＣＰＵ１Ｈ９は、各処理を実現するための演算及びデータの加工を行う演算装置並びにハードウェアの制御を行う制御装置である。例えば、ＣＰＵ１Ｈ９は、あらかじめインストールされるプログラムに基づいて、各処理を実行する。 The CPU 1H9 is a calculation device that performs calculation and data processing for realizing each process, and a control device that controls hardware. For example, the CPU 1H9 executes each process based on a program installed in advance.

ＲＯＭ１Ｈ１０、ＳＲＡＭ１Ｈ１１及びＤＲＡＭ１Ｈ１２は、記憶装置の例である。例えば、ＲＯＭ１Ｈ１０は、ＣＰＵ１Ｈ９に処理を実行させるためのプログラム、データ又はパラメータ等を記憶する。また、ＳＲＡＭ１Ｈ１１及びＤＲＡＭ１Ｈ１２は、ＣＰＵ１Ｈ９がプログラムに基づいて処理を実行するのに用いられるプログラム、プログラムが使用するデータ、プログラムが生成するデータ等を記憶する。なお、撮影装置１は、ハードディスク等の補助記憶装置を更に有してもよい。 ROM1H10, SRAM1H11, and DRAM1H12 are examples of storage devices. For example, the ROM 1H10 stores programs, data, parameters, and the like for causing the CPU 1H9 to execute processing. The SRAM 1H11 and the DRAM 1H12 store a program used by the CPU 1H9 to execute processing based on the program, data used by the program, data generated by the program, and the like. Note that the photographing apparatus 1 may further include an auxiliary storage device such as a hard disk.

操作Ｉ／Ｆ１Ｈ１３は、スイッチ１Ｈ３等の入力装置と接続され、撮影装置１に対するユーザの操作を入力する処理を行うインタフェースである。例えば、操作Ｉ／Ｆ１Ｈ１３は、スイッチ等の入力装置、入力装置を接続するためのコネクタ、ケーブル、入力装置から入力される信号を処理する回路、ドライバ及び制御装置等である。なお、操作Ｉ／Ｆ１Ｈ１３は、ディスプレイ等の出力装置を更に有してもよい。また、操作Ｉ／Ｆ１Ｈ１３は、入力装置と、出力装置とが一体となったいわゆるタッチパネル等でもよい。さらに、操作Ｉ／Ｆ１Ｈ１３は、ＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）等のインタフェースを有し、フラッシュメモリ等の記録媒体を撮影装置１に接続してもよい。これによって、操作Ｉ／Ｆ１Ｈ１３は、撮影装置１から記録媒体にデータを入出力してもよい。 The operation I / F 1H13 is an interface that is connected to an input device such as the switch 1H3 and performs processing for inputting a user operation on the photographing apparatus 1. For example, the operation I / F 1H13 is an input device such as a switch, a connector for connecting the input device, a cable, a circuit that processes a signal input from the input device, a driver, and a control device. The operation I / F 1H13 may further include an output device such as a display. The operation I / F 1H13 may be a so-called touch panel in which an input device and an output device are integrated. Further, the operation I / F 1H 13 may have an interface such as a USB (Universal Serial Bus), and may connect a recording medium such as a flash memory to the photographing apparatus 1. Accordingly, the operation I / F 1H13 may input / output data from the photographing apparatus 1 to the recording medium.

なお、スイッチ１Ｈ３は、シャッタに係る操作以外の操作を行うための電源スイッチ及びパラメータ入力スイッチ等でもよい。 The switch 1H3 may be a power switch, a parameter input switch, or the like for performing an operation other than the operation related to the shutter.

ネットワークＩ／Ｆ１Ｈ１４、無線Ｉ／Ｆ１Ｈ１５及びアンテナ１Ｈ１６は、無線又は有線で、外部装置と撮影装置１を接続させる。例えば、撮影装置１は、ネットワークＩ／Ｆ１Ｈ１４によって、ネットワークに接続し、スマートフォン２へデータを送信する。なお、ネットワークＩ／Ｆ１Ｈ１４、無線Ｉ／Ｆ１Ｈ１５及びアンテナ１Ｈ１６は、ＵＳＢ等の有線で他の外部装置と接続するハードウェアでもよい。すなわちネットワークＩ／Ｆ１Ｈ１４、無線Ｉ／Ｆ１Ｈ１５及びアンテナ１Ｈ１６は、コネクタ及びケーブル等でもよい。 The network I / F 1H14, the wireless I / F 1H15, and the antenna 1H16 connect the external apparatus and the photographing apparatus 1 by wireless or wired. For example, the imaging device 1 is connected to the network by the network I / F 1H 14 and transmits data to the smartphone 2. Note that the network I / F 1H14, the wireless I / F 1H15, and the antenna 1H16 may be hardware that is connected to another external device via a wired connection such as a USB. That is, the network I / F 1H14, the wireless I / F 1H15, and the antenna 1H16 may be connectors and cables.

バス１Ｈ１７は、撮影装置１が有するハードウェア間で、データ等を入出力するのに用いられる。すなわち、バス１Ｈ１７は、いわゆる内部バスである。例えば、バス１Ｈ１７は、ＰＣＩＥｘｐｒｅｓｓ（ＰｅｒｉｐｈｅｒａｌＣｏｍｐｏｎｅｎｔＩｎｔｅｒｃｏｎｎｅｃｔＢｕｓＥｘｐｒｅｓｓ）等である。 The bus 1H17 is used to input / output data and the like between hardware included in the photographing apparatus 1. That is, the bus 1H17 is a so-called internal bus. For example, the bus 1H17 is PCI Express (Peripheral Component Interconnect Bus Express) or the like.

姿勢センサ１Ｈ１８は、撮影装置１の姿勢を検知する。例えば、姿勢センサ１Ｈ１８は、３軸加速度センサ又は角速度センサ等で、複数のセンサの組み合わせでもよい。 The attitude sensor 1H18 detects the attitude of the photographing apparatus 1. For example, the posture sensor 1H18 may be a combination of a plurality of sensors such as a triaxial acceleration sensor or an angular velocity sensor.

なお、撮影装置１は、撮影素子が２つである場合に限られない。例えば、３つ以上の撮影素子を有してもよい。さらに、撮影装置１は、１つの撮影素子の撮影角度を変えて、複数の部分画像を撮影してもよい。 Note that the photographing apparatus 1 is not limited to the case where there are two photographing elements. For example, you may have three or more imaging elements. Further, the photographing apparatus 1 may photograph a plurality of partial images by changing the photographing angle of one photographing element.

なお、撮影装置１が行う処理は、他の装置が行ってもよい。例えば、処理の一部又は全部は、撮影装置１がデータ及びパラメータ等を送信し、スマートフォン２又はネットワークで接続される他の情報処理装置が行ってもよい。また、情報処理システム１０は、複数の情報処理装置を有し、処理を分散、冗長又は並列に行ってもよい。 Note that the processing performed by the photographing apparatus 1 may be performed by another apparatus. For example, part or all of the processing may be performed by the smartphone 2 or another information processing apparatus connected via the network, with the imaging apparatus 1 transmitting data and parameters. The information processing system 10 may include a plurality of information processing apparatuses and perform processing in a distributed, redundant, or parallel manner.

＜情報処理装置のハードウェア構成例＞
図５は、本発明の一実施形態に係る情報処理装置のハードウェア構成の一例を説明するブロック図である。情報処理装置の一例であるスマートフォン２は、補助記憶装置２Ｈ１と、主記憶装置２Ｈ２と、入出力装置２Ｈ３と、状態センサ２Ｈ４と、ＣＰＵ２Ｈ５と、ネットワークＩ／Ｆ２Ｈ６とを有する。さらに、スマートフォン２は、スピーカ２ＨＳ１及びスピーカ２ＨＳ２を有する。 <Example of hardware configuration of information processing apparatus>
FIG. 5 is a block diagram illustrating an example of a hardware configuration of the information processing apparatus according to the embodiment of the present invention. A smartphone 2 that is an example of an information processing apparatus includes an auxiliary storage device 2H1, a main storage device 2H2, an input / output device 2H3, a state sensor 2H4, a CPU 2H5, and a network I / F 2H6. Furthermore, the smartphone 2 includes a speaker 2HS1 and a speaker 2HS2.

また、スマートフォン２が有するハードウェアは、バス２Ｈ７で接続され、バス２Ｈ７を介してデータ又は信号を入出力する。 The hardware included in the smartphone 2 is connected via a bus 2H7, and inputs / outputs data or signals via the bus 2H7.

補助記憶装置２Ｈ１は、データ、パラメータ又はプログラム等を記憶する。具体的には、補助記憶装置２Ｈ１は、例えば、ハードディスク、フラッシュＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）等である。なお、補助記憶装置２Ｈ１が記憶するデータは、ネットワークＩ／Ｆ２Ｈ６で接続されるファイルサーバ等が一部又は全部を冗長又は代わりに記憶してもよい。 The auxiliary storage device 2H1 stores data, parameters, programs, and the like. Specifically, the auxiliary storage device 2H1 is, for example, a hard disk, a flash SSD (Solid State Drive), or the like. The data stored in the auxiliary storage device 2H1 may be partially or entirely stored redundantly or alternatively by a file server or the like connected by the network I / F 2H6.

主記憶装置２Ｈ２は、処理を実行するためのプログラムが使用する記憶領域となる、いわゆるメモリ（Ｍｅｍｏｒｙ）等である。すなわち、主記憶装置２Ｈ２は、データ、プログラム又はパラメータ等を記憶する。例えば、主記憶装置２Ｈ２は、ＳＲＡＭ（ＳｔａｔｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、ＤＲＡＭ等である。なお、主記憶装置２Ｈ２は、記憶及び取出を行う制御装置を更に有してもよい。 The main storage device 2H2 is a so-called memory that becomes a storage area used by a program for executing processing. That is, the main storage device 2H2 stores data, programs, parameters, and the like. For example, the main storage device 2H2 is an SRAM (Static Random Access Memory), a DRAM, or the like. The main storage device 2H2 may further include a control device that performs storage and retrieval.

入出力装置２Ｈ３は、画像又は処理結果等を表示する出力装置及びユーザによる操作を入力する入力装置である。具体的には、入出力装置２Ｈ３は、いわゆるタッチパネル、周辺回路及びドライバ等である。そして、入出力装置２Ｈ３は、例えば、所定のＧＵＩ（ＧｒａｐｈｉｃａｌＵｓｅｒＩｎｔｅｒｆａｃｅ）及び画像処理された画像等をユーザに表示する。一方で、入出力装置２Ｈ３は、例えば、表示されるＧＵＩ又は画像をユーザが操作すると、ユーザによる操作を入力する。 The input / output device 2H3 is an output device that displays an image or a processing result, and an input device that inputs an operation by a user. Specifically, the input / output device 2H3 is a so-called touch panel, a peripheral circuit, a driver, and the like. The input / output device 2H3 displays, for example, a predetermined GUI (Graphical User Interface), an image processed image, and the like to the user. On the other hand, for example, when the user operates a displayed GUI or image, the input / output device 2H3 inputs an operation by the user.

状態センサ２Ｈ４は、スマートフォン２の状態を検出するセンサである。具体的には、状態センサ２Ｈ４は、ジャイロ（ｇｙｒｏ）センサ、３軸加速度センサ等である。例えば、状態センサ２Ｈ４は、スマートフォン２が有する辺のうち、一辺が水平に対して所定の角度以上であるか否かを判定する。すなわち、状態センサ２Ｈ４は、スマートフォン２が縦方向の姿勢の状態であるか横方向の姿勢の状態であるかを検出する。 The state sensor 2H4 is a sensor that detects the state of the smartphone 2. Specifically, the state sensor 2H4 is a gyro sensor, a triaxial acceleration sensor, or the like. For example, the state sensor 2H4 determines whether one side of the sides of the smartphone 2 is greater than or equal to a predetermined angle with respect to the horizontal. That is, the state sensor 2H4 detects whether the smartphone 2 is in a vertical posture state or a horizontal posture state.

ＣＰＵ２Ｈ５は、各処理を実現するための演算及びデータの加工を行う演算装置並びにハードウェアの制御を行う制御装置である。なお、ＣＰＵ２Ｈ５は、並列、冗長又は分散して処理するために、複数のＣＰＵ、デバイス又は複数のコア（ｃｏｒｅ）から構成されてもよい。また、スマートフォン２は、画像処理を行うため、ＧＰＵ（ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）等を内部又は外部に有してもよい。 The CPU 2H5 is a calculation device that performs calculation and data processing for realizing each process, and a control device that controls hardware. The CPU 2H5 may include a plurality of CPUs, devices, or a plurality of cores in order to perform processing in parallel, redundancy, or distribution. The smartphone 2 may have a GPU (Graphics Processing Unit) or the like inside or outside in order to perform image processing.

ネットワークＩ／Ｆ２Ｈ６は、無線又は有線で、ネットワークを介して外部装置と接続する。具体的には、ネットワークＩ／Ｆ２Ｈ６は、データ等を入出力するためのアンテナ、周辺回路及びドライバ等である。例えば、スマートフォン２は、ＣＰＵ２Ｈ５及びネットワークＩ／Ｆ２Ｈ６によって、撮影装置１等から画像データを入力する。一方で、スマートフォン２は、ＣＰＵ２Ｈ５及びネットワークＩ／Ｆ２Ｈ６によって、撮影装置１等へデータ等を出力する。 The network I / F 2H6 is connected to an external device via a network in a wireless or wired manner. Specifically, the network I / F 2H6 includes an antenna, a peripheral circuit, a driver, and the like for inputting and outputting data and the like. For example, the smartphone 2 inputs image data from the imaging device 1 or the like by the CPU 2H5 and the network I / F 2H6. On the other hand, the smartphone 2 outputs data and the like to the photographing apparatus 1 and the like by the CPU 2H5 and the network I / F 2H6.

スピーカ２ＨＳ１及びスピーカ２ＨＳ２は、音声を出力する。スピーカ２ＨＳ１及びスピーカ２ＨＳ２は、ステレオ出力を行う。なお、スピーカ２ＨＳ１及びスピーカ２ＨＳ２には、イヤホン又は外部のスピーカ等の外部装置が接続され、接続された外部装置から音声が出力されてもよい。これらのスピーカによって、スマートフォン２は、複数箇所から音声を出力する。また、スピーカの数は、２個以上、すなわち、音声を出力する箇所は、２箇所以上であるのが望ましい。 The speaker 2HS1 and the speaker 2HS2 output sound. The speaker 2HS1 and the speaker 2HS2 perform stereo output. Note that an external device such as an earphone or an external speaker may be connected to the speaker 2HS1 and the speaker 2HS2, and sound may be output from the connected external device. With these speakers, the smartphone 2 outputs sound from a plurality of locations. Further, it is desirable that the number of speakers is two or more, that is, there are two or more locations where sound is output.

なお、情報処理装置は、スマートフォンに限られない。情報処理装置は、スマートフォン以外のコンピュータでもよい。例えば、情報処理装置は、ＨＭＤ（ＨｅａｄＭｏｕｎｔｅｄＤｉｓｐｌａｙ）、ＰＣ、ＰＤＡ（ＰｅｒｓｏｎａｌＤｉｇｉｔａｌＡｓｓｉｓｔａｎｃｅ）、タブレット、携帯電話器又はこれらの組み合わせ等でもよい。 Note that the information processing apparatus is not limited to a smartphone. The information processing apparatus may be a computer other than a smartphone. For example, the information processing apparatus may be an HMD (Head Mounted Display), a PC, a PDA (Personal Digital Assistance), a tablet, a mobile phone, or a combination thereof.

＜情報処理システムによる全体処理例＞
図６は、本発明の一実施形態に係る情報処理システムによる全体処理の一例を説明するシーケンス図である。 <Example of overall processing by information processing system>
FIG. 6 is a sequence diagram illustrating an example of overall processing by the information processing system according to the embodiment of the present invention.

＜全天球画像の生成例＞（ステップＳ０１）
図６のステップＳ０１では、撮影装置１は、全天球画像を生成する。なお、全天球画像は、例えば、撮影装置１による図７の処理によって、図３（ａ）及び図３（ｂ）の半球画像等から生成される。 <Generation example of omnidirectional image> (step S01)
In step S01 of FIG. 6, the imaging device 1 generates an omnidirectional image. Note that the omnidirectional image is generated from the hemispherical image of FIGS. 3A and 3B by the processing of FIG.

図７は、本発明の一実施形態に係る全天球画像の一例を説明する図である。なお、図７（ａ）は、図３（ａ）の半球画像を光軸に対して水平方向及び垂直方向の入射角が等位となる箇所を線で結んで示す図である。光軸に対して水平方向の入射角を「θ」、光軸に対して垂直方向の入射角を「φ」という。さらに、図７（ｂ）は、図７（ａ）と同様に、図３（ｂ）の半球画像を光軸に対して水平方向及び垂直方向の入射角が等位となる箇所を線で結んで示す図である。 FIG. 7 is a diagram illustrating an example of an omnidirectional image according to an embodiment of the present invention. FIG. 7A is a diagram showing the hemispherical image of FIG. 3A by connecting the places where the incident angles in the horizontal direction and the vertical direction are equal to the optical axis by lines. The incident angle in the horizontal direction with respect to the optical axis is referred to as “θ”, and the incident angle in the direction perpendicular to the optical axis is referred to as “φ”. Further, in FIG. 7B, as in FIG. 7A, the hemispherical image of FIG. 3B is connected by a line at the positions where the incident angles in the horizontal and vertical directions are equal to the optical axis. It is a figure shown by.

また、図７（ｃ）は、メルカトル図法によって処理された画像の一例を説明する図である。具体的には、図７（ｃ）の画像は、図７（ａ）及び図７（ｂ）の画像をあらかじめ生成されるＬＵＴ（ＬｏｏｋＵｐＴａｂｌｅ）等で対応させ、正距円筒図法で生成される画像である。そして、図７（ｃ）の状態となった後、図７（ａ）及び図７（ｂ）のそれぞれの画像を図７（ｄ）に示すように合成すると、全天球画像が生成される。合成処理は、図７（ｃ）に示す状態の半球画像を２つ用いて、全天球画像を生成する処理である。なお、図７（ｄ）の合成処理は、図７（ｃ）の状態の半球画像を単に連続して配置する処理に限られない。例えば、全天球画像の水平方向中心がθ＝１８０°でない場合、合成処理において、撮影装置は、まず、図３（ａ）の半球画像を前処理し、全天球画像の中心に配置する。次に、撮影装置は、生成する画像の左右部分に、図３（ｂ）の半球画像を前処理した画像を左右部分に配置できる大きさに分割し、半球画像を合成して図３（ｃ）の全天球画像を生成してもよい。 Moreover, FIG.7 (c) is a figure explaining an example of the image processed by Mercator projection. Specifically, the image of FIG. 7C is an image generated by equirectangular projection by associating the images of FIGS. 7A and 7B with a LUT (LookUpTable) generated in advance. It is. Then, after the state shown in FIG. 7C, the images of FIGS. 7A and 7B are combined as shown in FIG. 7D to generate an omnidirectional image. . The composition process is a process for generating an omnidirectional image using two hemispherical images in the state shown in FIG. Note that the composition process in FIG. 7D is not limited to a process in which hemispherical images in the state in FIG. For example, when the horizontal center of the omnidirectional image is not θ = 180 °, in the synthesis process, the imaging device first preprocesses the hemispherical image of FIG. 3A and places it at the center of the omnidirectional image. . Next, the imaging apparatus divides the image obtained by pre-processing the hemispherical image of FIG. 3B into a size that can be arranged in the left and right parts, and synthesizes the hemispherical images into the left and right parts of the generated image, and then combines them in FIG. ) Spherical image may be generated.

なお、全天球画像を生成する処理は、正距円筒図法による処理に限られない。例えば、φ方向において、図７（ｂ）の半球画像が有する画素の並びと、図７（ａ）の半球画像が有する画素並びとが、上下が逆であり、かつ、θ方向においてそれぞれの画素の並びが左右逆である天地逆転となる場合がある。この場合、撮影装置は、前処理において、図７（ｂ）の半球画像を図７（ａ）のφ方向及びθ方向の画素の並びと揃えるために、１８０°Ｒｏｌｌ回転させる処理等を行ってもよい。 In addition, the process which produces | generates an omnidirectional image is not restricted to the process by equirectangular projection. For example, in the φ direction, the pixel arrangement of the hemispherical image of FIG. 7B and the pixel arrangement of the hemispherical image of FIG. 7A are upside down, and each pixel in the θ direction is In some cases, the top-and-bottom reversal is reversed. In this case, the imaging apparatus performs a process of rotating 180 ° Roll in order to align the hemispherical image in FIG. 7B with the arrangement of the pixels in the φ direction and the θ direction in FIG. Also good.

また、全天球画像を生成する処理は、図７（ａ）及び図７（ｂ）の半球画像が有するそれぞれの歪曲収差を補正する歪補正処理等が行われてもよい。さらに、全天球画像を生成する処理は、シェーディング補正、ガンマ補正、ホワイトバランス、手振れ補正、オプティカル・ブラック補正処理、欠陥画素補正処理、エッジ強調処理又はリニア補正処理等が行われてもよい。なお、合成処理は、半球画像の撮影範囲と、他方の半球画像の撮影範囲とが重複する場合、重複する撮影範囲に撮影される被写体の画素を利用して補正を行うと、精度良く半球画像を合成することができる。 Moreover, the process which produces | generates an omnidirectional image may perform the distortion correction process etc. which correct | amend each distortion aberration which the hemispherical image of Fig.7 (a) and FIG.7 (b) has. Furthermore, the processing for generating the omnidirectional image may be performed by shading correction, gamma correction, white balance, camera shake correction, optical black correction processing, defective pixel correction processing, edge enhancement processing, linear correction processing, or the like. Note that if the shooting range of the hemisphere image and the shooting range of the other hemisphere image overlap, the compositing process can be performed with high accuracy by performing correction using the pixels of the subject shot in the overlapping shooting range. Can be synthesized.

以上の処理によって、撮影装置１は、撮影される複数の半球画像から全天球画像を生成する。なお、全天球画像は、別の処理によって生成されてもよい。 Through the above processing, the imaging device 1 generates an omnidirectional image from a plurality of hemispheric images to be captured. Note that the omnidirectional image may be generated by another process.

＜全天球画像の送信例＞（ステップＳ０２）
図６のステップＳ０２では、スマートフォン２は、ネットワーク等を介して、ステップＳ０１によって生成される全天球画像を取得する。スマートフォン２が、図７（ｄ）の全天球画像を取得する場合を例に説明する。 <Example of transmission of omnidirectional image> (step S02)
In step S02 in FIG. 6, the smartphone 2 acquires the omnidirectional image generated in step S01 via a network or the like. A case where the smartphone 2 acquires the omnidirectional image of FIG. 7D will be described as an example.

＜全天球パノラマ画像の生成例＞（ステップＳ０３）
図６のステップＳ０３では、スマートフォン２は、ステップＳ０２で取得される全天球画像から全天球パノラマ画像を生成する。 <Generation example of omnidirectional panoramic image> (step S03)
In step S03 of FIG. 6, the smartphone 2 generates an omnidirectional panoramic image from the omnidirectional image acquired in step S02.

図８は、本発明の一実施形態に係る全天球パノラマ画像の一例を説明する図である。例えば、ステップＳ０３では、スマートフォン２は、図７（ｄ）の全天球画像から図８の全天球パノラマ画像を生成する。なお、全天球パノラマ画像は、全天球画像を球形状（３Ｄモデル）に貼り付けた画像である。 FIG. 8 is a diagram illustrating an example of an omnidirectional panoramic image according to an embodiment of the present invention. For example, in step S03, the smartphone 2 generates the omnidirectional panoramic image of FIG. 8 from the omnidirectional image of FIG. The omnidirectional panoramic image is an image obtained by pasting the omnidirectional image into a spherical shape (3D model).

全天球パノラマ画像を生成する処理は、ＯｐｅｎＧＬＥＳ（ＯｐｅｎＧＬ（登録商標）ｆｏｒＥｍｂｅｄｄｅｄＳｙｓｔｅｍｓ）等のＡＰＩ（ＡｐｐｌｉｃａｔｉｏｎＰｒｏｇｒａｍｍｉｎｇＩｎｔｅｒｆａｃｅ）で実現される。具体的には、全天球パノラマ画像は全天球画像が有する画素が三角形に分割される。そして、各三角形の頂点Ｐ（以下「頂点Ｐ」という。）をつなぎ合わせて、ポリゴンとして貼り付けて生成される。 The process of generating the omnidirectional panoramic image is realized by an API (Application Programming Interface) such as OpenGL ES (OpenGL (registered trademark) for Embedded Systems). Specifically, in the omnidirectional panoramic image, the pixels of the omnidirectional image are divided into triangles. Then, the vertices P of the triangles (hereinafter referred to as “vertices P”) are connected and generated as a polygon.

＜全天球パノラマ画像の選択例＞（ステップＳ０４）
図６のステップＳ０４では、スマートフォン２は、ユーザから、全天球パノラマ画像を選択させる操作を入力する。具体的には、ステップＳ０４では、スマートフォン２は、ステップＳ０３で生成される全天球パノラマ画像を縮小した画像として、サムネイル（ｔｈｕｍｂｎａｉｌ）画像形式で複数表示する。 <Example of selecting spherical panoramic image> (Step S04)
In step S04 of FIG. 6, the smartphone 2 inputs an operation for selecting an omnidirectional panoramic image from the user. Specifically, in step S04, the smartphone 2 displays a plurality of thumbnail images in a thumbnail image format as a reduced image of the omnidirectional panoramic image generated in step S03.

例えば、複数の全天球パノラマ画像がスマートフォン２に記憶されている場合、スマートフォン２は、複数の全天球パノラマ画像から、サムネイル画像を一覧で出力する。そして、スマートフォン２に、サムネイル画像の一覧から、１つのサムネイル画像を選択するユーザの操作が入力される。ステップＳ０４で選択された全天球パノラマ画像が処理対象となって、処理が行われる。 For example, when a plurality of omnidirectional panoramic images are stored in the smartphone 2, the smartphone 2 outputs a list of thumbnail images from the plurality of omnidirectional panoramic images. Then, a user operation for selecting one thumbnail image from the list of thumbnail images is input to the smartphone 2. The omnidirectional panoramic image selected in step S04 is processed, and processing is performed.

なお、全天球画像が１種類しかない場合又は設定等によって、ステップＳ０４は、省略されてもよい。また、先に全天球画像をサムネイル画像として一覧で出力させてもよい。そして、サムネイル画像の一覧から１つを選択して、選択された全天球画像に基づいて、ステップＳ０３によって全天球パノラマ画像を生成してもよい。 Note that step S04 may be omitted when there is only one type of omnidirectional image, or depending on settings or the like. Alternatively, the omnidirectional image may be output as a thumbnail image in a list first. Then, one of the thumbnail images may be selected, and an omnidirectional panoramic image may be generated in step S03 based on the selected omnidirectional image.

＜所定領域の設定及び表示画像の出力例＞（ステップＳ０５）
図６のステップＳ０５では、スマートフォン２は、ユーザから、全天球画像が示す範囲（この例では、全方位である。）のうち、画像で出力する領域（以下「所定領域」という。）を設定する。スマートフォン２が、ユーザに出力する画像を「出力画像」という。出力画像は、所定領域を示す画像である。また、所定領域が設定され、最初にスマートフォン２が出力する表示画像を「初期画像」という。 <Setting of Predetermined Area and Output Example of Display Image> (Step S05)
In step S05 of FIG. 6, the smartphone 2 outputs an area (hereinafter referred to as “predetermined area”) to be output as an image from the user in the range indicated by the omnidirectional image (in this example, all directions). Set. An image that the smartphone 2 outputs to the user is referred to as an “output image”. The output image is an image showing a predetermined area. In addition, a display image that is set with a predetermined area and is first output by the smartphone 2 is referred to as an “initial image”.

ステップＳ０５では、スマートフォン２は、初期画像を生成する。 In step S05, the smartphone 2 generates an initial image.

図９は、本発明の一実施形態に係る初期画像の一例を説明するための図である。図９（ａ）は、初期画像の一例としてＸＹＺ軸の３次元座標系を説明する図である。 FIG. 9 is a diagram for explaining an example of an initial image according to an embodiment of the present invention. FIG. 9A is a diagram illustrating an XYZ axis three-dimensional coordinate system as an example of an initial image.

スマートフォン２は、所定領域Ｔを「仮想カメラ３」が撮影する範囲として、仮想カメラ３からの視点で表示画像を生成する。また、仮想カメラの初期位置は、座標系の原点（０，０，０）の位置とする。さらに、全天球パノラマ画像が、立体球ＣＳとして表現される。初期状態では、仮想カメラ３は、立体球ＣＳの全天球パノラマ画像に対して、原点から全天球パノラマ画像を見るユーザの視点に相当する。 The smartphone 2 generates a display image from the viewpoint from the virtual camera 3 with the predetermined area T as a range where the “virtual camera 3” captures. The initial position of the virtual camera is the position of the origin (0, 0, 0) of the coordinate system. Furthermore, the omnidirectional panoramic image is expressed as a solid sphere CS. In the initial state, the virtual camera 3 corresponds to the viewpoint of the user who views the panoramic image from the origin with respect to the panoramic image of the solid sphere CS.

次に、図９（ｂ）は、所定領域Ｔの一例を示す３面図である。初期状態では、原点に仮想カメラ３が位置する。さらに、図９（ｃ）は、所定領域Ｔの一例を投影図である。仮想カメラ３が、所定領域Ｔを立体球ＣＳに投影している。 Next, FIG. 9B is a three-view diagram illustrating an example of the predetermined region T. In the initial state, the virtual camera 3 is located at the origin. Further, FIG. 9C is a projection view of an example of the predetermined region T. The virtual camera 3 projects the predetermined area T onto the solid sphere CS.

また、図９（ｄ）は、所定領域を特定するための位置及び視野角の一例を示す図である。所定領域Ｔは、仮想カメラ３の３次元座標に相当する視点の位置（Ｘ，Ｙ，Ｚ）及び仮想カメラ３の視野角αによって決定される。また、視野角αから所定領域Ｔが定まると、対角線画角２Ｌの中点として所定領域Ｔの中心点ＣＰの２次元座標が定まる。 FIG. 9D is a diagram illustrating an example of a position and a viewing angle for specifying a predetermined region. The predetermined area T is determined by the viewpoint position (X, Y, Z) corresponding to the three-dimensional coordinates of the virtual camera 3 and the viewing angle α of the virtual camera 3. When the predetermined area T is determined from the viewing angle α, the two-dimensional coordinates of the center point CP of the predetermined area T are determined as the midpoint of the diagonal field angle 2L.

次に、仮想カメラ３から中心点ＣＰまでの距離は、下記（１）式で示される。 Next, the distance from the virtual camera 3 to the center point CP is expressed by the following equation (1).

初期設定により、所定領域Ｔが定まる。そして、所定領域Ｔに基づいて、初期画像が生成される。例えば、視点の位置（Ｘ，Ｙ，Ｚ）及び視野角αの初期設定は、（Ｘ，Ｙ，Ｚ，α）＝（０，０，０，３４）等のようにユーザ等によって設定される。 The predetermined area T is determined by the initial setting. Then, based on the predetermined area T, an initial image is generated. For example, the initial setting of the viewpoint position (X, Y, Z) and the viewing angle α is set by a user or the like such as (X, Y, Z, α) = (0, 0, 0, 34). .

そして、画角を変える操作、いわゆるズーム操作が入力されると、スマートフォン２は、ズーム処理を行う。なお、ズーム処理は、ユーザによる操作に基づいて、所定領域を拡大又は縮小させ、変更された所定領域に基づいて、表示画像を生成する処理である。 When an operation for changing the angle of view, a so-called zoom operation, is input, the smartphone 2 performs zoom processing. The zoom process is a process for enlarging or reducing a predetermined area based on an operation by the user and generating a display image based on the changed predetermined area.

ユーザによるズームの操作によって入力される操作量を「変化量ｄｚ」という。まず、ズームの操作が入力されると、スマートフォン２は、変化量ｄｚを取得する。そして、スマートフォン２は、変化量ｄｚに基づいて、下記（２）式を計算する。 The operation amount input by the zoom operation by the user is referred to as “change amount dz”. First, when a zoom operation is input, the smartphone 2 acquires a change amount dz. And the smart phone 2 calculates following (2) Formula based on the variation | change_quantity dz.

なお、上記（２）式における「α」は、図９（ｄ）に示す視野角αである。また、上記（２）式で示す「ｍ」は、ズーム量を調整するための係数であり、あらかじめ設定される値である。さらに、上記（２）式における「α０」は、初期状態における視野角α、いわゆる視野角αの初期値である。 Note that “α” in the above equation (2) is the viewing angle α shown in FIG. Further, “m” shown in the above equation (2) is a coefficient for adjusting the zoom amount, and is a value set in advance. Furthermore, “α0” in the above equation (2) is the initial value of the viewing angle α in the initial state, the so-called viewing angle α.

次に、スマートフォン２は、上記（２）式に基づいて計算される視野角αを投影行列に用いて、所定領域Ｔを決定する。 Next, the smartphone 2 determines the predetermined region T using the viewing angle α calculated based on the above equation (2) as a projection matrix.

なお、変化量ｄｚを入力する操作が行われた後、変化量ｄｚ２となるズームの操作をユーザが更に行うと、スマートフォン２は、下記（３）式を計算する。 Note that after the operation of inputting the change amount dz is performed, when the user further performs a zoom operation that becomes the change amount dz2, the smartphone 2 calculates the following expression (3).

上記（３）式の視野角αは、各操作によって入力されるそれぞれの変化量を合計した値に基づいて計算される。複数の操作が行われても、視野角αの計算から行うことで、スマートフォン２は、一貫した操作性を保つことができる。 The viewing angle α in the above equation (3) is calculated based on the total value of the amounts of change input by each operation. Even if a plurality of operations are performed, the smartphone 2 can maintain a consistent operability by calculating the viewing angle α.

なお、ズーム処理は、上記（２）式又は上記（３）式に基づく処理に限られない。例えば、ズーム処理は、仮想カメラ３の視野角α及び視点位置の変更を組み合わせて実現してもよい。具体的には、以下のようなズーム処理が行われてもよい。 The zoom process is not limited to the process based on the above formula (2) or the above formula (3). For example, the zoom process may be realized by combining the change of the viewing angle α and the viewpoint position of the virtual camera 3. Specifically, the following zoom processing may be performed.

図１０は、本発明の一実施形態に係る別のズーム処理の一例を説明するための図である。図１０に示す立体球ＣＳは、図９に示す立体球ＣＳと同様で、立体球ＣＳの半径を「１」として説明する。 FIG. 10 is a diagram for explaining an example of another zoom process according to an embodiment of the present invention. The solid sphere CS shown in FIG. 10 is the same as the solid sphere CS shown in FIG. 9, and the radius of the solid sphere CS will be described as “1”.

まず、図１０に示す原点は、仮想カメラ３の初期位置である。そして、仮想カメラ３は、光軸を移動して位置を変更する。光軸は、図９（ａ）に示すＺ軸と同様である。仮想カメラ３の移動量ｄは、原点から移動した距離で示す。例えば、仮想カメラ３が原点に位置する初期状態の場合、移動量ｄは「０」となる。 First, the origin shown in FIG. 10 is the initial position of the virtual camera 3. The virtual camera 3 changes the position by moving the optical axis. The optical axis is the same as the Z axis shown in FIG. The moving amount d of the virtual camera 3 is indicated by the distance moved from the origin. For example, in the initial state where the virtual camera 3 is located at the origin, the movement amount d is “0”.

仮想カメラ３の移動量ｄ及び視野角αに基づいて、図９に示す所定領域Ｔとなる範囲を図１０では画角ωで示す。画角ωは、仮想カメラ３が原点に位置する場合、ｄ＝０の画角である。また、ｄ＝０の場合、画角ω及び視野角αは、一致する。 Based on the movement amount d of the virtual camera 3 and the viewing angle α, a range that becomes the predetermined region T shown in FIG. The angle of view ω is an angle of view of d = 0 when the virtual camera 3 is located at the origin. When d = 0, the angle of view ω and the viewing angle α coincide.

一方で、仮想カメラ３が原点から離れ、ｄの値が「０」より大きい場合、画角ω及び視野角αは、異なる範囲となる。そして、別のズーム処理は、画角ωとなる範囲を変更する処理である。 On the other hand, when the virtual camera 3 is away from the origin and the value of d is larger than “0”, the angle of view ω and the viewing angle α are in different ranges. Another zoom process is a process for changing the range of the angle of view ω.

図１１は、本発明の一実施形態に係る別のズーム処理の一例を説明するための表である。なお、説明表４は、画角ωの範囲が６０°乃至３００°の例を示す。スマートフォン２は、ズーム指定値ＺＰに基づいて、視野角α及び仮想カメラ３の移動量ｄのうち、どちらを優先的に変更するかを決定する。 FIG. 11 is a table for explaining an example of another zoom process according to the embodiment of the present invention. The explanatory table 4 shows an example in which the range of the angle of view ω is 60 ° to 300 °. The smartphone 2 determines which of the viewing angle α and the movement amount d of the virtual camera 3 is preferentially changed based on the zoom designation value ZP.

なお、「範囲」は、ズーム指定値ＺＰに基づいて決定する範囲である。また、「出力倍率」は、別のズーム処理によって決定される画像パラメータに基づいて計算された画像の出力倍率である。さらに、「ズーム指定値ＺＰ」は、出力させる画角に対応する値である。 The “range” is a range determined based on the zoom designation value ZP. The “output magnification” is an output magnification of an image calculated based on an image parameter determined by another zoom process. Further, the “zoom designation value ZP” is a value corresponding to the angle of view to be output.

別のズーム処理は、ズーム指定値ＺＰに基づいて移動量ｄ及び視野角αの決定する処理を変更する。具体的には、別のズーム処理の行う処理は、ズーム指定値ＺＰに基づいて、説明表４の４つの方法のいずれかに決定される。ズーム指定値ＺＰの範囲は、「Ａ〜Ｂ」、「Ｂ〜Ｃ」、「Ｃ〜Ｄ」及び「Ｄ〜Ｅ」の４つの範囲に区分される。 In another zoom process, the process for determining the movement amount d and the viewing angle α is changed based on the zoom designation value ZP. Specifically, the process of performing another zoom process is determined as one of the four methods in the explanatory table 4 based on the zoom designation value ZP. The range of the zoom designation value ZP is divided into four ranges “A to B”, “B to C”, “C to D”, and “D to E”.

また、「画角ω」は、別のズーム処理によって決定した画像パラメータに対応する画角ωである。さらに、「変更するパラメータ」は、ズーム指定値ＺＰに基づいて４つの方法でそれぞれ変更するパラメータを説明する記載である。「備考」は、「変更するパラメータ」についての備考である。 The “view angle ω” is the view angle ω corresponding to the image parameter determined by another zoom process. Furthermore, “parameter to be changed” is a description for explaining parameters to be changed by four methods based on the zoom designation value ZP. “Remarks” is a note on “parameters to be changed”.

「ｖｉｅｗＷＨ」は、出力領域の幅又は高さを示す値である。例えば、出力領域が横長の場合、「ｖｉｅｗＷＨ」は、幅の値を示す。一方で、出力領域が縦長の場合、「ｖｉｅｗＷＨ」は、高さの値を示す。すなわち、「ｖｉｅｗＷＨ」は、出力領域の長手方向のサイズを示す値である。 “ViewWH” is a value indicating the width or height of the output area. For example, when the output area is horizontally long, “viewWH” indicates a width value. On the other hand, when the output area is vertically long, “viewWH” indicates a height value. That is, “viewWH” is a value indicating the size of the output area in the longitudinal direction.

「ｉｍｇＷＨ」は、出力画像の幅又は高さを示す値である。例えば、出力領域が横長の場合、「ｖｉｅｗＷＨ」は、出力画像の幅の値を示す。一方で、出力領域が縦長の場合、「ｖｉｅｗＷＨ」は、出力画像の高さの値を示す。すなわち、「ｖｉｅｗＷＨ」は、出力画像の長手方向のサイズを示す値である。 “ImgWH” is a value indicating the width or height of the output image. For example, when the output area is horizontally long, “viewWH” indicates the value of the width of the output image. On the other hand, when the output area is vertically long, “viewWH” indicates the height value of the output image. That is, “viewWH” is a value indicating the size of the output image in the longitudinal direction.

「ｉｍｇＤｅｇ」は、出力画像の表示範囲の角度を示す値である。具体的には、出力画像の幅を示す場合、「ｉｍｇＤｅｇ」は、３６０°である。一方で、出力画像の高さを示す場合、「ｉｍｇＤｅｇ」は、１８０°である。 “ImgDeg” is a value indicating the angle of the display range of the output image. Specifically, when indicating the width of the output image, “imgDeg” is 360 °. On the other hand, when indicating the height of the output image, “imgDeg” is 180 °.

図１２は、本発明の一実施形態に係る別のズーム処理の「範囲」の一例を説明するための図である。別のズーム処理が行われた場合、画像に表示される「範囲」及び画像の例を示す。図１２に示す例を用いて、ズームアウトについて説明する。なお、図１２の各図における左図は、出力される画像の一例を示す。図１２の各図における右図は、出力される際における仮想カメラ３の状態の一例を図１０と同様のモデル図で示す図である。 FIG. 12 is a diagram for explaining an example of a “range” of another zoom process according to an embodiment of the present invention. An example of a “range” displayed on an image and an image when another zoom process is performed is shown. The zoom-out will be described using the example shown in FIG. In addition, the left figure in each figure of FIG. 12 shows an example of the image output. The right diagram in each figure of FIG. 12 is a diagram showing an example of the state of the virtual camera 3 when it is output, in a model diagram similar to FIG.

図１２（ａ）は、説明表４の「範囲」が「Ａ〜Ｂ」となるズーム指定値ＺＰが入力された場合の出力される画像及び「範囲」の例を示す。仮想カメラ３の視野角αは、α＝６０°と固定される。さらに、ズーム指定値ＺＰが「Ａ〜Ｂ」であり、仮想カメラ３の移動量ｄが、視野角αが固定された状態で変更されるとする。視野角αが固定された状態で、仮想カメラ３の移動量ｄが大きくなるように変更する例を説明する。移動量ｄが大きくなる場合、画角ωは広がる。つまり、ズーム指定値ＺＰを「Ａ〜Ｂ」とし、かつ、視野角αを固定し、仮想カメラ３の移動量ｄを大きくすると、ズームアウト処理が実現できる。なお、ズーム指定値ＺＰが「Ａ〜Ｂ」である場合、仮想カメラ３の移動量ｄは、「０」から立体球ＣＳの半径までである。具体的には、立体球ＣＳの半径が「１」であるため、仮想カメラ３の移動量ｄは、「０〜１」の値となる。また、仮想カメラ３の移動量ｄは、ズーム指定値ＺＰに対応する値となる。 FIG. 12A shows an example of an output image and “range” when a zoom designation value ZP in which “range” in the explanatory table 4 is “A to B” is input. The viewing angle α of the virtual camera 3 is fixed as α = 60 °. Furthermore, it is assumed that the zoom designation value ZP is “A to B” and the movement amount d of the virtual camera 3 is changed in a state where the viewing angle α is fixed. An example in which the movement amount d of the virtual camera 3 is changed so as to increase in a state where the viewing angle α is fixed will be described. When the movement amount d increases, the angle of view ω increases. That is, zoom-out processing can be realized by setting the zoom designation value ZP to “A to B”, fixing the viewing angle α, and increasing the movement amount d of the virtual camera 3. When the zoom designation value ZP is “A to B”, the moving amount d of the virtual camera 3 is from “0” to the radius of the solid sphere CS. Specifically, since the radius of the solid sphere CS is “1”, the movement amount d of the virtual camera 3 has a value of “0 to 1”. Further, the moving amount d of the virtual camera 3 is a value corresponding to the zoom designation value ZP.

次に、図１２（ｂ）は、説明表４の「範囲」が「Ｂ〜Ｃ」となるズーム指定値ＺＰが入力した場合の出力される画像及び「範囲」の例を示す。なお、「Ｂ〜Ｃ」は、「Ａ〜Ｂ」よりズーム指定値ＺＰが大きい値である。そして、ズーム指定値ＺＰを「Ｂ〜Ｃ」とし、仮想カメラ３の移動量ｄは、仮想カメラ３が立体球ＣＳの外縁に位置する値に固定されるとする。図１２（ｂ）の仮想カメラ３の移動量ｄは、立体球ＣＳの半径である「１」に固定される。また、ズーム指定値ＺＰが「Ｂ〜Ｃ」であり、仮想カメラ３の移動量ｄが固定された状態で、視野角αが変更されるとする。図１２（ａ）から図１２（ｂ）に示すように、画角ωは、広がる。つまり、ズーム指定値ＺＰを「Ｂ〜Ｃ」とし、かつ、仮想カメラ３の移動量ｄを固定し、視野角αを大きくすると、ズームアウト処理が実現できる。なお、ズーム指定値ＺＰが「Ｂ〜Ｃ」である場合、視野角αは、「ω／２」で計算される。また、ズーム指定値ＺＰが「Ｂ〜Ｃ」である場合、視野角αの範囲は、「Ａ〜Ｂ」である場合に固定される値である「６０°」から、「１２０°」までとなる。 Next, FIG. 12B shows an example of an output image and “range” when a zoom designation value ZP in which “range” in the explanatory table 4 is “B to C” is input. Note that “B to C” is a value for which the zoom designation value ZP is larger than “A to B”. The zoom designation value ZP is “B to C”, and the movement amount d of the virtual camera 3 is fixed to a value at which the virtual camera 3 is positioned on the outer edge of the solid sphere CS. The movement amount d of the virtual camera 3 in FIG. 12B is fixed to “1” which is the radius of the solid sphere CS. Further, it is assumed that the viewing angle α is changed in a state where the zoom designation value ZP is “B to C” and the movement amount d of the virtual camera 3 is fixed. As shown in FIGS. 12A to 12B, the angle of view ω increases. That is, zoom-out processing can be realized by setting the zoom designation value ZP to “B to C”, fixing the movement amount d of the virtual camera 3 and increasing the viewing angle α. When the zoom designation value ZP is “B to C”, the viewing angle α is calculated as “ω / 2”. When the zoom designation value ZP is “B to C”, the viewing angle α ranges from “60 °”, which is a fixed value when “A to B”, to “120 °”. Become.

ズーム指定値ＺＰが「Ａ〜Ｂ」又は「Ｂ〜Ｃ」の場合、画角ωは、ズーム指定値ＺＰと一致する。また、ズーム指定値ＺＰが「Ａ〜Ｂ」及び「Ｂ〜Ｃ」の場合、画角ωは、値が増加する。 When the zoom designation value ZP is “A to B” or “B to C”, the angle of view ω matches the zoom designation value ZP. When the zoom designation value ZP is “A to B” and “B to C”, the angle of view ω increases.

図１２（ｃ）は、説明表４の「範囲」が「Ｃ〜Ｄ」となるズーム指定値ＺＰが入力した場合の出力される画像及び「範囲」の例を示す。なお、「Ｃ〜Ｄ」は、「Ｂ〜Ｃ」よりズーム指定値ＺＰが大きい値である。そして、ズーム指定値ＺＰを「Ｃ〜Ｄ」とし、視野角αは、α＝１２０°と固定されるとする。ズーム指定値ＺＰが「Ｃ〜Ｄ」、仮想カメラ３の移動量ｄが、視野角αが固定された状態で変更される場合、画角ωは広がる。また、仮想カメラ３の移動量ｄは、説明表４のズーム指定値ＺＰに基づく式によって計算される。なお、ズーム指定値ＺＰが「Ｃ〜Ｄ」の場合、仮想カメラ３の移動量ｄは、最大表示距離ｄｍａｘ１まで変更される。最大表示距離ｄｍａｘ１は、スマートフォン２における出力領域で、立体球ＣＳを最大に表示できる距離である。出力領域は、スマートフォン２が画像等を出力する画面のサイズ等である。さらに、最大表示距離ｄｍａｘ１は、図１２（ｄ）に示す状態で、下記（４）式で計算される。 FIG. 12C shows an example of an output image and “range” when a zoom designation value ZP in which “range” in the explanatory table 4 is “C to D” is input. Note that “C to D” is a value for which the zoom designation value ZP is larger than “B to C”. The zoom designation value ZP is “C to D”, and the viewing angle α is fixed to α = 120 °. When the zoom designation value ZP is “C to D” and the movement amount d of the virtual camera 3 is changed while the viewing angle α is fixed, the angle of view ω is widened. The movement amount d of the virtual camera 3 is calculated by an expression based on the zoom designation value ZP in the explanatory table 4. When the zoom designation value ZP is “C to D”, the moving amount d of the virtual camera 3 is changed to the maximum display distance dmax1. The maximum display distance dmax1 is an output area in the smartphone 2 and is a distance at which the solid sphere CS can be displayed to the maximum. The output area is the size of the screen on which the smartphone 2 outputs an image or the like. Further, the maximum display distance dmax1 is calculated by the following equation (4) in the state shown in FIG.

なお、上記（４）式の「ｖｉｅｗＷ」、「ｖｉｅｗＨ」は、それぞれスマートフォン２における出力領域の幅、高さを示す値である。最大表示距離ｄｍａｘ１は、スマートフォン２における出力領域、すなわち、「ｖｉｅｗＷ」及び「ｖｉｅｗＨ」の値等に基づいて計算される。 Note that “viewW” and “viewH” in the above formula (4) are values indicating the width and height of the output area in the smartphone 2, respectively. The maximum display distance dmax1 is calculated based on the output area in the smartphone 2, that is, the values of “viewW” and “viewH”.

図１２（ｄ）は、説明表４の「範囲」が「Ｄ〜Ｅ」となるズーム指定値ＺＰが入力した場合の出力される画像及び「範囲」の例を示す。なお、「Ｄ〜Ｅ」は、「Ｃ〜Ｄ」よりズーム指定値ＺＰが大きい値である。そして、ズーム指定値ＺＰを「Ｄ〜Ｅ」とし、視野角αは、α＝１２０°と固定されるとする。図１２（ｄ）に示すように、ズーム指定値ＺＰが「Ｃ〜Ｄ」であり、仮想カメラ３の移動量ｄが、視野角αが固定された状態で変更されるとする。また、仮想カメラ３の移動量ｄは、限界表示距離ｄｍａｘ２まで変更される。なお、限界表示距離ｄｍａｘ２は、スマートフォン２における出力領域で、立体球ＣＳが内接して表示される距離である。具体的には、限界表示距離ｄｍａｘ２は、下記（５）式で計算される。なお、限界表示距離ｄｍａｘ２は、図１２（ｅ）に示す状態である。 FIG. 12D shows an example of an output image and “range” when a zoom designation value ZP in which “range” in the explanatory table 4 is “D to E” is input. Note that “D to E” is a value for which the zoom designation value ZP is larger than “C to D”. The zoom designation value ZP is “D to E”, and the viewing angle α is fixed to α = 120 °. As shown in FIG. 12D, it is assumed that the zoom designation value ZP is “C to D” and the movement amount d of the virtual camera 3 is changed in a state where the viewing angle α is fixed. Further, the moving amount d of the virtual camera 3 is changed to the limit display distance dmax2. The limit display distance dmax2 is a distance where the solid sphere CS is inscribed in the output area of the smartphone 2. Specifically, the limit display distance dmax2 is calculated by the following equation (5). The limit display distance dmax2 is the state shown in FIG.

上記（５）式の限界表示距離ｄｍａｘ２は、スマートフォン２における出力領域である「ｖｉｅｗＷ」及び「ｖｉｅｗＨ」の値に基づいて計算される。また、限界表示距離ｄｍａｘ２は、スマートフォン２が出力できる最大の範囲で、仮想カメラ３の移動量ｄを大きくできる限界の値を示す。そして、スマートフォン２は、ズーム指定値ＺＰが説明表４の範囲に収まる値、すなわち、仮想カメラ３の移動量ｄの値が限界表示距離ｄｍａｘ２以下となるように、入力される値を制限してもよい。この制限によって、スマートフォン２は、出力領域である画面に出力画像をフィットさせた状態又は所定の出力倍率で画像をユーザに出力できる状態となり、ズームアウトを実現できる。そして、「Ｄ〜Ｅ」の処理によって、スマートフォン２は、ユーザに出力されている画像が全天球パノラマであることを認識させることができる。 The limit display distance dmax2 in the above equation (5) is calculated based on the values of “viewW” and “viewH” that are output areas in the smartphone 2. The limit display distance dmax2 indicates a limit value that can increase the moving amount d of the virtual camera 3 in the maximum range that the smartphone 2 can output. The smartphone 2 limits the input value so that the zoom designation value ZP falls within the range of the explanatory table 4, that is, the value of the movement amount d of the virtual camera 3 is equal to or less than the limit display distance dmax2. Also good. Due to this limitation, the smartphone 2 is in a state where the output image is fitted to the screen which is the output area, or in a state where the image can be output to the user at a predetermined output magnification, and zoom-out can be realized. And the process of "DE" can make the smart phone 2 recognize that the image currently output to the user is a spherical panorama.

なお、ズーム指定値ＺＰが「Ｃ〜Ｄ」又は「Ｄ〜Ｅ」の場合、画角ωは、ズーム指定値ＺＰと異なる値となる。また、説明表４及び図１２で示す各範囲間では、画角ωは、連続しているが、広角側へのズームアウトによって、画角ωは、一様に増加しなくともよい。例えば、ズーム指定値ＺＰが「Ｃ〜Ｄ」の場合、画角ωは、仮想カメラ３の移動量ｄに伴い、増加する。一方で、ズーム指定値ＺＰが「Ｄ〜Ｅ」の場合、画角ωは、仮想カメラ３の移動量ｄに伴い、減少する。なお、この減少は、立体球ＣＳが有する外側の領域が写り込むためである。ズーム指定値ＺＰが２４０°以上の広視野域を指定する場合、スマートフォン２は、仮想カメラ３の移動量ｄを変更することによって、ユーザに違和感の少ない画像を出力し、かつ、画角ωを変化させることができる。 When the zoom designation value ZP is “C to D” or “D to E”, the angle of view ω is different from the zoom designation value ZP. In addition, the angle of view ω is continuous between the ranges shown in the explanatory table 4 and FIG. 12, but the angle of view ω does not need to increase uniformly by zooming out to the wide angle side. For example, when the zoom designation value ZP is “C to D”, the angle of view ω increases with the movement amount d of the virtual camera 3. On the other hand, when the zoom designation value ZP is “D to E”, the angle of view ω decreases with the movement amount d of the virtual camera 3. This decrease is because the outer area of the solid sphere CS is reflected. When designating a wide field of view where the zoom designation value ZP is 240 ° or more, the smartphone 2 outputs an image with less discomfort to the user by changing the movement amount d of the virtual camera 3, and sets the angle of view ω. Can be changed.

また、ズーム指定値ＺＰが広角方向に変更されると、画角ωは、広くなる場合が多い。画角ωが広くなる場合、スマートフォン２は、仮想カメラ３の視野角αを固定し、仮想カメラ３の移動量ｄを大きくする。スマートフォン２は、仮想カメラ３の視野角αを固定することによって、仮想カメラ３の視野角αの増加を少なくし、歪みの少ない画像を出力できる。 In addition, when the zoom designation value ZP is changed in the wide-angle direction, the angle of view ω is often widened. When the angle of view ω is wide, the smartphone 2 fixes the viewing angle α of the virtual camera 3 and increases the movement amount d of the virtual camera 3. By fixing the viewing angle α of the virtual camera 3, the smartphone 2 can reduce an increase in the viewing angle α of the virtual camera 3 and output an image with less distortion.

仮想カメラ３の視野角αを固定し、スマートフォン２が、仮想カメラ３の移動量ｄを大きくする、すなわち、仮想カメラ３を遠ざける方向に動かす場合、スマートフォン２は、広角表示の開放感をユーザに与えることができる。また、仮想カメラ３を遠ざける方向に動かす場合、人間が広範囲を確認する際の動きと類似であるため、スマートフォン２は、違和感の少ないズームアウトを実現できる。 When the viewing angle α of the virtual camera 3 is fixed and the smartphone 2 increases the movement amount d of the virtual camera 3, that is, moves the virtual camera 3 away from the smartphone 2, the smartphone 2 provides the user with a wide-angle display feeling of opening. Can be given. In addition, when moving the virtual camera 3 away, it is similar to the movement when a human confirms a wide range, and thus the smartphone 2 can realize zoom-out with little discomfort.

ズーム指定値ＺＰが「Ｄ〜Ｅ」の場合、画角ωは、ズーム指定値ＺＰが広角方向に変更するに伴い、減少する。画角ωを減少させることで、スマートフォン２は、ユーザに立体球ＣＳから遠ざかっていく感覚を与えることができ、違和感の少ない画像を出力できる。 When the zoom designation value ZP is “D to E”, the angle of view ω decreases as the zoom designation value ZP changes in the wide-angle direction. By reducing the angle of view ω, the smartphone 2 can give the user a sense of moving away from the solid sphere CS, and can output an image with less discomfort.

図１１に示す説明表４の別のズーム処理によって、スマートフォン２は、ユーザに違和感の少ない画像を出力できる。 By another zoom process of the explanatory table 4 illustrated in FIG. 11, the smartphone 2 can output an image with less discomfort to the user.

なお、スマートフォン２は、説明表４で説明する仮想カメラ３の移動量ｄ又は視野角αのみに、変更する場合に限られない。すなわち、スマートフォン２は、説明表４において、優先的に仮想カメラ３の移動量ｄ又は視野角αを変更する形態であればよく、調整のため、固定となる値を十分小さい値変更してもよい。また、スマートフォン２は、ズームアウトを行うに限られない。スマートフォン２は、ズームインを行ってもよい。 Note that the smartphone 2 is not limited to changing only the movement amount d or the viewing angle α of the virtual camera 3 described in the explanatory table 4. That is, the smartphone 2 only needs to change the movement amount d or the viewing angle α of the virtual camera 3 preferentially in the explanatory table 4, and even if the fixed value is changed to a sufficiently small value for adjustment. Good. The smartphone 2 is not limited to zooming out. The smartphone 2 may zoom in.

＜複数箇所での音声入力例＞（ステップＳ０６）
図６のステップＳ０６では、撮影装置１は、複数箇所のマイクロフォンで音声を入力する。例えば、撮影装置１のマイクロフォン１ＨＭ１、１ＨＭ２、１ＨＭ３及び１ＨＭ４の４箇所で音声が入力される。望ましくは、撮影装置１は、複数箇所で入力されたる音声を処理して、アンビソニックス（Ａｍｂｉｓｏｎｉｃｓ）のＢフォーマット等のように、各入力音声と、入力音声が発生した方向とが関連付けされるデータ（以下「音声入力データ」という。）を生成する。すなわち、アンビソニックスのＢフォーマット等で音声入力データが生成されると、スマートフォン２等は、音声入力データを参照すると、各入力音声が発生した方向がわかる。 <Example of voice input at multiple locations> (step S06)
In step S06 in FIG. 6, the photographing apparatus 1 inputs sound with a plurality of microphones. For example, voices are input at four locations of the microphones 1HM1, 1HM2, 1HM3, and 1HM4 of the photographing apparatus 1. Desirably, the imaging device 1 processes audio input at a plurality of locations, and associates each input audio with the direction in which the input audio is generated, such as the Ambisonics B format. (Hereinafter referred to as “voice input data”). That is, when the voice input data is generated in the Ambisonics B format or the like, the smartphone 2 or the like can know the direction in which each input voice is generated by referring to the voice input data.

＜音声入力データの送信例＞（ステップＳ０７）
図６のステップＳ０７では、撮影装置１は、音声入力データをスマートフォン２に送信する。以降、スマートフォン２が各処理を行う例で説明する。 <Transmission example of voice input data> (step S07)
In step S <b> 07 of FIG. 6, the imaging device 1 transmits voice input data to the smartphone 2. Hereinafter, an example in which the smartphone 2 performs each process will be described.

＜音声入力データを変換して仮想スピーカデータを生成する例＞（ステップＳ０８）
図６のステップＳ０８では、スマートフォン２は、音声入力データを変換して、仮想的に配置される複数の仮想スピーカに、音声を出力させるためのデータ（以下「仮想スピーカデータ」という。）を生成する。 <Example of Converting Audio Input Data to Generate Virtual Speaker Data> (Step S08)
In step S08 in FIG. 6, the smartphone 2 converts the voice input data and generates data (hereinafter referred to as “virtual speaker data”) for outputting voice to a plurality of virtual speakers that are virtually arranged. To do.

図１３は、本発明の一実施形態に係る仮想スピーカの配置例を示す模式図である。視点の位置（Ｘ，Ｙ，Ｚ）に、聞き手となるユーザＵＲがいる。また、ユーザＵＲは、図９等に示す仮想カメラ３と同じ概念である。 FIG. 13 is a schematic diagram illustrating an arrangement example of virtual speakers according to an embodiment of the present invention. There is a user UR who becomes a listener at the viewpoint position (X, Y, Z). The user UR has the same concept as the virtual camera 3 shown in FIG.

そして、視点であるユーザＵＲを中心として、複数の箇所に、１個ずつ仮想的にスピーカが配置されているものとして処理が行われる。なお、図９に配置されるスピーカが仮想スピーカであるが、実際に設置される装置ではなく、仮想的にユーザＵＲの周辺に配置されるものとして処理が行われる。 Then, the processing is performed assuming that speakers are virtually arranged one by one at a plurality of locations around the user UR as the viewpoint. Although the speaker arranged in FIG. 9 is a virtual speaker, the processing is performed on the assumption that the speaker is virtually arranged around the user UR, not an actually installed apparatus.

撮影装置１が水平状態でされる場合、仮想スピーカはユーザＵＲの前後左右４箇所に配置される。なお、撮影装置が水平状態以外で使用される場合、上下方向（Ｙ軸方向）を考慮して、更に上下に１個ずつ仮想スピーカが追加されてもよい。つまり、仮想スピーカは、６箇所に配置される。仮想スピーカが、ユーザＵＲの前後左右４箇所に配置される４ｃｈ（チャンネル）の例を説明する。 When the photographing apparatus 1 is in a horizontal state, the virtual speakers are arranged at four positions on the front, back, left, and right of the user UR. When the photographing apparatus is used in a state other than the horizontal state, one virtual speaker may be added at the top and bottom in consideration of the vertical direction (Y-axis direction). That is, the virtual speakers are arranged at six locations. An example of 4 channels (channels) in which virtual speakers are arranged at four positions on the front, back, left, and right of the user UR will be described.

具体的には、ユーザＵＲの前方、右手方向、左手方向、及び後方には、それぞれ仮想スピーカＶＳＦ、ＶＳＲ、ＶＳＬ、及びＶＳＢが配置される。この配置によって、ユーザＵＲの前後左右から音声が出力され、情報処理システムは、臨場感のある出力音声を出力することができる。 Specifically, virtual speakers VSF, VSR, VSL, and VSB are respectively arranged in front, right hand direction, left hand direction, and rear of the user UR. With this arrangement, sound is output from the front, back, left and right of the user UR, and the information processing system can output realistic output sound.

４ｃｈの仮想スピーカから音声を出力する方法は、"西村竜一（２０１４），５，アンビソニックス（＜特集＞立体音響技術）ＴｈｅＪｏｕｒｎａｌｏｆｔｈｅＩｎｓｔｉｔｕｔｅｏｆＩｍａｇｅＩｎｆｏｒｍａｔｉｏｎａｎｄＴｅｌｅｖｉｓｉｏｎＥｎｇｉｎｅｅｒｓ，６８（８），６１６―６２０，ｈｔｔｐ:／／ｃｉ.ｎｉｉ.ａｃ.ｊｐ／ｎａｉｄ／１１０００９８４４０５１"に記載される方法等である。 The method of outputting audio from a 4ch virtual speaker is described in “Ryuichi Nishimura (2014), 5, Ambisonics (<Special Issue> Stereophonic Technology) The Journal of the Institute of Television Information and Television Engineers, 68 (8). 620, http://ci.ni.ac.jp/naid/110009844051 ".

具体的には、ＦｉｒｓｔｏｒｄｅｒＡｍｂｉｓｏｎｉｃｓでは、仮想スピーカＶＳＬ、ＶＳＲ、ＶＳＦ及びＶＳＢから出力される音声として、仮想スピーカデータの例である（Ｌ，Ｒ，Ｆ，Ｂ）は、下記（６）式のように計算される。

Ｌ＝Ｗ＋ｋ１×Ｙ
Ｒ＝Ｗ−ｋ１×Ｙ
Ｆ＝Ｗ＋ｋ１×Ｘ
Ｂ＝Ｗ−ｋ１×Ｘ・・・（６）

なお、上記（６）式の「ｋ１」は、あらかじめ設定される係数で、「Ｘ」、「Ｙ」及び「Ｗ」は、アンビソニックスのＢフォーマットが示す４つのデータ（Ｘ，Ｙ，Ｚ，Ｗ）のうちの３つである。なお、上記（６）式のＸ，Ｙ，Ｚは、視点の位置（Ｘ，Ｙ，Ｚ）とは異なるデータである。 Specifically, First order Ambisonics is an example of virtual speaker data (L, R, F, B) as the sound output from the virtual speakers VSL, VSR, VSF, and VSB. Is calculated as follows.

L = W + k1 × Y
R = W−k1 × Y
F = W + k1 × X
B = W−k1 × X (6)

In the above equation (6), “k1” is a coefficient set in advance, and “X”, “Y”, and “W” are four data (X, Y, Z, and W) indicated by the B format of Ambisonics. W). Note that X, Y, and Z in the above equation (6) are data different from the viewpoint position (X, Y, Z).

すなわち、アンビソニックスのＢフォーマット形式等の音声入力データは、上記（６）式等によって、仮想スピーカデータに変換される。 That is, audio input data such as the Ambisonics B format is converted into virtual speaker data according to the above equation (6).

図１４は、本発明の一実施形態に係る仮想スピーカの配置を変更した第１例を示す模式図である。図１３と比較すると、図１４は、ユーザＵＲが、原点を中心として左方向ＲＬに、９０°回転している。仮想スピーカＶＳＦも、ユーザＵＲの回転に合わせて、左方向ＲＬに、９０°回転した位置に配置され、回転した後のユーザＵＲの前方に、配置される。同様に、仮想スピーカＶＳＲ、ＶＳＬ、及びＶＳＢは、回転した後のユーザＵＲの右手方向、左手方向及び後方に配置される。例えば、図１３で仮想スピーカＶＳＦから出力された音声が、図１４では仮想スピーカＶＳＲから出力される。 FIG. 14 is a schematic diagram illustrating a first example in which the arrangement of virtual speakers according to an embodiment of the present invention is changed. Compared to FIG. 13, in FIG. 14, the user UR is rotated 90 ° in the left direction RL with the origin at the center. The virtual speaker VSF is also arranged at a position rotated 90 ° in the left direction RL in accordance with the rotation of the user UR, and is arranged in front of the user UR after the rotation. Similarly, the virtual speakers VSR, VSL, and VSB are arranged in the right hand direction, the left hand direction, and the rear side of the user UR after the rotation. For example, the sound output from the virtual speaker VSF in FIG. 13 is output from the virtual speaker VSR in FIG.

ほかにも、ズームイン又はズームアウトが行われた場合、ズーム処理に合わせて、音声が出力されてもよい。 In addition, when zooming in or zooming out is performed, audio may be output in accordance with zoom processing.

図１５は、本発明の一実施形態に係る仮想スピーカの配置の第２例を示す模式図である。図１３と比較すると、図１４は、ユーザＵＲが、原点を中心として光軸方向（Ｚ軸）に後方ＳＺに移動している。したがって、図１４の視野の位置（Ｘ，Ｙ，Ｚ）は、Ｚの座標値のみが、図１３から変更される。 FIG. 15 is a schematic diagram illustrating a second example of the arrangement of virtual speakers according to an embodiment of the present invention. Compared to FIG. 13, in FIG. 14, the user UR moves to the rear SZ in the optical axis direction (Z axis) with the origin at the center. Accordingly, the position (X, Y, Z) of the visual field in FIG. 14 is changed from FIG. 13 only in the coordinate value of Z.

また、図１３に示す場合の初期状態を図１２（ａ）に対応する状態とすると、図１５に示す変更後の状態は、例えば、図１２（ｂ）又は図１２（ｃ）等に対応する状態である。 If the initial state in the case shown in FIG. 13 is a state corresponding to FIG. 12A, the changed state shown in FIG. 15 corresponds to, for example, FIG. 12B or FIG. State.

所定領域Ｔが変更される場合、視野角αが一定であっても、所定領域Ｔとなる範囲が広くなり、画角はω＜ω２である。したがって、表示画像は、画角ωの設定より広い範囲を示す画像となる。 When the predetermined area T is changed, even if the viewing angle α is constant, the range of the predetermined area T is widened, and the angle of view is ω <ω2. Therefore, the display image is an image showing a wider range than the setting of the angle of view ω.

そして、仮想スピーカＶＳＦ、ＶＳＲ、ＶＳＬ及びＶＳＢの配置は、変更の前後で同じ位置とする。そして、情報処理システム１０は、仮想スピーカＶＳＦから出力される音声の音量を小さくし、仮想スピーカＶＳＲから出力される音声の音量を大きくする。なお、情報処理システム１０は、仮想スピーカＶＳＢ及びＶＳＬから出力される音声の音量も変更してよい。 The virtual speakers VSF, VSR, VSL, and VSB are arranged at the same position before and after the change. Then, the information processing system 10 decreases the volume of the sound output from the virtual speaker VSF and increases the volume of the sound output from the virtual speaker VSR. Note that the information processing system 10 may also change the volume of sound output from the virtual speakers VSB and VSL.

この場合、ユーザＵＲは、後方で発生する音声が大きく聞こえ、前方で発生する音声が小さく聞こえる。したがって、情報処理システム１０は、実際に後方ＳＺでユーザＵＲが移動した場合と同じような音声を出力させることができ、臨場感を高めることができる。 In this case, the user UR can hear a large amount of sound generated in the rear and a small amount of sound generated in the front. Therefore, the information processing system 10 can output the same sound as when the user UR actually moves in the backward SZ, and can enhance the sense of reality.

また、情報処理システム１０は、所定領域Ｔに基づいて、出力させる音声を限定してもよい。すなわち、情報処理システム１０は、所定領域Ｔの範囲に該当する方向から入力された音声を出力するようにしてもよい。この場合、情報処理システム１０は、表示画像に写る被写体に合わせた音声を出力させることができ、臨場感を高めることができる。 Further, the information processing system 10 may limit the sound to be output based on the predetermined area T. That is, the information processing system 10 may output a sound input from a direction corresponding to the range of the predetermined region T. In this case, the information processing system 10 can output a sound according to the subject shown in the display image, and can enhance the sense of reality.

図１６は、本発明の一実施形態に係る仮想スピーカの配置の第３例を示す模式図である。図１６に示す所定領域Ｔの変更は、図１５に示す変更後、図１４に示す変更がされた場合の例である。所定領域Ｔの変更は、回転及び位置の変更を組み合わせた変更でもよい。図１６は、図１４と同様に、仮想スピーカの配置が変更され、かつ、図１５と同様に、仮想スピーカから出力される音声の音量が変更される。 FIG. 16 is a schematic diagram illustrating a third example of the arrangement of virtual speakers according to an embodiment of the present invention. The change of the predetermined area T shown in FIG. 16 is an example when the change shown in FIG. 14 is made after the change shown in FIG. The change of the predetermined region T may be a change combining rotation and position change. In FIG. 16, the arrangement of the virtual speakers is changed as in FIG. 14, and the volume of the sound output from the virtual speakers is changed as in FIG.

図１７は、本発明の一実施形態に係る仮想スピーカの配置の第４例を示す模式図である。図１３と比較すると、図１７は、ユーザＵＲが、原点を中心として光軸方向（Ｚ軸）の後方ＳＺに第１距離ＳＺ１移動した点が異なる。したがって、図１６の視野の位置（Ｘ，Ｙ，Ｚ）は、Ｚの座標値のみが、図１３から変更される。 FIG. 17 is a schematic diagram illustrating a fourth example of the arrangement of virtual speakers according to an embodiment of the present invention. FIG. 17 is different from FIG. 13 in that the user UR moves the first distance SZ1 to the rear SZ in the optical axis direction (Z axis) with the origin at the center. Accordingly, the position (X, Y, Z) of the visual field in FIG. 16 is changed from FIG. 13 only in the coordinate value of Z.

また、図１３に示す場合の初期状態を図１２（ａ）に対応する状態とすると、図１５に示す変更後の状態は、例えば、図１２（ｄ）等の状態である。 If the initial state shown in FIG. 13 is a state corresponding to FIG. 12A, the changed state shown in FIG. 15 is, for example, the state shown in FIG.

そして、仮想スピーカＶＳＦ、ＶＳＲ、ＶＳＬ及びＶＳＢの配置は、変更の前後で同じ位置とする。ただし、ユーザＵＲの前方に、仮想スピーカＶＳＦ、ＶＳＲ、ＶＳＬ及びＶＳＢは、配置される。 The virtual speakers VSF, VSR, VSL, and VSB are arranged at the same position before and after the change. However, the virtual speakers VSF, VSR, VSL, and VSB are arranged in front of the user UR.

この場合、仮想スピーカがユーザＵＲの前方に配置されるため、ユーザＵＲは、前方から音声が聞こえる。したがって、情報処理システム１０は、実際に後方へ第１距離ＳＺ１分、ユーザＵＲが移動した場合と同じような音声を出力させることができ、臨場感を出すことができる。 In this case, since the virtual speaker is arranged in front of the user UR, the user UR can hear sound from the front. Therefore, the information processing system 10 can output the same sound as when the user UR has actually moved backward by the first distance SZ1 and can give a sense of realism.

図１８は、本発明の一実施形態に係る仮想スピーカの配置の第５例を示す模式図である。図１７に示す例と比較すると、図１８は、ユーザＵＲが、原点を中心として、光軸方向（Ｚ軸）に、第１距離ＳＺ１の位置から、後方ＳＺとなる第２距離ＳＺ２まで移動している。したがって、図１８の視野の位置（Ｘ，Ｙ，Ｚ）は、Ｚの座標値のみが、図１３から変更される。 FIG. 18 is a schematic diagram illustrating a fifth example of the arrangement of virtual speakers according to an embodiment of the present invention. Compared with the example shown in FIG. 17, FIG. 18 shows that the user UR moves from the position of the first distance SZ1 to the second distance SZ2 that is the rear SZ in the optical axis direction (Z axis) with the origin at the center. ing. Accordingly, the position (X, Y, Z) of the visual field in FIG. 18 is changed from FIG.

また、図１３に示す場合の初期状態を図１２（ａ）に対応する状態とすると、図１５に示す変更後の状態は、例えば、図１２（ｅ）等の状態である。
そして、仮想スピーカＶＳＦ、ＶＳＲ、ＶＳＬ及びＶＳＢの配置は、例えば、変更の前後で同じ位置とする。ただし、ユーザＵＲの前方に、仮想スピーカＶＳＦ、ＶＳＲ、ＶＳＬ及びＶＳＢは、配置される。 If the initial state shown in FIG. 13 is a state corresponding to FIG. 12A, the changed state shown in FIG. 15 is, for example, the state shown in FIG.
And the arrangement | positioning of virtual speaker VSF, VSR, VSL, and VSB is made into the same position before and behind a change, for example. However, the virtual speakers VSF, VSR, VSL, and VSB are arranged in front of the user UR.

情報処理システム１０は、所定領域Ｔ、すなわち、視点の位置（Ｘ，Ｙ，Ｚ）及び視野角αに基づいて、音声入力データを仮想スピーカに音声を出力させる仮想スピーカデータを上記（６）式等によって生成する。 Based on the predetermined region T, that is, the position of the viewpoint (X, Y, Z) and the viewing angle α, the information processing system 10 uses the above-mentioned formula (6) as virtual speaker data for outputting audio input data to the virtual speaker. And so on.

＜仮想スピーカデータを変換して音声出力データを生成する例＞（ステップＳ０９）
図６のステップＳ０９では、スマートフォン２は、仮想スピーカデータを変換して音声出力データを生成する。例えば、上記（６）式によって生成される仮想スピーカデータ（Ｌ，Ｒ，Ｆ，Ｂ）は、下記（７）式のように変換され、音声出力データ（Ｌ２，Ｒ２）となる。

Ｌ２＝ｋ４×（Ｌ＋ｋ２×Ｆ＋ｋ３×Ｂ）
Ｒ２＝ｋ４×（Ｒ＋ｋ２×Ｆ＋ｋ３×Ｂ）・・・（７）

上記（７）式では、「ｋ２」、「ｋ３」及び「ｋ４」は、視点の位置（Ｘ，Ｙ，Ｚ）に基づいて定まる係数である。 <Example of Converting Virtual Speaker Data to Generate Audio Output Data> (Step S09)
In step S09 in FIG. 6, the smartphone 2 converts the virtual speaker data to generate audio output data. For example, the virtual speaker data (L, R, F, B) generated by the above equation (6) is converted as shown in the following equation (7) and becomes audio output data (L2, R2).

L2 = k4 × (L + k2 × F + k3 × B)
R2 = k4 × (R + k2 × F + k3 × B) (7)

In the above equation (7), “k2,” “k3,” and “k4” are coefficients determined based on the position (X, Y, Z) of the viewpoint.

なお、仮想スピーカデータ（Ｌ，Ｒ，Ｆ，Ｂ）は、音響伝達関数等によって変換されてもよい。 Note that the virtual speaker data (L, R, F, B) may be converted by an acoustic transfer function or the like.

例えば、スピーカ２ＨＳ１及びスピーカ２ＨＳ２がイヤホン又はヘッドホン等に音声を出力する場合、仮想スピーカデータ（Ｌ，Ｒ，Ｆ，Ｂ）は、頭部伝達関数（ＨＲＴＦ（Ｈｅａｄ−ＲｅｌａｔｅｄＴｒａｎｓｆｅｒＦｕｎｃｔｉｏｎ））等によって変換されるのが望ましい。 For example, when the speaker 2HS1 and the speaker 2HS2 output sound to earphones or headphones, the virtual speaker data (L, R, F, B) is converted by a head-related transfer function (HRTF (Head-Related Transfer Function)) or the like. It is desirable to be done.

一方で、スピーカ２ＨＳ１及びスピーカ２ＨＳ２が備え付けのスピーカ等に音声を出力する場合、仮想スピーカデータ（Ｌ，Ｒ，Ｆ，Ｂ）は、室伝達関数（ＲＴＦ（ＲｏｏｍＴｒａｎｓｆｅｒＦｕｎｃｔｉｏｎ））等によって変換されるのが望ましい。 On the other hand, when audio is output to the speakers 2HS1 and 2HS2, the virtual speaker data (L, R, F, B) is converted by a room transfer function (RTF (Room Transfer Function)) or the like. Is desirable.

音声は、壁による反射又は障害物による回折等の影響を受けて変形する場合がある。そこで、室伝達関数を用いると、情報処理システム１０は、出力音声に、反射等による変形の特性を反映できる。 The sound may be deformed under the influence of reflection by a wall or diffraction by an obstacle. Therefore, when the room transfer function is used, the information processing system 10 can reflect the characteristics of deformation due to reflection or the like in the output sound.

また、上記（７）式等において、「Ｆ」と「Ｂ」との間、「Ｌ」と「Ｒ」との間又はこれらの間両方に、位相差をつけるため、遅延が設定されてもよい。遅延が設定されると、距離感が強調できる。 Further, in the above equation (7) and the like, even if a delay is set in order to give a phase difference between “F” and “B”, between “L” and “R”, or both of them, Good. When the delay is set, the sense of distance can be emphasized.

＜音声出力データに基づく出力音声の出力例＞（ステップＳ１０）
図６のステップＳ１０では、スマートフォン２は、音声出力データに基づいて、出力音声を出力する。具体的には、上記（７）等によって、音声出力データ（Ｌ２，Ｒ２）が生成されると、スマートフォン２は、音声出力データ（Ｌ２，Ｒ２）が示す各音声をスピーカ２ＨＳ１及びスピーカ２ＨＳ２（図５）に振り分けて出力する。 <Example of Output Audio Based on Audio Output Data> (Step S10)
In step S10 of FIG. 6, the smartphone 2 outputs an output sound based on the sound output data. Specifically, when the sound output data (L2, R2) is generated by the above (7) or the like, the smartphone 2 transmits the sound indicated by the sound output data (L2, R2) to the speaker 2HS1 and the speaker 2HS2 (FIG. Output to 5).

なお、図では、表示画像を生成及び出力するための処理と、音声に係る処理とを分けて説明したが、情報処理システム１０は、これらの処理を並列して行ってもよい。 In the figure, the process for generating and outputting the display image and the process related to the sound are described separately, but the information processing system 10 may perform these processes in parallel.

＜機能構成例＞
図１９は、本発明の一実施形態に係る情報処理システムの機能構成例を示す機能ブロック図である。情報処理システム１０は、音声入力部１０Ｆ１、第１変換部１０Ｆ２、設定部１０Ｆ３、撮影部１０Ｆ４、第２変換部１０Ｆ５、音声出力部１０Ｆ６及び画像出力部１０Ｆ７を備える。 <Functional configuration example>
FIG. 19 is a functional block diagram illustrating a functional configuration example of an information processing system according to an embodiment of the present invention. The information processing system 10 includes an audio input unit 10F1, a first conversion unit 10F2, a setting unit 10F3, a photographing unit 10F4, a second conversion unit 10F5, an audio output unit 10F6, and an image output unit 10F7.

音声入力部１０Ｆ１は、撮影部１０Ｆ４が撮影する範囲から、入力音声ＶＩＮを複数箇所で入力して、音声入力データＤＩＮを生成する音声入力手順を行う。例えば、音声入力部１０Ｆ１は、マイクロフォン１ＨＭ１、１ＨＭ２、１ＨＭ３及び１ＨＭ４（図４）等によって実現される。 The voice input unit 10F1 performs a voice input procedure in which the input voice VIN is input at a plurality of locations from the range captured by the shooting unit 10F4 to generate voice input data DIN. For example, the voice input unit 10F1 is realized by the microphones 1HM1, 1HM2, 1HM3, and 1HM4 (FIG. 4).

第１変換部１０Ｆ２は、第１の変換部の例であって、所定領域Ｔ（図９）に基づいて、音声入力データＤＩＮを変換して、仮想スピーカデータＤＶＳを生成する第１の変換手順を行う。例えば、第１変換部１０Ｆ２は、ＣＰＵ２Ｈ５（図５）等の演算装置によって実現される。 The first conversion unit 10F2 is an example of a first conversion unit, and converts the audio input data DIN based on the predetermined area T (FIG. 9) to generate virtual speaker data DVS. I do. For example, the first conversion unit 10F2 is realized by an arithmetic device such as the CPU 2H5 (FIG. 5).

設定部１０Ｆ３は、所定領域Ｔを特定する視点の位置（Ｘ，Ｙ，Ｚ）及び視点の視野角αを設定する設定手順を行う。例えば、設定部１０Ｆ３は、入出力装置２Ｈ３（図５）等によって実現される。 The setting unit 10F3 performs a setting procedure for setting the viewpoint position (X, Y, Z) that identifies the predetermined region T and the viewing angle α of the viewpoint. For example, the setting unit 10F3 is realized by the input / output device 2H3 (FIG. 5) or the like.

撮影部１０Ｆ４は、複数の画像を撮影する撮影手順を行う。例えば、撮影部１０Ｆ４は、図４に示す撮影装置１等によって実現される。 The photographing unit 10F4 performs a photographing procedure for photographing a plurality of images. For example, the photographing unit 10F4 is realized by the photographing apparatus 1 shown in FIG.

第２変換部１０Ｆ５は、仮想スピーカデータＤＶＳを変換して音声出力データＤＯＵＴを生成する第２変換手順を行う。例えば、第２変換部１０Ｆ５は、ＣＰＵ２Ｈ５（図５）等の演算装置によって実現される。 The second conversion unit 10F5 performs a second conversion procedure for converting the virtual speaker data DVS to generate the audio output data DOUT. For example, the second conversion unit 10F5 is realized by an arithmetic device such as the CPU 2H5 (FIG. 5).

音声出力部１０Ｆ６は、音声出力データＤＯＵＴに基づいて、複数箇所から出力音声ＶＯＵＴを出力する音声出力手順を行う。例えば、音声出力部１０Ｆ６は、スピーカ２ＨＳ１及びスピーカ２ＨＳ２（図５）等によって実現される。 The audio output unit 10F6 performs an audio output procedure for outputting the output audio VOUT from a plurality of locations based on the audio output data DOUT. For example, the audio output unit 10F6 is realized by the speaker 2HS1, the speaker 2HS2 (FIG. 5), and the like.

画像出力部１０Ｆ７は、撮影部１０Ｆ４が撮影する複数の画像に基づいて、表示画像ＩＭＧＯＵＴを出力する画像出力手順を行う。例えば、画像出力部１０Ｆ７は、入出力装置２Ｈ３（図５）等によって実現される。 The image output unit 10F7 performs an image output procedure for outputting the display image IMGOUT based on a plurality of images captured by the imaging unit 10F4. For example, the image output unit 10F7 is realized by the input / output device 2H3 (FIG. 5) or the like.

図１９の機能構成により、情報処理システム１０は、まず、撮影部１０Ｆ４によって撮影される複数の半球画像等から全天球画像を生成する（図３）。そして、情報処理システム１０は、音声入力部１０Ｆ１によって、入力音声ＶＩＮを複数箇所で入力する。入力音声ＶＩＮが複数箇所で入力されると、情報処理システム１０は、アンビソニックスのＢフォーマット等の音声入力データＤＩＮを生成する。 With the functional configuration of FIG. 19, the information processing system 10 first generates an omnidirectional image from a plurality of hemispherical images and the like photographed by the photographing unit 10F4 (FIG. 3). Then, the information processing system 10 inputs the input voice VIN at a plurality of locations by the voice input unit 10F1. When the input voice VIN is input at a plurality of locations, the information processing system 10 generates voice input data DIN such as an Ambisonics B format.

設定部１０Ｆ３は、所定領域Ｔを特定するパラメータである視点の位置（Ｘ，Ｙ，Ｚ）及び視点の視野角αを設定する。この設定に基づいて、図９に示すように、全天球画像が示す全範囲のうち、表示画像ＩＭＧＯＵＴに出力される範囲が定まる。 The setting unit 10F3 sets the viewpoint position (X, Y, Z) and the viewing angle α of the viewpoint, which are parameters for specifying the predetermined region T. Based on this setting, as shown in FIG. 9, the range output to the display image IMGOUT is determined from the entire range indicated by the omnidirectional image.

第１変換部１０Ｆ２は、所定領域Ｔ、すなわち、視点の位置（Ｘ，Ｙ，Ｚ）及び視点の視野角αによって定まる表示画像ＩＭＧＯＵＴに合わせた音声入力データＤＩＮを生成する。具体的には、ユーザＵＲは、情報処理システム１０に、所定領域Ｔをズーム、並行移動又は回転させる操作を入力する。例えば、ズームの操作が入力されると、所定領域Ｔ及び表示画像ＩＭＧＯＵＴは、図１２に示すように、変更される。 The first conversion unit 10F2 generates audio input data DIN that matches the display image IMGOUT determined by the predetermined region T, that is, the position (X, Y, Z) of the viewpoint and the viewing angle α of the viewpoint. Specifically, the user UR inputs to the information processing system 10 an operation for zooming, parallel moving, or rotating the predetermined area T. For example, when a zoom operation is input, the predetermined area T and the display image IMGOUT are changed as shown in FIG.

そこで、情報処理システム１０は、視点の位置（Ｘ，Ｙ，Ｚ）及び視点の視野角αに合わせて、図１３乃至図１８のように、仮想スピーカＶＳＦ、ＶＳＲ、ＶＳＬ及びＶＳＢ等の仮想スピーカをユーザＵＲの周辺に配置する。
第１変換部１０Ｆ２は、仮想スピーカＶＳＦ、ＶＳＲ、ＶＳＬ及びＶＳＢからの出力を上記（６）式等で計算する。この計算に基づいて、仮想スピーカデータ（Ｌ，Ｒ，Ｆ，Ｂ）が生成される。 Therefore, the information processing system 10 adjusts the viewpoint position (X, Y, Z) and the viewing angle α of the viewpoint, and virtual speakers such as virtual speakers VSF, VSR, VSL, and VSB as shown in FIGS. Is arranged around the user UR.
The first conversion unit 10F2 calculates the outputs from the virtual speakers VSF, VSR, VSL, and VSB using the above equation (6) and the like. Based on this calculation, virtual speaker data (L, R, F, B) is generated.

仮想スピーカデータ（Ｌ，Ｒ，Ｆ，Ｂ）の場合、情報処理システム１０は、表示画像ＩＭＧＯＵＴに合わせた出力音声ＶＯＵＴを出力できるため、臨場感のある音声を出力できる。例えば、表示画像ＩＭＧＯＵＴに合う方向から、情報処理システム１０は、立体音声となる出力音声ＶＯＵＴを出力できる。 In the case of virtual speaker data (L, R, F, B), the information processing system 10 can output the output sound VOUT in accordance with the display image IMGOUT, and therefore can output sound with a sense of presence. For example, the information processing system 10 can output the output sound VOUT that is a three-dimensional sound from the direction matching the display image IMGOUT.

第２変換部１０Ｆ５は、仮想スピーカデータ（Ｌ，Ｒ，Ｆ，Ｂ）を変換して音声出力データＤＯＵＴを生成する。この変換に基づいて、音声出力部１０Ｆ６を実現するハードウェア等に適した出力音声ＶＯＵＴを示す音声出力データＤＯＵＴを生成される。 The second conversion unit 10F5 converts the virtual speaker data (L, R, F, B) to generate audio output data DOUT. Based on this conversion, audio output data DOUT indicating output audio VOUT suitable for hardware or the like that implements audio output unit 10F6 is generated.

また、状態センサ２Ｈ４（図５）等がある場合、情報処理システム１０は、スマートフォン２の姿勢を検出できる。例えば、スマートフォン２が縦置き方向であるか、横置き方向であるかによって、音声出力部１０Ｆ６を実現するハードウェアの配置が変わる場合がある。縦置き方向の場合、スピーカ２ＨＳ１及びスピーカ２ＨＳ２も、縦置き方向の配置になり、一方で、横置き方向の場合、スピーカ２ＨＳ１及びスピーカ２ＨＳ２も、横置き方向の配置となる。そこで、情報処理システム１０は、状態センサ２Ｈ４等によって、ハードウェアの配置を検出する。そして、情報処理システム１０は、ハードウェアの配置に合わせて、音声出力データＤＯＵＴを生成してもよい。 Moreover, when there exists state sensor 2H4 (FIG. 5) etc., the information processing system 10 can detect the attitude | position of the smart phone 2. FIG. For example, the arrangement of hardware that implements the audio output unit 10F6 may change depending on whether the smartphone 2 is in the portrait orientation or the landscape orientation. In the case of the portrait orientation, the speakers 2HS1 and 2HS2 are also arranged in the portrait orientation. On the other hand, in the landscape orientation, the speakers 2HS1 and 2HS2 are also arranged in the landscape orientation. Therefore, the information processing system 10 detects the arrangement of hardware using the state sensor 2H4 and the like. Then, the information processing system 10 may generate the audio output data DOUT in accordance with the hardware arrangement.

したがって、第２変換部１０Ｆ５の変換に基づいて、ＨＲＴＦ又はＲＴＦ等の音響伝達関数からハードウェアに適した音声に変換できるため、臨場感のある音声が出力できる。 Therefore, since the sound transfer function such as HRTF or RTF can be converted into a sound suitable for hardware based on the conversion of the second conversion unit 10F5, a sound with a sense of reality can be output.

他にも、第２変換部１０Ｆ５の変換に基づいて、音声出力部１０Ｆ６を実現するハードウェアが、５．１ｃｈ等であっても、臨場感のある音声が出力できる。 In addition, based on the conversion of the second conversion unit 10F5, even if the hardware that realizes the audio output unit 10F6 is 5.1 ch or the like, it is possible to output realistic sound.

なお、音声入力部１０Ｆ１は、音声出力部１０Ｆ６より多い箇所で入力音声ＶＩＮを入力するのが望ましい。例えば、マイクロフォンの数は、スピーカの数より多い方が望ましい。この場合、情報処理システム１０は、出力音声ＶＯＵＴの音質を向上できる。 Note that the voice input unit 10F1 preferably inputs the input voice VIN at more places than the voice output unit 10F6. For example, the number of microphones is preferably larger than the number of speakers. In this case, the information processing system 10 can improve the sound quality of the output sound VOUT.

アンビソニックス方式を用いると、位相差を表現できる。さらに、図１９の機能構成により、情報処理システム１０は、位相差のある音声の音量レベルに差を表現できる。具体的には、左右の位相差ＩＬＤ（両耳間レベル差、ＩｎｔｅｒａｕｒａｌＬｅｖｅｌＤｉｆｆｅｒｅｎｃｅ）がある音声は、両耳間時間差ＩＴＤ（ＩｎｔｅｒａｕｒａｌＴｉｍｅＤｉｆｆｅｒｅｎｃｅ）又は位相差ＩＰＤ（ＩｎｔｅｒａｕｒａｌＰｈａｓｅＤｉｆｆｅｒｅｎｃｅ）をつけると、臨場感を高めることができる。 When the ambisonics method is used, a phase difference can be expressed. Further, the information processing system 10 can express the difference in the volume level of the sound having a phase difference by the functional configuration of FIG. Specifically, a sound having a left-right phase difference ILD (interaural level difference) and an interaural time difference ITD (interaural time difference) or phase difference IPD (interaural phase difference) A feeling can be heightened.

また、複数箇所で、入力音声ＶＩＮを入力し、複数の入力音声ＶＩＮの間で音声のレベル差が所定値以上に違う場合、情報処理システム１０は、最もダイナミックレンジの大きいマイクロフォンで入力した入力音声ＶＩＮを選択する。したがって、情報処理システム１０は、モノラル信号を取得でき、入力音声ＶＩＮの異常を検出できる。 In addition, when the input voice VIN is input at a plurality of locations, and the voice level difference between the plurality of input voices VIN is different from a predetermined value or more, the information processing system 10 inputs the input voice input with the microphone having the largest dynamic range. Select VIN. Therefore, the information processing system 10 can acquire a monaural signal and can detect an abnormality in the input voice VIN.

他にも、異常は、相関係数等から得られてもよい。特に、マイクロフォンが無指向性である場合、入力音声ＶＩＮの間に差がつきにくいため、相関係数は、差がつきにくい。したがって、相関係数に基づいて、情報処理システム１０は、異常を検出できる。 In addition, the abnormality may be obtained from a correlation coefficient or the like. In particular, when the microphone is omnidirectional, there is little difference between the input voices VIN, so the correlation coefficient is difficult to make a difference. Therefore, the information processing system 10 can detect an abnormality based on the correlation coefficient.

＜他の実施形態＞
なお、本発明に係る実施形態は、プログラミング言語等で記述されるプログラムによって実現されてもよい。すなわち、本発明に係る実施形態は、情報処理装置等のコンピュータに情報処理方法を実行させるためのプログラムによって実現されてもよい。なお、プログラムは、フラッシュメモリ、ＳＤ（登録商標）カード又は光学ディスク等の記録媒体に記憶して頒布することができる。また、プログラムは、インターネット等の電気通信回線を通じて頒布することができる。 <Other embodiments>
The embodiment according to the present invention may be realized by a program written in a programming language or the like. That is, the embodiment according to the present invention may be realized by a program for causing a computer such as an information processing apparatus to execute an information processing method. The program can be stored and distributed in a recording medium such as a flash memory, an SD (registered trademark) card, or an optical disk. The program can be distributed through a telecommunication line such as the Internet.

また、本発明に係る実施形態において、処理の一部又は全部は、例えば、フィールド・プログラマブル・ゲート・アレイ（ＦＰＧＡ）等のプログラマブル・デバイス（ＰＤ）で処理され、実現されてもよい。さらに、本発明に係る実施形態において、処理の一部又は全部は、ＡＳＩＣ（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）で処理され、実現されてもよい。 In the embodiment according to the present invention, part or all of the processing may be processed and realized by a programmable device (PD) such as a field programmable gate array (FPGA). Further, in the embodiment according to the present invention, a part or all of the processing may be realized by being processed by ASIC (Application Specific Integrated Circuit).

また、情報処理装置は、１つの情報処理装置に限られず、複数の情報処理装置で構成されてもよい。すなわち、本発明に係る実施形態は、１以上の情報処理装置を有する情報処理システムによって実現されてもよい。 Further, the information processing apparatus is not limited to one information processing apparatus, and may be configured by a plurality of information processing apparatuses. That is, the embodiment according to the present invention may be realized by an information processing system having one or more information processing apparatuses.

以上、本発明の好ましい実施例について詳述したが、本発明は係る特定の実施形態に限定されない。すなわち、特許請求の範囲に記載された本発明の要旨の範囲内において、種々の変形、変更が可能である。 As mentioned above, although the preferable Example of this invention was explained in full detail, this invention is not limited to the specific embodiment which concerns. That is, various modifications and changes can be made within the scope of the gist of the present invention described in the claims.

１０情報処理システム
１撮影装置
２スマートフォン
ＶＩＮ入力音声
ＶＯＵＴ出力音声
ＶＳＦ、ＶＳＲ、ＶＳＬ、ＶＳＢ仮想スピーカ
ＤＩＮ音声入力データ
ＤＯＵＴ音声出力データ 10 Information processing system 1 Shooting device 2 Smartphone VIN Input audio VOUT Output audio VSF, VSR, VSL, VSB Virtual speaker DIN Audio input data DOUT Audio output data

特許第６０４４３２８号公報Japanese Patent No. 6044328

Claims

An information processing system having an information processing apparatus connected to an imaging apparatus that captures a plurality of images,
A voice input unit for inputting a plurality of voices;
An image output unit that outputs a display image based on the plurality of images;
A setting unit for setting a predetermined area to be output to the display image;
A first conversion unit that converts a plurality of voices input to the voice input unit based on the predetermined area;
An information processing system comprising: an audio output unit that outputs a plurality of sounds converted by the first conversion unit.

The information processing system according to claim 1, wherein the setting unit sets a position determined by three-dimensional coordinates and a viewing angle that is an angle of view of the virtual camera as the predetermined area.

The information processing system according to claim 1, wherein a direction in which the sound is generated is associated with the sound input to the sound input unit.

The voice input unit inputs the voice at four or more locations,
The information processing system according to claim 1, wherein the sound output unit outputs the converted sound at two or more locations.

The information processing system according to claim 4, wherein the voice input unit inputs the plurality of voices at more places than the voice output unit.

3. The information processing system according to claim 2, wherein the first conversion unit changes an arrangement of a virtual speaker or an output from the virtual speaker when the position, the viewing angle, or a combination thereof is changed.

An information processing method performed by an information processing system having an information processing apparatus connected to an imaging apparatus that captures a plurality of images,
A voice input procedure in which the information processing system inputs a plurality of voices;
An image output procedure in which the information processing system outputs a display image based on the plurality of images;
A setting procedure in which the information processing system sets a predetermined area to be output to the display image;
A first conversion procedure in which the information processing system converts a plurality of voices input by the voice input procedure based on the predetermined area;
An information processing method, wherein the information processing system includes a sound output procedure for outputting a plurality of sounds converted by the first conversion procedure.

A program for causing a computer having an information processing apparatus connected to an imaging apparatus for capturing a plurality of images to execute an information processing method,
A voice input procedure in which the computer inputs a plurality of voices;
An image output procedure in which the computer outputs a display image based on the plurality of images;
A setting procedure in which the computer sets a predetermined area to be output to the display image;
A first conversion procedure in which the computer converts a plurality of voices input by the voice input procedure based on the predetermined area;
A program for causing the computer to execute a sound output procedure for outputting a plurality of sounds converted by the first conversion procedure.