JP7434846B2

JP7434846B2 - Image processing device, image processing method, program

Info

Publication number: JP7434846B2
Application number: JP2019216547A
Authority: JP
Inventors: 均並木
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2019-11-29
Filing date: 2019-11-29
Publication date: 2024-02-21
Anticipated expiration: 2039-11-29
Also published as: JP2021086494A

Description

本発明は、画像処理装置、画像処理方法、及び、プログラムに関する。 The present invention relates to an image processing device, an image processing method, and a program.

ユーザ見ている方向に応じて周囲３６０度の画像を提供するＶＲ（virtual reality）ゴーグルが知られている。このため、周囲３６０度の広角な範囲が撮像された画像が必要になる。また、ＶＲゴーグルでは右目用と左目用の画像を表示することで、奥行き感がある仮想現実世界を提供できる。 2. Description of the Related Art VR (virtual reality) goggles that provide a 360-degree surrounding image depending on the direction in which the user is viewing are known. For this reason, an image captured in a wide-angle range of 360 degrees is required. Furthermore, by displaying images for the right eye and left eye, VR goggles can provide a virtual reality world with a sense of depth.

一度の撮像操作で、周囲３６０°の画像（以下、全天球画像という）を得る撮像装置が知られている（例えば特許文献１参照。）。特許文献１には、背中合わせに配置された光学撮像系で撮像した２つの半球画像を合成して１つの全天球画像を生成する撮像装置が開示されている。 2. Description of the Related Art An imaging device that obtains a 360° surrounding image (hereinafter referred to as a spherical image) with a single imaging operation is known (see, for example, Patent Document 1). Patent Document 1 discloses an imaging device that generates one spherical image by combining two hemispherical images captured by optical imaging systems arranged back to back.

しかしながら、従来は、立体視用の全天球画像を用意することが容易でないという問題があった。 However, conventionally, there has been a problem that it is not easy to prepare omnidirectional images for stereoscopic viewing.

まず、全天球画像の撮像装置はレンズが各方向に対して１つ（合計２つ）しか存在しないため、360度の平面として周囲を撮像できても立体視用になっていない。ＶＲゴーグル等で見た場合、平面が強調されて迫力に欠ける。 First, an imaging device for spherical images has only one lens in each direction (two in total), so even though it can image the surroundings as a 360-degree plane, it is not suitable for stereoscopic viewing. When viewed with VR goggles, etc., the plane is emphasized and lacks impact.

立体視用の全天球画像を用意する方法は大きく分けて２つある。
１．各方向にレンズが１つの全天球画像の撮像装置で、異なる位置から２回以上撮像し、コンピュータ又はユーザが編集、合成する。
２. 立体視用の全天球画像を撮像できる特殊なカメラを用いる。 There are broadly two methods for preparing spherical images for stereoscopic viewing.
1. A spherical image capturing device with one lens in each direction captures images two or more times from different positions, and a computer or user edits and combines the images.
2. A special camera that can capture 3D images is used.

１の方法については撮像を２回以上行う都合上、撮像にタイムラグが発生する。タイムラグが発生すると、その間に人間の移動、樹木の風による揺らぎ、雲の変化などが生じ、２つの全天球画像の合わせこみが困難になる。観光地などでは人が大勢おり、全ての人間の動きをタイムラグが生じた２つの写真間で合わせることは実質的に不可能である。 Regarding method 1, a time lag occurs in imaging because imaging is performed two or more times. When a time lag occurs, things like the movement of people, the movement of trees due to the wind, and changes in clouds occur during that time, making it difficult to align the two spherical images. At tourist spots, etc., there are many people, and it is virtually impossible to match the movements of all the people between two photos with a time lag.

２の方法については特殊なカメラの値段が高く、筐体が球形などのかさばる形状をしているため、一般用途での利用が難しい。 Regarding method 2, the special camera is expensive and the housing has a bulky shape such as a sphere, making it difficult to use for general purposes.

また、１，２の方法のいずれでも、観光地等で撮像した既に存在する全天球画像をユーザが立体視できないという点では変わらず、全天球画像で立体視を行いたければ、新たに撮像しなおす必要が生じてしまう。 In addition, with either method 1 or 2, the user cannot stereoscopically view an already existing spherical image taken at a tourist spot. It becomes necessary to take the image again.

本発明は、上記課題に鑑み、立体視可能な全天球画像を提供することができる画像処理装置を提供することを目的とする。 SUMMARY OF THE INVENTION In view of the above-mentioned problems, an object of the present invention is to provide an image processing device that can provide a stereoscopic spherical image.

上記課題に鑑み、本発明は、１つの全天球画像に対し推定アルゴリズムを施して、立体視用の全天球画像を生成する画像処理装置であって、前記推定アルゴリズムはニューラルネットワークを使用した学習によって構築されており、３Ｄモデリングソフトで作成されたモデリングデータを、水平に並べた３つの仮想的なカメラのうち中央のカメラで撮像した全天球画像を入力用の全天球画像とし、３つの仮想的なカメラのうち右側カメラで撮像した全天球画像と、左側カメラで撮像した全天球画像のうち、前記右側カメラの背面に対応する部分が写った全天球画像の背面画像と前記左側カメラの背面に対応する部分が写った全天球画像の背面画像とをスワップして、教師データとなる右目用の全天球画像と左目用の全天球画像を生成する学習データ作成部を有することを特徴とする。
また、本発明は、１つの全天球画像に対し推定アルゴリズムを施して、距離画像を生成し、前記距離画像と前記全天球画像から立体視用の全天球画像を生成する画像処理装置であって、前記推定アルゴリズムはニューラルネットワークを使用した学習によって構築されており、３Ｄモデリングソフトで作成されたモデリングデータを仮想的なカメラで撮像した全天球画像及び前記全天球画像の距離画像のうち、前記全天球画像を入力、前記距離画像を教師データとして、前記全天球画像に対し前記距離画像を出力する前記推定アルゴリズムが構築されており、前記推定アルゴリズムで推定された前記距離画像が有する画素の距離を用いて、前記全天球画像を直交座標系の三次元点群に変換し、前記直交座標系の水平面で所定点の周囲を回転する直線であって、所定角度ずつ回転させた各位置で、前記直線の仰角を変化させた場合に前記直線の近傍の三次元点を、円筒画像に変換する視差計算部を有することを特徴とする。
In view of the above problems, the present invention provides an image processing device that applies an estimation algorithm to one spherical image to generate a spherical image for stereoscopic viewing, and the estimation algorithm uses a neural network. The modeling data created with 3D modeling software is used as an input spherical image taken by the center camera of three virtual cameras arranged horizontally. Of the three virtual cameras, the spherical image taken by the right camera and the spherical image taken by the left camera, which includes the part corresponding to the back of the right camera. Swap the back image and the back image of the spherical image showing the portion corresponding to the back of the left camera to generate a spherical image for the right eye and a spherical image for the left eye, which serve as training data. It is characterized by having a learning data creation section .
The present invention also provides an image in which a distance image is generated by applying an estimation algorithm to one spherical image, and a spherical image for stereoscopic viewing is generated from the distance image and the spherical image. The processing device is a processing device, and the estimation algorithm is constructed by learning using a neural network, and the estimation algorithm is constructed by combining modeling data created with 3D modeling software with a spherical image captured by a virtual camera and a spherical image of the spherical image. Among the distance images, the estimation algorithm is constructed which inputs the omnidirectional image, uses the distance image as training data, and outputs the distance image for the omnidirectional image. The spherical image is converted into a three-dimensional point group in an orthogonal coordinate system using the distance of pixels of the distance image, and a straight line that rotates around a predetermined point on a horizontal plane in the orthogonal coordinate system, The present invention is characterized in that it includes a parallax calculation unit that converts a three-dimensional point near the straight line into a cylindrical image when the elevation angle of the straight line is changed at each position rotated by an angle.

立体視可能な全天球画像を提供することができる画像処理装置を提供することができる。 It is possible to provide an image processing device that can provide a stereoscopic spherical image.

本実施形態において立体視可能な全天球画像の作成方法の概略を説明する図である。FIG. 3 is a diagram illustrating an outline of a method for creating a stereoscopically viewable omnidirectional image in this embodiment. 画像処理装置のハードウェア構成例を示す図である。1 is a diagram illustrating an example of a hardware configuration of an image processing device. 撮像装置のハードウェア構成図の一例である。1 is an example of a hardware configuration diagram of an imaging device. 全天球画像のフォーマットを説明する図である。FIG. 2 is a diagram illustrating the format of a spherical image. 立体視用の画像のフォーマットの一例である。This is an example of an image format for stereoscopic viewing. 画像処理装置の機能をブロック状に示す機能ブロック図の一例である。1 is an example of a functional block diagram showing functions of an image processing device in a block form. 画像処理装置が行う学習データの作成方法の流れを説明するフローチャート図である。FIG. 2 is a flowchart illustrating the flow of a learning data creation method performed by the image processing device. ３Ｄモデリングソフトを用いた学習データの作成方法を説明する図である。FIG. 2 is a diagram illustrating a method of creating learning data using 3D modeling software. 左側、中央、右側の３つの仮想的なカメラの１回の撮像結果を示す図の一例である。It is an example of the figure which shows the imaging result of one time of three virtual cameras of the left side, the center, and the right side. 左右の画像と目の関係を説明する図である。FIG. 3 is a diagram illustrating the relationship between left and right images and eyes. 左右画像スワップで行われる処理を説明する図である。FIG. 3 is a diagram illustrating processing performed in left and right image swapping. 全天球画像を立体球とした場合の左右画像のスワップを説明する図である。FIG. 6 is a diagram illustrating swapping of left and right images when the omnidirectional image is a three-dimensional sphere. ＣＮＮ（Convolutional Neural Network）のニューラルネットワークの構成例を示す図である。1 is a diagram showing an example of the configuration of a neural network of CNN (Convolutional Neural Network). 畳み込みと逆畳み込みを模式的に説明する図である。FIG. 2 is a diagram schematically explaining convolution and deconvolution. 画像処理装置が立体視用の全天球画像を出力する処理を説明するフローチャート図の一例である。FIG. 2 is an example of a flowchart illustrating a process in which the image processing device outputs a 3D stereoscopic spherical image; FIG. 画像処理装置が立体視用の全天球画像を作成する処理の概略を説明する図である。FIG. 2 is a diagram illustrating an outline of a process in which the image processing device creates a 3D stereoscopic spherical image. 画像処理装置の機能をブロック状に示す機能ブロック図の一例である。1 is an example of a functional block diagram showing functions of an image processing device in a block form. 画像処理装置が行う学習データの作成方法の流れを説明するフローチャート図である。FIG. 2 is a flowchart illustrating the flow of a learning data creation method performed by the image processing device. レンダリング結果の一例を示す図である。It is a figure which shows an example of a rendering result. ニューラルネットワークの構成例を示す図である。FIG. 2 is a diagram showing a configuration example of a neural network. 視差計算部が行う処理を説明するフローチャート図の一例である。It is an example of a flowchart figure explaining the process which a parallax calculation part performs. 目の向いている向きと目の位置を説明する図の一例である。This is an example of a diagram illustrating the direction in which the eyes are facing and the position of the eyes. ある目の向きにおける正距円筒画像と三次元点群の関係を模式的に示す図である。FIG. 2 is a diagram schematically showing the relationship between an equirectangular image and a three-dimensional point group in a certain eye direction. 動画を三次元立体動画に変換する流れを説明する図である。FIG. 2 is a diagram illustrating a flow of converting a moving image into a three-dimensional stereoscopic moving image.

以下、本発明を実施するための形態の一例として、画像処理装置と画像処理装置が行う画像処理方法について図面を参照しながら説明する。 DESCRIPTION OF THE PREFERRED EMBODIMENTS An image processing apparatus and an image processing method performed by the image processing apparatus will be described below as an example of a mode for carrying out the present invention with reference to the drawings.

＜概要＞
図１は、本実施形態において立体視可能な全天球画像の作成方法の概略を説明する図である。
(1) 全天球画像の撮像装置９が撮像処理を行い１つの全天球画像を生成する。すでに撮像されていてもよい。
(2) 画像処理装置１０が推定アルゴリズム（プログラム）を実行して、１つの全天球画像から２つの全天球画像を出力する。この２つの全天球画像は立体視可能な全天球画像となっている。推定アルゴリズムは、１つの全天球画像からニューラルネットワークを使って左目用の全天球画像と右目用の全天球画像を推定するアルゴリズムである。ただし、ニューラルネットワークによるアルゴリズムには限定しなくてよい。
(3) 例えばＶＲゴーグルなどでユーザが360度の空間を立体視で閲覧できる。 <Summary>
FIG. 1 is a diagram illustrating an outline of a method for creating a stereoscopic spherical image in this embodiment.
(1) The omnidirectional image imaging device 9 performs imaging processing and generates one omnidirectional image. The image may have already been captured.
(2) The image processing device 10 executes the estimation algorithm (program) and outputs two omnidirectional images from one omnidirectional image. These two spherical images are spherical images that can be viewed stereoscopically. The estimation algorithm is an algorithm that uses a neural network to estimate a left eye spherical image and a right eye spherical image from one spherical image. However, it is not limited to algorithms using neural networks.
(3) For example, users can view a 360-degree space with stereoscopic vision using VR goggles.

なお、撮像装置９は正距円筒図法 (後述) のフォーマットで全天球画像を生成できればよい。あるいは、３Ｄモデリングソフトを使って全天球画像をレンダリングしてもよい。正距円筒図法には限られず、メルカトル図法、ミラー図法、又は、心射円筒図法などでもよい。 Note that the imaging device 9 only needs to be able to generate a spherical image in the format of equirectangular projection (described later). Alternatively, a spherical image may be rendered using 3D modeling software. The projection is not limited to the equirectangular projection, but may be a Mercator projection, a Miller projection, a centripetal projection, or the like.

このように、本実施形態の画像処理装置１０は、１つの全天球画像から立体視可能な２つの全天球画像を生成できる。時間をおいての撮像や特殊な撮像装置が必要ない。また、すでに撮像済みの全天球画像から立体視可能な全天球画像を生成できる。 In this way, the image processing device 10 of this embodiment can generate two omnidirectional images that can be viewed stereoscopically from one omnidirectional image. There is no need for timed imaging or special imaging equipment. Furthermore, a spherical image that can be viewed stereoscopically can be generated from a spherical image that has already been captured.

＜用語について＞
全天球画像とは、周囲360度が撮像された画像データをいう。必ずしも３６０度の全てが写っている必要はなく、画質向上などのために一部が省略されていてもよい。全天球画像は、平面画像に変換された状態（正距円筒画像）と立体球の状態を取る場合がある。 <About terms>
A spherical image is image data that captures 360 degrees of the surrounding area. It is not necessary that all 360 degrees be captured, and a portion may be omitted to improve image quality. A celestial sphere image may take a state where it is converted into a flat image (equirectangular image) or a state where it is a three-dimensional sphere.

推定アルゴリズムは、１つの全天球画像から立体視用の２つの全天球画像を生成するプログラムである。あるいは、１つの全天球画像から１つ以上の画素に距離情報が含まれる距離画像を生成するプログラムである。 The estimation algorithm is a program that generates two omnidirectional images for stereoscopic viewing from one omnidirectional image. Alternatively, it is a program that generates a distance image in which distance information is included in one or more pixels from one spherical image.

＜ハードウェア構成例＞
＜＜画像処理装置＞＞
図２は、画像処理装置１０のハードウェア構成例を示す。図２に示されているように、画像処理装置１０は、コンピュータによって構築されており、図２に示されているように、ＣＰＵ５０１、ＲＯＭ５０２、ＲＡＭ５０３、ＨＤ５０４、ＨＤＤ(Hard Disk Drive)コントローラ５０５、ディスプレイ５０６、外部機器接続Ｉ／Ｆ(Interface)５０８、ネットワークＩ／Ｆ５０９、バスライン５１０、キーボード５１１、ポインティングデバイス５１２、ＤＶＤ－ＲＷ(Digital Versatile Disk Rewritable)ドライブ５１４、メディアＩ／Ｆ５１６を備えている。 <Hardware configuration example>
<<Image processing device>>
FIG. 2 shows an example of the hardware configuration of the image processing device 10. As shown in FIG. 2, the image processing device 10 is constructed by a computer, and includes a CPU 501, ROM 502, RAM 503, HD 504, HDD (Hard Disk Drive) controller 505, It includes a display 506, an external device connection I/F (Interface) 508, a network I/F 509, a bus line 510, a keyboard 511, a pointing device 512, a DVD-RW (Digital Versatile Disk Rewritable) drive 514, and a media I/F 516. .

これらのうち、ＣＰＵ５０１は、画像処理装置１０全体の動作を制御する。ＲＯＭ５０２は、ＩＰＬ等のＣＰＵ５０１の駆動に用いられるプログラムを記憶する。ＲＡＭ５０３は、ＣＰＵ５０１のワークエリアとして使用される。ＨＤ５０４は、プログラム等の各種データを記憶する。ＨＤＤコントローラ５０５は、ＣＰＵ５０１の制御にしたがってＨＤ５０４に対する各種データの読み出し又は書き込みを制御する。ディスプレイ５０６は、カーソル、メニュー、ウィンドウ、文字、又は画像などの各種情報を表示する。外部機器接続Ｉ／Ｆ５０８は、各種の外部機器を接続するためのインターフェースである。この場合の外部機器は、例えば、ＵＳＢ(Universal Serial Bus)メモリやプリンタ等である。ネットワークＩ／Ｆ５０９は、通信ネットワークを利用してデータ通信をするためのインターフェースである。バスライン５１０は、図２に示されているＣＰＵ５０１等の各構成要素を電気的に接続するためのアドレスバスやデータバス等である。 Among these, the CPU 501 controls the overall operation of the image processing apparatus 10 . The ROM 502 stores programs used to drive the CPU 501 such as IPL. RAM 503 is used as a work area for CPU 501. The HD 504 stores various data such as programs. The HDD controller 505 controls reading and writing of various data to the HD 504 under the control of the CPU 501. The display 506 displays various information such as a cursor, menu, window, characters, or images. External device connection I/F 508 is an interface for connecting various external devices. The external device in this case is, for example, a USB (Universal Serial Bus) memory, a printer, or the like. The network I/F 509 is an interface for data communication using a communication network. The bus line 510 is an address bus, a data bus, etc. for electrically connecting each component such as the CPU 501 shown in FIG. 2.

また、キーボード５１１は、文字、数値、各種指示などの入力のための複数のキーを備えた入力手段の一種である。ポインティングデバイス５１２は、各種指示の選択や実行、処理対象の選択、カーソルの移動などを行う入力手段の一種である。ＤＶＤ－ＲＷドライブ５１４は、着脱可能な記録媒体の一例としてのＤＶＤ－ＲＷ５１３に対する各種データの読み出し又は書き込みを制御する。なお、ＤＶＤ－ＲＷに限らず、ＤＶＤ－Ｒ等であってもよい。メディアＩ／Ｆ５１６は、フラッシュメモリ等の記録メディア５１５に対するデータの読み出し又は書き込み（記憶）を制御する。 Further, the keyboard 511 is a type of input means that includes a plurality of keys for inputting characters, numerical values, various instructions, and the like. The pointing device 512 is a type of input means for selecting and executing various instructions, selecting a processing target, moving a cursor, and the like. The DVD-RW drive 514 controls reading and writing of various data on a DVD-RW 513, which is an example of a removable recording medium. Note that it is not limited to DVD-RW, but may be DVD-R or the like. The media I/F 516 controls reading or writing (storage) of data to a recording medium 515 such as a flash memory.

なお、図２では省略されているが、ＧＰＵ（Graphics Processing Unit）を有するとよい。ＧＰＵは数値演算の並列処理に優れており、ニューラルネットワークで生じる演算を高速に行える。 Note that although it is omitted in FIG. 2, it is preferable to include a GPU (Graphics Processing Unit). GPUs are excellent at parallel processing of numerical calculations and can perform calculations generated by neural networks at high speed.

＜＜撮像装置＞＞
図３を用いて、撮像装置９のハードウェア構成を説明する。図３は、撮像装置９のハードウェア構成図である。以下では、撮像装置９は、２つの撮像素子を使用した全天球（全方位）撮像装置とするが、撮像素子は２つ以上いくつでもよい。また、必ずしも全方位撮像専用の装置である必要はなく、通常のデジタルカメラやスマートフォン等に後付けの全方位の撮像ユニットを取り付けることで、実質的に撮像装置９と同じ機能を有するようにしてもよい。 <<Imaging device>>
The hardware configuration of the imaging device 9 will be explained using FIG. 3. FIG. 3 is a hardware configuration diagram of the imaging device 9. As shown in FIG. In the following, the imaging device 9 is assumed to be a spherical (omnidirectional) imaging device using two imaging devices, but the number of imaging devices may be any number greater than or equal to two. Furthermore, it does not necessarily have to be a device exclusively for omnidirectional imaging; it may be possible to have substantially the same functions as the imaging device 9 by attaching an aftermarket omnidirectional imaging unit to a normal digital camera, smartphone, etc. good.

図３に示されているように、撮像装置９は、撮像ユニット６０１、画像処理ユニット６０４、撮像制御ユニット６０５、マイク６０８、音処理ユニット６０９、ＣＰＵ(Central Processing Unit)６１１、ＲＯＭ(Read Only Memory)６１２、ＳＲＡＭ(Static Random Access Memory)６１３、ＤＲＡＭ(Dynamic Random Access Memory)６１４、操作部６１５、外部機器接続Ｉ／Ｆ６１６、通信部６１７、アンテナ６１７ａ、及び、加速度・方位センサ６１８を有している。 As shown in FIG. 3, the imaging device 9 includes an imaging unit 601, an image processing unit 604, an imaging control unit 605, a microphone 608, a sound processing unit 609, a CPU (Central Processing Unit) 611, and a ROM (Read Only Memory). ) 612, SRAM (Static Random Access Memory) 613, DRAM (Dynamic Random Access Memory) 614, operation unit 615, external device connection I/F 616, communication unit 617, antenna 617a, and acceleration/direction sensor 618. There is.

このうち、撮像ユニット６０１は、各々半球画像を結像するための１８０°以上の画角を有する広角レンズ（いわゆる魚眼レンズ）６０２ａ，６０２ｂと、各広角レンズに対応させて設けられている２つの撮像素子６０３ａ，６０３ｂを備えている。撮像素子６０３ａ，６０３ｂは、魚眼レンズ６０２ａ，６０２ｂによる光学像を電気信号の画像データに変換して出力するＣＭＯＳ(Complementary Metal Oxide Semiconductor)センサやＣＣＤ(Charge Coupled Device)センサなどの画像センサ、この画像センサの水平又は垂直同期信号や画素クロックなどを生成するタイミング生成回路、この撮像素子の動作に必要な種々のコマンドやパラメータなどが設定されるレジスタ群などを有している。 Among these, the imaging unit 601 includes wide-angle lenses (so-called fisheye lenses) 602a and 602b each having an angle of view of 180° or more for forming a hemispherical image, and two imaging units provided corresponding to each wide-angle lens. It includes elements 603a and 603b. The image sensors 603a and 603b are image sensors such as CMOS (Complementary Metal Oxide Semiconductor) sensors and CCD (Charge Coupled Device) sensors that convert optical images formed by fisheye lenses 602a and 602b into electrical signal image data and output the image data. It has a timing generation circuit that generates horizontal or vertical synchronization signals and pixel clocks, and a group of registers in which various commands and parameters necessary for the operation of this image sensor are set.

撮像ユニット６０１の撮像素子６０３ａ，６０３ｂは、各々、画像処理ユニット６０４とパラレルＩ／Ｆバスで接続されている。一方、撮像ユニット６０１の撮像素子６０３ａ，６０３ｂは、撮像制御ユニット６０５とは、シリアルＩ／Ｆバス（Ｉ２Ｃバス等）で接続されている。画像処理ユニット６０４、撮像制御ユニット６０５及び音処理ユニット６０９は、バス６１０を介してＣＰＵ６１１と接続される。更に、バス６１０には、ＲＯＭ６１２、ＳＲＡＭ６１３、ＤＲＡＭ６１４、操作部６１５、外部機器接続Ｉ／Ｆ(Interface)６１６、通信部６１７、及び加速度・方位センサ６１８なども接続される。 The imaging elements 603a and 603b of the imaging unit 601 are each connected to the image processing unit 604 via a parallel I/F bus. On the other hand, the imaging elements 603a and 603b of the imaging unit 601 are connected to the imaging control unit 605 via a serial I/F bus (such as an I2C bus). The image processing unit 604, the imaging control unit 605, and the sound processing unit 609 are connected to the CPU 611 via a bus 610. Furthermore, a ROM 612, an SRAM 613, a DRAM 614, an operation section 615, an external device connection I/F (Interface) 616, a communication section 617, an acceleration/direction sensor 618, and the like are also connected to the bus 610.

画像処理ユニット６０４は、撮像素子６０３ａ，６０３ｂから出力される画像データをパラレルＩ／Ｆバスを通して取り込み、それぞれの画像データに対して所定の処理を施した後、これらの画像データを合成処理して、正距円筒画像のデータを作成する。 The image processing unit 604 takes in image data output from the image sensors 603a and 603b through the parallel I/F bus, performs predetermined processing on each image data, and then synthesizes these image data. , create equirectangular image data.

撮像制御ユニット６０５は、一般に撮像制御ユニット６０５をマスタデバイス、撮像素子６０３ａ，６０３ｂをスレーブデバイスとして、Ｉ２Ｃバスを利用して、撮像素子６０３ａ，６０３ｂのレジスタ群にコマンド等を設定する。必要なコマンド等は、ＣＰＵ６１１から受け取る。また、撮像制御ユニット６０５は、同じくＩ２Ｃバスを利用して、撮像素子６０３ａ，６０３ｂのレジスタ群のステータスデータ等を取り込み、ＣＰＵ６１１に送る。 Generally, the imaging control unit 605 uses the I2C bus to set commands and the like in register groups of the imaging devices 603a and 603b, with the imaging control unit 605 as a master device and the imaging devices 603a and 603b as slave devices. Necessary commands and the like are received from the CPU 611. Further, the imaging control unit 605 also uses the I2C bus to take in status data and the like of the register groups of the imaging elements 603a and 603b, and sends it to the CPU 611.

また、撮像制御ユニット６０５は、操作部６１５のシャッターボタンが押下されたタイミングで、撮像素子６０３ａ，６０３ｂに画像データの出力を指示する。撮像装置９によっては、ディスプレイ（例えば、スマートフォンのディスプレイ）によるプレビュー表示機能や動画表示に対応する機能を持つ場合もある。この場合は、撮像素子６０３ａ，６０３ｂからの画像データの出力は、所定のフレームレート（フレーム／分）によって連続して行われる。 Furthermore, the imaging control unit 605 instructs the imaging elements 603a and 603b to output image data at the timing when the shutter button of the operation unit 615 is pressed. Depending on the imaging device 9, it may have a preview display function or a function corresponding to video display on a display (for example, a smartphone display). In this case, image data is output continuously from the image sensors 603a and 603b at a predetermined frame rate (frames/minute).

また、撮像制御ユニット６０５は、後述するように、ＣＰＵ６１１と協働して撮像素子６０３ａ，６０３ｂの画像データの出力タイミングの同期をとる同期制御手段としても機能する。なお、本実施形態では、撮像装置９にはディスプレイが設けられていないが、表示部を設けてもよい。 The imaging control unit 605 also functions as a synchronization control unit that synchronizes the output timing of image data of the imaging elements 603a and 603b in cooperation with the CPU 611, as will be described later. Note that in this embodiment, the imaging device 9 is not provided with a display, but may be provided with a display section.

マイク６０８は、音を音（信号）データに変換する。音処理ユニット６０９は、マイク６０８から出力される音データをＩ／Ｆバスを通して取り込み、音データに対して所定の処理を施す。 Microphone 608 converts sound into sound (signal) data. The sound processing unit 609 takes in sound data output from the microphone 608 through the I/F bus, and performs predetermined processing on the sound data.

ＣＰＵ６１１は、撮像装置９の全体の動作を制御すると共に必要な処理を実行する。ＲＯＭ６１２は、ＣＰＵ６１１のための種々のプログラムを記憶している。ＳＲＡＭ６１３及びＤＲＡＭ６１４はワークメモリであり、ＣＰＵ６１１で実行するプログラムや処理途中のデータ等を記憶する。特にＤＲＡＭ６１４は、画像処理ユニット６０４での処理途中の画像データや処理済みの正距円筒画像のデータを記憶する。 The CPU 611 controls the overall operation of the imaging device 9 and executes necessary processing. ROM612 stores various programs for CPU611. The SRAM 613 and DRAM 614 are work memories that store programs executed by the CPU 611, data being processed, and the like. In particular, the DRAM 614 stores image data that is currently being processed by the image processing unit 604 and data of a processed equirectangular image.

操作部６１５は、シャッターボタン６１５ａなどの操作ボタンの総称である。ユーザは操作部６１５を操作することで、種々の撮像モードや撮像条件などを入力する。 The operation unit 615 is a general term for operation buttons such as the shutter button 615a. By operating the operation unit 615, the user inputs various imaging modes, imaging conditions, and the like.

外部機器接続Ｉ／Ｆ６１６は、各種の外部機器を接続するためのインターフェースである。この場合の外部機器は、例えば、ＵＳＢ(Universal Serial Bus)メモリやＰＣ(Personal Computer)等である。ＤＲＡＭ６１４に記憶された正距円筒画像のデータは、この外部機器接続Ｉ／Ｆ６１６を介して外付けのメディアに記録されたり、必要に応じて外部機器接続Ｉ／Ｆ６１６を介してスマートフォン等の外部端末（装置）に送信されたりする。 The external device connection I/F 616 is an interface for connecting various external devices. The external device in this case is, for example, a USB (Universal Serial Bus) memory, a PC (Personal Computer), or the like. The equirectangular image data stored in the DRAM 614 can be recorded on an external media via this external device connection I/F 616, or transferred to an external terminal such as a smartphone via the external device connection I/F 616 as necessary. (device).

通信部６１７は、撮像装置９に設けられたアンテナ６１７ａを介して、Wi－Fi、ＮＦＣ(Near Field Communication)やＢｌｕｅｔｏｏｔｈ（登録商標）等の近距離無線通信技術によって、スマートフォン等の外部端末（装置）と通信を行う。この通信部６１７によっても、正距円筒画像のデータをスマートフォン等の外部端末（装置）に送信することができる。 The communication unit 617 communicates with an external terminal (device) such as a smartphone via an antenna 617a provided on the imaging device 9 using short-range wireless communication technology such as Wi-Fi, NFC (Near Field Communication), or Bluetooth (registered trademark). ). The communication unit 617 can also transmit equirectangular image data to an external terminal (device) such as a smartphone.

加速度・方位センサ６１８は、地球の磁気から撮像装置９の方位を算出し、方位情報を出力する。この方位情報はExifに沿った関連情報（メタデータ）の一例であり、撮像画像の画像補正等の画像処理に利用される。なお、関連情報には、画像の撮像日時、及び画像データのデータ容量の各データも含まれている。また、加速度・方位センサ６１８は、撮像装置９の移動に伴う角度の変化（Roll角、Pitch角、Yaw角）を検出するセンサである。角度の変化はExifに沿った関連情報（メタデータ）の一例であり、撮像画像の画像補正等の画像処理に利用される。更に、加速度・方位センサ６１８は、３軸方向の加速度を検出するセンサである。撮像装置９は、加速度・方位センサ６１８が検出した加速度に基づいて、自装置（撮像装置９）の姿勢（重力方向に対する角度）を算出する。撮像装置９に、加速度・方位センサ６１８が設けられることによって、画像補正の精度が向上する。 The acceleration/azimuth sensor 618 calculates the azimuth of the imaging device 9 from the earth's magnetism and outputs azimuth information. This orientation information is an example of related information (metadata) in accordance with Exif, and is used for image processing such as image correction of captured images. Note that the related information also includes data such as the date and time when the image was captured and the data capacity of the image data. Further, the acceleration/azimuth sensor 618 is a sensor that detects changes in angle (roll angle, pitch angle, yaw angle) accompanying movement of the imaging device 9. A change in angle is an example of related information (metadata) according to Exif, and is used for image processing such as image correction of a captured image. Furthermore, the acceleration/direction sensor 618 is a sensor that detects acceleration in three axial directions. The imaging device 9 calculates the attitude (angle with respect to the direction of gravity) of its own device (imaging device 9) based on the acceleration detected by the acceleration/azimuth sensor 618. By providing the acceleration/direction sensor 618 in the imaging device 9, the accuracy of image correction is improved.

＜全天球画像のフォーマット＞
図４は、全天球画像のフォーマットを説明する図である。図４（ａ）は正距円筒画像であり、図４（ｂ）は立体球である。正距円筒画像は現実世界の3次元座標系を２次元に転写したものである。図４では、転写における座標系の変換方法を示している。 <Format of spherical image>
FIG. 4 is a diagram illustrating the format of a spherical image. FIG. 4(a) is an equirectangular cylinder image, and FIG. 4(b) is a three-dimensional sphere. An equirectangular image is a two-dimensional transfer of the three-dimensional coordinate system of the real world. FIG. 4 shows a method of converting a coordinate system in transfer.

立体球の中心を通るベクトルＡは、θとφにより表すことができる。
θ：3次元空間上の水平方向の角度
φ：3次元空間上の垂直方向の角度
このθとφで指定される画素を２次元に転写したものが全天球画像に一般的に使われる正距円筒画像である。本実施形態では平面状態の全天球画像は正距円筒画像であるとして説明する。図４（ａ）に示すように、正距円筒画像は、水平方向に360度、垂直方向に180度の画角となる。 A vector A passing through the center of the three-dimensional sphere can be expressed by θ and φ.
θ: Angle in the horizontal direction in 3D space φ: Angle in the vertical direction in 3D space The two-dimensional transfer of the pixels specified by θ and φ is the normal angle generally used for spherical images. This is a rectangular cylinder image. In this embodiment, the spherical image in a planar state will be described as an equirectangular image. As shown in FIG. 4(a), the equirectangular cylindrical image has an angle of view of 360 degrees in the horizontal direction and 180 degrees in the vertical direction.

＜立体視用の画像のフォーマット＞
図５は、立体視用の画像のフォーマットの一例である。全天球画像に限らず、立体視可能な画像にはいくつかのフォーマットが存在する。ここに記載したものは、その中の一つであるトップ＆ボトム形式と呼ばれるフォーマットである。上が左目に表示する画像であり、下が右目に表示する画像に対応する。本実施形態の画像処理装置１０は最終的な出力画像の形式として、正距円筒画像を縦に２つ並べたトップ＆ボトム形式の画像を出力する。 <Format of image for stereoscopic viewing>
FIG. 5 is an example of the format of a stereoscopic image. In addition to spherical images, there are several formats for stereoscopic images. The one described here is one of these formats, called the top and bottom format. The top image corresponds to the image displayed to the left eye, and the bottom image corresponds to the image displayed to the right eye. The image processing apparatus 10 of this embodiment outputs a top-and-bottom format image in which two equirectangular images are vertically arranged as the final output image format.

なお、立体視用の画像のフォーマットは他にもいくつかある。出力画像は以下のいずれでも構わない。
・サイドバイサイド形式：左目用の画像と右目用の画像を左右に並べた形式
・フレームシーケンシャル形式：動画用の方式で、左目用の画像と右目用の画像を交互に動画フレームとして並べる形式
なお、元の正距円筒画像は横長（１（縦）：２（横））のアスペクト比であるため、立体視用の全天球画像に関してはトップ＆ボトム形式が採用されることが多い。トップ＆ボトム形式を用いると、上記のアスペクト比から丁度、正方形の画像サイズになるためである。 Note that there are several other formats for stereoscopic images. The output image may be any of the following.
・Side-by-side format: A format in which images for the left eye and images for the right eye are arranged side by side. ・Frame sequential format: A format for videos in which images for the left eye and images for the right eye are arranged alternately as video frames. Since the equirectangular image has an aspect ratio of horizontally long (1 (vertical): 2 (horizontal)), the top and bottom format is often adopted for spherical images for stereoscopic viewing. This is because if the top and bottom format is used, the image size will be exactly square based on the above aspect ratio.

＜画像処理装置の機能について＞
図６は、画像処理装置１０の機能をブロック状に示す機能ブロック図の一例である。まず、図６（ａ）は学習フェーズの画像処理装置１０を示す。画像処理装置１０は、記憶部４１、学習部４２、画像出力部４３及び学習データ作成部４４を有している。このうち画像出力部４３は学習により構築されるため点線で示した。画像処理装置１０が有するこれらの各機能は、画像処理装置１０が有するＣＰＵ５０１が、ＨＤ５０４からＲＡＭ５０３に展開されたプログラムを実行することで実現される機能又は手段である。また、記憶部４１は、画像処理装置１０が有するＨＤ５０４又はＲＡＭ５０３の少なくとも一方に形成されている。 <About the functions of the image processing device>
FIG. 6 is an example of a functional block diagram showing the functions of the image processing device 10 in a block form. First, FIG. 6(a) shows the image processing apparatus 10 in a learning phase. The image processing device 10 includes a storage section 41, a learning section 42, an image output section 43, and a learning data creation section 44. Among these, the image output unit 43 is constructed by learning and is therefore shown with a dotted line. Each of these functions that the image processing device 10 has is a function or means that is realized by the CPU 501 of the image processing device 10 executing a program loaded from the HD 504 to the RAM 503. Further, the storage unit 41 is formed in at least one of the HD 504 and the RAM 503 included in the image processing apparatus 10.

学習データ作成部４４は、学習データを作成する。本実施形態では、例えば３Ｄモデリングソフトを使って、入力用の全天球画像、及び、教師データとなる右目用と左目用の２つの全天球画像を作成する。 The learning data creation unit 44 creates learning data. In this embodiment, for example, 3D modeling software is used to create a spherical image for input and two spherical images for the right eye and for the left eye, which serve as teacher data.

記憶部４１には、学習データ記憶部４９が構築されている。学習データ記憶部４９は、学習データを記憶している。本実施形態の学習データは、入力用の全天球画像、及び、用意された立体視用の全天球画像（左目用と右目用の２つの全天球画像である）。左目用と右目用の１対の全天球画像が教師データである。学習データの作成方法については後述する。 A learning data storage section 49 is constructed in the storage section 41 . The learning data storage unit 49 stores learning data. The learning data of this embodiment includes a spherical image for input and a prepared spherical image for stereoscopic viewing (two spherical images for the left eye and for the right eye). A pair of spherical images for the left eye and for the right eye is the training data. The method for creating learning data will be described later.

なお、学習データはサーバからネットワークを介してダウンロードされてもよい。サーバはクラウドにあってもオンプレミスにあってもよい。また、記憶部４１は画像処理装置１０の外部に設けられていてもよい。 Note that the learning data may be downloaded from a server via a network. The server can be in the cloud or on-premises. Further, the storage unit 41 may be provided outside the image processing device 10.

学習部４２は、ニューラルネットワークを初めとする各種の機械学習のアプローチで、入力用の全天球画像と教師データの全天球画像の対応を学習する。機械学習とは、コンピュータに人のような学習能力を獲得させるための技術であり、コンピュータがデータ識別等に必要なアルゴリズムを事前に取り込まれる学習データから自律的に生成し新たなデータについてこれを適用して予測を行う技術のことをいう。機械学習のための学習方法は、教師あり学習、教師なし学習、半教師学習、強化学習、深層学習のいずれかの方法でもよく、更に、これらの学習方法を組み合わせた学習方法でもよく、機械学習のための学習方法は問わない。 The learning unit 42 uses various machine learning approaches such as neural networks to learn the correspondence between the input spherical image and the teacher data spherical image. Machine learning is a technology that allows computers to acquire human-like learning abilities, in which computers autonomously generate algorithms necessary for data identification etc. from pre-loaded learning data, and then apply these algorithms to new data. This refers to the technology that is applied to make predictions. The learning method for machine learning may be supervised learning, unsupervised learning, semi-supervised learning, reinforcement learning, or deep learning, or may be a learning method that combines these learning methods. It doesn't matter what learning method you use.

教師データが立体視用の全天球画像なので、入力用の全天球画像に対し立体視用の全天球画像を推定する（出力する）画像出力部４３が学習の結果、得られる。 Since the teacher data is a spherical image for stereoscopic viewing, an image output unit 43 that estimates (outputs) a spherical image for stereoscopic viewing from an input spherical image is obtained as a result of learning.

なお、学習時の画像処理装置１０は、ネットワーク上に存在してもよい。この場合、ＰＣ（Personal Computer）などのクライアント端末が学習データを画像処理装置１０に送信し、構築した画像出力部４３をクライアント端末に返す。 Note that the image processing device 10 during learning may exist on a network. In this case, a client terminal such as a PC (Personal Computer) transmits learning data to the image processing device 10, and returns the constructed image output section 43 to the client terminal.

図６（ｂ）は画像推定フェーズの画像処理装置１０の機能ブロック図の一例である。画像処理装置１０は上記の画像出力部４３を有している。画像出力部４３は全天球画像の撮像装置９から全天球画像を取得して、立体視用の全天球画像（右目用の全天球画像と左目用の全天球画像）を出力する。入力用の全天球画像は、撮像装置９が撮像したものだけでなく、記憶部４１、ＵＳＢメモリ又はネットワーク上に記憶されていたものでもよい。 FIG. 6(b) is an example of a functional block diagram of the image processing device 10 in the image estimation phase. The image processing device 10 has the above-mentioned image output section 43. The image output unit 43 acquires the spherical image from the spherical image capturing device 9 and outputs the spherical image for stereoscopic viewing (the spherical image for the right eye and the spherical image for the left eye). do. The spherical image for input is not limited to one captured by the imaging device 9, but may be one stored in the storage unit 41, a USB memory, or a network.

また、画像推定フェーズの画像処理装置１０は、ネットワーク上に存在してもよい。この場合、クライアント端末が入力用の全天球画像を画像処理装置１０に送信し、画像出力部４３が出力した立体視用の全天球画像をクライアント端末に返す。 Further, the image processing device 10 in the image estimation phase may exist on a network. In this case, the client terminal transmits the input spherical image to the image processing device 10, and the image output unit 43 returns the stereoscopic spherical image output to the client terminal.

＜学習データの作成方法＞
続いて、図７～図１２を用いて学習データの作成方法を説明する。まず、図７は、画像処理装置１０が行う学習データの作成方法の流れを説明するフローチャート図である。図７に示すように、学習データ作成部４４は、カメラの設置、レンダリング、及び、左右画像のスワップを行う。 <How to create learning data>
Next, a method for creating learning data will be explained using FIGS. 7 to 12. First, FIG. 7 is a flowchart illustrating the flow of a learning data creation method performed by the image processing apparatus 10. As shown in FIG. 7, the learning data creation unit 44 installs a camera, performs rendering, and swaps left and right images.

（Ｓ１カメラの設置）
図８は、３Ｄモデリングソフトを用いた学習データの作成方法を説明する図である。本実施形態では、教師データは、３Ｄモデリングソフトを用いて作成する。３Ｄモデリングソフトとは、３ＤＣＡＤや３ＤＣＧのデータをコンピュータ上で可視化するアプリケーションである。３ＤＣＡＤは主に数式で三次元形状を表し、３ＤＣＧはポリゴンの組み合わせで三次元形状を表す。 (S1 Camera installation)
FIG. 8 is a diagram illustrating a method for creating learning data using 3D modeling software. In this embodiment, the teacher data is created using 3D modeling software. 3D modeling software is an application that visualizes 3D CAD and 3DCG data on a computer. 3D CAD mainly represents a three-dimensional shape using mathematical formulas, and 3DCG represents a three-dimensional shape using a combination of polygons.

開発者などが適当な都市のモデリングデータを３Ｄモデリングソフトで再現しておく。学習データ作成部４４は都市内に仮想的なカメラ４１０を設置する。設置する位置に決まりはなく周囲を撮像できればよい。このカメラは全天球画像を撮像できるカメラであるが、このカメラは２つの光学系を有していなくてよい。仮想空間なのでカメラパラメータとして画角を360度に設定すれば良いためである。ただし、例えば背中合わせに２つで１組の光学撮像系を有し、それぞれの光学撮像系のレンズが魚眼レンズの内部パラメータを有するカメラでもよい。 A developer or the like reproduces appropriate modeling data of a city using 3D modeling software. The learning data creation unit 44 installs a virtual camera 410 within the city. There is no limit to where it can be installed, as long as it can capture images of the surrounding area. This camera is a camera capable of capturing a spherical image, but this camera does not need to have two optical systems. This is because since it is a virtual space, it is only necessary to set the angle of view to 360 degrees as a camera parameter. However, for example, it may be a camera that has a set of two optical imaging systems placed back to back, and the lenses of each optical imaging system have internal parameters of a fisheye lens.

カメラは計３台設置される。例えば、並行に並べて左側、中央、右側の３つに配置される。左側と右側のカメラが人間の左右の目に相当するので、左右のカメラが人間の左右の目の視差と同等になる程度に離して設置される。なお、視差を強調したい場合は実際の人間の視差より左右のカメラ間隔を広げてもよい。中央のカメラは入力用の全天球画像を撮像し、左右のカメラは教師データの全天球画像を撮像する。 A total of three cameras will be installed. For example, they are arranged in parallel on the left, center, and right. Since the left and right cameras correspond to the left and right human eyes, the left and right cameras are installed at a distance equivalent to the parallax between the left and right human eyes. Note that if you want to emphasize the parallax, the distance between the left and right cameras may be made wider than the actual human parallax. The center camera captures a spherical image for input, and the left and right cameras capture spherical images for teacher data.

仮想的なカメラは学習データの数として十分な枚数になるだけ撮像する必要があり、３つのカメラはそれぞれ、例えば300枚程度を撮像する。カメラの位置を移動させながら定期的又は等間隔に決まった枚数の撮像を繰り返す。なお、必要な枚数はニューラルネットワークの重み（パラメータ）が収束するために必要な枚数であるため、３００枚は一例である。 The virtual camera needs to take enough images to provide learning data, and each of the three cameras takes about 300 images, for example. A fixed number of images are repeatedly captured at regular intervals or at equal intervals while moving the camera position. Note that the required number of sheets is the number necessary for the weights (parameters) of the neural network to converge, so 300 sheets is an example.

（Ｓ２レンダリング）
図９は、左側、中央、右側の３つの仮想的なカメラの１回の撮像結果を示す。図９（ａ）が左側のカメラの全天球画像、図９（ｂ）が中央のカメラの全天球画像、図９（ｃ）が右側のカメラの全天球画像である。ほぼ同じ画像であるが、厳密には視差が生じている。 (S2 rendering)
FIG. 9 shows the results of one time of imaging by three virtual cameras on the left, center, and right. 9(a) is the spherical image of the left camera, FIG. 9(b) is the spherical image of the center camera, and FIG. 9(c) is the spherical image of the right camera. Although the images are almost the same, strictly speaking, there is a parallax.

なお、カメラは実際に撮像するわけでなく、画角内の３ＤＣＡＤや３ＤＣＧのデータを２次元に透視投影することで全天球画像を得る。この処理をレンダリングという場合がある。一般には、カメラの位置に関する外部パラメータと射影方式（魚眼レンズ）に関する内部パラメータ（行列になっている）を開発者等が設定しておき、３ＤＣＡＤや３ＤＣＧのデータに乗じることで、全天球画像の画素値が定まる。 Note that the camera does not actually capture images, but obtains a spherical image by two-dimensionally projecting 3D CAD or 3DCG data within the viewing angle. This process is sometimes called rendering. In general, developers set external parameters related to the camera position and internal parameters (in a matrix) related to the projection method (fisheye lens), and then multiply them by 3D CAD or 3DCG data to create a spherical image. The pixel value is determined.

（Ｓ３左右画像のスワップ）
次に、図１０に基づいて、左右画像のスワップについて説明する。図１０は、左右の画像と目の関係を説明する図である。図９に示した右目用の全天球画像と左目用の全天球画像では教師データにならない。これは、左右のカメラの位置関係と、人間の目の位置が、人間の体の向きによって変わるからである。 (S3 Swap left and right images)
Next, swapping of left and right images will be explained based on FIG. 10. FIG. 10 is a diagram illustrating the relationship between the left and right images and the eyes. The spherical image for the right eye and the spherical image for the left eye shown in FIG. 9 do not serve as training data. This is because the positional relationship between the left and right cameras and the position of the human eye change depending on the orientation of the human body.

図１０に示すように、人間が前を向いている場合は左側に設置したカメラが左目に相当し、右側に設置したカメラが右目に相当する。しかし、逆方向を人間が向いた場合、左側に設置したカメラは右目に相当し、右側に設置したカメラが左目に相当してしまう。 As shown in FIG. 10, when a person is facing forward, the camera installed on the left side corresponds to the left eye, and the camera installed on the right side corresponds to the right eye. However, when a person faces in the opposite direction, the camera installed on the left side corresponds to the right eye, and the camera installed on the right side corresponds to the left eye.

そこで、図１１に示すように、学習データ作成部４４は左右画像のスワップを行う。図１１は、左右画像スワップで行われる処理を説明する図である。左右画像のスワップとは、左側のカメラの全天球画像と右側のカメラの全天球画像を、左目用画像と右目用画像にそれぞれ変換する処理である。 Therefore, as shown in FIG. 11, the learning data creation unit 44 swaps the left and right images. FIG. 11 is a diagram illustrating the processing performed in left and right image swapping. Swapping the left and right images is a process of converting the spherical image of the left camera and the spherical image of the right camera into a left-eye image and a right-eye image, respectively.

背中合わせに２つの光学撮像系（レンズと撮像素子など）を有する撮像装置９の場合、正面レンズが画像の左から1/4の位置から3/4の位置までを撮像し、残りを背面レンズが撮像する。したがって、右側カメラの背面の画像と左側カメラの背面の画像を交換すればよい。 In the case of an imaging device 9 that has two optical imaging systems (lens and image sensor, etc.) placed back to back, the front lens captures the image from 1/4 to 3/4 from the left, and the rear lens captures the rest. Take an image. Therefore, the image of the back of the right camera and the image of the back of the left camera may be exchanged.

図１２は全天球画像を立体球とした場合の左右画像のスワップを説明する図である。図１２に示すように、左側のカメラの全天球画像のうち正面レンズの部分画像と、右側のカメラの全天球画像のうち背面レンズの部分画像を組み合わせれば、それは左目の画像となる。同様に、右側のカメラの全天球画像のうち正面レンズの部分画像と、左側のカメラの全天球画像のうち背面レンズの部分画像を組み合わせれば、それは右目の画像となる。つまり左右のカメラの全天球画像のうち背面レンズ側をスワップした画像を作ることが左右の目の全天球画像を作ることになる。 FIG. 12 is a diagram illustrating swapping of left and right images when the omnidirectional image is a three-dimensional sphere. As shown in Figure 12, if the partial image of the front lens of the spherical image of the left camera is combined with the partial image of the rear lens of the spherical image of the right camera, it becomes the image of the left eye. . Similarly, if a partial image of the front lens of the spherical image of the right camera is combined with a partial image of the rear lens of the spherical image of the left camera, the resulting image becomes the image of the right eye. In other words, creating an image by swapping the rear lens side of the spherical images of the left and right cameras creates a spherical image of the left and right eyes.

（Ｓ４終了判定）
学習データ作成部４４は十分な（閾値以上の）枚数の学習データを生成するまでＳ１～Ｓ３を行う。 (S4 Completion Judgment)
The learning data creation unit 44 performs S1 to S3 until a sufficient number of learning data (more than a threshold value) is generated.

以上で、学習データ作成部４４は、入力用の全天球画像と、教師データ（左目用の全天球画像、右目用の全天球画像）とを含む学習データを生成できた。画像処理装置１０は、同様の処理を、３つの仮想的なカメラが撮像した全ての全天球画像について行う。 As described above, the learning data creation unit 44 was able to generate learning data including the input spherical image and the teacher data (the spherical image for the left eye and the spherical image for the right eye). The image processing device 10 performs similar processing on all spherical images captured by the three virtual cameras.

＜ニューラルネットワークの構造＞
図１３は、ＣＮＮ（Convolutional Neural Network）のニューラルネットワークの構成例を示す。ニューラルネットワーク又は深層学習（Deep Neural Network）のうち、畳み込み（Convolutional）演算を使用するネットワークをＣＮＮという。ＣＮＮにより上記の推定アルゴリズム（画像出力部）が構築される。本実施形態では主にＣＮＮを使用するが、画像生成さえできれば、どのようなアルゴリズムを用いても良い。 <Structure of neural network>
FIG. 13 shows an example of the configuration of a CNN (Convolutional Neural Network) neural network. Among neural networks and deep learning, a network that uses convolutional operations is called a CNN. The above estimation algorithm (image output unit) is constructed by CNN. Although CNN is mainly used in this embodiment, any algorithm may be used as long as it can generate an image.

図１４は、畳み込みと逆畳み込みを模式的に説明する図である。図１４（ａ）は畳み込み処理のイメージであり、図１４（ｂ）は逆畳み込みのイメージである。畳み込み部５１，５３は入力画像３０１にフィルタ（カーネル）３０２の各要素を掛け合わせて特徴マップ３０３の１画素分の値を得る。畳み込み部５１，５３はストライドと呼ばれるシフト量だけずらして同じ演算を行う。図１４（ａ）では入力画像が５×５、フィルタが３×３、ストライドが１なので、３×３の特徴マップ３０３が得られている。フィルタはチャンネル（例えばＲＧＢ）の数だけ用意されるため、チャンネル数に応じた特徴マップが得られる。なお、１つの畳み込みごとにＲＥＬＵ関数などで非線形変換することが一般的である。 FIG. 14 is a diagram schematically explaining convolution and deconvolution. FIG. 14(a) is an image of convolution processing, and FIG. 14(b) is an image of deconvolution. The convolution units 51 and 53 multiply the input image 301 by each element of the filter (kernel) 302 to obtain a value for one pixel of the feature map 303. The convolution units 51 and 53 perform the same calculation with a shift amount called stride. In FIG. 14A, the input image is 5×5, the filter is 3×3, and the stride is 1, so a 3×3 feature map 303 is obtained. Since as many filters as there are channels (for example, RGB) are prepared, feature maps corresponding to the number of channels can be obtained. Note that it is common to perform nonlinear transformation using a RELU function or the like for each convolution.

逆畳み込み部５２，５４は、畳み込みで得られた特徴マップ（一例として３×３とする）を入力画像３０５にして、画素の周囲に空白を足してサイズを拡大した後でフィルタ３０６（３×３）を施す。逆畳み込み部５２，５４はストライドと呼ばれるシフト量だけずらして同じ演算を行う。これにより、図１４（ｂ）では５×５の特徴マップ３０７が得られている。特徴マップ３０３の周囲に埋め込まれる値は空白（ゼロ）でなくてもよい。例えば、畳み込み演算で得られた同じサイズの特徴マップを使用する方法がある。 The deconvolution units 52 and 54 use the feature map obtained by convolution (3×3 as an example) as an input image 305, add spaces around the pixels to enlarge the size, and then apply a filter 306 (3×3) to the input image 305. 3). The deconvolution units 52 and 54 perform the same calculations with a shift amount called a stride. As a result, a 5×5 feature map 307 is obtained in FIG. 14(b). The values embedded around the feature map 303 do not have to be blank (zero). For example, there is a method that uses feature maps of the same size obtained by convolution operations.

図１３に戻って説明する。図１３では、学習部４２が入力される全天球画像をコピーして２つの全天球画像を作成している。上段が左目用であり、下段が右目用である。畳み込み部５１，５３ではいくつかの畳み込みが行われ、特徴抽出している。特徴抽出の過程ではフィルタが用意される。このフィルタの係数は特徴を抽出するように自動的に学習される。フィルタのサイズとストライドにより元の画像は小さくなっていく。図１３では、「２×１」のサイズまで特徴抽出されているが一例に過ぎない。なお、８，１６，…２０４８は畳み込みにおけるチャンネル数である。 The explanation will be returned to FIG. 13. In FIG. 13, the learning unit 42 copies the input spherical image to create two spherical images. The upper row is for the left eye, and the lower row is for the right eye. The convolution units 51 and 53 perform several convolutions and extract features. A filter is prepared in the process of feature extraction. The coefficients of this filter are automatically learned to extract features. The original image becomes smaller depending on the filter size and stride. In FIG. 13, features are extracted up to a size of "2×1", which is just an example. Note that 8, 16, . . . 2048 is the number of channels in convolution.

逆畳み込み部５２，５４では、学習部４２が特徴マップに対し逆畳み込みを行う。図１３では入力された全天球画像に、逆畳み込みで得られる特徴マップを近づけるために、畳み込み層の同一のサイズの画像を取り入れている。ただし、この処理はなくてもよい。 In the deconvolution units 52 and 54, the learning unit 42 performs deconvolution on the feature map. In FIG. 13, images of the same size in the convolution layer are incorporated in order to bring the feature map obtained by deconvolution closer to the input spherical image. However, this process may not be necessary.

学習時には、用意した右目用と左目用の教師データと、逆畳み込み部５２，５４が出力する画像との差を学習部４２が算出する。この差は逆畳み込み部５２，５４、畳み込み部５１，５３の順に逆伝播される。逆伝播によりフィルタの係数が更新される。 During learning, the learning unit 42 calculates the difference between the prepared teacher data for the right eye and the left eye and the images output by the deconvolution units 52 and 54. This difference is back-propagated in the order of deconvolution units 52 and 54 and convolution units 51 and 53. The coefficients of the filter are updated by backpropagation.

図１３の処理を、フィルタの係数の変化が収束するまで又は全ての学習データについて学習部４２が行うことで、フィルタの係数が学習され、入力された全天球画像に対し左目用の全天球画像と右目用の全天球画像を出力する画像出力部４３が構築される。 The learning unit 42 performs the process shown in FIG. 13 until the change in the filter coefficients converges or for all the learning data, so that the filter coefficients are learned and the left-eye omnidirectional image is applied to the input omnidirectional image. An image output unit 43 is constructed that outputs a spherical image and a spherical image for the right eye.

なお、図１３の構成は一例であり、畳み込み層の後にプーリング層があってもよいし、逆畳み込み層の代わりに全結合層があってもよい。 Note that the configuration in FIG. 13 is an example, and a pooling layer may be provided after the convolutional layer, and a fully connected layer may be provided instead of the deconvolutional layer.

＜立体視用全天球画像を出力処理＞
図１５は、画像処理装置１０が立体視用の全天球画像を出力する処理を説明するフローチャート図の一例である。 <Output processing of spherical images for stereoscopic viewing>
FIG. 15 is an example of a flowchart illustrating a process in which the image processing device 10 outputs a 3D spherical image.

まず、画像出力部４３は入力用の全天球画像を取得する（Ｓ１１）。全天球画像は撮像装置９がリアルタイムに撮像したものでもよいし、記憶部４１に記憶されていてもよい。 First, the image output unit 43 acquires a spherical image for input (S11). The spherical image may be captured by the imaging device 9 in real time, or may be stored in the storage unit 41.

次に、画像出力部４３は入力された全天球画像をコピーして右目用と左目用の全天球画像を生成する（Ｓ１２）。 Next, the image output unit 43 copies the input spherical image to generate spherical images for the right eye and for the left eye (S12).

画像出力部４３は、右目用と左目用の全天球画像それぞれに推定アルゴリズムを適用する（Ｓ１３）。すなわち、図１３に示した畳み込みと逆畳み込みを行って、右目用と左目用の全天球画像を出力する。 The image output unit 43 applies the estimation algorithm to each of the right-eye and left-eye spherical images (S13). That is, the convolution and deconvolution shown in FIG. 13 are performed to output spherical images for the right eye and for the left eye.

画像出力部４３は、右目用と左目用の全天球画像から立体視用の全天球画像を生成する（Ｓ１４）。すなわち、トップ&ボトム形式と呼ばれるフォーマットに変換する。 The image output unit 43 generates a spherical image for stereoscopic vision from the spherical images for the right eye and the left eye (S14). In other words, it is converted into a format called top and bottom format.

＜主な効果＞
以上説明したように、本実施形態の画像処理装置１０は、１つの全天球画像から立体視可能な全天球画像を生成できる。時間をおいての撮像や特殊な撮像装置が必要ない。また、すでに撮像済みの全天球画像から立体視可能な全天球画像を生成できる。 <Main effects>
As described above, the image processing device 10 of this embodiment can generate a stereoscopically viewable omnidirectional image from one omnidirectional image. There is no need for timed imaging or special imaging equipment. Furthermore, a spherical image that can be viewed stereoscopically can be generated from a spherical image that has already been captured.

本実施例では、距離画像を生成しそれを左右の目の画像に変換する画像処理装置１０について説明する。 In this embodiment, an image processing device 10 that generates a distance image and converts it into left and right eye images will be described.

図１６は、本実施例の画像処理装置１０が立体視用の全天球画像を作成する処理の概略を説明する図である。本実施例でも、入力される全天球画像４００は１つであり、最終的な出力は左目用と右目用の全天球画像である。 FIG. 16 is a diagram illustrating an outline of a process in which the image processing apparatus 10 of this embodiment creates a 3D image for stereoscopic viewing. In this embodiment as well, only one spherical image 400 is input, and the final output is a spherical image for the left eye and a spherical image for the right eye.

(1) 本実施例では、左目用と右目用の全天球画像をニューラルネットワーク４０１にて直接出力するのではなく、ニューラルネットワーク４０１で出力するのは距離画像４０２とする。距離画像４０２とは画素ごとにこの画素に写っている物体までの距離情報が配置された画像である。 (1) In this embodiment, the neural network 401 does not directly output the omnidirectional images for the left eye and the right eye, but instead outputs the distance image 402 from the neural network 401. The distance image 402 is an image in which distance information to an object reflected in this pixel is arranged for each pixel.

(2) 画像処理装置１０は、入力された全天球画像４００と距離画像４０２を用いてそれぞれが画素値を有する三次元点群４０４を作成する。 (2) The image processing device 10 uses the input spherical image 400 and the distance image 402 to create a three-dimensional point group 404, each of which has a pixel value.

(3) 画像処理装置１０は三次元点群４０４の画素値を用いて左目用と右目用の正距円筒画像４０５を生成する。 (3) The image processing device 10 uses the pixel values of the three-dimensional point group 404 to generate equirectangular cylinder images 405 for the left eye and the right eye.

この方法は、実施例１の方法に対して、最終出力の左目用と右目用の全天球画像の解像度を高くしやすいというメリットがある。実施例１の方法では、左目用と右目用の全天球画像を直接ニューラルネットワークで生成していたため、左目用と右目用の全天球画像の品質はニューラルネットワークで消費するメモリの量に依存する。 This method has an advantage over the method of the first embodiment in that it is easier to increase the resolution of the final output spherical images for the left eye and for the right eye. In the method of Example 1, the spherical images for the left and right eyes were directly generated by the neural network, so the quality of the spherical images for the left and right eyes depends on the amount of memory consumed by the neural network. do.

画像生成系のニューラルネットワークはメモリ消費が激しいため、大規模なGPUを用意しない限りはサイズに物理的な制約がかかる。撮像装置９は高解像度なので、実施例１では入力用の全天球画像を縮小してニューラルネットワークに適用せねばならず、解像度が落ちる。 Neural networks for image generation consume a lot of memory, so unless you have a large-scale GPU, there are physical constraints on the size. Since the imaging device 9 has a high resolution, in the first embodiment, the input spherical image must be reduced in size and applied to the neural network, resulting in a decrease in resolution.

これに対して、本実施例では距離画像を生成する。距離画像自体の解像度はニューラルネットワークで作る点で実施例１と同じため、2048 x 1024程度になる。しかし、距離画像は構造が単純であるために拡大しても画質が劣化しにくい。むしろ拡大すべきである点が利点になる。つまり、元の解像度をフルに活かして、解像度を上げて左目用と右目用の全天球画像を出力することができる。 In contrast, in this embodiment, a distance image is generated. The resolution of the distance image itself is the same as in Example 1 in that it is created using a neural network, so it is approximately 2048 x 1024. However, since the distance image has a simple structure, the image quality is less likely to deteriorate even if it is enlarged. Rather, the advantage is that it should be expanded. In other words, it is possible to make full use of the original resolution, increase the resolution, and output spherical images for the left and right eyes.

＜機能について＞
図１７は、画像処理装置１０の機能をブロック状に示す機能ブロック図の一例である。図１７（ａ）は学習フェーズにおける画像処理装置１０の機能をブロック状に示す機能ブロック図の一例である。本実施例では学習データ作成部４４の機能が実施例１と異なっている。本実施例の学習データ作成部４４は入力用の全天球画像から１枚の距離画像を作ればよく、右目用と左目用の全天球画像（教師データ）を作成する必要がない。 <About functions>
FIG. 17 is an example of a functional block diagram showing the functions of the image processing device 10 in a block form. FIG. 17A is an example of a functional block diagram showing in block form the functions of the image processing device 10 in the learning phase. In this embodiment, the function of the learning data creation section 44 is different from that in the first embodiment. The learning data creation unit 44 of this embodiment only needs to create one distance image from the input spherical image, and there is no need to create spherical images for the right eye and for the left eye (teacher data).

また、学習部４２が画像出力部４３を構築する点は実施例１と同じだが、画像出力部４３は１枚の距離画像を作成する。 Further, the point that the learning section 42 constructs the image output section 43 is the same as in the first embodiment, but the image output section 43 creates one distance image.

図１７（ｂ）は画像推定フェーズにおける画像処理装置１０の機能をブロック状に示す機能ブロック図の一例である。本実施例では、新たに視差計算部４５を有している。視差計算部４５は、画像出力部４３が作成した距離画像と入力された全天球画像から左目用の全天球画像と右目用の全天球画像を作成する。 FIG. 17(b) is an example of a functional block diagram showing in block form the functions of the image processing device 10 in the image estimation phase. This embodiment additionally includes a parallax calculation section 45. The parallax calculation unit 45 creates a spherical image for the left eye and a spherical image for the right eye from the distance image created by the image output unit 43 and the input spherical image.

＜学習データの作成方法＞
図１８は、画像処理装置１０が行う学習データの作成方法の流れを説明するフローチャート図である。図１８の説明では、主に図７との相違を説明する。図１８に示すように、学習データ作成部４４は、カメラの設置、及び、レンダリングを行うが、左右画像のスワップを行う必要がない。 <How to create learning data>
FIG. 18 is a flowchart illustrating the flow of the learning data creation method performed by the image processing device 10. In the explanation of FIG. 18, differences from FIG. 7 will be mainly explained. As shown in FIG. 18, the learning data creation unit 44 installs a camera and performs rendering, but there is no need to swap left and right images.

（Ｓ１カメラの設置）
本実施例では、正距円筒画像とそれに対応した距離画像が学習データとなる。教師データは距離画像である。本実施例においても３Ｄモデリングソフトが利用される。本実施例では学習データ作成部４４は１つのみ仮想的なカメラを配置すればよい。 (S1 Camera installation)
In this embodiment, an equirectangular cylinder image and a corresponding distance image serve as learning data. The training data is a distance image. 3D modeling software is also used in this embodiment. In this embodiment, the learning data creation unit 44 only needs to arrange one virtual camera.

（Ｓ２レンダリング）
実施例１と同様に、学習データ作成部４４は仮想的なカメラに全天球画像をレンダリングする。また、本実施例では学習データ作成部４４は距離画像を作成するのでＲＧＢの輝度値は不要であり、ｚバッファと呼ばれる深度情報をレンダリングする。３Ｄモデリングソフトではzバッファと呼ばれるカメラに対する深度情報を利用できる。深度情報はカメラと物体の特定点との距離を表す情報である。３Ｄモデルではカメラを固定した際に空間内のオブジェクト同士がカメラから見た際に重なり合うことがあり、その際に効率的に描画処理を進めるためにはカメラから遠いオブジェクトを描画する作業は無駄になる。そこで使われるデータがzバッファで、カメラに近いオブジェクトのみをレンダリングすることで描画速度を高速化するために用いる。 (S2 rendering)
As in the first embodiment, the learning data creation unit 44 renders a spherical image to a virtual camera. Furthermore, in this embodiment, since the learning data creation unit 44 creates a distance image, RGB luminance values are not necessary, and depth information called a z-buffer is rendered. 3D modeling software can use depth information for the camera called z-buffer. Depth information is information representing the distance between the camera and a specific point on an object. In a 3D model, when the camera is fixed, objects in space may overlap when viewed from the camera, and in this case, in order to proceed with the drawing process efficiently, it is unnecessary to draw objects that are far from the camera. Become. The data used here is the z-buffer, which is used to speed up the drawing speed by rendering only the objects closest to the camera.

学習データ作成部４４はカメラの外部パラメータと内部パラメータを使ってＲＧＢ値でなく距離情報（ｚバッファ）を正距円筒画像に投影する。よって、３Ｄモデルに設置された仮想的なカメラでは、カメラを基準にして３Ｄモデル空間を構成する全てのオブジェクトに対して距離情報の計測が可能になる。 The learning data creation unit 44 uses external and internal parameters of the camera to project not RGB values but distance information (z buffer) onto an equirectangular image. Therefore, with the virtual camera installed in the 3D model, distance information can be measured for all objects that constitute the 3D model space using the camera as a reference.

図１９はレンダリング結果の一例を示す図である。図１９（ａ）は入力用の全天球画像であり、図１９（ｂ）は距離画像である。図面の制約上、読み取れないが、図１９（ｂ）では距離に応じた色が付されている。 FIG. 19 is a diagram showing an example of a rendering result. FIG. 19(a) is a spherical image for input, and FIG. 19(b) is a distance image. Although it cannot be read due to drawing limitations, in FIG. 19(b), colors are assigned according to the distance.

一般的に距離画像はグレースケールの画像で表現される。しかしグレースケールでは色の幅が256階調しかないため距離の分解能が少ない。このため、図１９（ｂ）では256段階でなく、より高分解能で距離を表している（作図の都合で濃淡の種類は実際よりも少なくなっている）。これにより学習データ作成部４４は、密度の高い距離画像をレンダリングできる。 Generally, distance images are expressed as grayscale images. However, in grayscale, the color width is only 256 gradations, so the distance resolution is low. Therefore, in FIG. 19(b), the distance is expressed in higher resolution rather than in 256 steps (for convenience of drawing, there are fewer types of shading than in reality). Thereby, the learning data creation unit 44 can render a high-density distance image.

＜ニューラルネットワークの構造＞
図２０は、本実施例のニューラルネットワークの構成例を示す。図２０は1枚の距離画像を出力するため、処理の流れが１つしかないが、畳み込み部５６と逆畳み込み部５７の構成は図１３と同様になっている。ただし、使用するニューラルネットワークに関しては同じでも違っていてもよい。教師データを変えているため、同じニューラルネットワークを使ったとしても学習させればフィルタの係数などの特性が変化するため、距離データに応じた画像出力部４３を構築できる。画像生成系のニューラルネットワークを用いれば良い。 <Structure of neural network>
FIG. 20 shows an example of the configuration of the neural network of this embodiment. Since FIG. 20 outputs one distance image, there is only one processing flow, but the configurations of the convolution unit 56 and deconvolution unit 57 are similar to those in FIG. 13. However, the neural networks used may be the same or different. Since the training data is changed, even if the same neural network is used, characteristics such as filter coefficients will change after learning, so it is possible to construct the image output unit 43 according to the distance data. An image generation neural network may be used.

学習時には、用意した距離画像（教師データ）と、逆畳み込み部５７が出力した出力画像との差を学習部４２が算出する。この差は逆畳み込み部５７、畳み込み部５６の順に逆伝播される。逆伝播によりフィルタの係数が更新される。 During learning, the learning unit 42 calculates the difference between the prepared distance image (teacher data) and the output image output by the deconvolution unit 57. This difference is back-propagated to the deconvolution unit 57 and then to the convolution unit 56 in this order. The coefficients of the filter are updated by backpropagation.

図２０の処理を、フィルタの係数の変化が収束するまで又は全ての学習データについて学習部４２が行うことで、フィルタの係数が学習され、入力された全天球画像に対し距離画像を出力する画像出力部４３が構築される。 The learning unit 42 performs the process shown in FIG. 20 until the change in the filter coefficients converges or for all learning data, thereby learning the filter coefficients and outputting a distance image for the input spherical image. An image output section 43 is constructed.

＜視差計算について＞
続いて、図２１を用いて視差計算について説明する。図２１は視差計算部４５が行う処理を説明するフローチャート図である。 <About parallax calculation>
Next, parallax calculation will be explained using FIG. 21. FIG. 21 is a flowchart illustrating the processing performed by the parallax calculation unit 45.

・Ｓ１０１
まず、視差計算部４５は、入力用の全天球画像とそれに対応する距離画像を用いて、三次元点群を生成する。図４で説明したように、全天球画像の各点はそれぞれ立体球の球面上の点に対応する。全天球画像の点(u,v)に対し、立体球に転写した際の水平角度θ(ラジアン)と垂直角度φは次のように計算される。なお、wとhはそれぞれ全天球画像の幅と高さであり、u、vを０～１に正規化するために導入されている。一例としてはw=5376とh=2688である。
θ = －π + 2π(u/w)
φ = －π/2 + π(v/h)
立体球の座標（θ,φ）は極座標であるが、極座標と直交座標系の変換は下式(1)により実現できる。・S101
First, the parallax calculation unit 45 generates a three-dimensional point group using the input omnidirectional image and the corresponding distance image. As explained with reference to FIG. 4, each point on the spherical image corresponds to a point on the spherical surface of the three-dimensional sphere. The horizontal angle θ (radians) and vertical angle φ when transferred to a three-dimensional sphere with respect to the point (u,v) of the spherical image are calculated as follows. Note that w and h are the width and height of the spherical image, respectively, and are introduced to normalize u and v to 0 to 1. An example is w=5376 and h=2688.
θ = −π + 2π(u/w)
φ = −π/2 + π(v/h)
Although the coordinates (θ, φ) of the three-dimensional sphere are polar coordinates, conversion between the polar coordinates and the orthogonal coordinate system can be realized by the following equation (1).

なお、(u,v)の点における距離画像の距離dは立体球の半径に対応するので、距離dが右辺に乗じられている。

Note that the distance d of the distance image at the point (u,v) corresponds to the radius of the three-dimensional sphere, so the right side is multiplied by the distance d.

視差計算部４５は、この演算を全天球画像の全ての点(u,v)に適用することで、全天球画像の全ての画素と、各画素に対応する距離を有する直交座標系の三次元点群を生成することができる。 By applying this calculation to all the points (u,v) of the spherical image, the parallax calculation unit 45 calculates all the pixels of the spherical image and the orthogonal coordinate system having the distance corresponding to each pixel. A three-dimensional point cloud can be generated.

・Ｓ１０２
視差計算部４５は以下の処理を各三次元点で行う。・S102
The parallax calculation unit 45 performs the following processing at each three-dimensional point.

・Ｓ１０３
視差計算部４５が作成するものは左右の目にそれぞれ対応した正距円筒画像である。ステップＳ１０１で三次元空間を復元できたので、カメラ(視点)を任意の座標に配置して、そこで画像がどのように映るかを計算により求める。視差計算部４５は左右の目の画像を作るために、カメラを目の間隔 (例えば左右で５cmずつの計１０cm) だけずらした正距円筒画像を三次元点群から作る。・S103
What the parallax calculation unit 45 creates are equirectangular cylinder images corresponding to the left and right eyes, respectively. Since the three-dimensional space has been restored in step S101, a camera (viewpoint) is placed at an arbitrary coordinate, and how the image appears there is determined by calculation. In order to create left and right eye images, the parallax calculation unit 45 creates an equirectangular cylindrical image from a group of three-dimensional points by shifting the camera by the distance between the eyes (for example, 5 cm on each side, 10 cm in total).

ここで、三次元点群の画像をレンダリングする際に、右目 (又は左目)の位置はある時点では１か所に定まるが、目の向いている向きによって目の位置も変わることに注意する。 When rendering an image of a three-dimensional point cloud, the position of the right eye (or left eye) is fixed at one place at a certain point in time, but it should be noted that the position of the eye changes depending on the direction the eye is facing.

図２２は、目の向いている向きと目の位置を説明する図である。まず、図２２（ａ）は、ｙ軸を中心にして目の向きがｘｚ平面を３６０度回転する様子を示す。例えば人間が原点にいると仮定した場合、人間の目の向きはｘｚ平面を３６０度回転することができる。図２２（ｂ）はｙ軸のマイナス方向からｘｚ平面を見た人間の上面図である。図２２（ｂ）では人間が４つの方向を向いている。図２２（ｂ）の４つの向きをＡ～Ｄとする。この場合、図２２（ｃ）に示すようにｘｚ軸を取ると、向きＡの右目の位置は原点からｘ軸のプラス方向に５ｃｍ、向きＢの右目の位置は原点からｚ軸のプラス方向に５ｃｍ、向きＣの右目の位置は原点からｘ軸のマイナス方向に５ｃｍ、向きＤの右目の位置は原点からｚ軸のマイナス方向に５ｃｍ、である。 FIG. 22 is a diagram illustrating the direction in which the eyes are facing and the position of the eyes. First, FIG. 22(a) shows how the eye direction rotates 360 degrees around the xz plane around the y axis. For example, assuming that a human being is at the origin, the direction of the human eye can be rotated 360 degrees around the xz plane. FIG. 22(b) is a top view of a human being viewed from the xz plane from the negative direction of the y axis. In FIG. 22(b), a person faces four directions. The four directions in FIG. 22(b) are designated as A to D. In this case, if we take the xz axis as shown in Figure 22(c), the position of the right eye in orientation A is 5 cm from the origin in the positive direction of the x axis, and the position of the right eye in direction B is 5 cm from the origin in the positive direction of the z axis. 5 cm, the position of the right eye in orientation C is 5 cm from the origin in the negative direction of the x-axis, and the position of the right eye in direction D is 5 cm from the origin in the negative direction of the z-axis.

したがって、３６０度の三次元点群をレンダリングする際は、視差計算部４５は、xz平面上（水平面）で左右の目の中心（所定点）に対し目の幅の円を描くように目の向きを３６０度一回りさせ、カメラ位置が元に戻るまで所定角度ずつ回転させた各位置（半径方向の各角度）でレンダリングすることになる。この各位置は正距円筒画像の横方向の画素数から計算できる。正距円筒画像の横幅をｗとすると、ｗで2πラジアンの回転量に対応させるために、目の方向の変化量は2π/ｗとする。 Therefore, when rendering a 360-degree three-dimensional point group, the parallax calculation unit 45 draws a circle the width of the eyes on the xz plane (horizontal plane) with respect to the centers of the left and right eyes (predetermined points). The orientation is rotated 360 degrees, and rendering is performed at each position (each angle in the radial direction) rotated by a predetermined angle until the camera position returns to its original position. Each position can be calculated from the number of pixels in the horizontal direction of the equirectangular image. If the width of the equirectangular image is w, then the amount of change in the eye direction is set to 2π/w in order to correspond to the amount of rotation of 2π radians in w.

・Ｓ１０４
次に、目の向きに対して最も歪みが少なく距離画像を作れるのは目に対してまっすぐの方向に対してなので、図２２（ｄ）に示すように、視差計算部４５は目に対して垂直な直線６２の方向にある三次元点群を使ってレンダリングする。・S104
Next, since the distance image with the least distortion can be created in the direction straight to the eyes, as shown in FIG. 22(d), the parallax calculation unit 45 Rendering is performed using a three-dimensional point group in the direction of a vertical straight line 62.

・Ｓ１０５
図２３は、ある目の向きにおける正距円筒画像と三次元点群の関係を模式的に示す図である。右目の位置が交差点であり、目の向きを表す直線６２が示されている。この場合、垂直方向（点線６１上）のすべての画素がレンダリングの対象となる。点線６１は、ｘｚ平面の目の向きは同じまま（直線６２の向きのまま）、直線６２の仰角を変化させた場合の軌跡である。仰角は直線６２をｙ軸方向（垂直方向）に変化させた場合の角度である。したがって、視差計算部４５は点線６１上の画素を正距円筒画像にレンダリングする。しかし、点線上に画素があるとは限らないので、ｘｚ平面の目の向きは同じまま、ｙ軸方向に１８０度変化させた直線６２に対し最も近傍にある点を選択してレンダリングする。・S105
FIG. 23 is a diagram schematically showing the relationship between an equirectangular image and a three-dimensional point group in a certain eye direction. The position of the right eye is the intersection, and a straight line 62 representing the direction of the eye is shown. In this case, all pixels in the vertical direction (on the dotted line 61) are to be rendered. A dotted line 61 is a trajectory obtained when the elevation angle of the straight line 62 is changed while the direction of the eye on the xz plane remains the same (the direction of the straight line 62 remains the same). The elevation angle is the angle when the straight line 62 is changed in the y-axis direction (vertical direction). Therefore, the parallax calculation unit 45 renders the pixels on the dotted line 61 into an equirectangular image. However, since there are not always pixels on the dotted line, the point closest to the straight line 62, which is changed by 180 degrees in the y-axis direction, is selected and rendered while the direction of the eye on the xz plane remains the same.

・Ｓ１０６
これにより、三次元点が定まるので、視差計算部４５はその選んだ三次元点（ｘ、ｙ、ｚ）を、式（２）を使って水平角度θと垂直角度φに変換し、三次元点（ｘ、ｙ、ｚ）の画素値を正距円筒画像の点の画素値に設定する。・S106
As a result, a three-dimensional point is determined, and the parallax calculation unit 45 converts the selected three-dimensional point (x, y, z) into a horizontal angle θ and a vertical angle φ using equation (2), and calculates the three-dimensional The pixel value of the point (x, y, z) is set to the pixel value of the point in the equirectangular image.

以上では、主に右目用の正距円筒画像を生成したので、同様に左目用の正距円筒画像も作成する。左目用の場合は回転が逆方向になる。以上の手順により、入力された正距円筒画像から左右の目の視差を考慮した２つの全天球画像を取得できる。

In the above, since an equirectangular image for the right eye was mainly generated, an equirectangular image for the left eye is also created in the same way. If it is for the left eye, the rotation will be in the opposite direction. Through the above procedure, two spherical images can be obtained from the input equirectangular image in consideration of the parallax between the left and right eyes.

＜主な効果＞
本実施例によれば、実施例１の効果に加え、高密度、高精細な正距円筒画像を得られる。 <Main effects>
According to this embodiment, in addition to the effects of the first embodiment, a high-density, high-definition equirectangular image can be obtained.

本実施例では動画について補足する。本実施例においては、上記の実施例にて説明した図２、図３のハードウェア構成図、及び、図６又は図１７に示した機能ブロック図を援用できるものとして説明する。 In this embodiment, additional information will be given regarding moving images. In this embodiment, the hardware configuration diagrams shown in FIGS. 2 and 3 and the functional block diagrams shown in FIG. 6 or 17 described in the above embodiments can be used.

図２４は、動画を三次元立体動画に変換する流れを説明する図である。図２４に示すように、
(1) 画像処理装置１０は動画を連続した画像とみなして連続した静止画に変換する。
(2) それぞれに対して画像処理装置１０は実施例１又は２の推定アルゴリズムを360度立体視向けの画像を生成する。
(3) それらの画像を動画として再結合する。 FIG. 24 is a diagram illustrating the flow of converting a moving image into a three-dimensional stereoscopic moving image. As shown in Figure 24,
(1) The image processing device 10 regards moving images as continuous images and converts them into continuous still images.
(2) For each, the image processing device 10 generates an image for 360-degree stereoscopic viewing using the estimation algorithm of the first or second embodiment.
(3) Recombine those images as a video.

音声が動画に存在する場合は、画像分割時に音声は分割して推定アルゴリズムを施し、動画の再結合の時に元の動画の音声を付け足せば良い。フレーム数等は変わっていないため音声ずれは発生しない。 If audio is present in the video, the audio can be divided during image segmentation, an estimation algorithm can be applied to the audio, and the audio from the original video can be added when the videos are recombined. Since the number of frames etc. have not changed, no audio deviation occurs.

＜主な効果＞
本実施例によれば、実施例１、２の効果に加え、動画にも対応できる。 <Main effects>
According to this embodiment, in addition to the effects of embodiments 1 and 2, it is also possible to deal with moving images.

＜その他の適用例＞
以上、本発明を実施するための最良の形態について実施例を用いて説明したが、本発明はこうした実施例に何等限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々の変形及び置換を加えることができる。 <Other application examples>
Although the best mode for carrying out the present invention has been described above using examples, the present invention is not limited to these examples in any way, and various modifications can be made without departing from the gist of the present invention. and substitutions can be added.

また、図６、図１７などの構成例は、画像処理装置１０による処理の理解を容易にするために、主な機能に応じて分割したものである。処理単位の分割の仕方や名称によって本願発明が制限されることはない。画像処理装置１０の処理は、処理内容に応じて更に多くの処理単位に分割することもできる。また、１つの処理単位が更に多くの処理を含むように分割することもできる。 Furthermore, the configuration examples shown in FIGS. 6 and 17 are divided according to main functions in order to facilitate understanding of the processing performed by the image processing apparatus 10. The present invention is not limited by the method of dividing the processing units or the names thereof. The processing of the image processing device 10 can also be divided into more processing units depending on the processing content. Furthermore, one processing unit can be divided to include more processing.

上記で説明した実施形態の各機能は、一又は複数の処理回路によって実現することが可能である。ここで、本明細書における「処理回路」とは、電子回路により実装されるプロセッサのようにソフトウェアによって各機能を実行するようプログラミングされたプロセッサや、上記で説明した各機能を実行するよう設計されたASIC(Application Specific Integrated Circuit)、DSP（digital signal processor）、FPGA（field programmable gate array）や従来の回路モジュール等のデバイスを含むものとする。 Each function of the embodiments described above can be realized by one or more processing circuits. Here, the term "processing circuit" as used herein refers to a processor programmed to execute each function by software, such as a processor implemented by an electronic circuit, or a processor designed to execute each function explained above. This includes devices such as ASICs (Application Specific Integrated Circuits), DSPs (digital signal processors), FPGAs (field programmable gate arrays), and conventional circuit modules.

９撮像装置
１０画像処理装置 9 Imaging device 10 Image processing device

特開2015－019344号広報Publication of JP-A-2015-019344

Claims

An image processing device that applies an estimation algorithm to one spherical image to generate a spherical image for stereoscopic viewing,
The estimation algorithm is constructed by learning using a neural network ,
Modeling data created with 3D modeling software is input by using a spherical image captured by the center camera of three virtual cameras arranged horizontally as an input spherical image.
A spherical image captured by the right camera of the three virtual cameras, and a rear image of the spherical image that shows a portion corresponding to the back of the right camera among the spherical image captured by the left camera. Learning data that generates a spherical image for the right eye and a spherical image for the left eye as training data by swapping the spherical image and the back image of the spherical image that shows the part corresponding to the back of the left camera. An image processing device comprising a creation section .

An image processing device that generates a distance image by applying an estimation algorithm to one omnidirectional image, and generates an omnidirectional image for stereoscopic viewing from the distance image and the omnidirectional image,
The estimation algorithm is constructed by learning using a neural network ,
Among the spherical image obtained by capturing modeling data created with 3D modeling software with a virtual camera and the distance image of the spherical image,
The estimation algorithm is constructed to input the spherical image, use the distance image as training data, and output the distance image for the spherical image,
converting the omnidirectional image into a three-dimensional point group in an orthogonal coordinate system using the pixel distance of the distance image estimated by the estimation algorithm;
A straight line that rotates around a predetermined point on the horizontal plane of the orthogonal coordinate system, and when the elevation angle of the straight line is changed at each position rotated by a predetermined angle, a three-dimensional point near the straight line is transformed into a cylinder. An image processing device characterized by having a parallax calculation unit that converts into an image .

3. The image processing apparatus according to claim 1, wherein the spherical image of a moving image is converted into a still image, a spherical image for stereoscopic viewing is generated, and then the image is combined with the moving image.

An image processing method that applies an estimation algorithm to one spherical image to generate a spherical image for stereoscopic viewing, the method comprising:
The estimation algorithm is constructed by learning using a neural network ,
Modeling data created with 3D modeling software is input by using a spherical image captured by the center camera of three virtual cameras arranged horizontally as an input spherical image.
A spherical image captured by the right camera of the three virtual cameras, and a rear image of the spherical image that shows a portion corresponding to the back of the right camera among the spherical image captured by the left camera. Image processing that swaps the back image of the spherical image showing the portion corresponding to the back of the left camera to generate a spherical image for the right eye and a spherical image for the left eye as training data. Method.

An image processing method that applies an estimation algorithm to one spherical image to generate a distance image, and generates a spherical image for stereoscopic viewing from the distance image and the spherical image, the method comprising:
The estimation algorithm is constructed by learning using a neural network ,
Among the spherical image obtained by capturing modeling data created with 3D modeling software with a virtual camera and the distance image of the spherical image,
The estimation algorithm is constructed to input the spherical image, use the distance image as training data, and output the distance image for the spherical image,
converting the omnidirectional image into a three-dimensional point group in an orthogonal coordinate system using the pixel distance of the distance image estimated by the estimation algorithm;
A straight line that rotates around a predetermined point on the horizontal plane of the orthogonal coordinate system, and when the elevation angle of the straight line is changed at each position rotated by a predetermined angle, a three-dimensional point near the straight line is transformed into a cylinder. An image processing method characterized by converting into an image .

In the image processing device,
A program that applies an estimation algorithm to one spherical image to generate a spherical image for stereoscopic viewing,
The estimation algorithm is constructed by learning using a neural network ,
The image processing device,
Modeling data created with 3D modeling software is input by using a spherical image captured by the center camera of three virtual cameras arranged horizontally as an input spherical image.
A spherical image captured by the right camera of the three virtual cameras, and a rear image of the spherical image that shows a portion corresponding to the back of the right camera among the spherical image captured by the left camera. Learning data that generates a spherical image for the right eye and a spherical image for the left eye as training data by swapping the spherical image and the back image of the spherical image that shows the part corresponding to the back of the left camera. A program to function as a creation section.

In the image processing device,
A program that applies an estimation algorithm to one omnidirectional image to generate a distance image, and generates an omnidirectional image for stereoscopic viewing from the distance image and the omnidirectional image, the program comprising:
The estimation algorithm is constructed by learning using a neural network ,
Among the spherical image obtained by capturing modeling data created with 3D modeling software with a virtual camera and the distance image of the spherical image,
The estimation algorithm is constructed to input the spherical image, use the distance image as training data, and output the distance image for the spherical image,
The image processing device,
converting the omnidirectional image into a three-dimensional point group in an orthogonal coordinate system using the pixel distance of the distance image estimated by the estimation algorithm;
A straight line that rotates around a predetermined point on the horizontal plane of the orthogonal coordinate system, and when the elevation angle of the straight line is changed at each position rotated by a predetermined angle, a three-dimensional point near the straight line is transformed into a cylinder. A program that functions as a parallax calculation unit that converts images.