JP2017102784A

JP2017102784A - Image processing system, image processing method and image processing program

Info

Publication number: JP2017102784A
Application number: JP2015236706A
Authority: JP
Inventors: 康輔高橋; Kosuke Takahashi; 弾三上; Dan Mikami; 麻理子五十川; Mariko Isogawa; 明小島; Akira Kojima
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2015-12-03
Filing date: 2015-12-03
Publication date: 2017-06-08

Abstract

PROBLEM TO BE SOLVED: To provide an image processing system capable of generating a virtual entire celestial sphere image while preventing degradation on viewing quality.SOLUTION: The image processing system determines a coupling line for coupling two image in a common are, when coupling two images; i.e., a first image which is an image acquired from a first image and a second image which is an image acquired from the second image and having a common part in an image area with the first image. The image processing system includes coupling line determination means that determines to dispose the coupling line avoiding an area where there is a high possibility that a person is viewing the images.SELECTED DRAWING: Figure 1

Description

本発明は、複数のカメラからの画像データを処理する画像処理装置、画像処理方法及び画像処理プログラムに関する。 The present invention relates to an image processing apparatus, an image processing method, and an image processing program for processing image data from a plurality of cameras.

近年、周囲３６０度を含む全天の画像である全天球画像を撮影できるカメラ（以下、全天球カメラという）及びその全天球画像の視聴において利用者が向いた方向を視聴することができるヘッドマウントディスプレイ（ＨＭＤ）が普及し始めている。そして、ネットワークを介して全天球画像を配信するサービスが注目を集めている。上記のような全天球画像は、ＨＭＤで視聴することで高い臨場感を得ることができ、スポーツやアーティストのライブ等のコンテンツの視聴における利用が期待されている。 In recent years, a camera (hereinafter referred to as an omnidirectional camera) that can capture an omnidirectional image that is an omnidirectional image including 360 degrees around the user, and viewing the direction in which the user is facing in viewing the omnidirectional image. The head mounted display (HMD) that can be used is becoming popular. And, a service that distributes omnidirectional images via a network is attracting attention. The omnidirectional image as described above can provide a high sense of realism when viewed with an HMD, and is expected to be used for viewing content such as sports and live performances by artists.

一般に、これらの全天球画像は、所望の視点に全天球カメラを設置することで撮影することができる。しかしながら、競技中のサッカーコートの中やバスケットコートの中は、全天球カメラを設置しようとすると競技者の邪魔となるため、全天球カメラを設置することができない。しかし、競技中のサッカーコートの中やバスケットコートの中に立っているかのような映像を視聴してみたいという要望がある。そこで、通常では全天球カメラを設置することのできない場所に仮想的な視点である仮想視点を設定して、仮想視点を含む領域を撮影する複数のカメラを設置し、それらのカメラからの画像を合成することにより、この仮想視点において全天球カメラで撮影したかのような全天球画像を得る技術が考案されている（例えば、非特許文献１参照）。以下の説明において、仮想視点における全天球画像を、仮想全天球画像という。 Generally, these omnidirectional images can be taken by installing an omnidirectional camera at a desired viewpoint. However, it is not possible to install a omnidirectional camera in a soccer court or a basketball court during competition because it would interfere with the competitors if an omnidirectional camera is installed. However, there is a desire to watch videos as if standing in a soccer court or basketball court during competition. Therefore, a virtual viewpoint, which is a virtual viewpoint, is usually set in a place where an omnidirectional camera cannot be installed, and multiple cameras that shoot an area including the virtual viewpoint are installed, and images from these cameras are displayed. A technique for obtaining an omnidirectional image as if taken by an omnidirectional camera at this virtual viewpoint has been devised (see, for example, Non-Patent Document 1). In the following description, the omnidirectional image at the virtual viewpoint is referred to as a virtual omnidirectional image.

仮想全天球画像を複数のカメラからの画像の合成によって得る画像処理システムの具体例について説明する。図２０は、従来の仮想全天球画像を得るための画像処理システムを示す図である。図２０に示すように、画像処理システム１は、全天球カメラ２と、Ｎ台（Ｎ≧２）の複数のカメラ３−１、３−２、３−３、…、３−Ｎ（以下、カメラ群３とする。）と、画像処理装置４と、表示装置５とを備える。画像処理システム１は、フットサルのコート１０内に仮想視点１１を設定した場合に、コート１０外に設置したカメラ群３からの画像の合成によって仮想視点１１における仮想全天球画像を得る。 A specific example of an image processing system for obtaining a virtual omnidirectional image by combining images from a plurality of cameras will be described. FIG. 20 is a diagram illustrating an image processing system for obtaining a conventional virtual omnidirectional image. As shown in FIG. 20, the image processing system 1 includes an omnidirectional camera 2 and a plurality of N (N ≧ 2) cameras 3-1, 3-2, 3-3,. , A camera group 3), an image processing device 4, and a display device 5. When the virtual viewpoint 11 is set in the futsal court 10, the image processing system 1 obtains a virtual omnidirectional image at the virtual viewpoint 11 by synthesizing images from the camera group 3 installed outside the court 10.

全天球カメラ２は、全天球画像を撮影するカメラである。全天球カメラ２は、試合が行われる前のタイミングでコート１０内の仮想視点１１の位置に設置される。全天球カメラ２は、予め、仮想視点１１の位置から仮想全天球画像の背景となる背景画像２０を撮影する。全天球カメラ２で撮影された全天球画像である背景画像２０は、画像処理装置４に入力されて蓄積される。 The omnidirectional camera 2 is a camera that captures an omnidirectional image. The omnidirectional camera 2 is installed at the position of the virtual viewpoint 11 in the court 10 at a timing before the game is played. The omnidirectional camera 2 captures in advance a background image 20 that is the background of the virtual omnidirectional image from the position of the virtual viewpoint 11. A background image 20 that is an omnidirectional image captured by the omnidirectional camera 2 is input to the image processing device 4 and accumulated.

コート１０の周囲には、カメラ群３が設置されている。図２０においてＮは４以上の自然数である。カメラ群３を構成するカメラの数は、多ければ多いほどよいが、最低数は２である。カメラ群３は、それぞれ仮想視点１１を含む画角となるようにコート１０の周囲に設置されている。画像処理装置４は、背景画像２０に対して合成するためカメラ群３の各カメラが出力する前景画像を含む切り出し画像に対して画像処理を行う。画像処理装置４は、全天球カメラ２より取得した背景画像２０に画像処理後の部分画像を合成して仮想全天球画像を生成する。表示装置５は、画像処理装置４で生成した仮想全天球画像を表示する装置であり、液晶ディスプレイ等である。 A camera group 3 is installed around the court 10. In FIG. 20, N is a natural number of 4 or more. The larger the number of cameras constituting the camera group 3, the better, but the minimum number is two. The camera group 3 is installed around the court 10 so as to have an angle of view including the virtual viewpoint 11. The image processing device 4 performs image processing on the cut-out image including the foreground image output from each camera of the camera group 3 to be combined with the background image 20. The image processing device 4 combines the partial image after image processing with the background image 20 acquired from the omnidirectional camera 2 to generate a virtual omnidirectional image. The display device 5 is a device that displays the virtual omnidirectional image generated by the image processing device 4, and is a liquid crystal display or the like.

画像処理システム１における画像処理の具体例を説明する。図２１は、画像処理システム１における画像処理される画像の具体例を示す図である。図２１（Ａ）は、仮想視点１１の位置に設置された全天球カメラ２で撮影された背景画像２０の例を示す図である。仮想視点１１を中心とする３６０度の画像となっている。背景画像２０は、競技開始前に撮影される画像であるのでコート１０内に競技を行う選手等は映っていない。 A specific example of image processing in the image processing system 1 will be described. FIG. 21 is a diagram illustrating a specific example of an image subjected to image processing in the image processing system 1. FIG. 21A is a diagram illustrating an example of the background image 20 captured by the omnidirectional camera 2 installed at the position of the virtual viewpoint 11. The image is a 360 degree image centered on the virtual viewpoint 11. Since the background image 20 is an image taken before the start of the competition, no player or the like who competes in the court 10 is shown.

図２１（Ｂ）は、左からカメラ３−１で撮影した部分画像２１と、カメラ３−２で撮影した部分画像２２と、カメラ３−３で撮影した部分画像２３とを示している。画像処理装置４は、部分画像２１〜２３のそれぞれから仮想視点１１を含み、かつ、フットサルの選手を含む領域２１１、２２１、２３１を切り出す。画像処理装置４は、切り出した領域２１１、２２１、２３１の画像に対して、画像処理を行うことで背景画像２０に貼り付け可能な部分画像２１１ａ、２２１ａ、２３１ａを生成する。 FIG. 21B shows a partial image 21 captured by the camera 3-1, a partial image 22 captured by the camera 3-2, and a partial image 23 captured by the camera 3-3 from the left. The image processing apparatus 4 cuts out regions 211, 221, and 231 that include the virtual viewpoint 11 and include futsal players from each of the partial images 21 to 23. The image processing apparatus 4 generates partial images 211 a, 221 a, and 231 a that can be pasted on the background image 20 by performing image processing on the cut out images of the areas 211, 221, and 231.

画像処理装置４は、背景画像２０に対して部分画像２１１ａ、２２１ａ、２３１ａを合成することで、仮想全天球画像２４を生成する。図２１（Ｃ）は、画像処理装置４が生成する仮想全天球画像２４の例を示す図である。図２１（Ｃ）に示すように、仮想全天球画像２４は、所定の領域に部分画像２１１ａ、２２１ａ、２３１ａを貼り付けているので、コート１０上で競技を行っているフットサルの選手が映っている画像である。 The image processing device 4 generates the virtual omnidirectional image 24 by combining the background images 20 with the partial images 211a, 221a, and 231a. FIG. 21C is a diagram illustrating an example of the virtual omnidirectional image 24 generated by the image processing device 4. As shown in FIG. 21C, the virtual omnidirectional image 24 has the partial images 211a, 221a, and 231a pasted in a predetermined area, so that the futsal player who is playing the game on the court 10 is shown. It is an image.

従来の画像処理システム１は、合成に用いているカメラ群３の光学中心及び仮想視点１１において想定する仮想全天球カメラの光学中心はそれぞれ異なる。このため、合成された仮想全天球画像２４は幾何学的に正しくない画像を含む。これを防ぐためには、画像処理装置４は、部分画像２１１ａ、２２１ａ、２３１ａを、仮想視点１１からの距離を示す奥行の一点で整合性が保たれるよう画像処理を行い背景画像２０に貼り付ける必要がある。しかしながら、整合性が保たれる奥行に存在せずに別の奥行に存在している物体（例えば、競技中の選手）の部分画像を貼り付ける場合には、画像処理により奥行の整合性を保つことができない。このような奥行に整合性のない物体は、仮想全天球画像２４において、その画像が分身（多重像）したり、消失したりする現象が発生する。 In the conventional image processing system 1, the optical center of the camera group 3 used for composition and the optical center of the virtual omnidirectional camera assumed in the virtual viewpoint 11 are different. For this reason, the synthesized virtual omnidirectional image 24 includes a geometrically incorrect image. In order to prevent this, the image processing device 4 performs image processing so that the consistency is maintained at one point in the depth indicating the distance from the virtual viewpoint 11 and pastes the partial images 211a, 221a, and 231a on the background image 20. There is a need. However, when pasting a partial image of an object (for example, a player who is competing) that does not exist at a depth where consistency is maintained but is located at another depth, the depth consistency is maintained by image processing. I can't. Such an object whose depth is inconsistent causes a phenomenon that the virtual omnidirectional image 24 becomes a duplicated image (multiple image) or disappears.

以下に、図面を用いて仮想全天球画像２４において、物体の画像が分身したり、消失したりする現象について説明する。図２２は、画像処理システム１における課題を説明するための図である。図２２において、撮影範囲４１は、カメラ３−１の撮影範囲において図２１（Ｂ）に示した領域２１１の撮影範囲を示す。撮影範囲４２は、カメラ３−２の撮影範囲において図２１（Ｂ）に示した領域２２１の撮影範囲を示す。撮影範囲４３は、カメラ３−３の撮影範囲において図２１（Ｂ）に示した領域２３１の撮影範囲を示す。また、仮想視点１１からの距離（奥行）が異なる３つの被写体（選手）４９〜５１が存在する。 Hereinafter, a phenomenon in which an image of an object is duplicated or disappeared in the virtual omnidirectional image 24 will be described with reference to the drawings. FIG. 22 is a diagram for explaining a problem in the image processing system 1. In FIG. 22, the shooting range 41 indicates the shooting range of the region 211 shown in FIG. 21B in the shooting range of the camera 3-1. The shooting range 42 indicates the shooting range of the region 221 shown in FIG. 21B in the shooting range of the camera 3-2. The shooting range 43 indicates the shooting range of the area 231 shown in FIG. 21B in the shooting range of the camera 3-3. In addition, there are three subjects (players) 49 to 51 having different distances (depths) from the virtual viewpoint 11.

図２２において破線で示している仮想視点１１からの第１の距離を示す奥行４６は、各撮影範囲４１〜４３が、重なりなく並んでいる。このような奥行４６に位置する被写体４９は、その画像が分身したり消失したりすることがなく、奥行に整合性のある被写体４９である。仮想視点１１からの第２の距離を示す奥行４７は、各撮影範囲４１〜４３が、横線部分４４に示すように重なっている。このような奥行４７に位置する被写体５０は、その画像が分身してしまうので、奥行に整合性のない被写体５０となる。仮想視点１１からの第３の距離を示す奥行４８は、各撮影範囲４１〜４３の間が斜線部分４５に示すように空いている。このような奥行４８に位置する被写体５１は、その画像の一部が消失してしまうので、奥行に整合性のない被写体５１となる。 In the depth 46 which shows the 1st distance from the virtual viewpoint 11 shown with the broken line in FIG. 22, each imaging | photography range 41-43 is located in a line without overlapping. The subject 49 positioned at the depth 46 is a subject 49 that is consistent in the depth without the image being duplicated or lost. In the depth 47 indicating the second distance from the virtual viewpoint 11, the shooting ranges 41 to 43 overlap as shown by the horizontal line portion 44. The subject 50 positioned at the depth 47 is a subject 50 that is inconsistent in the depth because the image is duplicated. The depth 48 indicating the third distance from the virtual viewpoint 11 is vacant as indicated by the hatched portion 45 between the imaging ranges 41 to 43. Since the subject 51 located at the depth 48 is partially lost, the subject 51 is not consistent with the depth.

高橋康輔、外３名、「複数カメラ映像を用いた仮想全天球映像合成に関する検討」、信学技報、2015年06月01日、vol.115, no.76、MVE2015-5、p.43-48Kosuke Takahashi and three others, “Study on virtual spherical image composition using multiple camera images”, IEICE Technical Report, June 1, 2015, vol.115, no.76, MVE2015-5, p. 43-48

以上のように、仮想全天球画像において被写体がある領域は、ユーザが注視する領域である視聴領域である可能性が高く、その視聴領域において被写体の分身や消失が発生すると、仮想全天球画像の視聴品質が低下するという問題がある。また、画像合成する際に、画像の継ぎ目が目に付いてしまうとこれもまた、仮想全天球画像の視聴品質が低下するという問題がある。 As described above, a region where a subject is present in the virtual omnidirectional image is highly likely to be a viewing region that is a region watched by the user. There is a problem that the viewing quality of the image is degraded. In addition, when an image is stitched when the images are combined, there is also a problem that the viewing quality of the virtual omnidirectional image is deteriorated.

本発明は、このような事情に鑑みてなされたもので、視聴品質の低下を抑制した仮想全天球画像を生成する画像処理装置、画像処理方法及び画像処理プログラムを提供することを目的とする。 The present invention has been made in view of such circumstances, and an object of the present invention is to provide an image processing device, an image processing method, and an image processing program that generate a virtual omnidirectional image that suppresses a decrease in viewing quality. .

本発明の一態様は、第１の映像から取得された画像である第１画像と、第２の映像から取得された画像であって前記第１画像と撮像領域の一部が共通する画像である第２画像との２つの画像を結合する際に、共通する部分において前記２つの画像を結合するための結合線を決定する画像処理装置であって、前記画像を見ている者が見ている可能性が高い領域を避けつつ、前記結合線を配置することを決定する結合線決定手段を備える画像処理装置である。 One embodiment of the present invention is a first image that is an image acquired from a first video, and an image that is acquired from a second video and has a part of an imaging region in common with the first image. An image processing apparatus that determines a connection line for combining two images at a common portion when combining two images with a second image, and is viewed by a person viewing the image The image processing apparatus includes a coupling line determination unit that determines to arrange the coupling line while avoiding a region that is highly likely to be present.

本発明の一態様は、第１の映像から取得された画像である第１画像と、第２の映像から取得された画像であって前記第１画像と撮像領域の一部が共通する画像である第２画像との２つの画像を結合する際に、共通する部分において前記２つの画像を結合するための結合線を決定する画像処理装置であって、前記共通する部分において、人間の目に敏感に感じる画素はそのエネルギーが大きいと定義される第１のエネルギー画像を生成する第１のエネルギー画像生成手段と、前記第１のエネルギー画像のエネルギーが小さい画素を探索しながら前記共通部分において、前記エネルギーが小さい一連の画素を求めることにより前記結合線を配置することを決定する結合線決定手段とを備え、前記結合線決定手段は、前記画像を見ている者が見ている可能性が高い領域を避けつつ、前記結合線を決定する画像処理装置である。 One embodiment of the present invention is a first image that is an image acquired from a first video, and an image that is acquired from a second video and has a part of an imaging region in common with the first image. An image processing apparatus for determining a connection line for combining the two images at a common portion when combining two images with a certain second image, and at the common portion, In the common part, the pixel that is sensitive to the first energy image generating unit that generates a first energy image defined as having a large energy and searching for a pixel having a low energy in the first energy image, A bond line determining means for determining the arrangement of the bond lines by obtaining a series of pixels with low energy, the bond line determining means being viewed by a person viewing the image. While avoiding potential higher regions, an image processing apparatus for determining the binding wire.

本発明の一態様は、前記画像処理装置であって、前記画像の前記共通する部分において、前記画像を見ている者が見ている領域を検出する領域検出手段と、前記領域検出手段による検出結果に基づき、検出した前記領域のエネルギーが大きくなる第２のエネルギー画像を生成する第２のエネルギー画像生成手段と、前記第１のエネルギー画像と第２のエネルギー画像を加算して第３のエネルギー画像を生成する第３のエネルギー画像生成手段をさらに備え、前記結合線決定手段は、前記第３のエネルギー画像に基づき、前記エネルギーが小さい一連の画素を求めることにより前記結合線を決定する。 One aspect of the present invention is the image processing apparatus, wherein the common part of the image detects a region that is viewed by a person viewing the image, and detection by the region detection unit. Based on the result, a second energy image generating means for generating a second energy image in which the detected energy of the region is increased, and a third energy by adding the first energy image and the second energy image. The image processing apparatus further includes third energy image generation means for generating an image, and the combination line determination means determines the combination line by obtaining a series of pixels having low energy based on the third energy image.

本発明の一態様は、前記画像処理装置であって、前記結合線決定手段は、シームカービング処理によって求めた前記共通する部分の前記２つの画像を結合する際に並べた方向と直交する方向のシームを結合線とする。 One aspect of the present invention is the image processing device, wherein the coupling line determination unit is configured to be orthogonal to a direction aligned when the two images of the common portion obtained by the seam carving process are combined. The seam is the connecting line.

本発明の一態様は、前記画像処理装置であって、前記２つの画像は、所定の位置に設けた仮想視点から見た仮想全天球画像を生成する際に用いる。 One aspect of the present invention is the image processing apparatus, wherein the two images are used when generating a virtual omnidirectional image viewed from a virtual viewpoint provided at a predetermined position.

本発明の一態様は、第１の映像から取得された画像である第１画像と、第２の映像から取得された画像であって前記第１画像と撮像領域の一部が共通する画像である第２画像との２つの画像を結合する際に、共通する部分において前記２つの画像を結合するための結合線を決定する画像処理装置が行う画像処理方法であって、前記画像を見ている者が見ている可能性が高い領域を避けつつ、前記結合線を配置することを決定する結合線決定ステップを有する画像処理方法である。 One embodiment of the present invention is a first image that is an image acquired from a first video, and an image that is acquired from a second video and has a part of an imaging region in common with the first image. An image processing method performed by an image processing apparatus that determines a connection line for combining two images at a common portion when combining two images with a second image, and viewing the image This is an image processing method including a coupling line determination step for determining that the coupling line is arranged while avoiding an area that is likely to be viewed by a person.

本発明の一態様は、第１の映像から取得された画像である第１画像と、第２の映像から取得された画像であって前記第１画像と撮像領域の一部が共通する画像である第２画像との２つの画像を結合する際に、共通する部分において前記２つの画像を結合するための結合線を決定する画像処理装置が行う画像処理方法であって、前記共通する部分において、人間の目に敏感に感じる画素はそのエネルギーが大きいと定義される第１のエネルギー画像を生成するエネルギー画像生成ステップと、前記第１のエネルギー画像のエネルギーが小さい画像を探索しながら前記共通部分における前記結合線を決定する結合線決定ステップとを有し、前記結合線決定ステップは、前記画像を見ている者が見ている可能性が高い領域を避けつつ、前記結合線を決定する画像処理方法である。 One embodiment of the present invention is a first image that is an image acquired from a first video, and an image that is acquired from a second video and has a part of an imaging region in common with the first image. An image processing method performed by an image processing apparatus that determines a connection line for combining the two images at a common portion when combining two images with a certain second image, wherein An energy image generating step for generating a first energy image that is defined as having high energy for pixels that are sensitive to human eyes, and the common part while searching for an image with low energy in the first energy image A bond line determining step for determining the bond line in step (a), wherein the bond line determining step is configured to determine the bond line while avoiding an area that is likely to be viewed by a person viewing the image. The image processing method of the constant.

本発明の一態様は、コンピュータを、前記画像処理装置として機能させるための画像処理プログラムである。 One embodiment of the present invention is an image processing program for causing a computer to function as the image processing apparatus.

本発明によれば、２つの画像を結合する際に、画像の継ぎ目を気づきにくい位置に配置するようにしたため、視聴品質の低下を抑制した仮想全天球画像を生成することができるという効果が得られる。 According to the present invention, when two images are combined, the joint between the images is arranged at a position where it is difficult to notice, so that it is possible to generate a virtual omnidirectional image that suppresses a decrease in viewing quality. can get.

本発明の一実施形態による画像処理装置の構成を示すブロック図である。1 is a block diagram illustrating a configuration of an image processing apparatus according to an embodiment of the present invention. 画像処理装置３０の基本構成例を示す図である。2 is a diagram illustrating a basic configuration example of an image processing device 30. FIG. オブジェクト情報格納部３０３に格納するオブジェクト情報の一例を示す図である。It is a figure which shows an example of the object information stored in the object information storage part 303. FIG. 隣り合う部分画像間の境界領域において重複が発生する場合の具体例を示す図である。It is a figure which shows the specific example in case overlap occurs in the boundary area | region between adjacent partial images. 画像処理システム１において１フレームの仮想全天球画像を作成する動作を示すフロー図である。FIG. 3 is a flowchart showing an operation for creating a virtual omnidirectional image of one frame in the image processing system 1. 画像処理装置３０が動画の仮想全天球画像を作成する動作について説明する図である。It is a figure explaining the operation | movement which the image processing apparatus 30 produces the virtual omnidirectional image of a moving image. 仮想全天球画像の生成処理を示す模式図である。It is a schematic diagram which shows the production | generation process of a virtual omnidirectional image. シームを求める動作を示す模式図である。It is a schematic diagram which shows the operation | movement which calculates | requires a seam. ２つの部分画像を結合する処理を示す説明図である。It is explanatory drawing which shows the process which couple | bonds two partial images. 前景の進行方向に基づいてエネルギー画像を生成して処理を行う例を示す図である。It is a figure which shows the example which produces | generates and processes an energy image based on the advancing direction of a foreground. エネルギー画像の生成方法を示す説明図である。It is explanatory drawing which shows the production | generation method of an energy image. エネルギー画像の生成方法を示す説明図である。It is explanatory drawing which shows the production | generation method of an energy image. 図９に示す２つの部分画像を結合する動作を示すフローチャートである。FIG. 10 is a flowchart illustrating an operation of combining two partial images illustrated in FIG. 9. FIG. 向いている方向で重み付けを行う例を示す図である。It is a figure which shows the example which weights by the direction which has faced. 中心からの距離で重み付けを行う例を示す図である。It is a figure which shows the example which weights by the distance from a center. エネルギー画像の生成方法を示す説明図である。It is explanatory drawing which shows the production | generation method of an energy image. 前景の動きの速さに基づいてエネルギーをも求める例を示す図である。It is a figure which shows the example which calculates | requires energy also based on the speed of the motion of a foreground. 仮想全天球画像を生成した例を示す図である。It is a figure which shows the example which produced | generated the virtual omnidirectional image. カメラの入力画像から切り出す部分画像の例を示す説明図である。It is explanatory drawing which shows the example of the partial image cut out from the input image of a camera. 従来の仮想全天球画像を得るための画像処理システムを示す図である。It is a figure which shows the image processing system for obtaining the conventional virtual omnidirectional image. 画像処理システム１における画像処理される画像の具体例を示す図である。3 is a diagram illustrating a specific example of an image to be image processed in the image processing system 1. FIG. 画像処理システム１における課題を説明するための図である。2 is a diagram for explaining a problem in the image processing system 1. FIG.

以下、図面を参照して、本発明の一実施形態による画像処理装置を説明する。図１は同実施形態による仮想全天球画像を視聴するためのシステム構成を示すブロック図である。この図において、図２０に示す従来の装置と同一の部分には同一の符号を付し、その説明を簡単に行う。仮想全天球画像を視聴するためのシステムは、画像処理システム１及び視聴システム９を備えている。 Hereinafter, an image processing apparatus according to an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing a system configuration for viewing a virtual omnidirectional image according to the embodiment. In this figure, the same parts as those of the conventional apparatus shown in FIG. A system for viewing a virtual omnidirectional image includes an image processing system 1 and a viewing system 9.

図１に示すように、画像処理システム１は、全天球カメラ２と、Ｎ台（Ｎ≧２）の複数のカメラ３−１、３−２、３−３、…、３−Ｎ（以下、カメラ群３とする。）と、画像処理装置３０と、表示装置５とを備える。画像処理システム１は、フットサルのコート１０内に仮想視点１１を設定した場合に、コート１０外に設置したカメラ群３からの画像の合成によって仮想視点１１における仮想全天球画像を得る。 As shown in FIG. 1, the image processing system 1 includes an omnidirectional camera 2 and a plurality of N (N ≧ 2) cameras 3-1, 3-2, 3-3,. , A camera group 3), an image processing device 30, and a display device 5. When the virtual viewpoint 11 is set in the futsal court 10, the image processing system 1 obtains a virtual omnidirectional image at the virtual viewpoint 11 by synthesizing images from the camera group 3 installed outside the court 10.

全天球カメラ２は、全天球画像を撮影するカメラである。全天球カメラ２は、競技が行われる前のタイミングでコート１０内の仮想視点１１の位置に設置される。全天球カメラ２は、予め、仮想視点１１の位置から仮想全天球画像の背景となる背景画像２０を撮影する。全天球カメラ２で撮影された背景画像２０は、画像処理装置４に入力されて蓄積される。全天球カメラ２は、競技中も仮想視点１１に設置したままだと競技の支障となるため、競技開始前に仮想視点１１の位置から取り除かれる。 The omnidirectional camera 2 is a camera that captures an omnidirectional image. The omnidirectional camera 2 is installed at the position of the virtual viewpoint 11 in the court 10 at the timing before the competition is performed. The omnidirectional camera 2 captures in advance a background image 20 that is the background of the virtual omnidirectional image from the position of the virtual viewpoint 11. The background image 20 captured by the omnidirectional camera 2 is input to the image processing device 4 and accumulated. The omnidirectional camera 2 is removed from the position of the virtual viewpoint 11 before the start of the competition because the omnidirectional camera 2 becomes a hindrance to the competition if it remains installed at the virtual viewpoint 11 during the competition.

コート１０の周囲には、カメラ群３が設置されている。カメラ群３の各カメラ３−１、３−２、３−３、…、３−Ｎは、背景画像２０に対して合成する前景を含む動画（映像）で撮影するカメラであり、それぞれ仮想視点１１を含む画角となるようにコート１０の周囲を取り囲むように設置されている。図１においてＮは、４以上の整数であり、同程度の画質の仮想全天球画像を得ようとするのであればコート１０が大きいほど大きな値となり、コート１０の大きさが同じであれば仮想全天球画像の画質を高いものにしようとするほど大きな値となる。 A camera group 3 is installed around the court 10. Each of the cameras 3-1, 3-2, 3-3,..., 3-N of the camera group 3 is a camera that shoots a moving image (video) including a foreground to be synthesized with the background image 20. 11 is installed so as to surround the periphery of the coat 10 so as to have an angle of view including 11. In FIG. 1, N is an integer equal to or greater than 4. If a virtual omnidirectional image with similar image quality is to be obtained, the larger the coat 10, the larger the value. The higher the image quality of the virtual omnidirectional image, the larger the value.

画像処理装置３０は、各カメラ３−１、３−２、３−３、…、３−Ｎで撮影される動画像を構成する画像である入力画像に対して画像処理を施して、全天球カメラ２より取得した背景画像２０に画像処理後の部分画像を合成する処理を行う。表示装置５は、画像処理装置３０で生成した仮想全天球画像を表示する装置であり、液晶ディスプレイ、ヘッドマウントディスプレイ（ＨＭＤ）等である。 The image processing device 30 performs image processing on an input image that is a moving image captured by each of the cameras 3-1, 3-2, 3-3,. A process of synthesizing the partial image after the image processing with the background image 20 acquired from the spherical camera 2 is performed. The display device 5 is a device that displays a virtual omnidirectional image generated by the image processing device 30, and is a liquid crystal display, a head mounted display (HMD), or the like.

視聴システム９は、画像サーバ６と、ネットワーク７と、複数の視聴装置８とを備える。画像サーバ６は、ネットワーク７を介して画像処理装置３０が生成した仮想全天球画像を配信するサーバである。ネットワーク７は、例えばインターネット等の通信網である。視聴装置８は、ネットワーク７に接続可能なユーザ端末８１と、ユーザ端末８１に接続されたＨＭＤ８２とから構成される装置である。ユーザ端末８１は、ネットワーク７を介して画像サーバ６が配信する仮想全天球画像を受信する機能と、受信した仮想全天球画像をＨＭＤ８２で視聴可能な映像信号に変換してＨＭＤ８２へ出力する機能とを備える。 The viewing system 9 includes an image server 6, a network 7, and a plurality of viewing devices 8. The image server 6 is a server that distributes the virtual omnidirectional image generated by the image processing device 30 via the network 7. The network 7 is a communication network such as the Internet. The viewing device 8 is a device that includes a user terminal 81 that can be connected to the network 7 and an HMD 82 that is connected to the user terminal 81. The user terminal 81 receives a virtual omnidirectional image distributed by the image server 6 via the network 7, converts the received virtual omnidirectional image into a video signal that can be viewed on the HMD 82, and outputs the video signal to the HMD 82. With functionality.

ＨＭＤ８２は、ユーザ端末８１から映像信号等を受信する受信部と、受信部を介して受信した映像信号を表示する液晶ディスプレイ等で構成される画面と、視聴者の頭の動きを検出する検出部と、検出部が検出した結果をユーザ端末８１に送信する送信部とを備える。ＨＭＤ８２の画面に表示される映像は、仮想全天球画像に基づいた仮想全天球映像の一部であり視野と呼ぶ。ＨＭＤ８２は、検出部が検出した視聴者の頭の動きに応じて表示する映像の範囲である視野を変更する機能を有する。 The HMD 82 includes a receiving unit that receives a video signal and the like from the user terminal 81, a screen that includes a liquid crystal display that displays the video signal received through the receiving unit, and a detection unit that detects the movement of the viewer's head. And a transmission unit that transmits a result detected by the detection unit to the user terminal 81. The video displayed on the screen of the HMD 82 is a part of a virtual omnidirectional video based on the virtual omnidirectional image and is called a visual field. The HMD 82 has a function of changing the visual field, which is a range of video to be displayed, according to the viewer's head movement detected by the detection unit.

頭を上下左右に動かすことに応じて視聴している映像が変化するので、ＨＭＤ８２を頭に装着した視聴者は、仮想視点１１の位置から競技を見ているかのような映像を視聴することができる。このように、ＨＭＤ８２を装着した視聴者は、あたかも仮想視点１１に立って競技を観戦しているかのような臨場感のある映像を視聴することができる。 Since the video being viewed changes as the head moves up, down, left and right, the viewer wearing the HMD 82 can view the video as if watching the competition from the position of the virtual viewpoint 11. it can. In this way, the viewer wearing the HMD 82 can view a video with a sense of presence as if standing in the virtual viewpoint 11 and watching the competition.

画像処理システム１において処理される画像は、図２０に示した従来の画像処理システム１で処理される画像と同様であるので、図２０を用いて画像処理システム１の動作について簡単に説明する。全天球カメラ２は、コート１０内の仮想視点１１に設置されて、図２１（Ａ）に示す背景画像２０を競技開始前に撮影する。競技が開始されるとカメラ群３の各カメラが撮影を開始する。例えば、カメラ群３内のカメラ３−１、３−２、３−３は、図２１（Ｂ）に示す部分画像２１〜２３を撮影する。 Since the image processed in the image processing system 1 is the same as the image processed in the conventional image processing system 1 shown in FIG. 20, the operation of the image processing system 1 will be briefly described with reference to FIG. The omnidirectional camera 2 is installed at the virtual viewpoint 11 in the court 10 and shoots the background image 20 shown in FIG. When the competition starts, each camera in the camera group 3 starts shooting. For example, the cameras 3-1, 3-2, and 3-3 in the camera group 3 capture the partial images 21 to 23 illustrated in FIG.

画像処理装置３０は、撮影された部分画像２１〜２３のそれぞれから仮想視点１１を含み、かつ、競技中の選手を含む領域２１１、２２１、２３１を切り出す。画像処理装置３０は、切り出した領域２１１、２２１、２３１の画像に対して、画像処理を行うことで背景画像２０に貼り付け可能な部分画像２１１ａ、２２１ａ、２３１ａを生成する。画像処理装置３０は、背景画像２０に対して部分画像２１１ａ、２２１ａ、２３１ａを合成することで、図２１（Ｃ）に示すような仮想全天球画像２４を生成する。 The image processing apparatus 30 cuts out areas 211, 221, and 231 that include the virtual viewpoint 11 from each of the photographed partial images 21 to 23 and that include players in competition. The image processing apparatus 30 generates partial images 211 a, 221 a, and 231 a that can be pasted on the background image 20 by performing image processing on the images of the extracted areas 211, 221, and 231. The image processing apparatus 30 combines the partial images 211a, 221a, and 231a with the background image 20 to generate a virtual omnidirectional image 24 as shown in FIG.

なお、視聴システム９は、図１に示す構成に限定されるものではない。視聴システム９は、画像処理装置３０が生成した仮想全天球画像を編集してから画像サーバ６へ出力する編集装置を備える構成等、仮想全天球画像をネットワーク７経由で配信可能な構成であればよい。視聴装置８の構成は、ネットワーク７を介して受信した仮想全天球画像を利用者が視聴できる構成であれば、どのような構成であってもよい。 The viewing system 9 is not limited to the configuration shown in FIG. The viewing system 9 has a configuration capable of distributing the virtual omnidirectional image via the network 7, such as a configuration including an editing device that edits the virtual omnidirectional image generated by the image processing device 30 and outputs the edited image to the image server 6. I just need it. The configuration of the viewing device 8 may be any configuration as long as the user can view the virtual omnidirectional image received via the network 7.

次に、図１に示す画像処理装置３０の構成について説明する。図２は、画像処理装置３０の基本構成例を示す図である。図２に示すように、画像処理装置３０は、オブジェクト解析部３１と、奥行取得部３２と、合成情報取得部３３と、画像入力部３４と、画像切り出し部３５と、画像合成部３６と、表示処理部３７と、キーボードやマウス等で構成され、奥行に関する情報を入力する入力部３８と、貼り合わせる部分画像の境界（継ぎ目）を決定する境界決定部３９と、カメラ群３の各カメラが撮影した前景画像を含む部分画像を格納する前景画像格納部３０１と、背景画像２０を格納する背景画像格納部３０２と、オブジェクト情報格納部３０３と、合成情報テーブル３０４とを備える。 Next, the configuration of the image processing apparatus 30 shown in FIG. 1 will be described. FIG. 2 is a diagram illustrating a basic configuration example of the image processing apparatus 30. As shown in FIG. 2, the image processing apparatus 30 includes an object analysis unit 31, a depth acquisition unit 32, a synthesis information acquisition unit 33, an image input unit 34, an image clipping unit 35, an image synthesis unit 36, Each of the cameras in the camera group 3 includes a display processing unit 37, an input unit 38 configured to input information regarding depth, a boundary determination unit 39 that determines a boundary (seam) of partial images to be combined, and a display processing unit 37. A foreground image storage unit 301 that stores a partial image including a captured foreground image, a background image storage unit 302 that stores a background image 20, an object information storage unit 303, and a composite information table 304 are provided.

オブジェクト解析部３１は、前景画像格納部３０１に格納されている部分画像を入力とし、部分画像に含まれるオブジェクトを抽出して、出力する。ここでオブジェクトとは、背景画像２０に含まれていないが部分画像に含まれている人物、物体（例えばボール）等である。オブジェクト解析部３１は、抽出したオブジェクトに対して当該オブジェクトを識別するための識別子であるＩＤを付与する。 The object analysis unit 31 receives a partial image stored in the foreground image storage unit 301 as an input, extracts an object included in the partial image, and outputs it. Here, the object is a person, an object (for example, a ball) or the like that is not included in the background image 20 but is included in the partial image. The object analysis unit 31 assigns an ID that is an identifier for identifying the object to the extracted object.

カメラ群３の各カメラで撮影される部分画像は、所定のフレーム周期を有する動画像であり、各フレームには撮影時間が関連付けられている。オブジェクト解析部３１は、時間方向に一連のフレームから抽出した同一オブジェクトに対して同じＩＤを付与する。オブジェクト情報格納部３０３は、オブジェクトを抽出する対象とした部分画像のフレーム毎の撮影時刻に関連付けてオブジェクト解析部３１が付与したＩＤを含むオブジェクトに関する情報を格納する。 The partial images photographed by each camera in the camera group 3 are moving images having a predetermined frame period, and the photographing time is associated with each frame. The object analysis unit 31 assigns the same ID to the same object extracted from a series of frames in the time direction. The object information storage unit 303 stores information about the object including the ID assigned by the object analysis unit 31 in association with the shooting time for each frame of the partial image from which the object is to be extracted.

例えば、オブジェクト解析部３１は、カメラ３−１が撮影した撮影時刻ｔ、ｔ＋１、ｔ＋２、…の一連のフレームである部分画像２１から抽出したオブジェクトには、ＩＤ１の識別子を付与する。同様に、オブジェクト解析部３１は、カメラ３−２が撮影した撮影時刻ｔ、ｔ＋１、ｔ＋２、…の一連のフレームである部分画像２２から抽出したオブジェクトには、ＩＤ２の識別子を付与し、カメラ３−３が撮影した撮影時刻ｔ、ｔ＋１、ｔ＋２、…の一連のフレームである部分画像２３から抽出したオブジェクトには、ＩＤ３の識別子を付与する。 For example, the object analysis unit 31 assigns an identifier of ID1 to an object extracted from the partial image 21 that is a series of frames at the photographing times t, t + 1, t + 2,. Similarly, the object analysis unit 31 assigns an identifier of ID2 to the object extracted from the partial image 22 that is a series of frames at the shooting times t, t + 1, t + 2,. ID-3 is assigned to the object extracted from the partial image 23, which is a series of frames at the photographing times t, t + 1, t + 2,.

オブジェクト解析部３１は、部分画像を解析してオブジェクトを抽出する際に、オブジェクトの属性を示すラベルと、オブジェクトのコート１０上の空間における３次元的な位置情報である３次元位置情報とを取得する。ラベルの具体例としては、人物であることを示す「人」、ボールであることを示す「ボール」、物体Ａであることを示す「物体Ａ」、物体Ｂであることを示す「物体Ｂ」、…等のカメラ群３の撮影範囲を移動する可能性のある物体を識別する情報を用いる。 When the object analysis unit 31 analyzes the partial image and extracts the object, the object analysis unit 31 acquires a label indicating the attribute of the object and three-dimensional position information that is three-dimensional position information in the space on the court 10 of the object. To do. Specific examples of the label include “person” indicating a person, “ball” indicating a ball, “object A” indicating an object A, and “object B” indicating an object B. ,..., Etc., information for identifying an object that may move within the shooting range of the camera group 3 is used.

オブジェクト解析部３１は、オブジェクトを抽出するために部分画像を解析処理することで、オブジェクトが「人」、「ボール」、「物体Ａ」、「物体Ｂ」のいずれに該当するのかを解析・判定して、その判定結果をラベルとして出力する。なお、オブジェクトが「人」、「ボール」、「物体Ａ」、「物体Ｂ」のいずれに該当するのかを解析・判定する手法としては、公知の画像解析技術を用いる。例えば、画像の解析により人を検出する技術を開示する文献として以下の公知文献１がある。
公知文献１：山内悠嗣、外２名、「[サーベイ論文] 統計的学習手法による人検出」、電子情報通信学会技術研究報告、vol.112、no.197、PRMU2012-43、pp.113-126、2012年9月 The object analysis unit 31 analyzes and determines whether the object corresponds to “person”, “ball”, “object A”, or “object B” by analyzing the partial image in order to extract the object. Then, the determination result is output as a label. It should be noted that a known image analysis technique is used as a method for analyzing and determining whether the object corresponds to “person”, “ball”, “object A”, or “object B”. For example, there is the following publicly known document 1 as a document disclosing a technique for detecting a person by analyzing an image.
Known Document 1: Atsushi Yamauchi and 2 others, “[Survey Paper] Human Detection by Statistical Learning Method”, IEICE Technical Report, vol.112, no.197, PRMU2012-43, pp.113- 126, September 2012

また、オブジェクト解析部３１は、部分画像内におけるオブジェクトの位置、オブジェクトを撮影したカメラ群３内の複数のカメラの位置及びその複数のカメラの撮影範囲（撮影方向及び画角）等の情報に基づいて、コート１０上の空間におけるオブジェクトの３次元位置を取得する。このオブジェクトの３次元位置を取得する手法としては、公知の技術を用いる。また、取得位置情報は、２次元位置の情報であってもよい。 Further, the object analysis unit 31 is based on information such as the position of the object in the partial image, the positions of a plurality of cameras in the camera group 3 that photographed the object, and the photographing ranges (shooting direction and angle of view) of the plurality of cameras. Thus, the three-dimensional position of the object in the space on the court 10 is acquired. As a method for acquiring the three-dimensional position of the object, a known technique is used. Further, the acquisition position information may be information on a two-dimensional position.

オブジェクト情報格納部３０３は、オブジェクト解析部３１が抽出したオブジェクトに関する情報であるオブジェクト情報を入力とし、オブジェクト情報をその撮影時刻に関連付けて格納する。オブジェクト情報は、オブジェクトを識別するＩＤと、オブジェクトの属性を示すラベルと、オブジェクトの３次元位置とを含む。 The object information storage unit 303 receives object information, which is information about the object extracted by the object analysis unit 31, and stores the object information in association with the shooting time. The object information includes an ID for identifying the object, a label indicating the attribute of the object, and the three-dimensional position of the object.

図３は、オブジェクト情報格納部３０３に格納するオブジェクト情報の一例を示す図である。図３に示すように、部分画像の各フレームの撮影時刻を示す時刻ｔ、ｔ＋１、ｔ＋２、…に関連付けて複数のオブジェクト情報を格納している。時刻ｔにおいては、オブジェクト１のオブジェクト情報として、ＩＤ１、ラベル１、３次元位置情報１が格納され、オブジェクト２のオブジェクト情報として、ＩＤ２、ラベル２、３次元位置情報２が格納されている。時刻ｔ＋１、時刻ｔ＋２においても、同じ情報が格納されている。 FIG. 3 is a diagram illustrating an example of object information stored in the object information storage unit 303. As shown in FIG. 3, a plurality of pieces of object information are stored in association with times t, t + 1, t + 2,. At time t, ID1, label 1, and three-dimensional position information 1 are stored as object information of the object 1, and ID2, label 2, and three-dimensional position information 2 are stored as object information of the object 2. The same information is stored at time t + 1 and time t + 2.

奥行取得部３２は、オブジェクト情報格納部３０３よりオブジェクト情報を読み出して、各撮影時刻において、複数のオブジェクトの中から重要なオブジェクトである主オブジェクトを特定して、仮想視点１１から特定した主オブジェクトまでの距離である奥行に関する奥行情報を取得し、出力する。重要なオブジェクトとは、例えば、仮想全天球画像の中で視聴者が注視する領域に存在するオブジェクトである。 The depth acquisition unit 32 reads out the object information from the object information storage unit 303, specifies a main object that is an important object from among a plurality of objects at each shooting time, and continues from the virtual viewpoint 11 to the specified main object. The depth information related to the depth which is the distance is acquired and output. An important object is, for example, an object that exists in a region in which a viewer gazes in a virtual omnidirectional image.

奥行取得部３２は、予め各撮影時刻における主オブジェクトを特定しておく。具体的には、仮想全天球画像を作成するコンテンツ作成者が、各撮影時刻において視聴者が注視すると推定される領域または視聴者が注視すると推定されるオブジェクトを特定する情報を入力部３８から入力する。これにより、奥行取得部３２は、入力された情報に基づいて各撮影時刻における主オブジェクトを特定する。奥行取得部３２において、主オブジェクトを特定する方法は、上述した方法に限定されるものではなく、色々な方法を用いてよい。例えば、撮影した部分画像における視聴者の興味の度合いを領域別に表したマップであるＳａｌｉｅｎｃｙＭａｐを求めて奥行取得部３２に入力する。奥行取得部３２では、入力されたＳａｌｉｅｎｃｙＭａｐに基づいて視覚的に顕著な領域に存在するオブジェクトを主オブジェクトとして特定してもよい。また、予め被験者に部分画像である動画を視聴させ、各撮影時刻においてどの領域を見ていたかという視聴ログを獲得し、その視聴ログを奥行取得部３２に入力し、入力された視聴ログに基づいて主オブジェクトを特定してもよい。 The depth acquisition unit 32 specifies the main object at each shooting time in advance. Specifically, the content creator who creates the virtual omnidirectional image uses the input unit 38 to specify information for identifying an area estimated to be watched by the viewer or an object estimated to be watched by the viewer at each shooting time. input. Thereby, the depth acquisition part 32 specifies the main object in each imaging | photography time based on the input information. The method for specifying the main object in the depth acquisition unit 32 is not limited to the method described above, and various methods may be used. For example, the Salientity Map, which is a map showing the degree of interest of the viewer in the captured partial image for each region, is obtained and input to the depth acquisition unit 32. The depth acquisition unit 32 may identify an object that exists in a visually noticeable region as a main object based on the input Salinity Map. In addition, the test subject is allowed to view a video that is a partial image in advance, a viewing log indicating which region was viewed at each shooting time is acquired, the viewing log is input to the depth acquisition unit 32, and based on the input viewing log The main object may be specified.

なお、ＳａｌｉｅｎｃｙＭａｐの求め方は公知の技術であり、例えば、以下の公知文献２に記載の技術を用いてもよい。
公知文献２：Laurent Itti, Christof Koch, and Ernst Niebur,"A Model of Saliency-Based Visual Attention for Rapid Scene Analysis",IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11):1254-1259 (1998) In addition, the method for obtaining the Saliency Map is a known technique. For example, the technique described in the following known document 2 may be used.
Known Document 2: Laurent Itti, Christof Koch, and Ernst Niebur, "A Model of Saliency-Based Visual Attention for Rapid Scene Analysis", IEEE Transactions on Pattern Analysis and Machine Intelligence, 20 (11): 1254-1259 (1998)

合成情報テーブル３０４は、部分画像から仮想視点１１を含む領域を切り出すための切り出し領域に関する情報である切出領域情報と、その切り出し領域に応じて切り出した画像を部分画像に変換するための情報である変換情報とを含む合成情報を格納する。部分画像は、切り出した画像を背景画像２０の対応領域に違和感なく貼り付けるために、切り出した画像に対して上記変換情報に応じて拡大、縮小、回転等の変形処理を行って生成される。この変形処理は、例えば、画像に対してアフィン変換を施すことによって行う。画像に対してアフィン変換を施す場合の変換情報は、例えばアフィン変換行列である。以下、部分領域画像に対して行う変形処理としてアフィン変換を用いる例を示すが、変形処理はアフィン変換に限定される必要はなく、変換情報に応じて拡大、縮小、回転等による画像の変換を行う処理であればどのような処理であってもよい。合成情報テーブル３０４は、カメラ群３において処理対象となる部分画像を撮影したカメラを特定するカメラコードと、仮想視点１１からの奥行と、その奥行に応じたアフィン変換行列である変換情報と、その奥行に応じた切出領域情報とを対応づけて格納するテーブルである。 The composite information table 304 is cut-out area information that is information about a cut-out area for cutting out an area including the virtual viewpoint 11 from a partial image, and information for converting an image cut out according to the cut-out area into a partial image. Composite information including certain conversion information is stored. The partial image is generated by subjecting the cut-out image to deformation processing such as enlargement, reduction, and rotation according to the conversion information in order to paste the cut-out image to the corresponding region of the background image 20 without a sense of incongruity. This deformation process is performed, for example, by performing affine transformation on the image. The conversion information when performing affine transformation on an image is, for example, an affine transformation matrix. The following shows an example of using affine transformation as the deformation processing performed on the partial area image. However, the deformation processing is not limited to affine transformation, and image conversion by enlargement, reduction, rotation, etc. is performed according to conversion information. Any process may be used as long as the process is performed. The composite information table 304 includes a camera code that identifies a camera that has captured a partial image to be processed in the camera group 3, a depth from the virtual viewpoint 11, conversion information that is an affine transformation matrix corresponding to the depth, and It is a table which stores in association with cut-out area information according to depth.

アフィン変換行列は、以下に示す方法により予め取得して合成情報テーブル３０４に記憶しておく。例えば、仮想視点１１から複数種類の距離（奥行）の位置に格子模様のチェスボードを設置して、仮想視点１１に設置した全天球カメラ２で撮影したチェスボードを含む画像と、カメラ群３で撮影したチェスボードを含む画像とを比較する。そして両画像において、撮影したチェスボードの各格子が対応するように画像を変換するアフィン変換行列を求める。このようにして、チェスボードを設置した奥行に対応したアフィン変換行列を求める。 The affine transformation matrix is acquired in advance by the following method and stored in the synthesis information table 304. For example, an image including a chess board photographed by the omnidirectional camera 2 installed at the virtual viewpoint 11 by installing a lattice-patterned chess board at a plurality of types of distances (depths) from the virtual viewpoint 11, and the camera group 3 Compare the image with the chess board taken in. Then, in both images, an affine transformation matrix for transforming the images so as to correspond to each grid of the photographed chess board is obtained. In this way, an affine transformation matrix corresponding to the depth at which the chess board is installed is obtained.

切出領域情報は、以下に示す方法により予め取得して合成情報テーブル３０４に記憶しておく。例えば、カメラ群３の内の隣接する２つのカメラで撮影された部分画像に同一の被写体（チェスボード）が存在する重複している領域がある場合は、一方の領域のみ残るように双方のカメラの画像に対する切り出し領域を設定する。切り出し領域は、仮想視点１１から被写体（チェスボード）まで複数種類の距離（奥行）について、カメラ群３に含まれるカメラ毎に求める。なお、双方のカメラの画像において、数画素〜数十画素の幅の重複領域を残すように切り出し領域を設定してもよい。 The cut-out area information is acquired in advance by the following method and stored in the synthesis information table 304. For example, if there is an overlapping area where the same subject (chessboard) exists in partial images taken by two adjacent cameras in the camera group 3, both cameras remain so that only one area remains. The cutout area for the image of is set. The cutout area is obtained for each camera included in the camera group 3 with respect to a plurality of types of distances (depths) from the virtual viewpoint 11 to the subject (chess board). Note that the cutout area may be set so that an overlapping area having a width of several pixels to several tens of pixels is left in the images of both cameras.

合成情報取得部３３は、奥行取得部３２が取得した奥行を入力とし、奥行に基づいて、合成情報テーブル３０４から、カメラ群３の各カメラで撮影された部分画像に対応する切り出し領域及びアフィン変換行列を含む合成情報を取得して、出力する。なお、合成情報テーブル３０４に格納されている奥行は数種類〜数十種類なので、奥行取得部３２が取得した奥行と同じ値の奥行のテーブルが無い場合が想定される。このような場合は、合成情報取得部３３は、奥行取得部３２が取得した奥行の前後の値となる合成情報テーブル３０４に記録済の２つの奥行の値に対応する合成情報（切出領域情報及び変換情報）を用いて、奥行取得部３２が取得した奥行に対応する合成情報を算出する。具体的には、上記記録済の２つの奥行の値に対応する切出領域情報の切り出し領域の座標値を線形補間することにより、その中間に位置する切り出し領域を特定する。上記記録済の２つの奥行の値に対応するアフィン変換行列の各係数を線形補間することにより、その中間値となるアフィン変換行列を算出する。 The composite information acquisition unit 33 uses the depth acquired by the depth acquisition unit 32 as an input, and based on the depth, from the composite information table 304, a cutout region and an affine transformation corresponding to a partial image captured by each camera of the camera group 3 Obtain and output composite information including a matrix. Since there are several to several tens of depths stored in the composite information table 304, it is assumed that there is no depth table having the same value as the depth acquired by the depth acquisition unit 32. In such a case, the composite information acquisition unit 33 combines information corresponding to the two depth values recorded in the composite information table 304 that are values before and after the depth acquired by the depth acquisition unit 32 (cutout area information). And conversion information), the combined information corresponding to the depth acquired by the depth acquisition unit 32 is calculated. Specifically, the coordinate value of the clip region in the clip region information corresponding to the two recorded depth values is linearly interpolated to identify the clip region located between the two. By linearly interpolating each coefficient of the affine transformation matrix corresponding to the two recorded depth values, an affine transformation matrix serving as an intermediate value is calculated.

前景画像格納部３０１は、各カメラを特定するカメラコードに関連付けてカメラ群３の各カメラで撮影した前景画像を含む部分画像を格納する。部分画像は、撮影時刻及び動画の画像データを含む。前景画像格納部３０１は、例えば、図２１（Ｂ）に示す部分画像２１を、カメラ３−１を特定するカメラコードに関連付けて格納し、部分画像２２を、カメラ３−３を特定するカメラコードに関連付けて格納し、部分画像２３を、カメラ３−３を特定するカメラコードに関連付けて格納する。 The foreground image storage unit 301 stores a partial image including a foreground image captured by each camera of the camera group 3 in association with a camera code that identifies each camera. The partial image includes shooting time and moving image data. The foreground image storage unit 301 stores, for example, the partial image 21 shown in FIG. 21B in association with the camera code that identifies the camera 3-1, and the partial image 22 that identifies the camera 3-3. And the partial image 23 is stored in association with the camera code that identifies the camera 3-3.

背景画像格納部３０２は、全天球カメラ２で撮影した全天球画像である背景画像２０を格納する。背景画像格納部３０２は、例えば、コート１０内の仮想視点１１に設置した天球カメラ２で撮影した図２１（Ａ）に示す背景画像２０を格納する。格納する背景画像２０は、１フレーム分の画像データでも所定時間分の動画の画像データでもよい。所定時間分の画像データを格納する場合は、背景画像２０において周期的に変化する部分（例えば電光掲示板が映っている部分があり、かつ、電光掲示板の表示内容が周期的に変化している部分。）があれば、その周期に応じた時間分の画像データを背景画像２０として格納すればよい。 The background image storage unit 302 stores the background image 20 that is an omnidirectional image captured by the omnidirectional camera 2. The background image storage unit 302 stores, for example, the background image 20 shown in FIG. 21A photographed by the celestial camera 2 installed at the virtual viewpoint 11 in the court 10. The background image 20 to be stored may be image data for one frame or moving image data for a predetermined time. When storing image data for a predetermined time, a portion that periodically changes in the background image 20 (for example, a portion in which an electric bulletin board is reflected and a portion in which the display content of the electric bulletin board is periodically changed) .), Image data for a time corresponding to the cycle may be stored as the background image 20.

画像処理装置３０が全天球カメラ２から背景画像２０を取得する構成はどのような構成であってもよい。例えば、画像処理装置３０が全天球カメラ２と有線または無線で通信可能な通信部を備えて、その通信部を介して背景画像２０を取得する構成であってもよい。また、全天球カメラ２に着脱可能な記録媒体を用いて当該記録媒体に背景画像２０を記録して、記録後の記録媒体を画像処理装置３０に接続して、画像処理装置３０が記録媒体から背景画像２０を読み出す構成により、背景画像２０を取得する構成であってもよい。また、画像処理装置３０が、カメラ群３から部分画像を取得する構成も全天球カメラ２の場合と同様にどのような構成であってもよい。 The configuration in which the image processing apparatus 30 acquires the background image 20 from the omnidirectional camera 2 may be any configuration. For example, the image processing device 30 may include a communication unit that can communicate with the omnidirectional camera 2 in a wired or wireless manner, and the background image 20 may be acquired via the communication unit. In addition, the background image 20 is recorded on the recording medium using a recording medium that can be attached to and removed from the omnidirectional camera 2, and the recorded recording medium is connected to the image processing apparatus 30. A configuration in which the background image 20 is acquired from the background image 20 may be obtained. Further, the configuration in which the image processing device 30 acquires the partial image from the camera group 3 may be any configuration as in the case of the omnidirectional camera 2.

画像入力部３４は、部分画像格納部３０１から部分画像を取得し、背景画像格納部３０２から背景画像２０を取得して、部分画像を画像切り出し部３５へ出力し、背景画像２０を画像合成部３６へ出力する。画像切り出し部３５は、合成情報取得部３３が取得した合成情報に含まれる切出領域情報に基づいて、カメラ群３の各カメラからの部分画像に対応する切り出し領域を特定し、部分画像から特定した切り出し領域を切り出して、切り出した画像を画像合成部３６へ出力する。画像切り出し部３５は、例えば、図２１（Ｂ）に示す部分画像２１〜２３のそれぞれから切り出し領域２１１、２２１、２３１を切り出す処理を行う。 The image input unit 34 acquires a partial image from the partial image storage unit 301, acquires the background image 20 from the background image storage unit 302, outputs the partial image to the image cutout unit 35, and outputs the background image 20 to the image composition unit To 36. The image cutout unit 35 specifies a cutout region corresponding to the partial image from each camera of the camera group 3 based on the cutout region information included in the composite information acquired by the composite information acquisition unit 33, and specifies from the partial image. The cut out area is cut out, and the cut out image is output to the image composition unit 36. For example, the image cutout unit 35 performs a process of cutting out the cutout areas 211, 221, and 231 from each of the partial images 21 to 23 illustrated in FIG.

画像合成部３６は、画像切り出し部３５が切り出した画像と合成情報取得部３３が取得した合成情報と、背景画像を入力とし、画像切り出し部３５が切り出した画像に対して、合成情報取得部３３が取得した合成情報に含まれる変換情報のアフィン変換行列に基づいて変形処理を行い、部分画像を生成する。画像合成部３６は、生成した部分画像をアフィン変換行列に基づいて背景画像２０に貼り付けて合成することで仮想全天球画像を生成し、出力する。なお、アフィン変換行列は、背景画像２０において部分画像を貼り付ける領域を示す情報を含む。画像合成部３６は、生成した仮想全天球画像を画像サーバ６へ送信する機能を有する。 The image synthesizing unit 36 receives the image cut out by the image cutout unit 35, the combination information acquired by the synthesis information acquisition unit 33, and the background image as input, and performs the synthesis information acquisition unit 33 on the image cut out by the image cutout unit 35. The transformation processing is performed based on the affine transformation matrix of the transformation information included in the composite information acquired by generating a partial image. The image synthesizing unit 36 generates and outputs a virtual omnidirectional image by pasting the generated partial image to the background image 20 based on the affine transformation matrix and synthesizing it. Note that the affine transformation matrix includes information indicating an area where the partial image is pasted in the background image 20. The image composition unit 36 has a function of transmitting the generated virtual omnidirectional image to the image server 6.

画像合成部３６は、例えば、図２１（Ｂ）に示す部分画像２１〜２３のそれぞれから切り出し領域２１１、２２１、２３１を切り出した画像に対して、アフィン変換行列に基づいた変形処理を行うことで、部分画像２１１ａ、２２１ａ、２３１ａを生成する。画像合成部３６は、例えば、背景画像２０に対して、部分画像２１１ａ、２２１ａ、２３１ａを所定の領域に貼り付けて合成することで図２１（Ｃ）に示す仮想全天球画像２４を生成する。 For example, the image composition unit 36 performs a deformation process based on the affine transformation matrix on the images obtained by cutting out the cutout areas 211, 221, and 231 from the partial images 21 to 23 illustrated in FIG. Partial images 211a, 221a, and 231a are generated. The image composition unit 36 generates the virtual omnidirectional image 24 shown in FIG. 21C by, for example, pasting the partial images 211a, 221a, and 231a on the background image 20 and combining the partial images 211a, 221a, and 231a. .

部分画像を背景画像２０に貼り付けて仮想全天球画像２４を生成した際に、隣り合う部分画像間の境界領域において重複が発生する場合がある。図４は、隣り合う部分画像間の境界領域において重複が発生する場合の具体例を示す図である。図４に示すように、仮想全天球画像２４に貼り付けた部分画像２１１ｂと部分画像２２１ｂとが境界領域２５において重複している。なお、図４に示す部分画像２１１ｂと部分画像２２１ｂが、図２１（Ｃ）に示した部分画像２１１ａ及び部分画像２２１ａと比較して異なる点は、両画像に重複する領域がある点である。 When the partial image is pasted on the background image 20 and the virtual omnidirectional image 24 is generated, there may be an overlap in the boundary region between the adjacent partial images. FIG. 4 is a diagram illustrating a specific example in the case where overlap occurs in a boundary region between adjacent partial images. As shown in FIG. 4, the partial image 211 b and the partial image 221 b pasted on the virtual omnidirectional image 24 overlap in the boundary region 25. Note that the partial image 211b and the partial image 221b shown in FIG. 4 are different from the partial image 211a and the partial image 221a shown in FIG. 21C in that there are overlapping areas in both images.

図４に示すように、部分画像２１１ｂと部分画像２２１ｂとが境界領域２５において重複している場合には、画像合成部３６は、重複している境界領域２５に対して以下に示すブレンディング（Ｂｌｅｎｄｉｎｇ）処理を行う。画像合成部３６は、Ｂｌｅｎｄｉｎｇパラメータαを定め、（式１）に基づいて重複領域２５の各ピクセルの値を算出する。
ｇ（ｘ、ｙ）＝αＩ_ｉ（ｘ、ｙ）＋（１−α）Ｉ_ｉ＋１（ｘ、ｙ） … （式１） As illustrated in FIG. 4, when the partial image 211 b and the partial image 221 b overlap in the boundary region 25, the image composition unit 36 performs blending (Blending) described below for the overlapping boundary region 25. ) Process. The image composition unit 36 determines a blending parameter α, and calculates the value of each pixel in the overlap region 25 based on (Equation 1).
g (x, y) = αI _i (x, y) + (1−α) I _{i + 1} (x, y) (Equation 1)

（式１）において、ｘ、ｙは、仮想全天球画像２４上における水平方向、垂直方向の座標である。ｇ（ｘ、ｙ）は、境界領域２５内の座標（ｘ、ｙ）の画素値の値である。Ｉ_ｉ（ｘ、ｙ）とＩ_ｉ＋１（ｘ、ｙ）は、カメラ群３内のカメラ３−ｉ及びカメラ３−（ｉ＋１）によって撮影された部分画像に基づいて生成された部分画像の座標（ｘ、ｙ）の画素値の値を表す。また、このαの値は重複領域２５で一定であるが、以下の（式２）に示すように変化させてもよい。
α（ｘ）＝（ｘ−ｘｓ）／（ｘｅ−ｘｓ） … （式２）
（式２）において、ｘｓ及びｘｅは、図４に示すように重複領域２５の両端のｘ座標であり、ｘｓ＜ｘｅである。 In (Expression 1), x and y are horizontal and vertical coordinates on the virtual omnidirectional image 24. g (x, y) is the value of the pixel value of the coordinates (x, y) in the boundary region 25. I _i (x, y) and I _{i + 1} (x, y) are coordinates of partial images generated based on the partial images photographed by the cameras 3-i and 3- (i + 1) in the camera group 3. x, y) represents the value of the pixel value. Further, the value of α is constant in the overlapping region 25, but may be changed as shown in the following (Equation 2).
α (x) = (x−xs) / (xe−xs) (Formula 2)
In (Expression 2), xs and xe are the x coordinates of both ends of the overlapping region 25 as shown in FIG. 4, and xs <xe.

表示処理部３７は、画像合成部３６が出力する仮想全天球画像を入力とし、仮想全天球画像を表示装置５において表示可能な映像信号に変換して出力する。仮想全天球画像２４は、図２１（Ｃ）に示す通り、歪みを含む画像であり、かつ、仮想視点１１を中心とする３６０度の景色を含む画像であるので、表示処理部３７は、仮想全天球画像から表示装置５に表示させる範囲の画像を切り出して、切り出した画像の歪みを補正する機能を有する。 The display processing unit 37 receives the virtual omnidirectional image output from the image synthesis unit 36, converts the virtual omnidirectional image into a video signal that can be displayed on the display device 5, and outputs the video signal. As shown in FIG. 21C, the virtual omnidirectional image 24 is an image including distortion and an image including a landscape of 360 degrees with the virtual viewpoint 11 as the center. It has a function of cutting out an image in a range to be displayed on the display device 5 from the virtual omnidirectional image and correcting distortion of the cut out image.

画像処理装置３０は、前景画像格納部３０１及び背景画像格納部３０２を備える構成としたが、これに限定されるものではない。例えば、前景画像格納部３０１及び背景画像格納部３０２を備える画像格納装置を別に設け、画像処理装置３０は、画像格納装置から前景画像格納部３０１及び背景画像格納部３０２を取得する構成であってもよい。 The image processing apparatus 30 includes the foreground image storage unit 301 and the background image storage unit 302, but is not limited thereto. For example, an image storage device including a foreground image storage unit 301 and a background image storage unit 302 is separately provided, and the image processing device 30 acquires the foreground image storage unit 301 and the background image storage unit 302 from the image storage device. Also good.

境界決定部３９は、画像合成部３６が出力する仮想全天球画像と合成情報を入力とし、前述した境界領域に対して、ブレンディング処理を行うのではなく、目に付きにくい境界線とすることにより、自然な画像合成を行うための境界領域を決定し、出力する。 The boundary determination unit 39 receives the virtual celestial sphere image output from the image synthesis unit 36 and the synthesis information as input, and does not perform blending processing on the boundary region described above, but sets the boundary line to be difficult to see. Thus, a boundary region for performing natural image composition is determined and output.

次に、画像処理システム１において１フレームの仮想全天球画像を作成する動作について説明する。図５は、画像処理システム１において１フレームの仮想全天球画像を作成する動作を示すフロー図である。図５に示す動作は、各撮影時刻における仮想全天球画像を生成する処理の前に、予めオブジェクト情報、合成情報、背景画像２０及び部分画像を取得する処理も含まれる。 Next, an operation for creating a virtual omnidirectional image of one frame in the image processing system 1 will be described. FIG. 5 is a flowchart showing an operation of creating a virtual omnidirectional image of one frame in the image processing system 1. The operation shown in FIG. 5 includes a process of acquiring object information, composite information, background image 20 and partial image in advance before the process of generating a virtual omnidirectional image at each shooting time.

仮想視点１１に全天球カメラ２を設置し、仮想視点１１から所定の距離（奥行）にチェスボードを設置した後に、全天球カメラ２は、チェスボードを含む全天球画像を撮影する（ステップＳ１０１）。全天球カメラ２を仮想視点１１から取り去って、カメラ群３の各カメラで、仮想視点１１及びチェスボードを含む撮影範囲を撮影し、全天球カメラ２で撮影された全天球画像に含まれるチェスボードと、カメラ群３内の一つのカメラで撮影された画像に含まれるチェスボードとを対応させるための合成情報を求める（ステップＳ１０２）。なお、ステップＳ１０１、１０２におけるチェスボードの撮影は、仮想視点１１から複数種類の距離にチェスボードを設置して行われる。 After the omnidirectional camera 2 is installed at the virtual viewpoint 11 and the chess board is installed at a predetermined distance (depth) from the virtual viewpoint 11, the omnidirectional camera 2 captures an omnidirectional image including the chess board ( Step S101). The omnidirectional camera 2 is removed from the virtual viewpoint 11, and the shooting range including the virtual viewpoint 11 and the chess board is taken by each camera of the camera group 3, and is included in the omnidirectional image taken by the omnidirectional camera 2. The composite information for associating the chess board to be matched with the chess board included in the image photographed by one camera in the camera group 3 is obtained (step S102). Note that the shooting of the chess board in steps S101 and S102 is performed by installing the chess board at a plurality of types of distances from the virtual viewpoint 11.

仮想視点１１に全天球カメラ２を設置した後に、全天球カメラ２は、背景画像２０を撮影する（ステップＳ１０３）。撮影された背景画像２０は、背景画像格納部３０２に格納される。全天球カメラ２を仮想視点１１から取り去った後であって、例えば競技開始と共に、カメラ群３は撮影を開始する。これにより、画像処理装置３０は、カメラ群３が撮影した部分画像を前景画像格納部３０１に格納する。オブジェクト解析部３１は、前景画像格納部３０１から部分画像を読み出して解析処理し、解析結果をオブジェクト情報格納部３０３に格納する。奥行取得部３２は、オブジェクト情報格納部３０３に格納されているオブジェクトの中から、入力部３８から入力された情報に基づいて主オブジェクトを特定する。奥行取得部３２は、仮想視点１１から特定した主オブジェクトまでの奥行情報を取得する（ステップＳ１０４）。 After the omnidirectional camera 2 is installed at the virtual viewpoint 11, the omnidirectional camera 2 captures the background image 20 (step S103). The captured background image 20 is stored in the background image storage unit 302. After the omnidirectional camera 2 is removed from the virtual viewpoint 11, the camera group 3 starts photographing, for example, when the competition starts. As a result, the image processing apparatus 30 stores the partial image captured by the camera group 3 in the foreground image storage unit 301. The object analysis unit 31 reads out the partial image from the foreground image storage unit 301 and performs analysis processing, and stores the analysis result in the object information storage unit 303. The depth acquisition unit 32 specifies a main object based on information input from the input unit 38 from among the objects stored in the object information storage unit 303. The depth acquisition unit 32 acquires depth information from the virtual viewpoint 11 to the identified main object (step S104).

合成情報取得部３３は、奥行取得部３２が取得した奥行を入力とし、奥行に基づいて、合成情報テーブル３０４から、各部分画像に対応する切り出し領域及びアフィン変換行列を含む合成情報を取得して、出力する（ステップＳ１０５）。ステップＳ１０５において、合成情報取得部３３は、奥行取得部３２が取得した奥行と同じ値の奥行のテーブルが無い場合は、奥行取得部３２が取得した奥行の前後の値となる奥行に対応する合成情報に基づいて、奥行取得部３２が取得した奥行に対応する合成情報を求める。 The composite information acquisition unit 33 receives the depth acquired by the depth acquisition unit 32 as input, and acquires composite information including a cutout region and an affine transformation matrix corresponding to each partial image from the composite information table 304 based on the depth. Are output (step S105). In step S <b> 105, when there is no depth table having the same value as the depth acquired by the depth acquisition unit 32, the composite information acquisition unit 33 combines the depth corresponding to the depth that is the value before and after the depth acquired by the depth acquisition unit 32. Based on the information, composite information corresponding to the depth acquired by the depth acquisition unit 32 is obtained.

画像切り出し部３５は、合成情報取得部３３が取得した合成情報に含まれる切出領域情報を入力とし、切出領域情報に基づいて、カメラ群３の各カメラからの部分画像に対応する切り出し領域を特定し、部分画像から特定した切り出し領域を切り出して、切り出した画像を画像合成部３６へ出力する。画像合成部３６は、画像切り出し部３５が切り出した画像と合成情報取得部３３が取得した合成情報と背景画像を入力とし、画像切り出し部３５が切り出した画像に対して、合成情報に含まれる変換情報のアフィン変換行列に基づいて変形処理を行い、部分画像を生成する。画像合成部３６は、生成した部分画像をアフィン変換行列に基づいて背景画像２０に貼り付けて合成することで仮想全天球画像を生成し、出力する（ステップＳ１０６）。 The image cutout unit 35 receives the cutout region information included in the composite information acquired by the composite information acquisition unit 33, and based on the cutout region information, the cutout region corresponding to the partial image from each camera in the camera group 3 Is extracted, the specified cutout region is cut out from the partial image, and the cutout image is output to the image composition unit 36. The image composition unit 36 receives the image cut out by the image cutout unit 35, the combination information acquired by the combination information acquisition unit 33, and the background image, and converts the image cut out by the image cutout unit 35 into the conversion information included in the combination information. A deformation process is performed based on the affine transformation matrix of information to generate a partial image. The image compositing unit 36 generates and outputs a virtual omnidirectional image by pasting the generated partial image to the background image 20 based on the affine transformation matrix and compositing (step S106).

画像合成部３６は、背景画像２０に貼り付ける２つの部分画像間の境界領域において重複している場合には、重複している境界領域に対してブレンディング処理を行う（ステップＳ１０７）。 When overlapping in the boundary region between the two partial images pasted on the background image 20, the image composition unit 36 performs blending processing on the overlapping boundary region (step S107).

次に、画像処理装置３０が動画の仮想全天球画像を作成する基本動作について説明する。図６は、画像処理装置３０が動画の仮想全天球画像を作成する動作について説明する図である。図６の動作においては、図５に示したステップＳ１０１〜ステップＳ１０４における部分画像の撮影までの処理は既に終えているものとする。図６に示すように、画像処理装置３０は、最初の撮影時刻のフレームに対する処理を開始する（ステップＳ２０１）。 Next, a basic operation in which the image processing apparatus 30 creates a virtual omnidirectional image of a moving image will be described. FIG. 6 is a diagram illustrating an operation in which the image processing apparatus 30 creates a virtual omnidirectional image of a moving image. In the operation of FIG. 6, it is assumed that the processes up to capturing of the partial image in steps S <b> 101 to S <b> 104 shown in FIG. 5 have already been completed. As shown in FIG. 6, the image processing apparatus 30 starts processing for the frame at the first photographing time (step S201).

画像入力部３４は、前景画像格納部３０１から部分画像を取得し、背景画像格納部３０２から背景画像２０を取得して、部分画像を画像切り出し部３５へ出力し、背景画像２０を画像合成部３６へ出力する（ステップＳ２０２）。奥行取得部３２は、オブジェクト情報格納部３０３に格納されているオブジェクトの中から、入力部３８から入力された情報に基づいて主オブジェクトを特定して、特定した主オブジェクトまでの奥行を取得する（ステップＳ２０３）。 The image input unit 34 acquires a partial image from the foreground image storage unit 301, acquires the background image 20 from the background image storage unit 302, outputs the partial image to the image clipping unit 35, and outputs the background image 20 to the image composition unit 36 (step S202). The depth acquisition unit 32 specifies a main object from the objects stored in the object information storage unit 303 based on information input from the input unit 38, and acquires the depth to the specified main object ( Step S203).

合成情報取得部３３は、奥行取得部３２が取得した奥行を入力とし、奥行に基づいて、合成情報テーブル３０４から、各部分画像に対応する合成情報を取得して、出力する（ステップＳ２０４）。画像切り出し部３５は、合成情報取得部３３が取得した合成情報を入力とし、合成情報に基づいて、部分画像から切り出し領域を切り出して、切り出した画像を画像合成部３６へ出力する。画像合成部３６は、画像切り出し部３５が切り出した画像と合成情報取得部３３が取得した合成情報と背景画像を入力とし、画像切り出し部３５が切り出した画像に対して、合成情報に含まれるアフィン変換行列に基づいて変形処理を行い、部分画像を生成する。画像合成部３６は、生成した部分画像をアフィン変換行列に基づいて背景画像２０に貼り付けて合成して、仮想全天球画像を生成して出力する（ステップＳ２０５）。画像処理装置３０は、次の撮影時刻の部分画像があればステップＳ２０１に戻りループを継続し、次の撮影時刻の部分画像がなければ、ループを終了する（ステップＳ２０６）。 The composite information acquisition unit 33 receives the depth acquired by the depth acquisition unit 32 as input, acquires composite information corresponding to each partial image from the composite information table 304 based on the depth, and outputs the composite information (step S204). The image cutout unit 35 receives the combination information acquired by the combination information acquisition unit 33, cuts out a cutout region from the partial image based on the combination information, and outputs the cutout image to the image composition unit 36. The image composition unit 36 receives the image cut out by the image cutout unit 35, the combination information acquired by the combination information acquisition unit 33, and the background image as input, and the image clipped by the image cutout unit 35 includes the affine included in the combination information. A deformation process is performed based on the transformation matrix to generate a partial image. The image composition unit 36 combines the generated partial image with the background image 20 based on the affine transformation matrix to generate and output a virtual omnidirectional image (step S205). If there is a partial image at the next shooting time, the image processing apparatus 30 returns to step S201 to continue the loop, and if there is no partial image at the next shooting time, the loop ends (step S206).

以上に説明したように画像処理装置３０は、視聴者が注目する主オブジェクトに対応した奥行を求めて、求めた奥行に対応した部分画像の生成し、生成した部分画像を背景画像２０に貼り付けることで仮想全天球画像を生成することができる。これにより、画像処理装置３０は、仮想全天球画像に含まれる主オブジェクトである被写体において分身が起こったり、消失が起こったりすることを抑制することができる。画像処理装置３０は、視聴者の注目する被写体の奥行に応じた合成処理を行うことで、仮想全天球画像に含まれる視聴者の注目する被写体における分身の発生を抑制することができ、視聴品質の低下を抑制した仮想全天球画像を視聴者に提供することができる。 As described above, the image processing apparatus 30 obtains a depth corresponding to the main object that the viewer is interested in, generates a partial image corresponding to the obtained depth, and pastes the generated partial image on the background image 20. Thus, a virtual omnidirectional image can be generated. Thereby, the image processing apparatus 30 can suppress the occurrence of alternation or disappearance in the subject that is the main object included in the virtual omnidirectional image. The image processing device 30 can suppress the occurrence of a parting in the subject of interest of the viewer included in the virtual omnidirectional image by performing the composition processing according to the depth of the subject of interest of the viewer. It is possible to provide a viewer with a virtual omnidirectional image in which the deterioration of quality is suppressed.

＜第１の実施形態＞
次に、本発明の第１の実施形態による画像処理装置を説明する。第１の実施形態は、前述した合成処理中に行う境界処理について変形を加えたものである。第１の実施形態では、前述したように、ブレンディング処理を行うのではなく、部分画像の合成時に目に付きにくい境界とするものである。ここで、図７を参照して仮想全天球画像の生成処理について簡単に説明する。図７は、仮想全天球画像の生成処理を示す模式図である。まず、カメラＣ_ｉ−１、Ｃ_ｉ、Ｃ_ｉ＋１によって撮影されたそれぞれの動画（映像）からそれぞれ取得された入力画像を事前に獲得する。カメラによって撮影された動画（映像）は、複数フレームの画像により構成されるものであり、ここでは処理対象フレームの画像を入力画像として獲得する。また、エネルギー画像の生成のために、処理対象フレームよりも過去の１つまたは複数のフレームの画像も事前に獲得する。そして、得られたそれぞれの入力画像から前景となる切り出し画像Ｓ_ｉ−１、Ｓ_ｉ、Ｓ_ｉ＋１をそれぞれ切り出す。 <First Embodiment>
Next, an image processing apparatus according to the first embodiment of the present invention will be described. In the first embodiment, the boundary process performed during the above-described synthesis process is modified. In the first embodiment, as described above, the blending process is not performed, but a boundary that is not easily noticed when the partial images are combined is used. Here, the virtual omnidirectional image generation processing will be briefly described with reference to FIG. FIG. 7 is a schematic diagram illustrating a virtual omnidirectional image generation process. First, input images respectively acquired from respective moving images (videos) photographed by the cameras C _i−1 , C _i , and C _{i + 1} are acquired in advance. A moving image (video) photographed by the camera is composed of images of a plurality of frames, and here, an image of a processing target frame is acquired as an input image. In addition, in order to generate an energy image, an image of one or more frames past the processing target frame is also acquired in advance. Then, cut-out images S _i−1 , S _i , and S _{i + 1} that are the foreground are cut out from the obtained input images.

次に、切り出し画像Ｓ_ｉ−１、Ｓ_ｉ、Ｓ_ｉ＋１に対して、予め求めてあるアフィン変換行列Ａ_ｉ−１、Ａ_ｉ、Ａ_ｉ＋１によって画像変換を行い、部分画像Ｓ’_ｉ−１、Ｓ’_ｉ、Ｓ’_ｉ＋１を生成する。そして、予め撮影してあった全天球画像Ｂと合成処理を行う。このとき、部分画像を重なり合うように並べて合成するのではなく、シームカービングの処理を適用して、部分画像と部分画像の継ぎ目が目に付かないようにして合成を行う。このように合成するようにすることにより。仮想視点Ｐｖから見た仮想全天球画像を生成することが可能となる。この仮想全天球画像をＨＤＭ８２によって、ユーザが見たい場面の方向へ視線を向けることにより、あたかもコート１０内の仮想視点１１のからフットサルの試合を観戦することが可能となる。 Next, the cut-out images S _i−1 , S _i , S _{i + 1} are subjected to image conversion using the affine transformation matrices A _i−1 , A _i , A _{i + 1} obtained in advance, and the partial images S ′ _i−1 , S ′ _i and S ′ _{i + 1} are generated. Then, a synthesis process is performed with the omnidirectional image B that has been captured in advance. At this time, instead of arranging the partial images so as to overlap each other, a seam carving process is applied so that the joint between the partial images and the partial images is not noticeable. By trying to synthesize like this. A virtual omnidirectional image viewed from the virtual viewpoint Pv can be generated. By directing a line of sight toward the scene that the user wants to see with this HDM 82, the virtual omnidirectional image can be viewed as if it were a futsal game from the virtual viewpoint 11 in the court 10.

ここで、本実施形態において用いるシームカービングについて説明する。シームの定義は、視覚的に重要でない画素を連結した画像列であり、これをシームと呼ぶ。画像の上端から下端を結ぶように連結したシームを縦シームという。シームカービングでは横シームもあるが、ここでは、縦シームのみを用いるため、縦シームについて説明する。シームカービングは、画像を拡大縮小する際に用いる技術であり、シームを削除することで、画像を縮小することができる。また、このシームを複製することにより画像の拡大にも適用可能である。シームカービングでは、視覚的に重要でないシームを検出することが重要である。これを行うために、画像のエネルギーという考えを導入する。基本的なアイデアは、人間の目に敏感に感じる画素はそのエネルギーが大きいとすることである。各画素について、エネルギーの値を付与した画像をエネルギー画像という。 Here, seam carving used in the present embodiment will be described. The definition of a seam is an image sequence in which pixels that are not visually important are connected, and this is called a seam. A seam connected so as to connect the upper end to the lower end of an image is called a vertical seam. Seam carving also has a horizontal seam, but since only the vertical seam is used here, the vertical seam will be described. Seam carving is a technique used when enlarging or reducing an image, and the image can be reduced by deleting the seam. Moreover, it is applicable also to the expansion of an image by duplicating this seam. In seam carving, it is important to detect seams that are not visually important. To do this, we introduce the idea of image energy. The basic idea is that pixels that are sensitive to the human eye have high energy. An image to which energy values are assigned for each pixel is referred to as an energy image.

図８は、シームを求める動作を示す模式図である。まずシームが定義されていない部分画像に対して、エネルギー計算（図８（Ｂ））を行う。この結果、エネルギー画像（図８（Ｃ））が得られることになる。このエネルギー画像に対して、公知のシームカービング処理のうち、縦シームを求める処理を行うことによって縦シームが得られる。シームカービングの処理については公知の技術（公知文献３参照）であるため、ここでは詳細な説明を省略するが、一般に累積エネルギーの低い境界がシームとして求められることになる。
公知文献３: Avidan, Shai, and Ariel Shamir. "Seam carving for content-aware image resizing." ACM Transactions on graphics (TOG). Vol. 26. No. 3. ACM, 2007. FIG. 8 is a schematic diagram showing an operation for obtaining a seam. First, energy calculation (FIG. 8B) is performed on a partial image in which no seam is defined. As a result, an energy image (FIG. 8C) is obtained. A vertical seam is obtained by performing a process for obtaining a vertical seam among the known seam carving processes on the energy image. Since the seam carving process is a known technique (refer to publicly known document 3), a detailed description is omitted here, but generally a boundary with low accumulated energy is obtained as a seam.
Known Document 3: Avidan, Shai, and Ariel Shamir. "Seam carving for content-aware image resizing." ACM Transactions on graphics (TOG). Vol. 26. No. 3. ACM, 2007.

そして、エネルギー画像上において求めたシームを元の部分画像上に重ね合わせる（図８（Ａ））ことによって、継ぎ目として用いることができるシームを定義することが可能となる。 Then, a seam that can be used as a seam can be defined by superimposing the seam obtained on the energy image on the original partial image (FIG. 8A).

次に、図９を参照して、２つの部分画像を結合する処理について説明する。図９は、２つの部分画像を結合する処理を示す説明図である。前述したように、２つの部分画像１、２は、その一部が重複している領域（これを境界領域という）を有している。この境界領域について、部分領域１、２のいずれかの境界領域部分を用いて、シームカービング処理の縦シームを求める処理のみを行うことによって、縦シームを求める。そして、求めた縦シームを２つの部分画像１、２を結合するための結合線として用いる。ここでは、縦シームを例にして説明したが、結合する画像が上下に並べられている場合は、横シームを求めることになる。すなわち、２つの画像を結合する際に並べた方向と直交する方向のシームを求めることになる。このようにすることにより、前述したブレンディング処理が不要になるとともに、目で気づきにくい場所に結合線を設定するようにしたため、２つの部分画像１、２を結合する際に自然に結合を行うことが可能となる。 Next, processing for combining two partial images will be described with reference to FIG. FIG. 9 is an explanatory diagram showing processing for combining two partial images. As described above, the two partial images 1 and 2 have a region where this part overlaps (this is called a boundary region). With respect to this boundary region, the vertical seam is obtained by performing only the processing for obtaining the vertical seam of the seam carving process using any one of the partial regions 1 and 2. The obtained vertical seam is used as a connecting line for connecting the two partial images 1 and 2. Here, a vertical seam has been described as an example. However, when images to be combined are arranged vertically, a horizontal seam is obtained. That is, a seam in a direction orthogonal to the direction in which two images are arranged is obtained. By doing so, the blending process described above becomes unnecessary, and the joining line is set at a place where it is difficult to notice with the eyes, so that the two partial images 1 and 2 can be joined naturally. Is possible.

しかし、シームカービングによって、単に結合線（継ぎ目）を求めて結合しただけでは、以下のような問題が生じる。すなわち、視聴者が注目している領域に部分画像の境界（継ぎ目）があると目に付きやすいという問題である。そこで、本実施形態では、視聴者は人などの前景が動いていく方向を見ているはずであるという仮定の基に、人などの前景画像であるオブジェクトが移動する方向にはシームを作らないようにした。図１０は、前景の進行方向に基づいてエネルギー画像を生成して処理を行う例を示す図である。視聴者は、図１０（Ａ）に示すように、画像に映っている人の進行方向の領域（破線で囲んだ領域）に視線が向くため、その領域にシームを作ってしまうとシームに基づいて結合した結合線が目立つことになる。 However, the following problems arise when the seam carving is simply performed by finding a joint line (seam). That is, there is a problem that it is easy to be noticed when there is a boundary (seam) between partial images in an area that is viewed by the viewer. Therefore, in the present embodiment, based on the assumption that the viewer should be looking at the direction in which the foreground such as a person moves, no seam is created in the direction in which the object that is the foreground image such as a person moves. I did it. FIG. 10 is a diagram illustrating an example in which an energy image is generated and processed based on the traveling direction of the foreground. As shown in FIG. 10 (A), the viewer looks toward the area in the moving direction of the person shown in the image (the area surrounded by the broken line), and if a seam is created in that area, the viewer is based on the seam. As a result, the combined line becomes conspicuous.

これは、図１０（Ｂ）に示す対象の部分画像から単にエネルギー画像を生成したため、画像に映っている人の進行方向の領域にシームができてしまったためである。これに対して、図１０（Ｃ）のように、新たに第２のエネルギー画像を生成し、この第２のエネルギー画像を加算することで前景の進行方法にシームを作らないようにする。第２のエネルギー画像は、前景の動きに基づいて、シームを作りたくない領域（ここでは、前景が進行しようとしている方向の領域）のエネルギーが高くなうようにしたエネルギー画像である。 This is because an energy image is simply generated from the target partial image shown in FIG. 10B, and a seam is formed in the region in the direction of movement of the person shown in the image. On the other hand, as shown in FIG. 10C, a second energy image is newly generated, and the second energy image is added so as not to make a seam in the foreground progression method. The second energy image is an energy image in which the energy of an area where the seam is not desired (here, the area in the direction in which the foreground is going to advance) is increased based on the foreground motion.

ここで、エネルギー画像の生成方法について説明する。図１１、図１２は、エネルギー画像の生成方法を示す説明図である。ある境界領域に関し、過去ｓ（≧１）フレームの画像を用いて各ピクセルに関してオプティカルフローを算出する（図１１（Ａ））。オプティカルフローを算出する手法については公知であるため、ここでは詳細な説明を省略する。 Here, an energy image generation method will be described. 11 and 12 are explanatory diagrams showing a method for generating an energy image. For a certain boundary region, an optical flow is calculated for each pixel using an image of the past s (≧ 1) frames (FIG. 11A). Since the method for calculating the optical flow is known, detailed description thereof is omitted here.

次に、先に求めたオプティカルフローについて、位置と方向に関して、Ｋ−ｍｅａｎｓなどを用いてクラスタリングする（図１１（Ｂ））。続いて、各クラスタについて、エネルギー画像を求める。エネルギー画像は境界領域と同じサイズの画像であるとする。図１２のようにクラスタＫ_ｉ（ｊはクラス番号。１≦ｊ）に含まれるオプティカルフローの平均をそのクラスタのオプティカルフローをｆ_ｉとし、Ｋ_ｉの重心をｇ_ｉとすると、平均（ｇ_ｉ＋Δｖ）のｘ座標、分散１の正規分布にしたがってエネルギー画像を作成する。 Next, the optical flow obtained previously is clustered using K-means or the like regarding the position and direction (FIG. 11B). Subsequently, an energy image is obtained for each cluster. It is assumed that the energy image is an image having the same size as the boundary region. As shown in FIG. 12, if the average of the optical flows included in the cluster K _i (j is the class number, 1 ≦ j) is the optical flow of the cluster f _i and the center of gravity of K _i is g _i , the average (g _i An energy image is created according to a normal distribution with an x coordinate of + Δv) and a variance of 1.

なお、Δｖ＝αｆ_ｉであり、正規分布の横軸は図１２のようにｆ_ｉと平行であるとする。αは係数であり、適当な数値を与えてよい。ただし、クラスタＫ_ｉ付近に存在する移動物体が次時刻にｇ_ｉ＋Δｖ付近に来るような値を与えることが望ましい。また、エネルギーの最大値ｅ_ｍａｘは予め定め（２５５など）、正規分布で得られるエネルギーの最大値がｅ_ｍａｘと等しくなるように正規化する。なお、ｅ_ｍａｘはｅ_ｍａｘ＝Ｎｆ_ｉ＋Ｐｆ_ｉのように、Ｋ_ｉに含まれるオプティカルフローの数（Ｎｆ_ｉ）やオプティカルフローの大きさ（Ｐｆ_ｉ）によって変化させてもよい（図１１（Ｃ））。 Note that Δv = αf _i and the horizontal axis of the normal distribution is parallel to f _i as shown in FIG. α is a coefficient, and an appropriate numerical value may be given. However, it is desirable to give a value such that a moving object existing in the vicinity of the cluster K _i comes near g _i + Δv at the next time. In addition, the maximum value e _{max of} energy is determined in advance (such as 255), and is normalized so that the maximum value of energy obtained by the normal distribution is equal to e _max . Note that e _max may be changed according to the number of optical flows (Nf _i ) included in K _i and the magnitude (Pf _i ) of optical flows as e _max = Nf _i + Pf _i (FIG. 11 (C )).

次に、各クラスタで作成したエネルギー画像を足し合わせ、境界領域のエネルギー画像とする。なお、図１１（Ｃ））で求めたｆ_ｉはクラスタＫ_ｉに含まれるオプティカルフローの最大値、最小値、中央値などでもよい。 Next, the energy images created in each cluster are added to obtain an energy image of the boundary region. Note that f _i obtained in FIG. 11C may be a maximum value, a minimum value, a median value, or the like of the optical flows included in the cluster K _i .

次に、図１３を参照して、図９に示す２つの部分画像を結合する動作を説明する。図１３は、図９に示す２つの部分画像を結合する動作を示すフローチャートである。まず、境界決定部３９は、エネルギー画像を前述した方法によって生成する（ステップＳ３０１）。 Next, an operation for combining the two partial images shown in FIG. 9 will be described with reference to FIG. FIG. 13 is a flowchart showing an operation of combining the two partial images shown in FIG. First, the boundary determination part 39 produces | generates an energy image by the method mentioned above (step S301).

次に、境界決定部３９は、２つのエネルギー画像を足し合わせて、第３のエネルギー画像を生成する（ステップＳ３０２）。そして、境界決定部３９は、シームカービング処理によって、足し合わされたエネルギー画像から縦シームを求める（ステップＳ３０３）。この処理によって、図９（Ｂ）に示す縦シームが定義されることになる。 Next, the boundary determination unit 39 adds the two energy images to generate a third energy image (step S302). Then, the boundary determining unit 39 obtains a vertical seam from the added energy image by the seam carving process (step S303). By this processing, the vertical seam shown in FIG. 9B is defined.

次に、境界決定部３９は、求めた縦シームを用いて境界を決定する（ステップＳ３０４）。続いて、境界決定部３９は、求めた境界に基づいて、部分画像の切り出しを行う（ステップＳ３０５）。この画像切り出しは、２つの部分画像に対して行う。そして、境界決定部３９は、２つの結合線を重ね合わせるようにして２つの部分画像を結合する（ステップＳ３０６）。 Next, the boundary determination unit 39 determines a boundary using the obtained vertical seam (step S304). Subsequently, the boundary determination unit 39 cuts out the partial image based on the obtained boundary (step S305). This image clipping is performed on two partial images. Then, the boundary determination unit 39 combines the two partial images so as to overlap the two combined lines (step S306).

次に、境界決定部３９は、未だ処理していない境界領域があるか否かにより処理が終了であるか否かを判定し（ステップＳ３０７）。未処理の境界領域があれば処理を繰り返す。 Next, the boundary determination unit 39 determines whether or not the processing is completed depending on whether or not there is a boundary region that has not yet been processed (step S307). If there is an unprocessed boundary area, the process is repeated.

このような処理によって、人などの前景が移動する方向に部分画像の継ぎ目ができることを防止することができるため、視聴品質の低下を抑制した仮想全天球画像を生成することができる。 By such processing, it is possible to prevent a partial image from being jointed in the direction in which the foreground such as a person moves, and thus it is possible to generate a virtual omnidirectional image that suppresses a decrease in viewing quality.

＜第２実施形態＞
次に、本発明の第２の実施形態による画像処理装置を説明する。第２の実施形態では、第１の実施形態と同様に、ブレンディング処理を行うのではなく、部分画像の合成時に目に付きにくい境界とするものである。前述したように、シームカービングによって、単に継ぎ目を求めて結合しただけでは、以下のような問題が生じる。すなわち、視聴者が注目している領域（視線方向）に部分画像の境界（継ぎ目）があると目に付きやすいという問題である。そこで、本実施形態では、視聴者が注目している方向（視聴者が見ている可能性が高い領域であり、視野のほぼ中央のことである）にはシームを作らないようにする。図１４は、向いている方向で重み付けを行う例を示す図である。視聴者は、図１４（Ａ）に示すように、視線の方向にシームを作ってしまうとシームに基づいて結合した結合線が目立つことになる。 Second Embodiment
Next, an image processing apparatus according to a second embodiment of the present invention will be described. In the second embodiment, as in the first embodiment, a blending process is not performed, but a boundary that is not easily noticed when combining partial images is used. As described above, the following problems occur when seam carving simply seeks and joins seams. That is, there is a problem that it is easy to be noticed when there is a boundary (seam) between partial images in an area (line-of-sight direction) that is viewed by the viewer. Therefore, in the present embodiment, a seam is not created in the direction in which the viewer is paying attention (the region that is likely to be viewed by the viewer, which is approximately the center of the field of view). FIG. 14 is a diagram illustrating an example in which weighting is performed in the facing direction. As shown in FIG. 14A, if the viewer creates a seam in the direction of the line of sight, the combined line based on the seam becomes conspicuous.

これは、図１４（Ｂ）に示す対象の部分画像から単にエネルギー画像を生成したため、画像に映っている人の左側にシームができてしまったためである。これに対して、図１４（Ｃ）のように、新たに第２のエネルギー画像を生成し、この第２のエネルギー画像を加算することで視線の方向にシームを作らないようにする。第２のエネルギー画像は、シームを作りたくない領域（ここでは、視聴者が見ている可能性が高い領域のことであり、視野のほぼ中央の領域）のエネルギーが高くなるようにしたエネルギー画像である。 This is because the energy image is simply generated from the target partial image shown in FIG. 14B, and a seam is formed on the left side of the person shown in the image. On the other hand, as shown in FIG. 14C, a second energy image is newly generated, and the second energy image is added so as not to create a seam in the direction of the line of sight. The second energy image is an energy image in which the energy of a region where a seam is not desired (here, a region that is highly likely to be viewed by the viewer, that is, a region in the middle of the visual field) is increased. It is.

ここで、エネルギー画像の生成方法について説明する。向いている方向を表す情報として画像上の点Ｄ＝（ｄｘ，ｄｙ）が既知であり、点Ｄは境界領域に含まれているとする。この点Ｄは、ＨＭＤ８２の方向の情報から取得する。境界決定部３９は、平均ｄｘ、分散１の正規分布にしたがってエネルギー画像を作成する。このとき、正規分布の横軸は水平であるとする。なお、エネルギーの最大値ｅ_ｍａｘは予め定め、正規分布で得られるエネルギーの最大値がｅ_ｍａｘと等しくなるように正規化する。 Here, an energy image generation method will be described. It is assumed that the point D = (dx, dy) on the image is known as information indicating the direction in which the head is facing, and the point D is included in the boundary region. This point D is acquired from information on the direction of the HMD 82. The boundary determination unit 39 creates an energy image according to a normal distribution with an average dx and a variance of 1. At this time, the horizontal axis of the normal distribution is assumed to be horizontal. Note that the maximum value e _{max of} energy is determined in advance and normalized so that the maximum value of energy obtained by the normal distribution is equal to e _max .

第２実施形態における図９に示す２つの部分画像を結合する動作は、図１３に示す処理動作と同様であるのでここでは詳細な説明を省略する。 Since the operation of combining the two partial images shown in FIG. 9 in the second embodiment is the same as the processing operation shown in FIG. 13, a detailed description thereof is omitted here.

このような処理によって、視聴者の視線方向に部分画像の継ぎ目ができることを防止することができるため、視聴品質の低下を抑制した仮想全天球画像を生成することができる。 By such processing, it is possible to prevent a partial image from being jointed in the viewer's line-of-sight direction, and thus it is possible to generate a virtual omnidirectional image in which deterioration in viewing quality is suppressed.

＜第３の実施形態＞
次に、本発明の第３の実施形態による画像処理装置を説明する。第３の実施形態では、第１の実施形態と同様に、ブレンディング処理を行うのではなく、結合部（継ぎ目）が部分画像の中心から離れると幾何的整合性が崩れてしまうことを防止するための結合方法である。前述したように、シームカービングによって、単に継ぎ目を求めて結合しただけでは、以下のような問題が生じる。すなわち、結合部が部分画像の中心から離れた位置に生成されてしまうと、画像の幾何的整合性が崩れてしまうという問題である。そこで、本実施形態では、整合性が崩れない位置に結合部を設けるようにする。図１５は、中心からの距離で重み付けを行う例を示す図である。本手法の性質上、中心に結合部（継ぎ目）を設定した方が幾何学的に正しい画像となる（図１５（Ａ））。 <Third Embodiment>
Next, an image processing apparatus according to a third embodiment of the present invention will be described. In the third embodiment, as in the first embodiment, the blending process is not performed, but the geometric consistency is prevented from being lost when the joint (seam) is separated from the center of the partial image. This is a combination method. As described above, the following problems occur when seam carving simply seeks and joins seams. That is, there is a problem in that the geometrical consistency of the image is lost if the combined portion is generated at a position away from the center of the partial image. Therefore, in this embodiment, the coupling portion is provided at a position where the consistency is not lost. FIG. 15 is a diagram illustrating an example in which weighting is performed based on the distance from the center. Due to the nature of this method, a geometrically correct image is obtained when the joint (seam) is set at the center (FIG. 15A).

したがって、従来の手法によって結合部（継ぎ目）を設定すると、画像の中心から離れた位置に結合部（継ぎ目）ができてしまう（図１５（Ｂ））。これに対して、図１５（Ｃ）のように、新たに第２のエネルギー画像を生成し、この第２のエネルギー画像を加算することで中心から離れた位置にシームを作らないようにする。第２のエネルギー画像は、シームを作りたくない領域（ここでは、中心から離れた位置の領域）のエネルギーが高くなうようにしたエネルギー画像である。 Therefore, when the connecting portion (seam) is set by the conventional method, the connecting portion (seam) is formed at a position away from the center of the image (FIG. 15B). On the other hand, as shown in FIG. 15C, a second energy image is newly generated, and the second energy image is added so as not to create a seam at a position away from the center. The second energy image is an energy image in which the energy of a region where a seam is not desired (here, a region away from the center) is increased.

ここで、エネルギー画像の生成方法について説明する。図１６は、エネルギー画像の生成方法を示す説明図である。図１６に示すようにカメラＣ_ｉとＣ_ｉ＋１で撮影した部分画像中で、仮想視点の位置をそれぞれｖ_ｉ，ｖ_ｉ＋１とし，ｖ_ｉ＋１とｖ_ｉの中点をｃ＝（ｃｘ，ｃｙ）とする。平均ｃｘ、分散１の正規分布にしたがってエネルギー画像を作成する。このとき、正規分布の横軸は水平であるとする。なお、エネルギーの最大値ｅ_ｍａｘは予め定め、正規分布で得られるエネルギーの最大値がｅ_ｍａｘと等しくなるように正規化する。 Here, an energy image generation method will be described. FIG. 16 is an explanatory diagram showing a method for generating an energy image. As shown in FIG. 16, in the partial images taken by the cameras C _i and C _{i + 1} , the positions of the virtual viewpoints are v _i and v _{i + 1} , respectively, and the midpoints of v _{i + 1} and v _i are c = (cx, cy). To do. An energy image is created according to a normal distribution with mean cx and variance 1. At this time, the horizontal axis of the normal distribution is assumed to be horizontal. Note that the maximum value e _{max of} energy is determined in advance and normalized so that the maximum value of energy obtained by the normal distribution is equal to e _max .

第３実施形態における図９に示す２つの部分画像を結合する動作は、図１３に示す処理動作と同様であるのでここでは詳細な説明を省略する。 The operation of combining the two partial images shown in FIG. 9 in the third embodiment is the same as the processing operation shown in FIG.

このような処理によって、部分画像の継ぎ目が画像の中心付近に設定されるようになるため、幾何的整合性が崩れることを防止することができるため、視聴品質の低下を抑制した仮想全天球画像を生成することができる。 By such processing, the seam of the partial image is set near the center of the image, so that it is possible to prevent the geometric consistency from being lost. An image can be generated.

＜第４の実施形態＞
次に、本発明の第４の実施形態による画像処理装置を説明する。第４の実施形態では、第１の実施形態と同様に、ブレンディング処理を行うのではなく、結合部（継ぎ目）が部分画像の中心から離れると幾何的整合性が崩れてしまうことを防止するための結合方法である。前述したように、シームカービングによって、単に継ぎ目を求めて結合しただけでは、以下のような問題が生じる。すなわち、結合部が部分画像の中心から離れた位置に生成されてしまうと、画像の幾何的整合性が崩れてしまうという問題である。そこで、本実施形態では、整合性が崩れない位置に結合部を設けるようにする。図１７は、前景の動きの速さに基づいてエネルギーをも求める例を示す図である。本手法の性質上、中心に結合部（継ぎ目）を設定した方が幾何学的に正しい画像となる。 <Fourth Embodiment>
Next, an image processing apparatus according to a fourth embodiment of the present invention will be described. In the fourth embodiment, as in the first embodiment, the blending process is not performed, but the geometric alignment is prevented from being lost when the joint (seam) is separated from the center of the partial image. This is a combination method. As described above, the following problems occur when seam carving simply seeks and joins seams. That is, there is a problem in that the geometrical consistency of the image is lost if the combined portion is generated at a position away from the center of the partial image. Therefore, in this embodiment, the coupling portion is provided at a position where the consistency is not lost. FIG. 17 is a diagram illustrating an example in which energy is also obtained based on the speed of the foreground motion. Due to the nature of this method, a geometrically correct image is obtained when a joint (seam) is set at the center.

したがって、従来の手法によって結合部（継ぎ目）を設定すると、画像の中心から離れた位置に結合部（継ぎ目）ができてしまう（図１７（Ｂ））。これに対して、図１７（Ｃ）のように、新たに第２のエネルギー画像を生成し、この第２のエネルギー画像を加算することで中心から離れた位置にシームを作らないようにする。第２のエネルギー画像は、シームを作りたくない領域（ここでは、ゆっくり動く前景は避け、速く動く前景は避けないようにする）のエネルギーが高くなるようにしたエネルギー画像である。速く動いている前景（例えば、ボール）は二重像や消失が発生しても気づきにくいことを利用している。 Therefore, when the joint portion (seam) is set by the conventional method, the joint portion (seam) is formed at a position away from the center of the image (FIG. 17B). On the other hand, as shown in FIG. 17C, a second energy image is newly generated, and the second energy image is added so as not to create a seam at a position away from the center. The second energy image is an energy image in which the energy of a region where a seam is not desired (here, avoiding a slowly moving foreground and avoiding a fast moving foreground) is increased. The fast moving foreground (for example, a ball) takes advantage of the fact that it is difficult to notice even if a double image or disappearance occurs.

ここで、エネルギー画像の生成方法について説明する。ある境界領域に関し、過去ｓ（≧１）フレームの画像から各ピクセルに関してオプティカルフローを算出する。そして、求めたオプティカルフローについて、位置と方向に関してＫ−ｍｅａｎｓなどを適用してクラスタリングする。 Here, an energy image generation method will be described. For a certain boundary region, an optical flow is calculated for each pixel from an image of a past s (≧ 1) frame. Then, the obtained optical flow is clustered by applying K-means or the like regarding the position and direction.

次に、各クラスタについて、第３の実施形態と同様にクラスタＫ_ｊに含まれるオプティカルフローの平均をｆ_ｊとする。このとき、ｆ_ｊの大きさ（｜ｆ_ｊ｜）がしきい値ｔｖ以下の場合（すなわちゆっくり動く物体の場合）、前述の処理によって求めた時と同様にエネルギー画像を求める。一方、ｆ_ｊの大きさ（｜ｆ_ｊ｜）がｔｖ以上の場合（すなわち速く動く物体の場合）、このクラスタＫ_ｊに対するエネルギー画像は求めない。そして、各クラスタで作成したエネルギー画像を足し合わせ、重複領域のエネルギー画像とする。 Next, for each cluster, the average of the optical flows included in the cluster K _j is set to f _j as in the third embodiment. At this time, when the size of f _j (| f _j |) is equal to or less than the threshold value tv (that is, in the case of a slowly moving object), the energy image is obtained in the same manner as that obtained by the above-described processing. On the other hand, when the size of f _j (| f _j |) is equal to or larger than tv (that is, in the case of a fast moving object), an energy image for this cluster K _j is not obtained. Then, the energy images created in each cluster are added to obtain an energy image of the overlapping region.

第４実施形態における図９に示す２つの部分画像を結合する動作は、図１３に示す処理動作と同様であるのでここでは詳細な説明を省略する。 The operation of combining the two partial images shown in FIG. 9 in the fourth embodiment is the same as the processing operation shown in FIG.

このような処理によって、部分画像の継ぎ目が画像の中心付近に設定されるようになるため、幾何的整合性が崩れることを防止することができるため、視聴品質の低下を抑制した仮想全天球画像を生成することができる。特に、ゆっくり動く物体（前景）は避け、速く動く物体（前景）は、避けないようにしたため、視聴品質の低下を抑制した仮想全天球画像を生成することができる。 By such processing, the seam of the partial image is set near the center of the image, so that it is possible to prevent the geometric consistency from being lost. An image can be generated. In particular, since a slowly moving object (foreground) is avoided and a fast moving object (foreground) is avoided, a virtual omnidirectional image in which deterioration in viewing quality is suppressed can be generated.

以上説明したように、２つの部分画像を結合する際に、画像の継ぎ目を視聴者の目に気づかれない位置に設定するようにしたため、視聴品質の低下を抑制した仮想全天球画像を生成することができる。また、２つの部分画像を結合する際に、幾何的整合性が崩れることを防止するようにしたため、同様に視聴品質の低下を抑制した仮想全天球画像を生成することができる。 As described above, when joining two partial images, the joint of the images is set to a position that is not noticed by the viewer, so that a virtual omnidirectional image that suppresses degradation in viewing quality is generated. can do. Further, since the geometric consistency is prevented from being lost when the two partial images are combined, a virtual omnidirectional image in which deterioration of viewing quality is similarly suppressed can be generated.

前述した方法によって２つの部分画像を結合する処理を繰り返すことにより、図１８に示すような仮想全天球画像を得ることができるようになる。図１８は仮想全天球画像を生成した例を示す図である。図１８左は、従来技術による仮想全天球画像の例を示す。図１８右は、シームカービングの処理を適用して、結合部（継ぎ目）の位置を最適化したため、質の高い仮想伝天球画像が生成できているのが分かる。 By repeating the process of combining two partial images by the method described above, a virtual omnidirectional image as shown in FIG. 18 can be obtained. FIG. 18 is a diagram illustrating an example in which a virtual omnidirectional image is generated. The left side of FIG. 18 shows an example of a virtual omnidirectional image according to the prior art. The right side of FIG. 18 shows that a high-quality virtual celestial sphere image can be generated by applying the seam carving process and optimizing the position of the joint (seam).

図１９は、カメラの入力画像から切り出す部分画像の例を示す説明図である。図１９に示すように、左図では、被写体である人が３台のカメラ全てに映っていたため、３重像が発生してしまう。一方、右図のように、シームカービングの処理によって部分画像の領域を大きくすることができるようになるため、１つの部分画像内に被写体である人を収めることができ、３重像が発生することを防止することができる。 FIG. 19 is an explanatory diagram illustrating an example of a partial image cut out from an input image of the camera. As shown in FIG. 19, in the left figure, the person who is the subject is reflected in all three cameras, and thus a triple image is generated. On the other hand, as shown in the right figure, the area of the partial image can be enlarged by the seam carving process, so that the person who is the subject can be contained in one partial image, and a triple image is generated. This can be prevented.

前述した実施形態における画像処理装置の全部または一部をコンピュータで実現するようにしてもよい。その場合、この機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現してもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含んでもよい。また上記プログラムは、前述した機能の一部を実現するためのものであってもよく、さらに前述した機能をコンピュータシステムに既に記録されているプログラムとの組み合わせで実現できるものであってもよく、ＰＬＤ（Programmable Logic Device）やＦＰＧＡ（Field Programmable Gate Array）等のハードウェアを用いて実現されるものであってもよい。 You may make it implement | achieve all or one part of the image processing apparatus in embodiment mentioned above with a computer. In that case, a program for realizing this function may be recorded on a computer-readable recording medium, and the program recorded on this recording medium may be read into a computer system and executed. Here, the “computer system” includes an OS and hardware such as peripheral devices. The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system. Furthermore, the “computer-readable recording medium” dynamically holds a program for a short time like a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line. In this case, a volatile memory inside a computer system serving as a server or a client in that case may be included and a program held for a certain period of time. Further, the program may be for realizing a part of the functions described above, and may be a program capable of realizing the functions described above in combination with a program already recorded in the computer system. It may be realized using hardware such as PLD (Programmable Logic Device) or FPGA (Field Programmable Gate Array).

以上、図面を参照して本発明の実施の形態を説明してきたが、上記実施の形態は本発明の例示に過ぎず、本発明が上記実施の形態に限定されるものではないことは明らかである。したがって、本発明の技術思想及び範囲を逸脱しない範囲で構成要素の追加、省略、置換、その他の変更を行ってもよい。 As mentioned above, although embodiment of this invention has been described with reference to drawings, the said embodiment is only the illustration of this invention, and it is clear that this invention is not limited to the said embodiment. is there. Therefore, additions, omissions, substitutions, and other modifications of the components may be made without departing from the technical idea and scope of the present invention.

２つの画像を結合する際に、画像の継ぎ目を気づきにくい位置に配置するようにしたため、視聴品質の低下を抑制した仮想全天球画像を生成することが不可欠な用途にも適用できる。 When two images are combined, they are arranged at positions where it is difficult to notice the seam of the images, so that it is also possible to apply to applications where it is indispensable to generate a virtual omnidirectional image that suppresses degradation in viewing quality.

１０・・・コート、１１・・・仮想視点、１・・・画像処理システム、２・・・全天球カメラ、３・・・カメラ群、５・・・表示装置、３０・・・画像処理装置、２０・・・背景画像、６・・・画像サーバ、７・・・ネットワーク、８・・・視聴装置、８１・・・ユーザ端末、８２・・・ＨＭＤ、９・・・視聴システム、３１・・・オブジェクト解析部、３２・・・奥行取得部、３３・・・合成情報取得部、３４・・・画像入力部、３５・・・画像切り出し部、３６・・・画像合成部、３７・・・表示処理部、３８・・・入力部、３９・・・境界決定部、３０１・・・前景画像格納部、３０２・・・背景画像格納部、３０３・・・オブジェクト情報格納部、３０４・・・合成情報テーブル DESCRIPTION OF SYMBOLS 10 ... Coat, 11 ... Virtual viewpoint, 1 ... Image processing system, 2 ... Spherical camera, 3 ... Camera group, 5 ... Display apparatus, 30 ... Image processing Device 20 ... background image 6 ... image server 7 ... network 8 ... viewing device 81 ... user terminal 82 ... HMD 9 ... viewing system 31 ... Object analysis unit, 32 ... Depth acquisition unit, 33 ... Composition information acquisition unit, 34 ... Image input unit, 35 ... Image cutout unit, 36 ... Image composition unit, 37 Display processing unit 38 ... Input unit 39 ... Boundary determination unit 301 ... Foreground image storage unit 302 ... Background image storage unit 303 ... Object information storage unit 304 ..Composite information table

Claims

2 of the 1st image which is an image acquired from the 1st image, and the 2nd image which is an image acquired from the 2nd image, and is an image with which a part of image pick-up field is common to the 1st image. An image processing apparatus for determining a connection line for combining the two images at a common portion when combining two images,
An image processing apparatus comprising a combined line determination unit that determines to arrange the combined line while avoiding an area that is likely to be viewed by a person viewing the image.

2 of the 1st image which is an image acquired from the 1st image, and the 2nd image which is an image acquired from the 2nd image, and is an image with which a part of image pick-up field is common to the 1st image. An image processing apparatus for determining a connection line for combining the two images at a common portion when combining two images,
In the common part, a first energy image generating means for generating a first energy image in which a pixel that is sensitive to the human eye is defined as having a large energy;
A search for pixels with low energy in the first energy image, and a combination line determination means for determining to arrange the connection lines by obtaining a series of pixels with low energy in the common part; and
The coupling line determination means includes
An image processing apparatus that determines the coupling line while avoiding an area that is highly likely to be viewed by a person viewing the image.

An area detecting means for detecting an area viewed by a person viewing the image in the common part of the image;
Second energy image generation means for generating a second energy image in which the energy of the detected area is increased based on a detection result by the area detection means;
A third energy image generating means for generating a third energy image by adding the first energy image and the second energy image;
The image processing apparatus according to claim 2, wherein the coupling line determination unit determines the coupling line by obtaining a series of pixels having the low energy based on the third energy image.

The image processing apparatus according to claim 3, wherein the joining line determination unit uses a seam in a direction orthogonal to a direction arranged when joining the two images of the common portion obtained by the seam carving process as a joining line. .

The image processing apparatus according to claim 1, wherein the two images are used when generating a virtual omnidirectional image viewed from a virtual viewpoint provided at a predetermined position.

2 of the 1st image which is an image acquired from the 1st image, and the 2nd image which is an image acquired from the 2nd image, and is an image with which a part of image pick-up field is common to the 1st image. An image processing method performed by an image processing apparatus for determining a connection line for combining the two images at a common portion when combining two images,
An image processing method comprising: a joint line determination step for deciding to place the joint line while avoiding an area where a person who is viewing the image is likely to be viewing.

2 of the 1st image which is an image acquired from the 1st image, and the 2nd image which is an image acquired from the 2nd image, and is an image with which a part of image pick-up field is common to the 1st image. An image processing method performed by an image processing apparatus for determining a connection line for combining the two images at a common portion when combining two images,
In the common part, an energy image generating step for generating a first energy image in which a pixel that is sensitive to human eyes is defined as having high energy;
A bond line determining step for determining the bond line in the common part while searching for an image with a low energy of the first energy image;
The coupling line determination step includes:
An image processing method for determining the coupling line while avoiding an area that is highly likely to be viewed by a person viewing the image.

An image processing program for causing a computer to function as the image processing apparatus according to claim 1.