JP2017215706A

JP2017215706A - Video synthesis method, video acquisition device, video synthesis system, and computer program

Info

Publication number: JP2017215706A
Application number: JP2016108167A
Authority: JP
Inventors: 広太竹内; Kota Takeuchi; 明小島; Akira Kojima; 春美川村; Harumi Kawamura; 和樹岡見; Kazuki Okami
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2016-05-31
Filing date: 2016-05-31
Publication date: 2017-12-07

Abstract

PROBLEM TO BE SOLVED: To provide a video synthesis technique capable of precisely synthesizing free viewpoint video at real time without using any large size equipment.SOLUTION: A video synthesis method in an embodiment, includes: a signal correction processing step in which a series of correction processing is carried out on three-dimensional images of plural viewpoints; and a synthesize processing step in which a series of image synthesize processing is carried out to generate a three-dimensional image of virtual view points from three-dimensional images of plural view points after the correction processing in the signal correction processing step. The signal correction processing step is parallelly carried out on each three-dimensional image of the plural viewpoints responding to a request by the synthesis processing part.SELECTED DRAWING: Figure 1

Description

本発明は、複数視点の映像から自由視点映像を合成する映像合成技術に関する。 The present invention relates to a video synthesis technique for synthesizing a free viewpoint video from a video of a plurality of viewpoints.

バーチャルリアリティ（以下、ＶＲと称する）が近年急速に注目を集めており、ヘッドマウントディスプレイ（ＨＭＤ）をはじめとするＶＲ視聴デバイスや、ＶＲコンテンツの市場が開拓されつつある。ＶＲコンテンツの多くは、ヘッドマウントディスプレイ等を装着することにより、実世界における自身の頭の動き等を計測し、バーチャルな仮想世界における視線方向と連動させることで、従来のテレビ映像では不可能であった没入感を生み出すことができる。そのため、コンピュータグラフィックス（以下、ＣＧと称する）やアニメーションの世界への没入型の映像体験が可能となるため、映像エンタテインメント業界をはじめとする各種業界から高い注目を集めている。 In recent years, virtual reality (hereinafter referred to as VR) has attracted rapid attention, and the market for VR viewing devices such as a head-mounted display (HMD) and VR content is being developed. Many VR contents are impossible with conventional TV images by measuring the movement of the head in the real world by linking a head-mounted display, etc., and linking it with the line-of-sight direction in the virtual virtual world. It can create a immersive feeling. For this reason, an immersive video experience in the world of computer graphics (hereinafter referred to as CG) and animation is possible, and has attracted a great deal of attention from various industries including the video entertainment industry.

しかしながら、現状のほとんどのＶＲコンテンツは、ＣＧか若しくは全天球映像によるものである。全天球映像は、ＣＧに比べて写実的な表現ができるため、よりリアリティがあるものの、ユーザは全天球カメラが設置されたある１点から視線を動かすことはできるが、視点の移動をすることはできない。そのため、より没入感を高めるためには、自由視点映像の開発が必要であるといえる。 However, most current VR contents are CG or omnidirectional video. Although the omnidirectional video can be expressed more realistically than CG, it is more realistic, but the user can move the line of sight from one point where the omnidirectional camera is installed, but the viewpoint can be moved. I can't do it. Therefore, it can be said that it is necessary to develop a free viewpoint video in order to enhance the immersive feeling.

一方で、究極の映像メディアを追及する研究として、古くから自由視点映像の研究が数多くなされてきた。例えば非特許文献１では、複数のＴＯＦ（Time of Flight）カメラを利用した自由視点映像を行っているが、ＴＯＦセンサの距離データの補正方法が画像処理ベースのものであり、必ずしも正しい補正結果が得られるとは限らないという課題がある。さらに、ＴＯＦセンサが高解像度化してきており、撮影するために要求されるＰＣ（Personal Computer）のスペックが上がってきているという現状がある。このような状況において、非特許文献１のように、並列処理ができない方法では、自由視点映像をリアルタイムに実現することは難しい。 On the other hand, as a research to pursue the ultimate video media, a lot of research on free viewpoint video has been made since ancient times. For example, in Non-Patent Document 1, free viewpoint video using a plurality of TOF (Time of Flight) cameras is performed, but the correction method of the distance data of the TOF sensor is based on image processing, and a correct correction result is not necessarily obtained. There is a problem that it is not always obtained. Furthermore, the TOF sensor has been improved in resolution, and there is a current situation that the specifications of a PC (Personal Computer) required for photographing are increasing. In such a situation, it is difficult to realize a free viewpoint video in real time by a method that cannot perform parallel processing as in Non-Patent Document 1.

非特許文献２は、クロマキーをはじめ多くの機材を利用し、高画質な自由視点映像を実現するものである。しかしながら、特許文献２では、クロマキー等撮影機材の大規模化は、自由視点映像コンテンツ作成のハードルを上げてしまうというデメリットがある。 Non-Patent Document 2 realizes a high-quality free viewpoint video by using many devices including a chroma key. However, in Patent Document 2, an increase in the size of photographing equipment such as chroma key has a demerit that it raises the hurdle for creating free viewpoint video content.

Alexiadis, Dimitrios, Dimitrios Zarpalas, and Petros Daras. "Fast and smooth 3D reconstruction using multiple RGB-Depth sensors." Visual Communications and Image Processing Conference, 2014 IEEE. IEEE, 2014.Alexiadis, Dimitrios, Dimitrios Zarpalas, and Petros Daras. "Fast and smooth 3D reconstruction using multiple RGB-Depth sensors." Visual Communications and Image Processing Conference, 2014 IEEE. IEEE, 2014. A. Collet et al., “High-quality streamable free-viewpoint video,” ACM Transactions on Graphics, 34(4), 2015A. Collet et al., “High-quality streamable free-viewpoint video,” ACM Transactions on Graphics, 34 (4), 2015

上述のように、現状のリアルタイム自由視点映像技術においては、三次元形状復元におけるアルゴリズムが画像処理ベースであるため、形状の精確性に欠けてしまい、必ずしも正しい形状の復元ができるとは限らない。また、非特許文献１では、並列処理を行うことが困難であり、リアルタイムで高い精確性の三次元形状を生成するためには、機器が大型化するという課題がある。 As described above, in the current real-time free viewpoint video technology, the algorithm for 3D shape restoration is based on image processing, so that the accuracy of the shape is lacking and the correct shape cannot always be restored. Further, in Non-Patent Document 1, it is difficult to perform parallel processing, and in order to generate a highly accurate three-dimensional shape in real time, there is a problem that the device becomes large.

上述の課題を鑑み、本発明の目的は、大型の機材を用いることなく、リアルタイムに高い精確性で、自由視点映像を合成できる映像合成技術を提供することを目的とする。 In view of the above-described problems, an object of the present invention is to provide a video synthesis technique capable of synthesizing a free viewpoint video with high accuracy in real time without using a large-sized device.

本発明の一態様は、複数の視点の三次元画像に対して補正処理を行う信号補正処理ステップと、前記信号補正処理ステップにおいて補正処理された複数の視点の三次元画像から、仮想視点の三次元画像を生成する画像合成処理を行う合成処理ステップと、を有し、前記合成処理部の要求に応じて、前記複数の視点の三次元画像ごとに前記信号補正処理ステップを並列で実行する、映像合成方法である。 One aspect of the present invention is a signal correction processing step of performing correction processing on a three-dimensional image of a plurality of viewpoints, and a three-dimensional image of a virtual viewpoint from the three-dimensional images of the plurality of viewpoints corrected in the signal correction processing step. A synthesis processing step for performing an image synthesis process for generating an original image, and according to the request of the synthesis processing unit, the signal correction processing step is executed in parallel for each of the three-dimensional images of the plurality of viewpoints. This is a video composition method.

本発明の一態様は、上記の映像合成方法であって、前記信号補正処理ステップでは各視点の三次元画像に対して、光学系モデルに従った補正処理を行う映像合成方法である。 One aspect of the present invention is the video composition method described above, wherein the signal correction processing step performs a correction process according to an optical system model on a three-dimensional image at each viewpoint.

本発明の一態様は、上記の映像合成方法であって、前記信号補正処理ステップでは、カラー画像のレンズ歪を補正する第１の補正処理と、距離画像のレンズ歪を補正する第２の補正処理と、前記第２の補正処理によって補正された距離画像の奥行方向の歪を補正する第３の補正処理と、前記第３の補正処理によって補正された距離画像をカラーカメラ視点へと投影するマッピング処理と、を行う、映像合成方法である。 One aspect of the present invention is the video composition method described above, wherein in the signal correction processing step, a first correction process for correcting lens distortion of a color image and a second correction for correcting lens distortion of a distance image are performed. And a third correction process for correcting distortion in the depth direction of the distance image corrected by the second correction process, and a distance image corrected by the third correction process is projected onto the color camera viewpoint. And a mapping process.

本発明の一態様は、自装置の視点で取得される三次元画像に対して補正処理を行う信号補正処理部を備え、前記信号補正処理部は、自装置を含む複数の映像取得装置によって補正処理された各視点の三次元画像から、仮想視点の三次元画像を生成する画像合成処理を行う映像合成装置の要求に応じて、他の前記映像取得装置に並列して前記補正処理を実行する、映像取得装置である。 One aspect of the present invention includes a signal correction processing unit that performs a correction process on a three-dimensional image acquired from the viewpoint of the device, and the signal correction processing unit is corrected by a plurality of video acquisition devices including the device. The correction processing is executed in parallel with the other video acquisition devices in response to a request from a video synthesis device that performs image synthesis processing to generate a virtual viewpoint three-dimensional image from the processed three-dimensional images of each viewpoint. , A video acquisition device.

本発明の一態様は、複数の視点の三次元画像に基づいて、仮想視点の三次元画像を生成する画像合成処理を行う合成処理部を備え、前記合成処理部は、各視点の三次元画像に対する補正処理を、各視点の三次元画像を取得する複数の映像取得装置に対して並列に実行させる、映像合成装置である。 One aspect of the present invention includes a synthesis processing unit that performs an image synthesis process for generating a virtual viewpoint 3D image based on a plurality of viewpoint 3D images, and the synthesis processing unit includes a 3D image of each viewpoint. Is a video synthesizing device that causes a plurality of video acquisition devices that acquire a three-dimensional image of each viewpoint to execute correction processing for the viewpoint in parallel.

本発明の一態様は、複数の視点の三次元画像に対して補正処理を行う複数の信号補正処理部と、前記信号補正処理部により補正処理された複数の視点の三次元画像から、仮想視点の三次元画像を生成する画像合成処理を行う合成処理部と、を備え、前記複数の信号補正処理部は、前記合成処理部の要求に応じて前記補正処理を並列処理で行う、映像合成システムである。 One aspect of the present invention provides a virtual viewpoint from a plurality of signal correction processing units that perform correction processing on a three-dimensional image of a plurality of viewpoints, and a three-dimensional image of the plurality of viewpoints that are corrected by the signal correction processing unit. And a synthesis processing unit that performs an image synthesis process for generating a three-dimensional image, wherein the plurality of signal correction processing units perform the correction process in parallel processing according to a request of the synthesis processing unit. It is.

本発明の一態様は、複数の視点の三次元画像に対して補正処理を行う信号補正処理ステップと、前記信号補正処理ステップにおいて補正処理された複数の視点の三次元画像から、仮想視点の三次元画像を生成する画像合成処理を行う合成処理ステップと、前記合成処理ステップの処理に応じて、前記複数の視点の三次元画像ごとに前記信号補正処理ステップを並列で実行するステップと、をコンピュータに実行させるためのコンピュータプログラムである。 One aspect of the present invention is a signal correction processing step of performing correction processing on a three-dimensional image of a plurality of viewpoints, and a three-dimensional image of a virtual viewpoint from the three-dimensional images of the plurality of viewpoints corrected in the signal correction processing step. A composition processing step for performing an image composition processing for generating an original image, and a step of executing the signal correction processing step in parallel for each of the three-dimensional images of the plurality of viewpoints in accordance with the processing of the composition processing step. It is a computer program for making it run.

本発明によれば、信号補正処理部と合成処理部との並列処理により、精確な形状復元による自由視点による三次元形状をリアルタイムで生成できる。 According to the present invention, a three-dimensional shape based on a free viewpoint by accurate shape restoration can be generated in real time by parallel processing of a signal correction processing unit and a synthesis processing unit.

本発明の実施形態に係る映像合成システムの概要の構成図である。1 is a schematic configuration diagram of a video composition system according to an embodiment of the present invention. 本発明の実施形態に係る映像合成システムにおけるセンサの構成を示すブロック図である。It is a block diagram which shows the structure of the sensor in the video composition system which concerns on embodiment of this invention. 本発明の実施形態に係る映像合成システムにおいて信号補正処理部として用いるＰＣの機能ブロック図である。It is a functional block diagram of PC used as a signal correction process part in the image | video synthesis system which concerns on embodiment of this invention. 本発明の実施形態に係る映像合成システムにおいて合成処理部として機能するＰＣの機能ブロック図である。It is a functional block diagram of PC which functions as a composition processing part in a picture composition system concerning an embodiment of the present invention. 本発明の実施形態に係る映像合成システムの全体構成をブロック図である。1 is a block diagram showing the overall configuration of a video composition system according to an embodiment of the present invention. 本発明の実施形態に係る映像合成システムの全体動作を示すフローチャートである。It is a flowchart which shows the whole operation | movement of the image | video synthetic | combination system which concerns on embodiment of this invention. 本発明の実施形態に係る映像合成システムにおける入力部の具体的な処理を示すフローチャートである。It is a flowchart which shows the specific process of the input part in the video composition system which concerns on embodiment of this invention. 本発明の実施形態に係る映像合成システムにおける信号補正部の構成を示すブロック図である。It is a block diagram which shows the structure of the signal correction | amendment part in the image | video synthetic | combination system which concerns on embodiment of this invention. 本発明の実施形態に係る映像合成システムにおけるレンズ歪補正の説明図である。It is explanatory drawing of the lens distortion correction in the image | video synthetic | combination system which concerns on embodiment of this invention. 本発明の実施形態に係る映像合成システムにおける画像合成部の処理を示すフローチャートである。It is a flowchart which shows the process of the image composition part in the video composition system concerning the embodiment of the present invention.

以下、本発明の実施の形態について、図面を参照しながら説明する。図１は本発明の実施形態に係る映像合成システム１の概要の構成図である。図１に示すように、本発明の実施形態に係る映像合成システム１は、複数の視点の三次元画像を撮影するセンサ１０−１〜１０−４と、信号補正処理部１１−１〜１１−４と、合成処理部１２と、ネットワークスイッチ１３とから構成される。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. FIG. 1 is a schematic configuration diagram of a video composition system 1 according to an embodiment of the present invention. As shown in FIG. 1, a video composition system 1 according to an embodiment of the present invention includes sensors 10-1 to 10-4 that capture three-dimensional images from a plurality of viewpoints, and signal correction processing units 11-1 to 11-. 4, a composition processing unit 12, and a network switch 13.

センサ１０−１〜１０−４は、異なる位置に配置されており、各視点でオブジェクト２の撮影を行う。なお、センサ１０−１〜１０−４の数は任意であり、センサ１０−１〜１０−４を特定しない場合、センサ１０という。 The sensors 10-1 to 10-4 are arranged at different positions and shoot the object 2 from each viewpoint. The number of sensors 10-1 to 10-4 is arbitrary, and is referred to as sensor 10 when the sensors 10-1 to 10-4 are not specified.

図２は、センサ１０の構成を示すブロック図である。図２に示すように、センサ１０は、カラーカメラ１０１と、距離センサ１０２とを有している。距離センサ１０２としては、例えばＴＯＦ（Time Of Flight）センサが用いられる。センサ１０からは、カラー画像と、距離画像と、撮影時のカメラパラメータ及び奥行方向歪補正パラメータ等の各種パラメータとが出力される。 FIG. 2 is a block diagram illustrating the configuration of the sensor 10. As shown in FIG. 2, the sensor 10 includes a color camera 101 and a distance sensor 102. For example, a TOF (Time Of Flight) sensor is used as the distance sensor 102. The sensor 10 outputs a color image, a distance image, and various parameters such as a camera parameter at the time of shooting and a depth direction distortion correction parameter.

図１において、センサ１０−１〜１０−４は、それぞれ、信号補正処理部１１−１〜１１−４と接続される。信号補正処理部１１−１〜１１−４は、例えばＰＣ（Personal Computer）を用いて構成される。信号補正処理部１１−１〜１１−４のＬＡＮ（Local Area Network）端子は、ネットワークスイッチ１３に接続される。 In FIG. 1, sensors 10-1 to 10-4 are connected to signal correction processing units 11-1 to 11-4, respectively. The signal correction processing units 11-1 to 11-4 are configured using, for example, a PC (Personal Computer). LAN (Local Area Network) terminals of the signal correction processing units 11-1 to 11-4 are connected to the network switch 13.

図３は、信号補正処理部１１−１〜１１−４として用いるＰＣの機能ブロック図である。なお、信号補正処理部１１−１〜１１−４の数は任意であり、信号補正処理部１１−１〜１１−４を特定しない場合、信号補正処理部１１という。 FIG. 3 is a functional block diagram of a PC used as the signal correction processing units 11-1 to 11-4. The number of the signal correction processing units 11-1 to 11-4 is arbitrary, and is referred to as the signal correction processing unit 11 when the signal correction processing units 11-1 to 11-4 are not specified.

図３に示すように、信号補正処理部１１は、入力部１１１と、信号補正部１１２を有している。入力部１１１は、センサ１０を構成するカラーカメラ１０１及び距離センサ１０２から、１フレームのカラー画像及び距離画像、撮影時のカメラパラメータ、奥行方向歪補正パラメータ等の各種パラメータを取得し、信号補正部１１２へ出力する。ここで取得されたカラーカメラのフレーム画像及び距離センサのフレーム画像を、以下では、それぞれカラー原画像、距離原画像と称する。信号補正部１１２は、センサ１０からのカラー画像及び距離画像に対して、光学系モデルに従った補正処理を行う。すなわち、信号補正部１１２は、センサ１０からのカラー画像及び距離画像に対して、カラー画像のレンズ歪補正処理と、距離画像のレンズ歪補正処理と、奥行方向歪補正処理と、補正された距離画像をカラーカメラ視点へと投影するマッピング処理を行う。 As shown in FIG. 3, the signal correction processing unit 11 includes an input unit 111 and a signal correction unit 112. The input unit 111 acquires various parameters such as a color image and a distance image of one frame, a camera parameter at the time of shooting, and a depth direction distortion correction parameter from the color camera 101 and the distance sensor 102 that constitute the sensor 10, and a signal correction unit To 112. The frame image of the color camera and the frame image of the distance sensor acquired here are hereinafter referred to as a color original image and a distance original image, respectively. The signal correction unit 112 performs correction processing on the color image and the distance image from the sensor 10 according to the optical system model. That is, the signal correction unit 112 performs the lens distortion correction process for the color image, the lens distortion correction process for the distance image, the depth direction distortion correction process, and the corrected distance for the color image and the distance image from the sensor 10. Performs mapping processing to project an image to the color camera viewpoint.

図１において、合成処理部１２は、例えばＰＣを用いて構成される。合成処理部１２には、操作インターフェース機器１５と映像出力機器１６とが接続される。操作インターフェース機器１５及び映像出力機器１６としては、例えばヘッドマウントディスプレイ１８（ＨＭＤ）が用いられる。また、合成処理部１２のＬＡＮ端子は、ネットワークスイッチ１３に接続される。ネットワークスイッチ１３は、信号補正処理部１１−１〜１１−４と合成処理部１２との間で、データのやり取りを行う。 In FIG. 1, the composition processing unit 12 is configured using, for example, a PC. An operation interface device 15 and a video output device 16 are connected to the composition processing unit 12. As the operation interface device 15 and the video output device 16, for example, a head mounted display 18 (HMD) is used. The LAN terminal of the composition processing unit 12 is connected to the network switch 13. The network switch 13 exchanges data between the signal correction processing units 11-1 to 11-4 and the synthesis processing unit 12.

図４は、合成処理部１２として機能するＰＣの機能ブロック図である。図４に示すように、合成処理部１２は、送受信部１２１と、画像合成部１２２を備えている。送受信部１２１は、システムの起動時の現時刻のタイムコード、若しくは録画データ中の要求フレームのタイムコードを送信する。また、送受信部１２１は、補正処理された補正済みカラー画像、補正済み距離画像及びカメラパラメータを受信する。画像合成部１２２は、操作インターフェース機器１５により設定された仮想視点で、送受信部１２１を介して入力された画像から自由視点画像を合成し、この自由視点の合成画像を映像出力機器１６に出力する。 FIG. 4 is a functional block diagram of a PC that functions as the synthesis processing unit 12. As illustrated in FIG. 4, the composition processing unit 12 includes a transmission / reception unit 121 and an image composition unit 122. The transmission / reception unit 121 transmits the time code of the current time when the system is activated or the time code of the request frame in the recorded data. Further, the transmission / reception unit 121 receives the corrected color image, the corrected distance image, and the camera parameter that have been corrected. The image synthesis unit 122 synthesizes a free viewpoint image from an image input via the transmission / reception unit 121 with the virtual viewpoint set by the operation interface device 15, and outputs the synthesized image of the free viewpoint to the video output device 16. .

図５は、本発明の実施形態に係る映像合成システム１の全体構成をブロック図で示したものである。また、図６は本発明の実施形態に係る映像合成システム１の全体動作を示すフローチャートである。図５及び図６を用いて、本発明の実施形態に係る映像合成システム１の動作の概要を説明する。 FIG. 5 is a block diagram showing the overall configuration of the video composition system 1 according to the embodiment of the present invention. FIG. 6 is a flowchart showing the overall operation of the video composition system 1 according to the embodiment of the present invention. The outline of the operation of the video composition system 1 according to the embodiment of the present invention will be described with reference to FIGS. 5 and 6.

図５に示すように、本発明の実施形態に係る映像合成システム１は、信号補正処理部１１−１〜１１−４と合成処理部１２とを有している。信号補正処理部１１−１〜１１−４と合成処理部１２とは、ネットワークスイッチ１３を介して接続されている。信号補正処理部１１−１〜１１−４は、入力部１１１−１〜１１１−４と信号補正部１１２−１〜１１２−４とを備える。合成処理部１２は、送受信部１２１と画像合成部１２２を備える。 As shown in FIG. 5, the video composition system 1 according to the embodiment of the present invention includes signal correction processing units 11-1 to 11-4 and a composition processing unit 12. The signal correction processing units 11-1 to 11-4 and the synthesis processing unit 12 are connected via a network switch 13. The signal correction processing units 11-1 to 11-4 include input units 111-1 to 111-4 and signal correction units 112-1 to 112-4. The composition processing unit 12 includes a transmission / reception unit 121 and an image composition unit 122.

（ステップＳ１０１）図６にフローチャートで示すように、信号補正処理部１１−１〜１１−４の入力部１１１−１〜１１１−４は、合成処理部１２から要求されるタイムコードに基いて、各々接続されたセンサ１０−１〜１０−４から、複数の視点のフレーム画像、距離画像、撮影時のカメラパラメータ、及び奥行方向歪補正パラメータをそれぞれ取得し、信号補正部１１２−１〜１１２−４へと出力する。 (Step S101) As shown in the flowchart in FIG. 6, the input units 111-1 to 111-4 of the signal correction processing units 11-1 to 11-4 are based on the time code requested from the synthesis processing unit 12. From each of the connected sensors 10-1 to 10-4, frame images of a plurality of viewpoints, distance images, camera parameters at the time of shooting, and depth direction distortion correction parameters are acquired, respectively, and signal correction units 112-1 to 112- Output to 4.

（ステップＳ１０２）信号補正部１１２−１〜１１２−４は、カラー原画像のレンズ歪補正処理、距離原画像のレンズ歪補正処理、奥行方向歪補正処理、及び補正された距離画像をカラーカメラの視点へマッピングする処理を行う。 (Step S 102) The signal correction units 112-1 to 112-4 perform lens distortion correction processing of the color original image, lens distortion correction processing of the distance original image, depth direction distortion correction processing, and the corrected distance image of the color camera. Performs mapping to the viewpoint.

（ステップＳ１０３）信号補正部１１２−１〜１１２−４は、補正処理された補正済みカラー画像、補正済み距離画像及びカメラパラメータを、ネットワークスイッチ１３を介して、合成処理部１２へと送信する。合成処理部１２の送受信部１２１は、信号補正部１１２−１〜１１２−４から補正処理された補正済みカラー画像、補正済み距離画像及びカメラパラメータを受信すると、これらを画像合成部１２２に送る。 (Step S 103) The signal correction units 112-1 to 112-4 transmit the corrected color image, the corrected distance image, and the camera parameters that have been corrected to the synthesis processing unit 12 via the network switch 13. When the transmission / reception unit 121 of the synthesis processing unit 12 receives the corrected color image, the corrected distance image, and the camera parameter that have been corrected from the signal correction units 112-1 to 112-4, the transmission / reception unit 121 sends them to the image synthesis unit 122.

（ステップＳ１０４）画像合成部１２２は、操作インターフェース機器１５を利用して、仮想視点を設定する。 (Step S 104) The image composition unit 122 sets a virtual viewpoint using the operation interface device 15.

（ステップＳ１０５）画像合成部１２２は、信号補正処理部１１−１〜１１−４から送られてきた補正処理された補正済みカラー画像、補正済み距離画像及びカメラパラメータから、設定された仮想視点に基いて三次元画像を合成し、映像出力機器１６に対して仮想視点画像を出力する。 (Step S 105) The image composition unit 122 converts the corrected color image, the corrected distance image, and the camera parameter that have been sent from the signal correction processing units 11-1 to 11-4 to the set virtual viewpoint. Based on this, a three-dimensional image is synthesized and a virtual viewpoint image is output to the video output device 16.

（ステップＳ１０６）画像合成部１２２は、次フレームが取得可能か否かを判定し、次フレームが取得可能なら（ステップＳ１０６：Ｙｅｓ）、ステップＳ１０１に処理を戻す。次フレームが取得可能でなければ（ステップＳ１０６：Ｎｏ）、処理は終了となる。また、終了判定があるときも、処理は終了となる。 (Step S106) The image composition unit 122 determines whether or not the next frame can be acquired. If the next frame can be acquired (step S106: Yes), the process returns to step S101. If the next frame cannot be acquired (step S106: No), the process ends. Also, when there is an end determination, the process ends.

このように、本発明の実施形態に係る映像合成システム１は、信号補正処理部１１−１〜１１−４は、それぞれ、各視点のカラー画像のレンズ歪補正処理、距離画像のレンズ歪補正処理、奥行方向歪補正処理、補正された距離画像をカラーカメラ視点へと投影するマッピング処理を並列的に行っている。また、これらの処理と並列して、合成処理部１２は、各視点の仮想視点画像を合成する処理を行っている。これにより、精確な形状復元による自由視点映像をリアルタイムに実現することが可能となる。 As described above, in the video composition system 1 according to the embodiment of the present invention, the signal correction processing units 11-1 to 11-4 have the lens distortion correction process for the color image at each viewpoint and the lens distortion correction process for the distance image, respectively. Depth direction distortion correction processing and mapping processing for projecting the corrected distance image onto the color camera viewpoint are performed in parallel. In parallel with these processes, the synthesis processing unit 12 performs a process of synthesizing the virtual viewpoint images of the respective viewpoints. This makes it possible to realize a free viewpoint video by accurate shape restoration in real time.

次に、本発明の実施形態に係る映像合成システム１の各部の動作について詳細に説明する。まず、信号補正処理部１１−１〜１１−４の入力部１１１の動作について説明する。 Next, the operation of each unit of the video composition system 1 according to the embodiment of the present invention will be described in detail. First, the operation of the input unit 111 of the signal correction processing units 11-1 to 11-4 will be described.

図７は、入力部１１１の具体的な処理を示すフローチャートである。
（ステップＳ２０１）入力部１１１は、まず、カメラパラメータと奥行方向歪補正パラメータの設定を行う。カメラパラメータの定義は一般的なものを用いて良いため、詳細な説明は割愛するが、ここでは、焦点距離ｆｘ，ｆｙ、画像中心ｃｘ，ｃｙ、レンズ歪パラメータｋ１，ｋ２，ｋ３，ｑ１，ｑ２、３ｘ３の回転行列Ｒ、（１×３）の並進行列Ｔ、をカメラパラメータとする。カメラパラメータは、各カメラ、距離センサ毎に固有のものを設定することができる。カメラパラメータは事前に設定しておいても良いし、本システムの中で求めても良い。本システムの中で求める方法は、一般的な方法として、以下の文献に記載されているものを用いることができる。 FIG. 7 is a flowchart showing specific processing of the input unit 111.
(Step S201) The input unit 111 first sets camera parameters and depth direction distortion correction parameters. Since the general definition of the camera parameters may be used, the detailed description is omitted. Here, the focal lengths fx, fy, the image centers cx, cy, the lens distortion parameters k1, k2, k3, q1, q2 are omitted. A 3 × 3 rotation matrix R and a (1 × 3) parallel progression T are camera parameters. As camera parameters, unique parameters can be set for each camera and distance sensor. Camera parameters may be set in advance or may be obtained in the present system. The method described in the following document can be used as a general method in the system.

Zhang, Zhengyou. "A flexible new technique for camera calibration." Pattern Analysis and Machine Intelligence, IEEE Transactions on 22.11 (2000): 1330-1334 Zhang, Zhengyou. "A flexible new technique for camera calibration." Pattern Analysis and Machine Intelligence, IEEE Transactions on 22.11 (2000): 1330-1334

奥行方向歪補正パラメータは２つのスカラー値α，βであり、一般的なＴＯＦ（Time of Flight）センサを距離センサとして用いる場合であれば、αは０．９７から１．０３程度、βは−５０から５０程度の値を用いれば良いが、これは距離センサ毎に個別の値を設定することができる。これはいかなる方法で推定しても構わないが、例えば以下の文献に記載されているものを用いることができる。 Depth direction distortion correction parameters are two scalar values α and β. When a general TOF (Time of Flight) sensor is used as a distance sensor, α is about 0.97 to 1.03, and β is −. A value of about 50 to 50 may be used, but an individual value can be set for each distance sensor. This may be estimated by any method, but for example, those described in the following documents can be used.

LOUIE, Ashton, 竹内広太, and 伊藤直己. "Multiple Plane View Camera Calibration for RGB-D Sensor Rectification (画像工学)." 電子情報通信学会技術研究報告= IEICE technical report: 信学技報 115.350 (2015): 81-86. LOUIE, Ashton, Hirota Takeuchi, and Naomi Ito. "Multiple Plane View Camera Calibration for RGB-D Sensor Rectification (Image Engineering)." IEICE technical report: IEICE technical report 115.350 (2015): 81-86.

（ステップＳ２０２）次に、ユーザは、リアルタイムの自由視点映像を生成するのか、ＨＤＤ（Hard Disk Drive）等に録画されたデータを用いて自由視点映像を生成するのかを決定する。 (Step S202) Next, the user determines whether to generate a real-time free viewpoint video or to generate a free viewpoint video using data recorded in an HDD (Hard Disk Drive) or the like.

（ステップＳ２０３）ステップＳ２０２で、リアルタイムの自由視点映像を生成すると判定された場合、入力部１１１は、最初のフレームのカラー原画像及び距離原画像をセンサ１０から取得する。 (Step S203) When it is determined in step S202 that a real-time free viewpoint video is to be generated, the input unit 111 acquires the color original image and the distance original image of the first frame from the sensor 10.

（ステップＳ２０４）入力部１１１は、カラー原画像及び距離原画像にタイムコードを付与して、処理をステップＳ２０８に進める。 (Step S204) The input unit 111 assigns a time code to the color original image and the distance original image, and advances the processing to step S208.

（ステップＳ２０５）ステップＳ２０２で、録画されたデータを用いて自由視点映像を生成すると判定された場合、入力部１１１は、合成処理部１２の送受信部１２１からタイムコードを受信する。 (Step S205) If it is determined in step S202 that a free viewpoint video is generated using the recorded data, the input unit 111 receives a time code from the transmission / reception unit 121 of the synthesis processing unit 12.

（ステップＳ２０６）入力部１１１は、要求されるタイムコードに、最も近い時刻に相当するフレームのカラー画像及び距離画像を録画データから取得する。 (Step S206) The input unit 111 acquires a color image and a distance image of a frame corresponding to the time closest to the requested time code from the recorded data.

（ステップＳ２０７）そして、入力部１１１は、カラー原画像及び距離原画像にタイムコードを付与して、処理をステップＳ２０８に進める。 (Step S207) Then, the input unit 111 adds a time code to the color original image and the distance original image, and advances the processing to step S208.

（ステップＳ２０８）入力部１１１は、タイムコードを付与したカラー原画像及び距離原画像を信号補正部１１２へと出力する。また、入力部１１１は、設定したカメラパラメータも合わせて信号補正部１１２へと出力する。 (Step S 208) The input unit 111 outputs the color original image and the distance original image to which the time code is given to the signal correction unit 112. The input unit 111 also outputs the set camera parameters to the signal correction unit 112 together.

（ステップＳ２０９）入力部１１１は、送受信部１２１からタイムコードの要求があるか否かを判定し（ステップＳ２０９）、タイムコードの要求があれば（ステップＳ２０９：Ｙｅｓ）、ステップＳ２０２に処理を戻し、フレーム画像の読み込み処理を繰り返す。タイムコードの要求がなければ（ステップＳ２０９：Ｎｏ）、処理を終了する。 (Step S209) The input unit 111 determines whether or not there is a time code request from the transmission / reception unit 121 (step S209). If there is a time code request (step S209: Yes), the process returns to step S202. The frame image reading process is repeated. If there is no time code request (step S209: No), the process is terminated.

次に、信号補正処理部１１−１〜１１−４の信号補正部１１２の動作について説明する。信号補正部１１２は、カラー原画像のレンズ歪補正と、距離原画像のレンズ歪補正と、奥行方向歪補正と、マッピング処理を順次行うことにより、最終的に、歪補正済みカラー画像と、歪補正済み距離画像と、カメラパラメータを出力する。図８は、信号補正部の構成を示すブロック図である。図８に示すように、信号補正部１１２は、カラー画像レンズ歪補正部２０１と、距離画像レンズ歪補正部２０２と、奥行方向歪補正部２０３と、マッピング部２０４とを備える。 Next, the operation of the signal correction unit 112 of the signal correction processing units 11-1 to 11-4 will be described. The signal correction unit 112 sequentially performs the lens distortion correction of the color original image, the lens distortion correction of the distance original image, the depth direction distortion correction, and the mapping process, so that the distortion corrected color image and the distortion are finally obtained. The corrected distance image and camera parameters are output. FIG. 8 is a block diagram illustrating a configuration of the signal correction unit. As shown in FIG. 8, the signal correction unit 112 includes a color image lens distortion correction unit 201, a distance image lens distortion correction unit 202, a depth direction distortion correction unit 203, and a mapping unit 204.

カラー画像レンズ歪補正部２０１は、カラー原画像に対して、カメラパラメータのうちの焦点距離ｆｘ，ｆｙ，画像中心ｃｘ，ｃｙ，レンズ歪パラメータを用いて、一般的な画像のレンズ歪補正処理を行う。一般的な方法であるため、詳細な説明は割愛するが、ここではカラー原画像中の画素（ｕ，ｖ）が、レンズ歪補正により、新たに（ｕ’，ｖ’）へとマッピングされるものとする。 The color image lens distortion correction unit 201 performs a general image lens distortion correction process on the original color image using the focal lengths fx and fy, the image centers cx and cy, and the lens distortion parameters among the camera parameters. Do. Since this is a general method, detailed description is omitted, but here, the pixel (u, v) in the color original image is newly mapped to (u ′, v ′) by lens distortion correction. Shall.

距離画像レンズ歪補正部２０２は、距離原画像に対して、レンズ歪補正処理を行う。カラー原画像に対する処理のみを距離画像に対して行ってしまうと、距離画像の画素値が精確に正しいものではなくなってしまう。これは、距離画像の各画素値が格納している値は、被写体表面上までの光路長さではなく、奥行方向のみの距離の値であることが理由である。そのため、信号補正部１１２においては、距離画像の画素値に誤りが生じないようなレンズ歪補正処理を距離画像レンズ歪補正部２０２にて行う。 The distance image lens distortion correction unit 202 performs lens distortion correction processing on the distance original image. If only the processing for the color original image is performed on the distance image, the pixel value of the distance image is not accurately correct. This is because the value stored in each pixel value of the distance image is not the optical path length to the surface of the subject but the distance value only in the depth direction. Therefore, in the signal correction unit 112, the distance image lens distortion correction unit 202 performs lens distortion correction processing so that an error does not occur in the pixel value of the distance image.

図９は、レンズ歪補正の説明図であり、図９（Ａ）は、距離画像歪補正の説明図である。図９（Ａ）において、距離画像レンズ歪補正部２０２では、カラー原画像と同様に、各画素（ｕ，ｖ）に対して、レンズ歪処理を行うことにより、（ｕ’，ｖ’）へと移動を行う。（ｕ，ｖ）は原画像上での画素の位置を示すための二次元ベクトルであり、ｕ，ｖはそれぞれ水平，垂直方向の各要素を示すためのスカラー値である。そして、距離原画像中の画素（ｕ，ｖ）における画素値をｄ、距離原画像中の画素ｐ’における画素値をｄ’と定義する。そして，ｄ’の値を以下の式（１）により決定する。 FIG. 9 is an explanatory diagram of lens distortion correction, and FIG. 9A is an explanatory diagram of distance image distortion correction. In FIG. 9A, the distance image lens distortion correction unit 202 performs the lens distortion processing on each pixel (u, v), similarly to the color original image, to (u ′, v ′). And move. (U, v) is a two-dimensional vector for indicating the position of the pixel on the original image, and u and v are scalar values for indicating each element in the horizontal and vertical directions, respectively. The pixel value at the pixel (u, v) in the distance original image is defined as d, and the pixel value at the pixel p ′ in the distance original image is defined as d ′. Then, the value of d 'is determined by the following equation (1).

図９（Ｂ）は、奥行方向歪補正の説明図である。図９（Ｂ）において、奥行方向歪補正部２０３は、奥行方向歪補正パラメータα，βと距離画像レンズ歪補正部２０２で補正された距離画像（以下，レンズ歪補正済み距離画像と称する）を用いて、距離画像の奥行方向の歪補正を行う。図９（Ｂ）では、距離画像レンズ歪補正部２０２での定義と同様に、画素（ｕ’，ｖ’）における画素値ｄ’に対してｄ’’へと補正するものとし、その決定方法は以下の式（２）を用いる。 FIG. 9B is an explanatory diagram of depth direction distortion correction. In FIG. 9B, the depth direction distortion correction unit 203 uses the depth direction distortion correction parameters α and β and the distance image corrected by the distance image lens distortion correction unit 202 (hereinafter referred to as a lens distortion corrected distance image). It is used to correct distortion in the depth direction of the distance image. In FIG. 9B, the pixel value d ′ in the pixel (u ′, v ′) is corrected to d ″ as defined by the distance image lens distortion correction unit 202, and its determination method Uses the following equation (2).

式（２）において、パラメータα及びパラメータβはいずれも公知技術を用いて求めることができる。パラメータα及びパラメータβを求めるための公知技術の例として、例えば以下に示す文献に開示された技術がある。 In Expression (2), both the parameter α and the parameter β can be obtained using a known technique. As an example of a known technique for obtaining the parameter α and the parameter β, there is a technique disclosed in the following literature, for example.

Lachat, E., et al. "First experiences with kinect v2 sensor for close range 3d modelling." The International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences 40.5 (2015): 93. Lachat, E., et al. "First experiences with kinect v2 sensor for close range 3d modeling." The International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences 40.5 (2015): 93.

マッピング部２０４は、奥行方向歪補正部２０３において補正された距離画像を、カラーカメラ視点へと投影し、補間処理を行うことによりカラー画像と解像度も視点も等しい距離画像を生成する。距離画像中の各画素（ｕ’，ｖ’）の画素の距離値ｄ’’と、距離センサのカメラパラメータを利用して、一般的な透視投影の逆変換を行うことで、カラーカメラがなすカメラ座標空間における三次元的な点の位置を算出することができる。この三次元点を、カラーカメラのカメラパラメータを用いて，カラーカメラの画像空間へと投影することにより、（ｕ’，ｖ’）と対応するカラーカメラ画像座標（ｉ，ｊ）を得ることができる。このような投影処理により、カラーカメラのカメラ原点を原点とする距離画像を取得することができる。しかし、一般的には距離画像センサの解像度はカラーカメラの解像度に比べて低いため、投影された（ｉ，ｊ）どうしでの補間処理が必要になる。この補間処理については、一般的な方法を用いてかまわないが、例えば、事前に距離画像空間内でメッシュを構築し、メッシュ単位での補間処理を行う方法を採ることができる。このようにして得られるカラーカメラ視点における距離画像のことを、歪補正済み距離画像とする。そして、マッピング部２０４は、送受信部１２１へと歪補正済みカラー画像、歪補正済み距離画像、カメラパラメータを出力する。歪補正済み距離画像は既にカラーカメラの視点からの画像へと変換されているため、距離センサカメラパラメータは以後不要となるため、ここで出力するカメラパラメータには含まれない。また、上記のとおり、カラーカメラ画像についてもレンズ歪補正処理を施しているため、レンズ歪パラメータも不要となる。そのため、送受信部１２１へと出力するカメラパラメータとは、カラーカメラパラメータの中からレンズ歪パラメータを除いたものだけである。 The mapping unit 204 projects the distance image corrected by the depth direction distortion correction unit 203 onto the color camera viewpoint, and generates a distance image having the same resolution and viewpoint as the color image by performing an interpolation process. A color camera performs the inverse transformation of general perspective projection using the distance value d ″ of each pixel (u ′, v ′) in the distance image and the camera parameter of the distance sensor. The position of a three-dimensional point in the camera coordinate space can be calculated. By projecting these three-dimensional points onto the image space of the color camera using the camera parameters of the color camera, the color camera image coordinates (i, j) corresponding to (u ′, v ′) can be obtained. it can. By such projection processing, a distance image having the origin of the color camera as the origin can be acquired. However, since the resolution of the distance image sensor is generally lower than that of the color camera, interpolation processing between the projected (i, j) is required. For this interpolation processing, a general method may be used. For example, a method of constructing a mesh in a distance image space in advance and performing interpolation processing in units of mesh can be employed. The distance image at the viewpoint of the color camera obtained in this way is set as a distortion corrected distance image. Then, the mapping unit 204 outputs the distortion corrected color image, the distortion corrected distance image, and the camera parameter to the transmission / reception unit 121. Since the distortion-corrected distance image has already been converted into an image from the viewpoint of the color camera, the distance sensor camera parameter is no longer necessary and is not included in the camera parameter output here. Further, as described above, since the lens distortion correction processing is performed on the color camera image, the lens distortion parameter is not necessary. Therefore, the camera parameters output to the transmission / reception unit 121 are only those obtained by removing the lens distortion parameter from the color camera parameters.

次に、合成処理部１２の送受信部１２１における処理について説明する。送受信部１２１は、画像合成部１２２が要求するタイムコードを各信号補正処理部１１−１〜１１−４に含まれる入力部１１１−１〜１１１−４へと転送する。さらに、送受信部１２１は、各信号補正処理部１１−１〜１１−４に含まれる信号補正部１１２−１〜１１２−４から出力される歪補正済みカラー画像と歪補正済み距離画像とカメラパラメータを受信し、これらを同一フレームのデータとして画像合成部１２２へと出力する。送受信部１２１は、これらの処理をネットワーク経由で行うものである。 Next, processing in the transmission / reception unit 121 of the synthesis processing unit 12 will be described. The transmission / reception unit 121 transfers the time code requested by the image synthesis unit 122 to the input units 111-1 to 111-4 included in each of the signal correction processing units 11-1 to 11-4. Further, the transmission / reception unit 121 includes a distortion-corrected color image, a distortion-corrected distance image, and a camera parameter output from the signal correction units 112-1 to 112-4 included in the signal correction processing units 11-1 to 11-4. Are output to the image composition unit 122 as data of the same frame. The transmission / reception unit 121 performs these processes via a network.

図１０は、画像合成部１２２の処理を示すフローチャートである。
（ステップＳ４０１）まず、画像合成部１２２は、背景の三次元点群データの生成を行う。背景の三次元点群データについては、事前に本システムを用いて撮影されたデータから生成しても良いし、ＣＧなどの非実写空間の三次元モデルを用いても良い。事前に本システムを用いて撮影されたデータから背景の三次元点群データを生成する方法については、以下の装置内における点群の生成方法と同じであるため、後で説明をする。 FIG. 10 is a flowchart showing the processing of the image composition unit 122.
(Step S401) First, the image composition unit 122 generates background three-dimensional point cloud data. The background three-dimensional point cloud data may be generated from data photographed in advance using the present system, or a three-dimensional model of a non-photographing space such as CG may be used. A method for generating background three-dimensional point cloud data from data captured using the present system in advance is the same as the point cloud generation method in the following apparatus, and will be described later.

合成処理部１２は、映像を出力するための映像出力機器１６としてヘッドマウントディスプレイ１８を備えているものとする。また、合成処理部１２は、操作インターフェース機器１５が接続されているものとする。操作インターフェース機器１５は、ヘッドマウントディスプレイ１８のヘッドトラッキング機能を利用するものとする。これにより、ユーザのリアルな視点位置と視線方向を、仮想カメラ視点位置と視線方向に反映することができるため、ユーザが自由視点映像の空間内に入り込んだような映像を提示することができるようになる。また、操作インターフェース機器１５は、三次元空間中での移動が可能であれば、その他の機器を用いても構わない。操作インターフェース機器１５としては、例えば、キーボードとマウス、ゲームコントローラなどを用いても良い。映像出力機器１６としても、通常のディスプレイを用いても構わない。その際はユーザの頭部などにトラッキングセンサ類を装着することにより、上記と同様の効果を得ることができる。 The composition processing unit 12 is assumed to include a head mounted display 18 as a video output device 16 for outputting video. Further, it is assumed that the operation interface device 15 is connected to the composition processing unit 12. The operation interface device 15 uses the head tracking function of the head mounted display 18. As a result, the user's realistic viewpoint position and line-of-sight direction can be reflected in the virtual camera viewpoint position and line-of-sight direction, so that the user can present an image that appears as if entering the free viewpoint video space. become. The operation interface device 15 may use other devices as long as it can move in the three-dimensional space. As the operation interface device 15, for example, a keyboard and mouse, a game controller, or the like may be used. As the video output device 16, a normal display may be used. In that case, the same effects as described above can be obtained by mounting tracking sensors on the user's head or the like.

（ステップＳ４０２）ユーザが仮想視点のカメラ位置と向き、映像を視聴したいタイムコードの指定を行う。仮想視点のカメラ位置と向きについては、上述したように、ヘッドマウントディスプレイ１８のヘッドトラッキングにより入力するか、若しくは、ゲームコントローラ等によりユーザが適宜操作できるものとする。タイムコードに関しては、シークバーの利用や早送り機能などにより、ユーザが映像を見たい時刻を適宜変更できるものとする。 (Step S402) The user designates the camera position and orientation of the virtual viewpoint and the time code for viewing the video. As described above, the camera position and orientation of the virtual viewpoint are input by head tracking of the head mounted display 18 or can be appropriately operated by the user using a game controller or the like. Regarding the time code, it is assumed that the time at which the user wants to view the video can be appropriately changed by using a seek bar or a fast-forward function.

（ステップＳ４０３）画像合成部１２２は、ユーザが指定した要求タイムコードを送受信部１２１へと出力し、送受信部１２１がこれに応じて入力部１１１へと要求し、返答が返ってきた歪補正済みカラー画像と、歪補正済み距離画像と、カメラパラメータを、全て受信する。 (Step S403) The image composition unit 122 outputs the requested time code specified by the user to the transmission / reception unit 121, and the transmission / reception unit 121 requests the input unit 111 accordingly, and the distortion has been corrected. A color image, a distortion corrected distance image, and camera parameters are all received.

（ステップＳ４０４）画像合成部１２２は、被写体の三次元点群データの生成を行う。各画素（ｉ，ｊ）における距離値ｄ’’と、カメラパラメータを利用して、一般的な透視投影の逆変換を行うことにより、各画素の三次元空間中の点の座標を得ることができる。そして、この各点に対して、画素（ｉ，ｊ）の色を付加することにより、色付きの三次元点群を得ることができる。これを全ての信号補正処理部１１−１〜１１−４のデータ毎に行うことにより、被写体の三次元点群データを生成することができる。なお、これらの処理は、画像中の全ての画素に対して行うこともできるが、一般的なＴＯＦ（Time of Flight）センサを距離センサとして用いる場合には、計測範囲の限界により距離の計測が行えていない領域が画像中に多数存在することが考えられる。また、背景モデルを事前に用意している場合には、シーン中の演者などの被写体のみの三次元点群データのみを表示したい場合も考えられる。このような場合に対しては，シーン中の三次元点群のうち、ユーザが事前に設定した範囲内に含まれる点群のみを残し、他を破棄する処理を加える。同様に、三次元点群のセグメンテーション手法を用いることで、一部の点群のみを残すこともできる。 (Step S404) The image composition unit 122 generates three-dimensional point cloud data of the subject. By performing inverse transformation of general perspective projection using the distance value d ″ at each pixel (i, j) and camera parameters, the coordinates of the point in the three-dimensional space of each pixel can be obtained. it can. A colored three-dimensional point group can be obtained by adding the color of the pixel (i, j) to each point. By performing this for each data of all the signal correction processing units 11-1 to 11-4, it is possible to generate the 3D point cloud data of the subject. These processes can also be performed for all pixels in the image. However, when a general TOF (Time of Flight) sensor is used as a distance sensor, distance measurement is performed due to the limit of the measurement range. It is conceivable that there are many regions that cannot be performed in the image. In addition, when a background model is prepared in advance, it may be possible to display only 3D point cloud data of only a subject such as a performer in a scene. For such a case, a process of leaving only the point group included in the range set in advance by the user from the three-dimensional point group in the scene and discarding others is added. Similarly, by using a 3D point cloud segmentation technique, it is possible to leave only some point clouds.

（ステップＳ４０５）次に、画像合成部１２２は、仮想視点カメラに対して、レンダリング処理を行う。まず、画像合成部１２２は、背景の三次元点群データと被写体の三次元点群のデータを、共通の三次元空間内へと重畳する。そして、画像合成部１２２は、これら全ての点群を、仮想視点カメラのカメラパラメータを利用して、仮想視点カメラに対してレンダリング処理を行う。レンダリンググ方法については，何を用いても構わないが、ここではポイントクラウドレンダリングを行う。そのほか、メッシュを構築した後に，一般的なレンダリング方法を用いても良い。 (Step S405) Next, the image composition unit 122 performs a rendering process on the virtual viewpoint camera. First, the image composition unit 122 superimposes the background three-dimensional point cloud data and the subject three-dimensional point cloud data in a common three-dimensional space. Then, the image composition unit 122 performs a rendering process on all the point groups with respect to the virtual viewpoint camera using the camera parameters of the virtual viewpoint camera. Any rendering method may be used, but point cloud rendering is performed here. In addition, a general rendering method may be used after the mesh is constructed.

（ステップＳ４０６）画像合成部１２２は、レンダリング画像を映像出力機器１６へと出力する。 (Step S 406) The image composition unit 122 outputs the rendered image to the video output device 16.

（ステップＳ４０７）画像合成部１２２は、ユーザからの終了指示があるか否かを判定する。ユーザからの終了指示がなければ（ステップＳ４０７：Ｎｏ）、ステップＳ４０８に処理を進める。 (Step S407) The image composition unit 122 determines whether there is an end instruction from the user. If there is no end instruction from the user (step S407: No), the process proceeds to step S408.

（ステップＳ４０８）画像合成部１２２は、描画対象フレームの指定、及び仮想視点カメラの位置と向きを設定する（ステップＳ４０８）。そして、ステップＳ４０２に処理を戻し、上記の処理を繰り返すものとする。 (Step S408) The image composition unit 122 sets the drawing target frame and sets the position and orientation of the virtual viewpoint camera (step S408). Then, the processing is returned to step S402, and the above processing is repeated.

なお、上述の例では、一枚の画像に対する処理について説明したが、複数の連続する画像に対して処理を繰り返すことで、動画像を処理することができる。なお、映像の全てのフレームに適用せずに、一部のフレームに対して本発明による処理を適用し、その他のフレームに対しては別の処理を適用しても構わない。 In the above-described example, processing for one image has been described. However, a moving image can be processed by repeating processing for a plurality of continuous images. Note that the processing according to the present invention may be applied to some frames without being applied to all frames of the video, and another processing may be applied to other frames.

なお、映像合成システム１の全部または一部の機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することにより各部の処理を行っても良い。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。
また、「コンピュータシステム」は、ＷＷＷシステムを利用している場合であれば、ホームページ提供環境（あるいは表示環境）も含むものとする。
また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含むものとする。また上記プログラムは、前述した機能の一部を実現するためのものであっても良く、さらに前述した機能をコンピュータシステムに既に記録されているプログラムとの組み合わせで実現できるものであっても良い。 A program for realizing all or part of the functions of the video composition system 1 is recorded on a computer-readable recording medium, and the program recorded on the recording medium is read into the computer system and executed. You may perform the process of each part. Here, the “computer system” includes an OS and hardware such as peripheral devices.
Further, the “computer system” includes a homepage providing environment (or display environment) if a WWW system is used.
The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system. Furthermore, the “computer-readable recording medium” dynamically holds a program for a short time like a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line. In this case, a volatile memory in a computer system serving as a server or a client in that case, and a program that holds a program for a certain period of time are also included. The program may be a program for realizing a part of the functions described above, or may be a program capable of realizing the functions described above in combination with a program already recorded in a computer system.

以上、本発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計変更等も含まれる。 The embodiment of the present invention has been described in detail with reference to the drawings. However, the specific configuration is not limited to this embodiment, and includes design changes and the like without departing from the gist of the present invention.

１：映像合成システム，１０：センサ，１１：信号補正処理部，１２：合成処理部，１３：ネットワークスイッチ，１５：操作インターフェース機器，１６：映像出力機器，１０１：カラーカメラ，１０２：距離センサ，１１１：入力部，１１２：信号補正処理部，１２１：送受信部，１２２：画像合成部，２０１：カラー画像レンズ歪補正部２０１，２０２：距離画像レンズ歪補正部，２０３：奥行方向歪補正部，２０４：マッピング部 1: video composition system, 10: sensor, 11: signal correction processing unit, 12: synthesis processing unit, 13: network switch, 15: operation interface device, 16: video output device, 101: color camera, 102: distance sensor, 111: input unit, 112: signal correction processing unit, 121: transmission / reception unit, 122: image synthesis unit, 201: color image lens distortion correction unit 201, 202: distance image lens distortion correction unit, 203: depth direction distortion correction unit, 204: Mapping unit

Claims

A signal correction processing step for performing correction processing on a three-dimensional image of a plurality of viewpoints;
A synthesis processing step for performing an image synthesis process for generating a three-dimensional image of a virtual viewpoint from the three-dimensional images of the plurality of viewpoints corrected in the signal correction processing step;
Have
In response to a request from the synthesis processing unit, the signal correction processing step is executed in parallel for each of the three-dimensional images of the plurality of viewpoints.
Video composition method.

In the signal correction processing step, correction processing according to the optical system model is performed on the three-dimensional image of each viewpoint.
The video composition method according to claim 1.

In the signal correction processing step, a first correction process for correcting lens distortion of the color image, a second correction process for correcting lens distortion of the distance image, and the distance image corrected by the second correction process are performed. Performing a third correction process for correcting distortion in the depth direction and a mapping process for projecting the distance image corrected by the third correction process onto a color camera viewpoint;
The video composition method according to claim 2.

A signal correction processing unit that performs correction processing on a three-dimensional image acquired from the viewpoint of the own device,
The signal correction processing unit responds to a request from a video composition device that performs image composition processing for generating a three-dimensional image of a virtual viewpoint from three-dimensional images of each viewpoint corrected by a plurality of video acquisition devices including the device itself. And executing the correction process in parallel with the other video acquisition device,
Video acquisition device.

A synthesis processing unit that performs image synthesis processing for generating a virtual viewpoint three-dimensional image based on a plurality of viewpoint three-dimensional images,
The synthesis processing unit causes a plurality of video acquisition devices that acquire a three-dimensional image of each viewpoint to execute correction processing on the three-dimensional image of each viewpoint in parallel.
Video composition device.

A plurality of signal correction processing units for performing correction processing on a three-dimensional image of a plurality of viewpoints;
A synthesis processing unit that performs an image synthesis process for generating a virtual viewpoint three-dimensional image from a plurality of viewpoint three-dimensional images corrected by the signal correction processing unit;
With
The plurality of signal correction processing units perform the correction processing in parallel processing according to the request of the synthesis processing unit,
Video composition system.

A signal correction processing step for performing correction processing on a three-dimensional image of a plurality of viewpoints;
A synthesis processing step for performing an image synthesis process for generating a three-dimensional image of a virtual viewpoint from the three-dimensional images of the plurality of viewpoints corrected in the signal correction processing step;
Performing the signal correction processing step in parallel for each of the three-dimensional images of the plurality of viewpoints according to the processing of the combining processing step;
A computer program for causing a computer to execute.