WO2023181904A1

WO2023181904A1 - Information processing device, information processing method, and recording medium

Info

Publication number: WO2023181904A1
Application number: PCT/JP2023/008482
Authority: WO
Inventors: 健太郎深水
Original assignee: ソニーグループ株式会社
Priority date: 2022-03-24
Filing date: 2023-03-07
Publication date: 2023-09-28

Abstract

An image synthesis device (10), which corresponds to an example of an information processing device, is provided with a synthesis processing unit (12c) that uses an RGB image (L_RGB), a depth image (I_Dep), and an omnidirectional image (I_Sep) as inputs to perform a synthesis process for disposing CG data (D_Cg) in a three-dimensional space representing a real space corresponding to the RGB image (L_RGB). The synthesis processing unit (12c): generates a PY image (L_PY) using only captured image information measured from the inputs, and a VI image (L_VI) using the captured image information and the CG data (D_Cg); generates an ADR image (L_ADR) using an ADR method based on the difference between the shades of the PY image (L_PY) and the VI image (L_VI), and a QI image (L_QI) using a QI method based on the ratio between the shades; generates a shadow impact image (S_I) representing the change in light radiant energy between when only the captured image information is used and when the captured image information and the CG data (D_Cg) are used; and linearly combines the ADR image (L_)ADR) and the QI image (L_QI) using the shadow impact image (S_I).

Description

Information processing device, information processing method, and recording medium

The present disclosure relates to an information processing device, an information processing method, and a recording medium.

Traditionally, technology that processes and edits live-action footage using CG (computer graphics) or compositing processing, including VFX (visual effects), has been used to create screen effects that do not exist in reality in the video field such as movies and TV dramas. Contributing to the spread of the work.

At such VFX sites, in order to make the viewer feel that the CG virtual object to be synthesized exists in the live-action video, it is necessary to faithfully reproduce the appearance of the virtual object and even the changes in shadows caused by the compositing process. A unique synthesis process is required.

Conventionally, techniques such as the ADR (Additive Differential Rendering) method and the QI (Quotient Image) method have been known to perform high-quality calculations of such shadow changes (for example, see Non-Patent Documents 1 and 2). ).

However, the above-mentioned conventional technology has room for further improvement in improving the quality of synthesizing CG data with respect to a three-dimensional space representing a real space.

For example, the synthesis quality of the ADR method and the QI method has advantages and disadvantages depending on the situation of the actual background, so in many cases, the user needs to manually select which method to use depending on the content of the background. In addition, the algorithms of these methods assume that the shadow area cast by CG on the real image is only a planar shape, so it is difficult to apply shadows to areas where many three-dimensional shapes intersect in a complex manner and cause occlusion. When applied, there is also the problem that it is strongly influenced by noise and is reflected as an artifact.

Therefore, the present disclosure proposes an information processing device, an information processing method, and a recording medium that can further improve the quality of synthesizing CG data into a three-dimensional space representing a real space.

In order to solve the above problems, an information processing device according to an embodiment of the present disclosure inputs an RGB image, a depth image, and a spherical image and converts CG data into a three-dimensional space representing a real space corresponding to the RGB image. A compositing processing unit is provided that performs compositing processing for arranging. The synthesis processing unit generates a first rendered image using only the real-life information measured from the input, and a second rendered image using the real-life information and the CG data, and A first composite image by a first method based on the difference in shading between the rendered image and the second rendered image and a second composite image by a second method based on the ratio of the shading are generated, A shadow impact image representing a change in optical radiation energy between the case of using only the information and the case of using the live-action information and the CG data is generated, and the first composite image is generated using the shadow impact image. and the second composite image are linearly combined.

FIG. 1 is a schematic explanatory diagram of an image synthesis method according to an embodiment of the present disclosure. FIG. 1 is a block diagram illustrating a configuration example of an image synthesis device according to an embodiment of the present disclosure. FIG. 2 is a block diagram showing an example of the configuration of a composition processing section. FIG. 3 is a diagram showing an example of an output image. FIG. 3 is a diagram showing an example of an RGB image. It is a figure showing an example of CG data. It is a figure showing an example of a depth image. It is a figure showing an example of a PY image. FIG. 3 is a diagram showing an example of a VI image. It is a figure showing an example of a mask image. FIG. 3 is a diagram showing an example of an RPY image. It is a figure showing an example of an RVI image. FIG. 3 is a diagram showing an example of a shadow impact image. It is a figure showing an example of an ADR image. FIG. 2 is a diagram (part 1) showing an example of a QI image. FIG. 2 is a diagram (part 2) showing an example of a QI image. FIG. 2 is a diagram (part 1) showing an example of a light gap image. FIG. 2 is a diagram (part 2) showing an example of a light gap image. FIG. 7 is a diagram showing the effect of correction using a light gap image. 3 is a flowchart illustrating a processing procedure executed by the image compositing device. It is a flowchart which shows the processing procedure when outputting a still image. It is a flowchart showing a processing procedure when outputting a moving image. FIG. 2 is a hardware configuration diagram showing an example of a computer that implements the functions of an image synthesis device.

Below, embodiments of the present disclosure will be described in detail based on the drawings. In addition, in each of the following embodiments, the same portions are given the same reference numerals and redundant explanations will be omitted.

Furthermore, in the following description, it is assumed that the information processing device according to the embodiment of the present disclosure is the image composition device 10. Further, in the following, it is assumed that the information processing method according to the embodiment of the present disclosure is an image synthesis method.

Further, the present disclosure will be described according to the order of items shown below.
1. Overview 1-1. Technical issues of ADR method and QI method 1-2. Outline of image synthesis method according to embodiment of the present disclosure 2. Configuration of image synthesis device 3. Details of the compositing process executed by the compositing processing unit 4. Modification example 5. Hardware configuration 6. Conclusion

<<1. Overview >>
<1-1. Technical issues of ADR method and QI method>
First, prior to explaining the image synthesis method according to the embodiment of the present disclosure, technical issues of the ADR method and the QI method will be described in detail.

In order to perform synthesis processing using the ADR method or QI method, which is the prior art of the embodiment of the present disclosure, in addition to the CG data D _Cg to be synthesized, geometry that is a three-dimensional shape in real space, and reflections corresponding to the geometry are required. A real-space illumination map is required as input.

This information is obtained by processing the RGB image L _RGB captured in real space, the depth image I _Dep , and the spherical image I _Sep using general DCC (Digital Content Creation) tools such as Maya and Blender. be able to.

Note that the following description will be made on the premise that the quality of the geometry and reflectance obtained in this way are of low quality. Both the ADR method and the QI method generate three images, a PY image L _PY , a VI image L _VI , and a mask image M, using the aforementioned geometry, reflectance, illumination map, and CG data to be synthesized.

PY image L _PY is an image rendered using only actual photographic information measured from a depth image. VI image L _VI is an image rendered using both real-shot information and CG data to be synthesized. The mask image M is an image indicating an area where CG data to be synthesized exists. These images can be generated by rendering with the DCC tool described above.

The RGB image L _RGB , the PY image L _PY , the VI image L _VI and the mask image M are input to the synthesis algorithm in the ADR method and the QI method, and each of these algorithms has advantages and disadvantages in terms of their structure.

First, in the ADR method, an ADR image L _ADR is calculated based on the following equation (1).

The first term of equation (1) indicates that in the area where the CG data D _Cg exists, that is, when M=1, the VI image L _VI is displayed as is. Furthermore, the second term in equation (1) means adding a change in shading L _VI −L _PY caused by the compositing process of the CG data D _Cg to the input RGB image L _RGB .

As a result, even if the ranging quality is not sufficiently high, only the information on the difference in shadows can be extracted from the VI image _LVI , and high-quality shadows can be effectively added to the real background.

However, the pixel values of the VI image _LVI are non-negative. That is, since L _VI ≧0, the following formula (2) holds true in the second term of the ADR method.

In other words, in the ADR method, if L _RGB - L _PY is positive, the L _ADR pixel value cannot be made less than L _RGB - L _PY , and there is a lower limit to the range in which the actual photograph can be darkened by shading. There is a problem with doing so. In addition, due to the nature of equation (1) above, if the input reflectance is of low quality and has a large error with respect to the true value, the color distributions of the RGB image L _RGB and the PY image L _PY will diverge, causing artifacts. There is also the problem that this is reflected in the ADR image _LADR .

Next, in the QI method, a QI image L _QI is calculated based on the following equation (3).

Similar to the ADR method, the QI method can add shadows to the real background by multiplying the input RGB image L _RGB by the ratio of shadows L _VI /L _PY produced by the compositing process of the CG data D _Cg . can.

Furthermore, in the QI method, the second term of equation (3) is actually operated by the following equation (4) in order to prevent the synthesis result from becoming unstable due to division by zero.

Here, ε is a constant to prevent division by zero. At this time, L _QI is limited to a maximum of 1.01 times L _PY . For this reason, the QI method has a problem in that there is an upper limit to the range in which a live photograph can be brightened by, for example, combining processing of emitting CG data _DCg . In addition, due to the nature of equation (3), if the input depth image _IDep is of low quality and has a large error with respect to the true value, an operation extremely similar to division by zero will occur in the second term of equation (4). There is a high possibility that this will occur, and in that case, there is a problem that it will be reflected in the QI image _LQI as an artifact.

Finally, we will show technical issues common to both the ADR method and the QI method. In the synthesis process of equations (1) and (3) above, we assume that the real space geometry, reflectance, and illumination map have high distance measurement quality, and that equation (5) below holds true over the entire image. It guarantees high-quality shading composition.

However, as a premise, if the ranging qualities of geometry and reflectance are low, there may be an area in the image where equation (5) does not hold. Among these regions, the region where the PY image L _PY is much darker than the RGB image L _RGB , that is, the region where the following equation (6) is satisfied, particularly affects the synthesis quality.

In embodiments of the present disclosure, such a region is referred to as a light-gap region. As stated above, the light gap region occurs when the distance measurement quality of the depth image _IDep is low quality and the three-dimensional object shielding environment in real space cannot be accurately acquired. When performing compositing processing of CG data D _Cg on the light gap region, the following equation (7) automatically holds true unless a virtual object that emits light is included in the CG data D _Cg .

Therefore, in the second term of Equation (1) of the ADR method, Equation (8) below holds true, and the desired output cannot be expected in the region where shading is to be added to the RGB image L _RGB by the compositing process. .

Similarly, in the second term of Equation (4) of the QI method, Equation (9) below holds true, causing a phenomenon similar to that of the ADR method.

As described above, in the light gap region, there is a technical problem that shading cannot be added to the RGB image L _RGB by the synthesis process, and as a result, it is reflected in the ADR image L _ADR and the QI image L _QI as an artifact.

Summarizing the above, the technical issues of both the ADR method and the QI method are summarized as follows. First, in the ADR method, there are restrictions on the expression of darkening the RGB image L _RGB by the compositing process of the CG data D _Cg . In addition, the ADR method tends to cause artifacts due to the low quality ranging quality of reflectance. Furthermore, in the ADR method, artifacts in the light gap region become noticeable.

On the other hand, in the QI method, there are restrictions on the expression of brightening the RGB image L _RGB by the synthesis process of the CG data D _Cg . In addition, the QI method tends to cause artifacts due to the low ranging quality of the depth image _IDep . Furthermore, in the QI method, artifacts in the light gap region become noticeable.

<1-2. Overview of image synthesis method according to embodiment of the present disclosure>
FIG. 1 is a schematic explanatory diagram of an image synthesis method according to an embodiment of the present disclosure. In order to solve the above technical problem, in the image synthesis method according to the embodiment of the present disclosure, the image synthesis device 10 inputs the RGB image L _RGB , the depth image I _Dep , and the spherical image I _Sep and converts the RGB image L RGB into an RGB image L _RGB . A synthesis process is performed to arrange the CG data D _Cg in a three-dimensional space representing the corresponding real space.

As shown in FIG. 1, in this compositing process, the image compositing device 10 combines the PY image _LPY , which is a rendered image using only the actual photographic information measured from the input, and the aforementioned photographic information and CG data _DCG . A VI image _LVI , which is the rendered image used, is generated (steps S1-1, S1-2). PY image L _PY corresponds to an example of a "first rendered image." VI image L _VI corresponds to an example of a "second rendered image."

The image synthesis device 10 also generates an ADR image L ADR, which is a synthesized image based on the ADR method based on the difference in shading between the PY image L _PY and a VI image L _VI , and _a QI image, which is a synthesized image based on the QI method based on the ratio of the shadings. Images L and _QI are generated (steps S2-1 and S2-2). ADR image L _ADR corresponds to an example of a "first composite image." QI image L _QI corresponds to an example of a "second composite image."

The image synthesis device 10 also generates a shadow impact image _SI representing a change in light radiation energy between the case where only the above-mentioned real-time photograph information is used and the case where the above-mentioned live-action information and CG data _DCg are used. Step S3). The image synthesis device 10 also performs a linear combination of the ADR image _LADR and the QI image _LQI using the shadow impact image S _I to generate an LC image L _LC (step S4).

That is, in the image synthesis method according to the embodiment of the present disclosure, the RGB image L _RGB , the depth image I _Dep , and the spherical image I _Sep are input, and CG is created in a three-dimensional space representing the real space corresponding to the RGB image L _RGB . Performs synthesis processing to arrange data D _Cg . At that time, the ADR image L _ADR generated by the ADR method and the QI image L _QI generated by the QI method are linearly combined using the shadow impact image S _I generated by the new algorithm of the present disclosure, and The advantages of the and QI methods are adaptively mixed for each image pixel.

This eliminates the need for the user to manually select the ADR image L _ADR and the QI image L _QI . Furthermore, it is possible to eliminate artifacts in areas where occlusion occurs using the shadow impact image _SI , and to realize automatic synthesis processing in real space where many three-dimensional shapes intersect in a complex manner and occlusion occurs.

That is, according to the image synthesis method according to the embodiment of the present disclosure, it is possible to further improve the synthesis quality of CG data _DCg with respect to a three-dimensional space representing a real space. Hereinafter, a configuration example of the image synthesis apparatus 10 to which the image synthesis method according to the embodiment of the present disclosure is applied will be described in more detail.

<<2. Configuration of image synthesis device >>
FIG. 2 is a block diagram illustrating a configuration example of the image synthesis device 10 according to the embodiment of the present disclosure. Note that FIG. 2 and FIG. 3 shown later show only the constituent elements necessary for explaining the features of the embodiment of the present disclosure, and descriptions of general constituent elements are omitted.

In other words, each component illustrated in FIGS. 2 and 3 is functionally conceptual, and does not necessarily need to be physically configured as illustrated. For example, the specific form of distributing/integrating each block is not limited to what is shown in the diagram, and all or part of the blocks can be functionally or physically distributed/integrated in arbitrary units depending on various loads and usage conditions. It is possible to configure them in an integrated manner.

Furthermore, in the explanation using FIGS. 2 and 3, the explanation of components that have already been explained may be simplified or omitted.

As shown in FIG. 2, the image synthesis device 10 includes a storage section 11 and a control section 12. The storage unit 11 is realized by, for example, a semiconductor memory element such as a RAM (Random Access Memory), a ROM (Read Only Memory), or a flash memory, or a storage device such as a hard disk or an optical disk.

In the example shown in FIG. 2, the storage unit 11 stores geometry information 11a, reflectance information 11b, illumination map information 11c, and DCC tool program 11d. The geometry information 11a is information corresponding to the above-mentioned geometry. The reflectance information 11b is information corresponding to the above-mentioned reflectance. The illumination map information 11c is information corresponding to the aforementioned illumination map. The DCC tool program 11d is program data of the DCC tool.

The control unit 12 is a controller, and includes, for example, a CPU (Central Processing Unit), an MPU (Micro Processing Unit), a GPU (Graphics Processing Unit), etc. This is realized by executing the information processing program according to the embodiment using the RAM as a work area. Further, the control unit 12 can be realized by, for example, an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).

The control unit 12 includes an acquisition unit 12a, a conversion unit 12b, a composition processing unit 12c, and an output unit 12d, and realizes or executes information processing functions and operations described below.

The acquisition unit 12a acquires the RGB image L _RGB , the depth image I _Dep , the spherical image I _Sep , and the CG data D _Cg .

The converter 12b converts the RGB image _LRGB , the depth image _IDep , and the spherical image _ISep into geometry information 11a, reflectance information 11b, and illumination map information 11c that can be read by the DCC tool.

The synthesis processing unit 12c receives the converted geometry information 11a, reflectance information 11b, illumination map information 11c, CG data D _{, and} _RGB image L as input, and generates a three-dimensional space representing a real space corresponding to the _RGB image L. A compositing process is performed to arrange CG data D _Cg .

FIG. 3 is a block diagram showing a configuration example of the composition processing section 12c. As shown in FIG. 3, the composition processing section 12c includes a first generation section 12ca, a second generation section 12cb, a third generation section 12cc, a fourth generation section 12cd, a fifth generation section 12ce, and an output image. It has a generation unit 12cf.

The first generation unit 12ca reads the converted geometry information 11a, reflectance information 11b, illumination map information 11c, and CG data D _Cg using the DCC tool, and generates a PY image L _PY , a VI image L _VI , and a mask image within the DCC tool. M, generate an RPY image R _PY and an RVI image R _VI . The RPY image R _PY and the RVI image R _VI are images that are input when generating the shadow impact image S _I.

The second generation unit 12cb generates an ADR image L _{ADR and a QI image L QI using the generated PY image L PY} _, _VI image L _VI , mask image M, and RGB image L _RGB . The third generation unit 12cc receives the RPY image _RPY and the RVI image _RVI as input and generates a shadow impact image _SI in parallel with the second generation unit 12cb.

Then, the fourth generation unit 12cd receives the ADR image L _ADR , the QI image L _QI , and the shadow impact image S _I and generates the LC image L _LC . The fifth generation unit 12ce receives the RGB image L _RGB , the PY image L _PY , and the shadow impact image S _I in parallel with the fourth generation unit 12 cd and generates a light gap image w _g . Finally, the output image generation unit 12cf receives the VI image L _VI , LC image L _LC , and light gap image w _g as input, generates an output image L _end , and outputs it.

More detailed contents of the compositing process executed by the compositing processing unit 12c will be described later in the explanation using FIG. 4 and subsequent figures.

Returning to the explanation of FIG. 3. The output unit 12d outputs the output image L _end generated by the composition processing unit 12c to an external device such as a display device.

<<3. Details of the compositing process executed by the compositing processing unit >>
Next, the details of the compositing process executed by the compositing processing unit 12c will be explained while giving examples of each image using FIGS. 4 to 19.

FIG. 4 is a diagram showing an example of the output image L _end . Further, FIG. 5 is a diagram showing an example of an RGB image L _RGB . Further, FIG. 6 is a diagram showing an example of CG data _DCg . Further, FIG. 7 is a diagram showing an example of the depth image _IDep . In the following, an example will be given in which the output image L _end shown in FIG. 4 is finally output from the composition processing unit 12c after undergoing composition processing.

FIG. 4 shows an example in which the mannequin in the rear is a mannequin that exists in real space, and the person in the foreground is CG. Note that, below, the figures are appropriately simplified in order to make the explanation easier to understand. Therefore, the examples shown below do not limit the synthesis quality of the synthesis processing according to the embodiments of the present disclosure.

Further, the M1 portion in the diagram of FIG. 4 will be described later. When performing the synthesis process of the output image L _end shown in FIG. 4, the RGB image L _RGB obtained by the obtaining unit 12a is as shown in FIG. 5. Note that the M2 portion in the diagram of FIG. 5 will be described later.

Further, the CG data D _Cg similarly obtained by the obtaining unit 12a is as shown in FIG. Similarly, the depth image _IDep acquired by the acquisition unit 12a is as shown in FIG. Note that the spherical image I _Sep acquired by the acquisition unit 12a is omitted here.

The synthesis processing unit 12c generates a PY image L _PY , a VI image L _VI , a mask image M, and an RPY image R _PY based on these RGB image L _RGB , CG data D _Cg , depth image I _Dep , and spherical image I _Sep , generates an RVI image _RVI .

FIG. 8 is a diagram showing an example of the PY image _LPY . Further, FIG. 9 is a diagram showing an example of the _VI image LVI. Moreover, FIG. 10 is a diagram showing an example of the mask image M. Further, FIG. 11 is a diagram showing an example of the RPY image _RPY . Further, FIG. 12 is a diagram showing an example of the RVI image _RVI . Further, FIG. 13 is a diagram showing an example of a shadow impact image _SI .

When performing the synthesis process of the output image L _end shown in FIG. 4, the PY image L _PY becomes as shown in FIG. 8. Similarly, the VI image _LVI becomes as shown in FIG. Similarly, the mask image M becomes as shown in FIG.

By the way, in the embodiment of the present disclosure, the technical problems of the ADR method and the QI method are overcome by a two-step synthesis process. First, in the first stage of synthesis processing, a new algorithm generates a shadow impact image _SI , which can be called a weighting function image that detects areas in which the ADR method and the QI method are respectively good.

Here, the generation algorithm of the shadow impact image _SI will be explained. In this algorithm, an approximate value R of irradiance, which is the optical radiation energy per unit area, is calculated by the following equation (10).

Here, the above equation (11) is the intensity of illumination incident on the object plane from the direction vector ω i to the three-dimensional position x in real space corresponding to the pixel in the image, and is the intensity of the illumination that enters the object plane from the direction vector ω _i , and Calculated from I _Sep.

In addition, the above equation ₍ 12) shows that when looking at the direction vector ω _i from the three-dimensional position Visibility) function. In addition, the approximation R has no color component and is gray scaled.

In this algorithm, the RPY image R _PY obtained as a result of calculating irradiance using only the distance-measured real-photo information, and the result of calculating irradiance using both the distance-measured real-photo information and the CG data D _Cg to be synthesized. The resulting RVI images _RVI are respectively generated.

When performing the synthesis process of the output image L _end shown in FIG. 4, the RPY image _RPY becomes as shown in FIG. 11. Similarly, the RVI image _RVI becomes as shown in FIG.

Then, the composition processing unit 12c generates a shadow impact image _S1 using the RVI image _RVI and the RPY image _RPY according to the following equation (13).

Here, dilate() is a general dilation process applied to the entire image in order to reduce noise that cannot be canceled out due to the light gap. Then, the synthesis processing unit 12c performs a linear combination of the ADR image L _ADR and the QI image L _QI using the shadow impact image S _I according to the following equation (14) to generate an LC image L _LC .

When performing the synthesis process of the output image L _end shown in FIG. 4, the shadow impact image S _I becomes as shown in FIG. 13.

Although the explanation is complicated, specific examples of the drawbacks of the ADR method that appear in the ADR image L _ADR and the drawbacks of the QI method that appear in the QI image L _QI will be shown here. FIG. 14 is a diagram showing an example of the ADR image L _ADR . Further, FIG. 15 is a diagram (part 1) showing an example of QI image L _QI . Further, FIG. 16 is a diagram (part 2) showing an example of QI image L _QI .

As with FIG. 4, FIG. 14 is an ADR image _LADR in which the person is a CG image. In such a case, in the ADR image L _ADR , color disharmony or the like may appear in the vicinity of the step of the pillar, as shown, for example, in the M41 part of the M4 part in the real space.

Further, in the ADR image _LADR , color blurring or the like may appear at the boundary of the object, as shown in the M42 section of the M4 section, for example.

Furthermore, similarly to FIG. 14, it is assumed that FIG. 15 is a QI image _LQI when the person is a CG image. In such a case, inappropriate noise may appear in the QI image _LQI in a portion where geometry noise is large (lower end of the image), as shown, for example, in the M51 portion of the M5 portion in real space.

Furthermore, in contrast to FIGS. 14 and 15, FIG. 16 is a QI image _LQI in which the person is a real object and a CG of a character holding a light-emitting sword is synthesized next to the person. In such a case, in the QI image _LQI , a situation may occur in which an area that should be brightened does not become bright due to reflection from the CG light-emitting sword, as shown in the M61 part of the M6 part in the real space, for example.

In the above-mentioned first stage synthesis processing, the synthesis processing unit 12c combines the ADR image L _ADR and the QI image using the shadow impact image S _I so that the defects of the ADR image L _ADR and the QI image L _QI are eliminated. A linear combination of _LQI is performed to generate an LC image _LLC . Therefore, the LC image _LLC is generated as an image in which such defects are eliminated.

Next, in the second stage of compositing processing, the compositing processing unit 12c uses the shadow impact image _SI to remove artifacts in the light gap region. In this algorithm, a light gap image w _g indicating a light gap region is calculated using the following equation (15).

Here, saturate() is a function that limits the input range to 0 to 1, and opening() is a morphological operation for denoising. Furthermore, λ _g and λ _p are hyperparameters, and in the embodiment of the present disclosure, they are empirically set to λ _g =2.0 and λ _p =2.0 in all synthesis executions.

Finally, the synthesis processing unit 12c preferentially allocates the VI image _LVI to the light gap area, that is, the area where _wg is high, by linear combination, and outputs the final synthesis result from which artifacts have been removed. Image L _end is obtained as shown in equation (16) below.

FIG. 17 is a diagram (part 1) showing an example of the light gap image _wg . Further, FIG. 18 is a diagram (part 2) showing an example of the light gap image _wg .

FIG. 17 corresponds to the case where the synthesis process of the output image L _end shown in FIG. 4 is performed. In such a case, the light gap image _wg will be as shown in FIG. 17.

The M7 section shown in FIG. 17 corresponds to the M2 section of the RGB image _LRGB shown in FIG. Furthermore, the M7 section corresponds to the M3 section of the PY image _LPY shown in FIG.

FIG. 18 shows these M2 portion, M3 portion, and M7 portion arranged side by side for comparison. As shown in FIG. 18, it can be seen that the rendering result of the PY image _LPY is significantly different from the RGB image L _RGB in the space between the right arm and torso of the mannequin in the image. The composition processing unit 12c detects this portion as a light gap image _wg , as shown in FIG.

Further, FIG. 19 is a diagram showing the effect of correction using the light gap image _wg . FIG. 19 shows the M1 portion of the output image L _end shown in FIG. 4, the M1 _ADR portion of the ADR image L _ADR corresponding to the M1 portion, and the M1 _QI portion of the QI image L _QI corresponding to the M1 portion. They are arranged side by side for comparison.

Similar to FIG. 18, as shown in FIG. 19, it can be seen that in the space between the mannequin's right arm and torso where the light gap is large, artifacts are large in both the ADR image L _ADR and the QI image L _QI . For example, it can be seen that in the ADR image L _ADR , the portion that should be a shadow is too bright. It can also be seen that the QI image _LQI has color bleeding and disharmony.

On the other hand, it can be seen that in the output image L _end , the artifacts observed in the ADR image _LADR and the QI image L _QI are eliminated due to the effect of the correction using the light gap image w _g .

The effects of the synthesis processing algorithm according to the embodiment of the present disclosure as described above will be summarized. First, the shadow impact image S _I has a low pixel value in an area where the light radiant energy has increased due to the synthesis process of the CG data D _Cg , and has a high pixel value in an area where the light radiant energy has decreased.

This corresponds to a case where the RGB image L _RGB becomes brighter and a case where it becomes darker due to the composition processing of the CG data _DCg . Therefore, by assigning the ADR image L _ADR to an area with a low shadow impact and the QI image L _QI to an area with a high shadow impact, it is possible to automatically combine the advantages of both methods.

Further, information only about illumination and visibility is used to generate the RVI image _RVI and the RPY image _RPY , which are irradiance images. In addition to this, general CG methods also consider the direction of normal lines in space, but normal lines are susceptible to noise when the depth image _IDep is of low quality. Therefore, by not using normal information to generate the RVI image _RVI and the RPY image _RPY , it is possible to generate a filter that is not affected by noise originating from the low-quality depth image _IDep .

Next, the effect of removing artifacts using the shadow impact image _SI will be described. In this algorithm, the light gap region is defined using the above equation (15), and calculation and detection are performed. In the argument of the opening function in equation (15), the range measurement error expressed by the difference between the input RGB image L _RGB and the PY image L _PY is raised to the power of λ _p to identify areas where the range measurement error is significantly large. can be detected.

Furthermore, by multiplying the shadow impact image S _I in Equation (15), it is possible to limit the artifact removal process to only areas where there is a possibility that a change in pixel value will occur due to the composition process in the RGB image L _RGB .

Further, the light gap image w _g is not a binary image, and the light gap changes smoothly from a low region to a high region, so the light gap image w _g does not have sharp edges. As a result, when the VI image L _VI and the LC image L _LC are linearly combined and blended in the above equation (16), the synthesis process can be performed so that the joint of the blended images is not noticeable.

Next, the processing procedure executed by the image synthesis device 10 will be explained using FIG. 20. FIG. 20 is a flowchart showing the processing procedure executed by the image synthesis device 10.

As shown in FIG. 20, the acquisition unit 12a first acquires an RGB image L _RGB , a depth image I _Dep , a spherical image I _Sep , and CG data D _Cg (step S101). Then, the conversion unit 12b converts the RGB image L _RGB , the depth image I _Dep , and the spherical image I _Sep so that they can be read into the DCC tool (step S102). Then, the composition processing unit 12c reads the converted data and the CG data _DCg using the DCC tool (step S103).

Subsequently, the synthesis processing unit 12c generates a PY image L _PY , a VI image L _VI , a mask image M, an RPY image R _PY , and an RVI image R _VI within the DCC tool (step S104).

Then, the synthesis processing unit 12c generates the ADR image L ADR and the QI image L _QI using the PY image L _PY , the VI image L _VI , _{the mask image M, and the RGB image L RGB} ₍ step S105).

Further, in parallel with this, the synthesis processing unit 12c generates a shadow impact image S _I using the RPY image R _PY and the RVI image R _VI (step S106).

When steps S105 and S106 are completed, the synthesis processing unit 12c generates an LC image _LLC using the ADR image L _ADR , the QI image L _QI , and the shadow impact image S _I (step S107).

Further, in parallel with this, the synthesis processing unit 12c generates a light gap image w _g using the shadow impact image S _I , the RGB image L _RGB , and the PY image L _PY (step S108).

When steps S107 and S108 are completed, the synthesis processing unit 12c generates an output image L _end using the LC image L _LC and the light gap image w _g (step S109). Then, the output unit 12d outputs the output image L _end (step S110), and the process ends.

Next, FIG. 21 is a flowchart showing the processing procedure when outputting a still image. First, a user photographs an RGB image L _RGB with an RGB camera (step S201). Further, the user photographs a depth image I _Dep with a depth camera (step S202). Further, the user photographs a spherical image I _Sep with a spherical camera (step S203). Further, the user creates CG data _DCg using the DCC tool (step S204).

Then, the image synthesis device 10 inputs each data acquired in steps S201 to S204 and executes the image synthesis process shown in FIG. 20 (step S205).

Then, for example, the display device displays the output image L _end output from the image synthesis device 10 as a still image (step S206), and the process ends.

Next, FIG. 22 is a flowchart showing the processing procedure when outputting a moving image. Note that steps S301 to S304 in FIG. 22 are the same as steps S201 to S204 shown in FIG. 21, so the description thereof will be omitted here.

Subsequently, the user updates the CG data _DCg according to the frame number (step S305). Then, the image synthesis device 10 inputs each data acquired in steps S301 to S305, and executes the image synthesis process shown in FIG. 20 (step S306).

Then, the image synthesis device 10 determines whether the output image L _end outputted by the image synthesis process has reached a predetermined number of frames (step S307). Here, if the predetermined number of frames has not been reached (step S307, No), the frame number is updated (step S308), and the processing from step S305 is repeated.

On the other hand, if the predetermined number of frames has been reached (step S307, Yes), for example, the display device combines the output images L _end of each frame in time series and displays it as a moving image (step S309). Then, the process ends.

<<4. Modified example >>
Among the processes described in the embodiments of the present disclosure described above, all or part of the processes described as being performed automatically can be performed manually, or the processes described as being performed manually can be performed manually. All or part of the processing can also be performed automatically using known methods. In addition, information including the processing procedures, specific names, and various data and parameters shown in the above documents and drawings may be changed arbitrarily, unless otherwise specified. For example, the various information shown in each figure is not limited to the illustrated information.

Furthermore, each component of each device shown in the drawings is functionally conceptual, and does not necessarily need to be physically configured as shown in the drawings. In other words, the specific form of distributing and integrating each device is not limited to what is shown in the diagram, and all or part of the devices can be functionally or physically distributed or integrated in arbitrary units depending on various loads and usage conditions. Can be integrated and configured.

Furthermore, the embodiments of the present disclosure described above can be combined as appropriate in areas where the processing contents do not conflict. Further, the order of each step shown in the sequence diagram or flowchart of this embodiment can be changed as appropriate.

<<5. Hardware configuration >>
Further, the image synthesis apparatus 10 according to the embodiment of the present disclosure described above is realized by, for example, a computer 1000 having a configuration as shown in FIG. 23. FIG. 23 is a hardware configuration diagram showing an example of a computer 1000 that implements the functions of the image synthesis apparatus 10. Computer 1000 has CPU 1100, RAM 1200, ROM 1300, HDD (Hard Disk Drive) 1400, communication interface 1500, and input/output interface 1600. Each part of computer 1000 is connected by bus 1050.

The CPU 1100 operates based on a program stored in the ROM 1300 or the HDD 1400 and controls each part. For example, the CPU 1100 loads programs stored in the ROM 1300 or HDD 1400 into the RAM 1200, and executes processes corresponding to various programs.

The ROM 1300 stores boot programs such as BIOS (Basic Input Output System) that are executed by the CPU 1100 when the computer 1000 is started, programs that depend on the hardware of the computer 1000, and the like.

The HDD 1400 is a computer-readable recording medium that non-temporarily records programs executed by the CPU 1100 and data used by the programs. Specifically, HDD 1400 is a recording medium that records an information processing program according to an embodiment of the present disclosure, which is an example of program data 1450.

Communication interface 1500 is an interface for connecting computer 1000 to external network 1550 (for example, network N). For example, CPU 1100 receives data from other devices or transmits data generated by CPU 1100 to other devices via communication interface 1500.

The input/output interface 1600 is an interface for connecting the input/output device 1650 and the computer 1000. For example, the CPU 1100 receives data from an input device such as a keyboard or a mouse via the input/output interface 1600. Further, the CPU 1100 transmits data to an output device such as a display, speaker, or printer via an input/output interface 1600. Furthermore, the input/output interface 1600 may function as a media interface that reads programs and the like recorded on a predetermined recording medium. Media includes, for example, optical recording media such as DVD (Digital Versatile Disc) and PD (Phase change rewritable disk), magneto-optical recording media such as MO (Magneto-Optical disk), tape media, magnetic recording media, semiconductor memory, etc. It is.

For example, when the computer 1000 functions as the image composition device 10 according to the embodiment of the present disclosure, the CPU 1100 of the computer 1000 realizes the functions of the control unit 12 by executing a program loaded onto the RAM 1200. Further, the HDD 1400 stores an information processing program according to the present disclosure and data in the storage unit 11. Note that although the CPU 1100 reads and executes the program data 1450 from the HDD 1400, as another example, these programs may be obtained from another device via the external network 1550.

<<6. Conclusion >>
As described above, according to an embodiment of the present disclosure, the image synthesis device 10 (corresponding to an example of an "information processing device") generates an RGB image L _RGB , a depth image I _Dep , and a spherical image I _Sep. A composition processing unit 12c is provided which performs composition processing of arranging CG data _DCg in a three-dimensional space representing a real space corresponding to the RGB image L _RGB as an input.

The synthesis processing unit 12c generates a PY image L _PY (corresponding to an example of a "first rendered image") using only the real shot information measured from the above input, and a VI using the above real shot information and CG data D _Cg . An image _LVI (corresponding to an example of a "second rendered image") is generated.

In addition, the synthesis processing unit 12c generates an ADR image L _ADR (“first composite image”) based on the ADR method (corresponding to an example of the “first method”) based on the difference in shading between the PY image L _PY and the VI image L _VI . ) and a QI image L _QI (corresponding to an example of a "second composite image") by the QI method (corresponding to an example of a "second method") based on the shading ratio.

In addition, the composition processing unit 12c generates a shadow impact image S _I representing a change in optical radiation energy between the case where only the above-mentioned live-action information is used and the case where the above-mentioned live-action information and CG data _DCg are used, A linear combination of the ADR image L _ADR and the QI image L _QI is performed using the shadow impact image S _I.

This makes it possible to further improve the quality of compositing the CG data _DCg with respect to the three-dimensional space representing the real space.

Although each embodiment of the present disclosure has been described above, the technical scope of the present disclosure is not limited to each of the above-mentioned embodiments as is, and various changes can be made without departing from the gist of the present disclosure. be. Furthermore, components of different embodiments and modifications may be combined as appropriate.

Further, the effects in each embodiment described in this specification are merely examples and are not limited, and other effects may also be provided.

Note that the present technology can also have the following configuration.
(1)
a composition processing unit that receives an RGB image, a depth image, and a spherical image as input and performs a composition process of arranging CG data in a three-dimensional space representing a real space corresponding to the RGB image;
The synthesis processing section is
Generating a first rendered image using only the live-action information measured from the input, and a second rendering image using the live-action information and the CG data,
Generating a first composite image using a first method based on a difference in shading between the first rendered image and the second rendered image, and a second composite image using a second method based on the ratio of the shadings. death,
generating a shadow impact image representing a change in optical radiation energy between a case where only the live-action information is used and a case where the live-action information and the CG data are used;
performing a linear combination of the first composite image and the second composite image using the shadow impact image;
Information processing device.
(2)
The synthesis processing section is
The shadow impact image is generated so that areas where light radiant energy has increased through the synthesis process have low pixel values, and areas where light radiant energy has decreased have high pixel values, and the areas with low pixel values have low pixel values. assigning the first composite image and the second composite image to the high pixel value area, respectively;
The information processing device according to (1) above.
(3)
The synthesis processing section is
The shadow impact image is created such that the low pixel value area corresponds to the area of the RGB image that becomes brighter due to the compositing process, and the high pixel value area corresponds to an area of the RGB image that becomes dark due to the compositing process. generate,
The information processing device according to (2) above.
(4)
The synthesis processing section at least includes:
generating the shadow impact image based on information regarding illumination calculated from the distance-measured spherical image and information regarding visibility of the CG data;
The information processing device according to (1), (2) or (3) above.
(5)
The synthesis processing section is
generating the shadow impact image based only on the illumination information and the visibility information;
The information processing device according to (4) above.
(6)
The synthesis processing section is
using the shadow impact image to remove artifacts occurring in the compositing process;
The information processing device according to any one of (1) to (5) above.
(7)
The synthesis processing section is
By generating a light gap image indicating a light gap region in which the first rendered image is much darker than the RGB image, and preferentially allocating the second rendered image to the light gap region by linear combination. remove artifacts,
The information processing device according to (6) above.
(8)
The synthesis processing section is
An area where the distance measurement error expressed by the difference between the first rendered image and the RGB image is extremely large is detected as the light gap area, and the area where the artifacts are removed is determined by multiplying the area by the shadow impact image. limit,
The information processing device according to (7) above.
(9)
performing a composition process of inputting an RGB image, a depth image, and a spherical image and arranging CG data in a three-dimensional space representing a real space corresponding to the RGB image;
including;
Performing the above-mentioned compositing process includes:
Generating a first rendered image using only the live-action information measured from the input, and a second rendered image using the live-action information and the CG data;
Generating a first composite image using a first method based on a difference in shading between the first rendered image and the second rendered image, and a second composite image using a second method based on the ratio of the shadings. to do and
generating a shadow impact image representing a change in optical radiation energy between a case where only the live-action information is used and a case where the live-action information and the CG data are used;
performing a linear combination of the first composite image and the second composite image using the shadow impact image;
Information processing methods, further including:
(10)
performing a composition process of inputting an RGB image, a depth image, and a spherical image and arranging CG data in a three-dimensional space representing a real space corresponding to the RGB image;
make the computer run
Performing the above-mentioned compositing process includes:
generating a first rendered image using only live-action information measured from the input, and a second rendered image using the live-action information and the CG data;
Generating a first composite image using a first method based on a difference in shading between the first rendered image and the second rendered image, and a second composite image using a second method based on the ratio of the shadings. to do,
generating a shadow impact image representing a change in optical radiation energy between a case where only the live-action information is used and a case where the live-action information and the CG data are used;
performing a linear combination of the first composite image and the second composite image using the shadow impact image;
A computer-readable recording medium having recorded thereon an information processing program that causes the computer to further execute the following.

10 Image synthesis device 11 Storage unit

11a Geometry information

11b Reflectance information 11c Illumination map information 11d DCC tool program 12 Control unit

12a Acquisition unit

12b Conversion unit 12c Synthesis processing unit 12ca First generation unit 12cb Second generation unit 12cc Third generation unit 12cd Fourth generation section 12ce Fifth generation section 12cf Output image generation section 12d Output section

Claims

a composition processing unit that receives an RGB image, a depth image, and a spherical image as input and performs a composition process of arranging CG data in a three-dimensional space representing a real space corresponding to the RGB image;
The synthesis processing section is
Generating a first rendered image using only the live-action information measured from the input, and a second rendering image using the live-action information and the CG data,
Generating a first composite image using a first method based on a difference in shading between the first rendered image and the second rendered image, and a second composite image using a second method based on the ratio of the shadings. death,
generating a shadow impact image representing a change in optical radiation energy between a case where only the live-action information is used and a case where the live-action information and the CG data are used;
performing a linear combination of the first composite image and the second composite image using the shadow impact image;
Information processing device.
The synthesis processing section is
The shadow impact image is generated so that areas where light radiant energy has increased through the synthesis process have low pixel values, and areas where light radiant energy has decreased have high pixel values, and the areas with low pixel values have low pixel values. assigning the first composite image and the second composite image to the high pixel value area, respectively;
The information processing device according to claim 1.
The synthesis processing section is
The shadow impact image is created such that the low pixel value area corresponds to the area of the RGB image that becomes brighter due to the compositing process, and the high pixel value area corresponds to an area of the RGB image that becomes dark due to the compositing process. generate,
The information processing device according to claim 2.
The synthesis processing section at least includes:
generating the shadow impact image based on information regarding illumination calculated from the distance-measured spherical image and information regarding visibility of the CG data;
The information processing device according to claim 1.
The synthesis processing section is
generating the shadow impact image based only on the illumination information and the visibility information;
The information processing device according to claim 4.
The synthesis processing section is
using the shadow impact image to remove artifacts occurring in the compositing process;
The information processing device according to claim 1.
The synthesis processing section is
By generating a light gap image indicating a light gap region in which the first rendered image is much darker than the RGB image, and preferentially allocating the second rendered image to the light gap region by linear combination. remove artifacts,
The information processing device according to claim 6.
The synthesis processing section is
An area where the distance measurement error expressed by the difference between the first rendered image and the RGB image is extremely large is detected as the light gap area, and the area where the artifacts are removed is determined by multiplying the area by the shadow impact image. limit,
The information processing device according to claim 7.
performing a composition process of inputting an RGB image, a depth image, and a spherical image and arranging CG data in a three-dimensional space representing a real space corresponding to the RGB image;
including;
Performing the above-mentioned compositing process includes:
Generating a first rendered image using only the live-action information measured from the input, and a second rendered image using the live-action information and the CG data;
Generating a first composite image using a first method based on a difference in shading between the first rendered image and the second rendered image, and a second composite image using a second method based on the ratio of the shadings. to do and
generating a shadow impact image representing a change in optical radiation energy between a case where only the live-action information is used and a case where the live-action information and the CG data are used;
performing a linear combination of the first composite image and the second composite image using the shadow impact image;
information processing methods, including
performing a composition process of inputting an RGB image, a depth image, and a spherical image and arranging CG data in a three-dimensional space representing a real space corresponding to the RGB image;
make the computer run
Performing the above-mentioned compositing process includes:
generating a first rendered image using only live-action information measured from the input, and a second rendered image using the live-action information and the CG data;
Generating a first composite image using a first method based on a difference in shading between the first rendered image and the second rendered image, and a second composite image using a second method based on the ratio of the shadings. to do,
generating a shadow impact image representing a change in optical radiation energy between a case where only the live-action information is used and a case where the live-action information and the CG data are used;
performing a linear combination of the first composite image and the second composite image using the shadow impact image;
A computer-readable recording medium having recorded thereon an information processing program that causes the computer to further execute the following.