WO2022124026A1

WO2022124026A1 - Trained model generation method and information processing device

Info

Publication number: WO2022124026A1
Application number: PCT/JP2021/042193
Authority: WO
Inventors: 尚小曳
Original assignee: ソニーグループ株式会社
Priority date: 2020-12-08
Filing date: 2021-11-17
Publication date: 2022-06-16

Abstract

The trained model generation method according to the present disclosure is a method for generating a trained model having a virtual viewpoint image generation unit (2) and a rendering parameter estimation unit (3), wherein the method includes the virtual viewpoint image generation unit (2) generating a virtual viewpoint image having a higher resolution than an input image, and the rendering parameter estimation unit (3) estimating a rendering parameter using the virtual viewpoint image.

Description

Trained model generation method and information processing device

This disclosure relates to a method of generating a trained model and an information processing device.

In CG (Computer Graphics) rendering, it takes a lot of labor to create rendering parameters such as surface shape and camera parameters. Therefore, there is an increasing need for a machine-learned trained model that automatically estimates rendering parameters that can reproduce the surface shape of a high-definition object from a two-dimensional image of an actual object.

For example, Patent Document 1 discloses a three-dimensional shape restoration method in which the resolution of shape restoration of an object is sequentially increased by using input images having different resolutions in stages, and the shape of the object is restored in a short time as a whole. ing.

In this 3D shape restoration method, global shape restoration is performed from 2D shadow sampling image data based on 3D image data. After that, with the global shape restoration performed earlier as the initial value, more precise shape restoration is performed based on the two-dimensional shadow shadow sampling image data having a resolution higher than the previous resolution.

Subsequently, with the previous restored shape as the initial value, more precise shape restoration is repeated based on the two-dimensional shadow sampling image data having a resolution higher than the previous resolution until the shape restoration with the target accuracy is performed.

Japanese Unexamined Patent Publication No. 5-126546

However, in the above-mentioned three-dimensional shape restoration method, inputs having different resolutions are used, but since the configuration is such that sparse sampling is performed for the same input, it is not possible to estimate a surface shape with higher definition than the input image.

Therefore, in this disclosure, we propose a learning model generation method and an information processing device that can estimate rendering parameters that can reproduce the surface shape of an object with higher definition than the input image.

According to the present disclosure, a method for generating a trained model is provided. The trained model generation method is a method of generating a trained model having a virtual viewpoint image generation unit and a rendering parameter estimation unit, in which the virtual viewpoint image generation unit generates a virtual viewpoint image having a higher resolution than the input image. The rendering parameter estimation unit estimates the rendering parameter using the virtual viewpoint image.

Further, according to the present disclosure, an information processing device is provided. The information processing device has a virtual viewpoint image generation unit and a rendering parameter estimation unit. The virtual viewpoint image generation unit generates a virtual viewpoint image having a higher resolution than the input image. The rendering parameter estimation unit estimates rendering parameters using the virtual viewpoint image.

It is a figure which shows the structural example of the information processing apparatus which concerns on 1st Embodiment of this disclosure. It is a figure which shows the structural example of the virtual viewpoint image generation part which concerns on 1st Embodiment of this disclosure. It is a flowchart which shows an example of the learning process executed by the virtual viewpoint image generation part which concerns on 1st Embodiment of this disclosure. It is a flowchart which shows an example of the rendering parameter estimation process performed by the information processing apparatus which concerns on 1st Embodiment of this disclosure. It is a figure which shows the structural example of the information processing apparatus which concerns on the 2nd Embodiment of this disclosure. It is a flowchart which shows an example of the learning process executed by the information processing apparatus which concerns on 2nd Embodiment of this disclosure. It is a figure which shows the structural example of the information processing apparatus which concerns on 3rd Embodiment of this disclosure.

Hereinafter, embodiments of the present disclosure will be described in detail with reference to the drawings. In each of the following embodiments, the same parts are designated by the same reference numerals, so that overlapping description will be omitted.

[1. First Embodiment]
[1.1. Information processing equipment]
FIG. 1 is a diagram showing a configuration example of an information processing apparatus according to the first embodiment of the present disclosure. The information processing device 1 includes a microcomputer having a CPU (Central Processing Unit), a ROM (Read Only Memory), a RAM (Random Access Memory), and various circuits. The information processing apparatus 1 includes a virtual viewpoint image generation unit 2 and a rendering parameter estimation unit 3 that function by executing a program stored in a ROM by a CPU using a RAM as a work area.

The virtual viewpoint image generation unit 2 and the rendering parameter estimation unit 3 included in the information processing device 1 are partially or wholly composed of hardware such as ASIC (Application Specific Integrated Circuit) and FPGA (Field Programmable Gate Array). May be good.

The virtual viewpoint image generation unit 2 and the rendering parameter estimation unit 3 included in the information processing device 1 realize or execute the actions of information processing described below, respectively. The internal configuration of the information processing apparatus 1 is not limited to the configuration shown in FIG. 1, and may be any other configuration as long as it is configured to perform information processing described later.

The virtual viewpoint image generation unit 2 generates a virtual viewpoint image using the input image and outputs it to the rendering parameter estimation unit 3. The virtual viewpoint image is an image assuming that a subject (hereinafter, may be referred to as an object) reflected in a two-dimensional input image is captured from a virtual viewpoint different from the viewpoint of the input image.

Here, a predicted zoom image in which the virtual viewpoint image generation unit 2 becomes a virtual viewpoint image from the input image will be described as an example. The predictive zoom image is an image as if the subject in the input image was zoomed by a camera, and is an image having a higher resolution than the input image. The virtual viewpoint image generation unit 2 can also generate a virtual viewpoint image other than the predicted zoom image. Virtual viewpoint images other than the predictive zoom image will be described later.

Here, a method of generating a trained model for the virtual viewpoint image generation unit 2 to generate a predictive zoom image will be described with reference to FIG. FIG. 2 is a diagram showing a configuration example of a virtual viewpoint image generation unit according to the first embodiment of the present disclosure.

Generally, the conversion from a low resolution image to a high resolution image is performed using a technique such as super-resolution. For example, as a method for converting a low-resolution image to a high-resolution image, a regression problem of a two-dimensional waveform that generates a predicted zoom image from an input image is set, and a gradation value between the predicted zoom image and the correct zoom image is set. A prediction method that minimizes the squared error of the difference between the two is known.

However, in the method that minimizes the square error (Mean Square Error), learning is performed using a large number of pairs of low-high resolution images and correct low-resolution images, so only a blurred predicted zoom image that is an average solution can be generated. ..

Therefore, the rendering parameter estimation unit 3 in the subsequent stage is a rendering parameter that can reproduce the surface shape of a high-definition object sufficiently higher than the input image when the predicted zoom image generated by the method of minimizing the square error is used. Cannot be estimated.

Therefore, as shown in FIG. 2, the virtual viewpoint image generation unit 2 according to the present embodiment includes a generation parameter 20, a generation unit 21, a binary classifier 22, and a similarity calculation unit 23. The generation unit 21 is a trained model machine-learned to generate a predictive zoom image having a distribution similar to the correct zoom image given the distribution of the input image.

The machine learning is realized by updating the generation parameter 20 of the generation unit 21 using the output of the binary classifier 22 that identifies whether the input image is sufficiently close to the correct zoom image.

Specifically, when the input image is input, the generation unit 21 generates a predicted zoom image which is a virtual viewpoint image of the input image and outputs it to the binary classifier 22. When the predictive zoom image is input from the generation unit 21 and the correct answer predictive zoom image which is the correct answer virtual viewpoint image is input from the outside, the binary classifier 22 has the probability of the classification result of both the predictive zoom image and the correct answer zoom image. The density distributions are calculated as probability density distributions dg and dr, respectively, and output to the similarity calculation unit 23.

The similarity calculation unit 23 calculates both the Kullback-Leibler Divergence and the Jensen-Shannon Divergence of the probability density distributions dl and dr as the similarity, and outputs them to the generation unit 21. The generation unit 21 updates the generation parameter 20 so as to minimize the similarity of the probability density distribution pg · dr input from the similarity calculation unit 23. The Kullback-Leibler Divergence and Jensen-Shannon Divergence are scales indicating that the smaller the value, the closer the characteristics of dl and dr are.

At this time, it is desirable to update the generation parameter 20 by using a general algorithm for obtaining the inverse function of the objective function (generation unit 21) with the similarity as the loss function. In the present embodiment, the generation unit 21 is realized by a convolutional neural network, and the generation unit 21 updates the generation parameter 20 by an error back propagation method using the similarity as an input.

In this way, the virtual viewpoint image generation unit 2 generates a predictive zoom image close to the correct answer from the viewpoint of the probability density distribution from the input image, for example, a blurred predictive zoom image obtained by a method that minimizes the square error. In comparison with the input image, it is possible to generate a predicted zoom image having a sufficiently higher resolution and closer to the correct answer. As a result, the rendering parameter estimation unit 3 in the subsequent stage can estimate rendering parameters that can reproduce the surface shape of the object with higher definition than the input image.

Returning to FIG. 1, the rendering parameter estimation unit 3 estimates and outputs the rendering parameter of the subject using the predicted zoom image. Here, the rendering parameter may target all rendering parameters used for general CG editing. In this embodiment, among the rendering parameters, the surface shape of the object to be rendered, which is represented by a three-dimensional point cloud or a mesh, is targeted.

As an example, the rendering parameter estimation unit 3 according to the present embodiment estimates a three-dimensional shape using the gradation value or the chromaticity value of the predicted zoom image. For the estimation of the 3D shape, it is desirable to apply a commonly used 3D shape estimation algorithm such as shape from shading. When shape from shading is used as an example, a map in the normal direction within a minute grid range corresponding to the pixel size of the input image is calculated by using the adjacent pixel difference of the gradation value, and is output as the surface shape.

As described above, in the present embodiment, the virtual viewpoint image generation unit 2 in the previous stage generates a predicted zoom image having a resolution sufficiently higher than that of the input image and is closer to the correct answer, and outputs the predicted zoom image to the rendering parameter estimation unit 3. Then, the rendering parameter estimation unit 3 estimates the rendering parameter using the predicted zoom image input from the virtual viewpoint image generation unit 2. As a result, the rendering parameter estimation unit 3 can estimate rendering parameters that can reproduce the surface shape of the object with higher definition than the input image.

[1.2. Learning process executed by the virtual viewpoint image generator]
Next, the learning process executed by the virtual viewpoint image generation unit 2 will be described with reference to FIG. FIG. 3 is a flowchart showing an example of learning processing executed by the virtual viewpoint image generation unit according to the first embodiment of the present disclosure.

As shown in FIG. 3, an input image is first input to the virtual viewpoint image generation unit 2 (step S101). The generation unit 21 generates a predicted zoom image which is a virtual viewpoint image by using the generation parameter 20 and the input image (step S102).

After that, the predicted zoom image and the correct zoom image, which is the correct virtual viewpoint image, are input to the binary classifier 22 (step S103). The binary classifier 22 calculates the probability density distribution of the binary classification result of the predicted zoom image which is the virtual viewpoint image and the correct zoom image which is the correct virtual viewpoint image as the probability density distribution pg · prr (step S104). ..

The similarity calculation unit 23 calculates the Kullback-Leibler Divergence between the probability density distribution pg and the probability density distribution pr as the similarity with Jensen-Shannon Divergence (step S105).

The generation unit 21 updates the generation parameter 20 of the generation unit 21 so as to minimize the similarity (step S106). After that, the virtual viewpoint image generation unit 2 determines whether or not the amount of change in the degree of similarity is equal to or less than the threshold value (step S107).

Then, when the virtual viewpoint image generation unit 2 determines that the amount of change in the similarity is not equal to or less than the threshold value (step S107, No), the virtual viewpoint image generation unit 2 changes to another input image (step S108), and shifts the process to step S101. Further, when the virtual viewpoint image generation unit 2 determines that the amount of change in the similarity is equal to or less than the threshold value (step S107, Yes), the process ends.

[1.3. Rendering parameter estimation process executed by the information processing device]
Next, with reference to FIG. 4, the rendering parameter estimation process executed by the information processing apparatus 1 will be described. FIG. 4 is a flowchart showing an example of rendering parameter estimation processing executed by the information processing apparatus according to the first embodiment of the present disclosure.

As shown in FIG. 4, an input image is first input to the information processing apparatus 1 (step S201). Subsequently, the trained virtual viewpoint image generation unit 2 generates a predicted zoom image which is a virtual viewpoint image by using the input image and the trained generation parameter 20 (step S202). After that, the rendering parameter estimation unit 3 estimates and outputs the rendering parameter using the predicted zoom image (step S203), and ends the process.

The virtual viewpoint image generation unit 2 and the rendering parameter estimation unit 3 according to the present embodiment may be implemented by any learning algorithm, but it is particularly preferable to realize them by a convolutional neural network.

[2. Second embodiment]
[2.1. Information processing equipment]
FIG. 5 is a diagram showing a configuration example of the information processing apparatus according to the second embodiment of the present disclosure. As shown in FIG. 5, the information processing apparatus 1a includes a virtual viewpoint image generation unit 2, a rendering parameter estimation unit 3a, a rendering unit 4, a rendering error calculation unit 5, a rendering parameter error estimation unit 6, and a rendering parameter. The estimation unit update unit 7 is provided.

The virtual viewpoint image generation unit 2 generates a predicted zoom image which is a virtual viewpoint image using the input image, and outputs it to the rendering parameter estimation unit 3a and the rendering error calculation unit 5. The rendering parameter estimation unit 3a estimates the rendering parameter using the predicted zoom image which is a virtual viewpoint image and the estimation parameter 30 used for estimating the rendering parameter, and outputs the rendering parameter to the rendering unit 4.

The rendering unit 4 renders using the estimated rendering parameters, and outputs the rendering result to the rendering error calculation unit 5. The rendering error calculation unit 5 calculates the rendering error using the predicted zoom image which is a virtual viewpoint image and the rendering result.

The rendering parameter error estimation unit 6 calculates the rendering parameter error using the rendering error. The rendering parameter estimation unit update unit 7 calculates the rendering parameter estimation unit update amount using the rendering parameter error, and updates the rendering parameter estimation unit 3a and the estimation parameter 30.

Hereinafter, each part included in the information processing apparatus 1a will be described in detail. First, the virtual viewpoint image generation unit 2 has the same configuration as the virtual viewpoint image generation unit 2 according to the first embodiment, and performs the same operation. Therefore, duplicate description is omitted here.

The rendering parameter estimation unit 3a estimates the rendering parameter using the predicted zoom image which is a virtual viewpoint image and the estimation parameter 30. Here, the rendering parameter may target all rendering parameters used for general CG editing.

The rendering unit 4 outputs the rendering result using the estimated rendering parameters. At this time, the rendering unit 4 outputs a rendering result with a two-dimensional gradation value having the same number of pixels as the predicted zoom image by using the estimated rendering parameter.

By doing so, there is an effect that the predicted zoom image and the rendering result can be compared in the processing unit in the subsequent stage, and the error can be directly fed back to the rendering parameter estimation unit 3a.

The rendering error calculation unit 5 calculates the rendering error using the predicted zoom image and the rendering result. The rendering error is calculated as the difference information of the two-dimensional gradation values in each pixel between the predicted zoom image and the rendering result.

At this time, the fact that a difference occurs in the gradation value in each pixel means that the estimation accuracy of the rendering parameter by the rendering parameter estimation unit 3a and the estimation parameter 30 is not sufficient for reconstructing the predicted zoom image. It is information indicating that there is room for update.

Therefore, the high-definition necessary for reconstructing the predicted zoom image is performed by processing the rendering parameter error estimation unit 6 and the rendering parameter estimation unit update unit 7 in the subsequent stage using the calculated rendering error. It is possible to estimate rendering parameters that can reproduce the surface shape of the object.

The rendering parameter error estimation unit 6 calculates the rendering parameter error using the rendering error. Then, the rendering parameter estimation unit update unit 7 calculates the rendering parameter estimation unit update amount using the rendering parameter error, and updates the rendering parameter estimation unit 3a and the estimation parameter R.

In the processing of the rendering parameter error estimation unit 6 and the rendering parameter estimation unit update unit 7, the difference information of the two-dimensional gradation values is reflected in the function for estimating a plurality of rendering parameters in three dimensions.

This process is realized by a process of deriving the inverse function of the entire function including the rendering parameter estimation unit 3a and the rendering unit 4 using the difference information of the two-dimensional gradation values as the loss function, and is therefore generally automatic. It is possible to apply a differential algorithm.

In the present embodiment, the entire function is represented by a differentiable convolutional neural network, and the rendering parameter estimation unit 3a and the estimation parameter 30 are subjected to an error back propagation method in which the difference information of the two-dimensional gradation values is input. And update.

According to the rendering parameter error estimation unit 6 and the rendering parameter estimation unit update unit 7 of the present embodiment, the difference information of the two-dimensional gradation values of the predicted zoom image and the rendering result in each pixel is used as the rendering error in the previous stage. By using it, there is an effect that the rendering parameter estimation unit 3a can be updated so as to estimate the rendering parameter that can reproduce the surface shape of the high-definition object required for reconstructing the predicted zoom image.

[2.2. Learning process executed by the information processing device]
Next, with reference to FIG. 6, the process executed by the information processing apparatus 1a will be described. It is a flowchart which shows an example of the learning process executed by the information processing apparatus which concerns on 2nd Embodiment of this disclosure. As shown in FIG. 6, in the information processing apparatus 1a, first, the trained virtual viewpoint image generation unit 2 generates a predicted zoom image which is a virtual viewpoint image by using the input image and the trained generation parameter 20. (Step S301).

Subsequently, the rendering parameter estimation unit 3a estimates the rendering parameter using the predicted zoom image and the estimation parameter 30, and the rendering unit 4 renders using the rendering parameter and outputs the rendering result (step S302). ..

After that, the rendering error calculation unit 5 calculates the rendering error using the predicted zoom image which is a virtual viewpoint image and the rendering result (step S303). Subsequently, the rendering parameter error estimation unit 6 calculates the rendering parameter error using the rendering error (step S304).

After that, the rendering parameter estimation unit update unit 7 calculates the rendering parameter estimation unit update amount using the rendering parameter error, and updates the rendering parameter estimation unit and the estimation parameter 30 (step S305).

Then, the rendering parameter estimation unit 3a determines whether or not the rendering parameter update unit update amount is equal to or less than the threshold value (step S306). When the rendering parameter estimation unit 3a determines that the rendering parameter update unit update amount is not equal to or less than the threshold value (step S306, No), the rendering parameter estimation unit 3a changes to another input image (step S307), and shifts the process to step S301. Further, when the rendering parameter estimation unit 3a determines that the rendering parameter update unit update amount is equal to or less than the threshold value (step S306, Yes), the process ends.

As described above, according to the present embodiment, the information processing apparatus 1a is a rendering parameter estimation unit so that the error between the virtual viewpoint image generated by the virtual viewpoint image generation unit 2 and the rendering result of the rendering unit 4 is minimized. 3a and the estimation parameter 30 can be optimized. As a result, the information processing apparatus 1a can estimate rendering parameters that can reproduce the surface shape of the object with higher definition than the input image.

Although each part in the present embodiment may be realized by any model / learning algorithm, it is preferable to realize it by combining a convolutional neural network and a differentiable CG renderer.

[3. Third Embodiment]
In the present embodiment, among the rendering parameters, the surface shape of the object to be rendered represented by the three-dimensional point cloud or the mesh and the camera parameters of the rendering camera are targeted.

In the present embodiment, the virtual viewpoint image generation unit 2 in the previous stage calculates a predicted zoom image obtained by zooming the input image, and calculates the surface shape of the object and the camera parameters of the rendering camera from the predicted zoom image to create a virtual image in the predicted zoom image. It is possible to reflect the appropriate zoom amount in the camera parameter estimation of the rendering camera, and as a result, there is an effect that the accuracy of surface shape estimation of the other object is improved.

FIG. 7 is a diagram showing a configuration example of the information processing apparatus according to the third embodiment of the present disclosure. As shown in FIG. 7, the rendering parameter estimation unit 3b of the information processing apparatus 1b includes a camera parameter estimation unit 31 and a surface shape estimation unit 32. The rendering parameter error estimation unit 6b includes a camera parameter error estimation unit 61 and a surface shape error estimation unit 62.

The rendering parameter estimation unit update unit 7b includes a camera parameter estimation unit update unit 71 and a surface shape estimation unit update unit 72. The virtual viewpoint image generation unit 2 has the same configuration as the virtual viewpoint image generation unit 2 according to the first embodiment, and performs the same operation. Therefore, duplicate description is omitted here.

The camera parameter estimation unit 31 estimates the camera parameters using the predicted zoom image and the estimation parameters used for estimating the camera parameters. The surface shape estimation unit 32 estimates the surface shape of the object using the predicted zoom image and the estimation parameters used for estimating the surface shape.

The rendering unit 41 uses the surface shape of the object and the camera parameters to output a rendering result with a two-dimensional gradation value having the same number of pixels as the predicted zoom image. By doing so, there is an effect that the prediction zoom image and the rendering result can be compared in the processing unit in the subsequent stage, and the error can be directly fed back to the rendering parameter estimation unit.

At this time, the fact that a difference occurs in the gradation value in each pixel means that the camera parameter and surface shape estimation accuracy based on the camera parameter estimation unit 31 and surface shape estimation unit 32 in the previous stage and the estimation parameters re-estimate the predicted zoom image. It is information that is not enough to configure and that there is room for update.

Therefore, the high-definition camera parameters necessary for reconstructing the predicted zoom image by processing the rendering parameter error estimation unit 6b and the rendering parameter estimation unit update unit 7b in the subsequent stage using the rendering error. And it becomes possible to estimate the surface shape.

The camera parameter error estimation unit 61 calculates the camera parameter error using the rendering error. Then, the camera parameter estimation unit update unit 71 calculates the camera parameter estimation unit update amount using the camera parameter error, and updates the camera parameter estimation unit 31 and the estimation parameter used for camera parameter estimation.

The surface shape error estimation unit 62 calculates the surface shape error using the rendering error. Then, the surface shape estimation unit update unit 72 calculates the surface shape estimation unit update amount using the surface shape error, and updates the surface shape estimation unit 32 and the estimation parameters used for surface shape estimation.

In the processing of the rendering parameter error estimation unit 6b and the rendering parameter estimation unit update unit 7b, the difference information of the two-dimensional gradation values is reflected in the function for estimating a plurality of rendering parameters in three dimensions.

This process is realized by a procedure for deriving the inverse function of the entire function including the rendering parameter estimation unit 3b and the rendering unit 41 using the difference information of the two-dimensional gradation values as the loss function, so that it is generally automatic. It is possible to apply a differential algorithm.

In the present embodiment, the entire function is represented by a derivatable convolutional neural network, and the rendering parameter estimation unit 3b and the camera parameter are used by an error back propagation method in which the difference information of the two-dimensional gradation value is input. The estimation parameters used for the estimation of the surface shape and the estimation parameters used for the estimation of the surface shape are updated.

According to the rendering parameter error estimation unit 6b and the rendering parameter estimation unit update unit 7b of the present embodiment, the difference information of the two-dimensional gradation values of the predicted zoom image and the rendering result in each pixel is used as the rendering error in the previous stage. By using it, there is an effect that the rendering parameter estimation unit can be updated to estimate the rendering parameter that can reproduce the surface shape of the high-definition object required for reconstructing the predicted zoom image.

Up to this point, the case where the

information processing devices

1, 1a and 1b generate the predicted zoom image of the input image as the virtual viewpoint image has been described, but the

information processing devices

1, 1a and 1b have other virtual viewpoints including the predicted zoom. It is also possible to generate an image.

For example, the

information processing devices

1, 1a, 1b rotate the virtual viewpoint with respect to the input image in at least one of the yaw direction, the roll direction, and the pitch direction. The generation parameter 20 is learned in advance so as to generate the predicted viewpoint image assuming the case, and the predicted viewpoint image is generated using the learned generation parameter 20, the rendering

parameter estimation units

3, 3a, 3b and the rendering. It can be output to the error calculation unit 5. Since the configuration and operation of the other parts are the same as those in the second embodiment or the third embodiment, the description thereof will be omitted.

By using such a virtual viewpoint image generation unit 2, by generating and using a predicted viewpoint image from another viewpoint that is not in the input image, a virtual viewpoint change in the predicted viewpoint image can be generated by the camera of the rendering camera. Reflecting in the estimation of parameters and the like, the viewpoint dependence of the surface shape estimation of the object is reduced, and it becomes possible to more stably estimate the rendering parameters with higher definition than the input image.

[4. effect]
In the method of generating the trained model, the virtual viewpoint image generation unit 2 has a virtual viewpoint image having a higher resolution than the input image in the method of generating the trained model having the virtual viewpoint image generation unit 2 and the rendering parameter estimation unit 3. The rendering parameter estimation unit 3 estimates the rendering parameter using the virtual viewpoint image. This makes it possible for the trained model generation method to estimate rendering parameters that can reproduce the surface shape of an object with higher definition than the input image.

As for the method of generating the trained model, the virtual viewpoint image generation unit 2 generates a virtual viewpoint image using the input image and the generation parameter 20, and the virtual viewpoint image and the correct virtual viewpoint image are used. The binary classifier 22 identifies whether or not the image is genuine like the correct virtual viewpoint image, and the probability density distribution of the virtual viewpoint image output from the binary classifier 22 and the probability density of the correct virtual viewpoint image. Includes updating the generated parameters to minimize similarity to the distribution. As a result, the trained model generation method can estimate rendering parameters that can reproduce the surface shape of a higher-definition object by using the generation parameters optimized by the update.

The method of generating the trained model is that the rendering unit 4 generates a rendered image using the estimated rendering parameters, and the rendering error calculation unit 5 calculates the error between the virtual viewpoint image and the rendered image. , The rendering parameter error estimation unit 6 estimates the rendering parameter error based on the calculated error between the virtual viewpoint image and the rendered image, and the rendering parameter estimation unit updating unit 7 estimates the rendering parameter error based on the rendering parameter error. Includes updating the estimation parameter 30 used to estimate the rendering parameters. As a result, the trained model generation method estimates the rendering parameters that can reproduce the surface shape of the object with even higher definition by using the estimation parameters optimized by the update by the rendering parameter estimation unit update unit 7. Is possible.

In the method of generating the trained model, the rendering parameter estimation unit 3b estimates the camera parameters used for rendering, estimates the surface shape of the object to be rendered, and the rendering parameter error estimation unit 6b determines the camera parameters. The error of the estimation parameter 30 is estimated, the error of the surface shape is estimated, the rendering parameter estimation unit update unit 7b updates the camera parameter included in the estimation parameter 30 based on the error of the camera parameter, and the surface. It includes updating the surface shape included in the estimation parameter 30 based on the shape error. This makes it possible for the trained model generation method to reflect, for example, the virtual zoom amount in the virtual viewpoint image in the camera parameter estimation of the rendering camera, and as a result, the surface shape estimation of the other object. It has the effect of improving accuracy.

The method of generating the trained model includes that the virtual viewpoint image generation unit 2 generates a virtual viewpoint image assuming that the virtual viewpoint is brought closer to the object to be rendered. As a result, the trained model generation method can generate a predicted zoom image having a higher resolution than the input image as if the subject in the input image was zoomed by the camera as a virtual viewpoint image.

The trained model generation method is when the virtual viewpoint image generation unit 2 rotates the virtual viewpoint with respect to the object to be rendered in at least one of the yaw direction, the roll direction, and the pitch direction. Includes generating a virtual viewpoint image assuming. As a result, the trained model generation method is to generate and use a predicted viewpoint image from another viewpoint that is not in the input image. It is possible to reduce the viewpoint dependence of the surface shape estimation of the object by reflecting it in the estimation such as, and to estimate the rendering parameters with higher definition than the input image more stably.

The information processing device 1 has a virtual viewpoint image generation unit 2 and a rendering parameter estimation unit 3. The virtual viewpoint image generation unit 2 generates a virtual viewpoint image having a higher resolution than the input image. The rendering parameter estimation unit 3 estimates rendering parameters using a virtual viewpoint image. As a result, the information processing apparatus 1 can estimate rendering parameters that can reproduce the surface shape of the object with higher definition than the input image.

The virtual viewpoint image generation unit 2 has a generation unit 21 and a binary classifier 22. The generation unit 21 generates a virtual viewpoint image using the input image and the generation parameters. The binary classifier 22 uses the virtual viewpoint image and the correct virtual viewpoint image to identify whether or not the virtual viewpoint image is genuine like the correct virtual viewpoint image. The virtual viewpoint image generation unit 2 updates the generation parameter 20 so that the degree of similarity between the probability density distribution of the virtual viewpoint image output from the binary classifier 22 and the probability density distribution of the correct virtual viewpoint image is minimized. As a result, the information processing apparatus 1 can estimate rendering parameters that can reproduce the surface shape of a higher-definition object by using the generation parameters optimized by the update.

The information processing device 1a has a rendering unit 4, a rendering error calculation unit 5, a rendering parameter error estimation unit 6, and a rendering parameter estimation unit update unit 7. The rendering unit 4 generates a rendered image using the estimated rendering parameters. The rendering error calculation unit 5 calculates an error between the virtual viewpoint image and the rendered image. The rendering parameter error estimation unit 6 estimates the error of the rendering parameter based on the calculated error between the virtual viewpoint image and the rendered image. The rendering parameter estimation unit update unit 7 updates the estimation parameters used for estimating the rendering parameters based on the error of the rendering parameters. As a result, the information processing apparatus 1a can estimate rendering parameters that can reproduce the surface shape of an even higher-definition object by using the estimation parameters optimized by the update by the rendering parameter estimation unit update unit 7. It will be possible.

The rendering parameter estimation unit 3b has a camera parameter estimation unit 31 and a surface shape estimation unit 32. The camera parameter estimation unit 31 estimates the camera parameters used for rendering. The surface shape estimation unit 32 estimates the surface shape of the object to be rendered. The rendering parameter error estimation unit 6b has a camera parameter error estimation unit 61 and a surface shape error estimation unit 62. The camera parameter error estimation unit 61 estimates the camera parameter error. The surface shape error estimation unit 62 estimates the surface shape error. The rendering parameter estimation unit update unit 7b has a camera parameter estimation unit update unit 71 and a surface shape estimation unit update unit 72. The camera parameter estimation unit update unit 71 updates the camera parameters included in the estimation parameters based on the error of the camera parameters. The surface shape estimation unit updating unit 72 updates the surface shape included in the estimation parameters based on the surface shape error. As a result, the information processing apparatus 1b can reflect, for example, a virtual zoom amount in the virtual viewpoint image in the camera parameter estimation of the rendering camera, and as a result, the accuracy of the surface shape estimation of the other object is improved. It has the effect of improving.

The virtual viewpoint image generation unit 2 generates a virtual viewpoint image assuming that the virtual viewpoint is brought closer to the object to be rendered. As a result, the

information processing devices

1, 1a, 1b can generate a predicted zoom image having a higher resolution than the input image as if the subject in the input image was zoomed by the camera as a virtual viewpoint image.

The virtual viewpoint image generation unit generates a virtual viewpoint image assuming that the virtual viewpoint is rotated in at least one of the yaw direction, the roll direction, and the pitch direction with respect to the object to be rendered. .. As a result, the

information processing devices

1, 1a, 1b generate and use a predicted viewpoint image from another viewpoint that is not included in the input image, so that, for example, a virtual viewpoint change in the predicted viewpoint image can be generated by the rendering camera. It is possible to reduce the viewpoint dependence of the surface shape estimation of the object by reflecting it in the estimation of the camera parameters and the like, and to more stably estimate the rendering parameters with higher definition than the input image.

It should be noted that the effects described in the present specification are merely examples and are not limited, and other effects may be obtained.

The present technology can also have the following configurations.
(1)
Virtual viewpoint image generator and
Rendering parameter estimator and
In the method of generating a trained model with
The virtual viewpoint image generation unit
To generate a virtual viewpoint image with higher resolution than the input image,
The rendering parameter estimation unit
A method of generating a trained model including estimating rendering parameters using the virtual viewpoint image.
(2)
The virtual viewpoint image generation unit
Generating the virtual viewpoint image using the input image and the generation parameters,
Using the virtual viewpoint image and the correct virtual viewpoint image, it is possible to identify whether or not the virtual viewpoint image is as genuine as the correct virtual viewpoint image by a binary classifier.
The above (1) including updating the generation parameter so that the similarity between the probability density distribution of the virtual viewpoint image output from the binary classifier and the probability density distribution of the correct virtual viewpoint image is minimized. ) How to generate the trained model.
(3)
The rendering part
Using the estimated rendering parameters to generate a rendered image,
The rendering error calculation unit
To calculate the error between the virtual viewpoint image and the rendered image,
Rendering parameter error estimator
Estimating the error of the rendering parameter based on the calculated error between the virtual viewpoint image and the rendered image, and
Rendering parameter estimation section update section
The method of generating a trained model according to (1) or (2) above, comprising updating the estimation parameters used for estimating the rendering parameters based on the error of the rendering parameters.
(4)
The rendering parameter estimation unit
Estimating the camera parameters used for rendering and
Estimating the surface shape of the object to be rendered and
The rendering parameter error estimation unit
Estimating the error of the camera parameters and
Estimating the error of the surface shape and
The rendering parameter estimation unit update unit
Updating the camera parameters included in the estimation parameters based on the camera parameter error,
The method for generating a trained model according to (3) above, which includes updating the surface shape included in the estimation parameters based on the surface shape error.
(5)
The virtual viewpoint image generator
The method for generating a trained model according to any one of (1) to (4) above, which includes generating the virtual viewpoint image assuming that the virtual viewpoint is brought close to the object to be rendered.
(6)
The virtual viewpoint image generator
The above (1) including generating the virtual viewpoint image assuming that the virtual viewpoint is rotated in at least one of the yaw direction, the roll direction, and the pitch direction with respect to the object to be rendered. )-(5). The method for generating a trained model according to any one of (5).
(7)
A virtual viewpoint image generator that generates a virtual viewpoint image with a higher resolution than the input image,
An information processing device having a rendering parameter estimation unit that estimates rendering parameters using the virtual viewpoint image.
(8)
The virtual viewpoint image generation unit is
A generation unit that generates the virtual viewpoint image using an input image and generation parameters,
It has a binary classifier that discriminates whether or not the virtual viewpoint image is genuine like the correct virtual viewpoint image by using the virtual viewpoint image and the correct virtual viewpoint image.
The generation parameter is updated so that the similarity between the probability density distribution of the virtual viewpoint image output from the binary classifier and the probability density distribution of the correct virtual viewpoint image is minimized. Information processing device.
(9)
A rendering unit that generates a rendered image using the estimated rendering parameters, and a rendering unit.
A rendering error calculation unit that calculates an error between the virtual viewpoint image and the rendered image,
A rendering parameter error estimation unit that estimates the error of the rendering parameter based on the calculated error between the virtual viewpoint image and the rendered image, and a rendering parameter error estimation unit.
The information processing apparatus according to (7) or (8) above, which has a rendering parameter estimation unit update unit that updates the estimation parameter used for estimating the rendering parameter based on the error of the rendering parameter.
(10)
The rendering parameter estimation unit is
A camera parameter estimation unit that estimates the camera parameters used for rendering, and a camera parameter estimation unit.
It has a surface shape estimation unit that estimates the surface shape of the object to be rendered, and has a surface shape estimation unit.
The rendering parameter error estimation unit is
A camera parameter error estimation unit that estimates the camera parameter error, and a camera parameter error estimation unit.
It has a surface shape error estimation unit that estimates the surface shape error, and has a surface shape error estimation unit.
The rendering parameter estimation unit update unit
A camera parameter estimation unit update unit that updates the camera parameters included in the estimation parameters based on the camera parameter error, and a camera parameter estimation unit update unit.
The information processing apparatus according to (9) above, which has a surface shape estimation unit updating unit that updates the surface shape included in the estimation parameter based on the surface shape error.
(11)
The virtual viewpoint image generator
The information processing apparatus according to any one of (7) to (10), which generates the virtual viewpoint image assuming that the virtual viewpoint is brought close to the object to be rendered.
(12)
The virtual viewpoint image generator
The above (7) to (7) to generate the virtual viewpoint image assuming that the virtual viewpoint is rotated in at least one of the yaw direction, the roll direction, and the pitch direction with respect to the object to be rendered. The information processing apparatus according to any one of 11).

1,1a, 1b Information processing device 2 Virtual viewpoint image generation unit 20 Generation parameter 21 Generation unit 22 Two-value classifier 23

Similarity calculation unit

3,3a, 3b Rendering parameter estimation unit 30 Estimation parameter 31 Camera parameter estimation unit 32 Surface shape Estimating unit 4,41 Rendering unit 5 Rendering

error calculation unit

6,6b Rendering parameter error estimation unit 61 Camera parameter error estimation unit 62 Surface shape error estimation unit 7,7b Rendering parameter estimation unit update unit 71 Camera parameter estimation unit update unit 72 Surface Shape estimation unit update unit

Claims

Virtual viewpoint image generator and
Rendering parameter estimator and
In the method of generating a trained model with
The virtual viewpoint image generation unit
To generate a virtual viewpoint image with higher resolution than the input image,
The rendering parameter estimation unit
A method of generating a trained model including estimating rendering parameters using the virtual viewpoint image.
The virtual viewpoint image generation unit
Generating the virtual viewpoint image using the input image and the generation parameters,
Using the virtual viewpoint image and the correct virtual viewpoint image, it is possible to identify whether or not the virtual viewpoint image is as genuine as the correct virtual viewpoint image by a binary classifier.
Claim 1 including updating the generation parameter so that the similarity between the probability density distribution of the virtual viewpoint image output from the binary classifier and the probability density distribution of the correct virtual viewpoint image is minimized. How to generate a trained model as described in.
The rendering part
Using the estimated rendering parameters to generate a rendered image,
The rendering error calculation unit
To calculate the error between the virtual viewpoint image and the rendered image,
Rendering parameter error estimator
Estimating the error of the rendering parameter based on the calculated error between the virtual viewpoint image and the rendered image, and
Rendering parameter estimation section update section
The method for generating a trained model according to claim 1, further comprising updating the estimation parameters used for estimating the rendering parameters based on the error between the virtual viewpoint image and the rendered image.
The rendering parameter estimation unit
Estimating the camera parameters used for rendering and
Estimating the surface shape of the object to be rendered and
The rendering parameter error estimation unit
Estimating the error of the camera parameters and
Estimating the error of the surface shape and
The rendering parameter estimation unit update unit
Updating the camera parameters included in the estimation parameters based on the camera parameter error,
The method for generating a trained model according to claim 3, further comprising updating the surface shape included in the estimation parameters based on the surface shape error.
The virtual viewpoint image generator
The method for generating a trained model according to claim 1, wherein the virtual viewpoint image is generated assuming that the virtual viewpoint is brought close to the object to be rendered.
The virtual viewpoint image generator
Claim 1 including generating the virtual viewpoint image assuming that the virtual viewpoint is rotated in at least one of the yaw direction, the roll direction, and the pitch direction with respect to the object to be rendered. How to generate a trained model as described in.
A virtual viewpoint image generator that generates a virtual viewpoint image with a higher resolution than the input image,
An information processing device having a rendering parameter estimation unit that estimates rendering parameters using the virtual viewpoint image.
The virtual viewpoint image generation unit is
A generation unit that generates the virtual viewpoint image using an input image and generation parameters,
It has a binary classifier that discriminates whether or not the virtual viewpoint image is genuine like the correct virtual viewpoint image by using the virtual viewpoint image and the correct virtual viewpoint image.
The information according to claim 7, wherein the generation parameter is updated so that the similarity between the probability density distribution of the virtual viewpoint image output from the binary classifier and the probability density distribution of the correct virtual viewpoint image is minimized. Processing equipment.
A rendering unit that generates a rendered image using the estimated rendering parameters, and a rendering unit.
A rendering error calculation unit that calculates an error between the virtual viewpoint image and the rendered image,
A rendering parameter error estimation unit that estimates the error of the rendering parameter based on the calculated error between the virtual viewpoint image and the rendered image, and a rendering parameter error estimation unit.
The information processing apparatus according to claim 7, further comprising a rendering parameter estimation unit updating unit that updates the estimation parameter used for estimating the rendering parameter based on the error of the rendering parameter.
The rendering parameter estimation unit is
A camera parameter estimation unit that estimates the camera parameters used for rendering, and a camera parameter estimation unit.
It has a surface shape estimation unit that estimates the surface shape of the object to be rendered, and has a surface shape estimation unit.
The rendering parameter error estimation unit is
A camera parameter error estimation unit that estimates the camera parameter error, and a camera parameter error estimation unit.
It has a surface shape error estimation unit that estimates the surface shape error, and has a surface shape error estimation unit.
The rendering parameter estimation unit update unit
A camera parameter estimation unit update unit that updates the camera parameters included in the estimation parameters based on the camera parameter error, and a camera parameter estimation unit update unit.
The information processing apparatus according to claim 9, further comprising a surface shape estimation unit updating unit that updates the surface shape included in the estimation parameter based on the surface shape error.
The virtual viewpoint image generator
The information processing device according to claim 7, wherein the virtual viewpoint image is generated assuming that the virtual viewpoint is brought close to the object to be rendered.
The virtual viewpoint image generator
The seventh aspect of claim 7 is to generate the virtual viewpoint image assuming that the virtual viewpoint is rotated in at least one of the yaw direction, the roll direction, and the pitch direction with respect to the object to be rendered. Information processing device.