CN114863007A

CN114863007A - Image rendering method and device for three-dimensional object and electronic equipment

Info

Publication number: CN114863007A
Application number: CN202210556554.4A
Authority: CN
Inventors: 张琦; 刘巧俏; 邹航
Original assignee: China Telecom Corp Ltd
Current assignee: China Telecom Corp Ltd
Priority date: 2022-05-20
Filing date: 2022-05-20
Publication date: 2022-08-05

Abstract

The embodiment of the disclosure provides an image rendering method and device of a three-dimensional object and electronic equipment, wherein the method comprises the following steps: acquiring target visual angle information aiming at a target object; determining position information of each spatial sampling point utilized when the target object is rendered based on the target visual angle information; generating color information of each space sampling point by using a pre-trained neural rendering network based on the position information of each space sampling point and the characteristic diagram of each image to be utilized of the target object; and generating a rendering image of the target object based on the color information of each spatial sampling point. By the scheme, the problem that the universality is not achieved when the neural rendering network is used for image rendering in the related technology can be solved.

Description

Image rendering method and device for three-dimensional object and electronic equipment

Technical Field

The present disclosure relates to the field of image rendering technologies, and in particular, to an image rendering method and apparatus for a three-dimensional object, and an electronic device.

Background

New-view image rendering of three-dimensional objects has been an important research direction in the field of computer graphics. The new view image rendering refers to rendering an image of the three-dimensional object at a new view angle according to images captured at a plurality of view angles of the three-dimensional object. In the related art, a neural rendering network trained in advance is used to render a new view image of a three-dimensional object.

However, in the related art, the method for rendering a new perspective image of a three-dimensional object by using a pre-trained neural rendering network only uses position information and color information of images taken from multiple perspectives of the three-dimensional object, which results in that the neural rendering network can only perform new perspective image rendering on a single object and has no universality, that is, one neural rendering network needs to be trained separately for each different three-dimensional object, and each training involves complicated operations such as collecting data. This adds a significant amount of work to the technician and is highly unfriendly to those in the non-professional field.

Disclosure of Invention

An object of the embodiments of the present disclosure is to provide an image rendering method and apparatus for a three-dimensional object, and an electronic device, so as to solve the problem in the related art that no versatility is provided when a neural rendering network is used for image rendering. The specific technical scheme is as follows:

in a first aspect, an embodiment of the present disclosure provides an image rendering method for a three-dimensional object, where the method includes:

acquiring target visual angle information aiming at a target object; the target visual angle information is visual angle information of a visual angle at which image rendering is to be performed;

determining position information of each spatial sampling point utilized when the target object is rendered based on the target visual angle information;

generating color information of each space sampling point by using a pre-trained neural rendering network based on the position information of each space sampling point and the characteristic diagram of each image to be utilized of the target object; wherein each image to be utilized is an image obtained by shooting from different visual angles of the target object; the neural rendering network is trained to obtain an artificial intelligence model based on the position information of sample sampling points, the characteristic diagram of each sample image and the true value of the image to be predicted; each sample image and the image to be predicted are images obtained by shooting from different visual angles of a sample object; the sample sampling points are space sampling points utilized when the image to be predicted is rendered;

and generating a rendering image of the target object based on the color information of each spatial sampling point.

Optionally, the target view information includes: viewing coordinates, and viewing angle;

the determining, based on the target perspective information, position information of each spatial sampling point utilized when rendering the target object includes:

generating a virtual ray corresponding to each pixel point in the image to be rendered according to the observation angle by taking the observation coordinate as an endpoint;

sampling each virtual ray to obtain the position information of each spatial sampling point utilized when the target object is rendered.

Optionally, a generation manner of the color information of any of the spatial sampling points includes:

determining the mapping position of the spatial sampling point in each image to be utilized, and determining the characteristic information at the mapping position from the characteristic diagram of the image to be utilized as the sampling characteristic corresponding to the spatial position point;

performing feature fusion processing on the sampling features to obtain fusion features corresponding to the spatial sampling points;

and inputting the position information of the spatial sampling points and the fusion characteristics corresponding to the spatial sampling points into a neural rendering network trained in advance to generate the color information of the spatial sampling points.

Optionally, the determining, for each image to be utilized, a mapping position of the spatial sampling point in the image to be utilized includes:

and determining the mapping position of the spatial sampling point in each image to be utilized based on the camera calibration information of the shooting equipment for shooting the image to be utilized and the position information of the spatial sampling point.

Optionally, the performing feature fusion processing on each sampling feature to obtain a fusion feature corresponding to the spatial sampling point includes:

and performing feature fusion processing on each sampling feature by using a pre-trained self-attention mechanism transform layer to obtain fusion features corresponding to the spatial sampling points.

Optionally, the neural rendering network and the self-attention mechanism fransformer layer are obtained by joint training based on position information of the sample sampling points, a feature map of each sample image, and a true value of the image to be predicted.

Optionally, the joint training mode of the neural rendering network and the self-attention mechanism fransformer layer includes:

obtaining each sample image and the image to be predicted;

determining the position information of the sample sampling points utilized when the sample object is rendered based on the view angle information corresponding to the image to be predicted;

determining the mapping position of the sample sampling point in the sample image for each sample image, and determining the characteristic information at the mapping position from the image characteristic diagram of the sample image as the sampling characteristic corresponding to the sample sampling point;

performing feature fusion processing on each sampling feature of the sample sampling points by using a self-attention mechanism Transformer layer in training to obtain fusion features corresponding to the sample sampling points;

inputting the position information of the sample sampling point and the fusion characteristics corresponding to the sample sampling point into a neural rendering network in training, and outputting the predicted color information of the sample sampling point;

generating an image of the sample object when viewed from a specified viewing angle based on the predicted color information of the sample sampling points as an output image; the appointed view angle comprises an observation position and an observation angle corresponding to the image to be predicted;

calculating model loss by using the output image and the difference of the image to be predicted;

when the neural rendering network is determined to be not converged by the model loss, adjusting model parameters of the neural rendering network and the self-attention mechanism fransformer layer until the neural rendering network and the self-attention mechanism fransformer layer are converged.

Optionally, the determining, for each image to be utilized, a mapping position of the spatial sampling point in the image to be utilized, and determining feature information at the mapping position from a feature map of the image to be utilized as a sampling feature corresponding to the spatial position point includes:

if the feature map of the image to be utilized is different from the size of the image to be utilized, performing linear interpolation processing on the feature map of the image to be utilized based on the size of the image to be utilized to obtain a target feature map with the same size as the image to be utilized;

and extracting the characteristic information at the mapping position in the target characteristic diagram to obtain the sampling characteristic corresponding to the spatial sampling point.

Optionally, the color information includes a color value, and a density value; the density value is used for representing the weight of the spatial sampling point when the spatial sampling point is used for generating the pixel value;

generating a rendered image of the target object based on the color information of each of the spatial sampling points, comprising:

weighting and adding the color values of the spatial sampling points on each virtual ray according to the density values to obtain pixel values of pixel points in the image to be rendered corresponding to the virtual ray;

and generating the rendered image based on the pixel value of each pixel point in the image to be rendered.

In a second aspect, an embodiment of the present disclosure provides an apparatus for rendering an image of a three-dimensional object, the apparatus including:

the acquisition module is used for acquiring target visual angle information aiming at a target object; the target visual angle information is visual angle information of a visual angle at which image rendering is to be performed;

the determining module is used for determining the position information of each spatial sampling point utilized when the target object is rendered based on the target visual angle information;

the first generation module is used for generating color information of each spatial sampling point based on the position information of each spatial sampling point and the characteristic diagram of each image to be utilized of the target object by utilizing a pre-trained neural rendering network; wherein each image to be utilized is an image obtained by shooting from different visual angles of the target object; the neural rendering network is trained to obtain an artificial intelligence model based on the position information of sample sampling points, the characteristic diagram of each sample image and the true value of the image to be predicted; each sample image and the image to be predicted are images obtained by shooting from different visual angles of a sample object; the sample sampling points are space sampling points utilized when the image to be predicted is rendered;

and the second generation module is used for generating a rendering image of the target object based on the color information of each spatial sampling point.

the determining module, based on the target perspective information, includes:

the first generation submodule is used for generating a virtual ray corresponding to each pixel point in the image to be rendered according to the observation angle by taking the observation coordinate as an endpoint;

and the sampling module is used for sampling each virtual ray to obtain the position information of each space sampling point utilized when the target object is rendered.

Optionally, the first generating module includes:

the first determining submodule is used for determining the mapping position of the spatial sampling point in each image to be utilized, and determining the characteristic information at the mapping position from the characteristic diagram of the image to be utilized as the sampling characteristic corresponding to the spatial position point;

the fusion submodule is used for carrying out feature fusion processing on the sampling features to obtain fusion features corresponding to the spatial sampling points;

and the input sub-module is used for inputting the position information of the spatial sampling points and the fusion characteristics corresponding to the spatial sampling points into a pre-trained neural rendering network to generate the color information of the spatial sampling points.

Optionally, the first determining submodule is specifically configured to:

Optionally, the fusion submodule is specifically configured to:

Optionally, the joint training mode of the neural rendering network and the attention mechanism fransformer layer includes:

obtaining each sample image and the image to be predicted;

Optionally, the first determining sub-module includes:

the linear interpolation unit is used for carrying out linear interpolation processing on the feature map of the image to be utilized based on the size of the image to be utilized if the feature map of the image to be utilized is different from the size of the image to be utilized, so as to obtain a target feature map with the same size as the image to be utilized;

and the extraction unit is used for extracting the feature information at the mapping position in the target feature map to obtain the sampling feature corresponding to the spatial sampling point.

the second generation module includes:

the adding submodule is used for weighting and adding the color value of the space sampling point on each virtual ray according to the density value to obtain the pixel value of the pixel point in the image to be rendered corresponding to the virtual ray;

and the second generation submodule is used for generating the rendered image based on the pixel value of each pixel point in the image to be rendered.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete mutual communication through the communication bus;

a memory for storing a computer program;

and the processor is used for realizing the steps of the image rendering method of the three-dimensional object when executing the program stored in the memory.

In a fourth aspect, the present disclosure provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the image rendering method for a three-dimensional object.

Embodiments of the present disclosure also provide a computer program product containing instructions, which when run on a computer, cause the computer to execute the above-mentioned method for rendering an image of a three-dimensional object.

The embodiment of the disclosure has the following beneficial effects:

the image rendering method of the three-dimensional object provided by the embodiment of the disclosure acquires target view angle information for a target object; determining position information of each spatial sampling point utilized when the target object is rendered based on the target visual angle information; generating color information of each space sampling point by using a pre-trained neural rendering network based on the position information of each space sampling point and the characteristic diagram of each image to be utilized of the target object; and generating a rendering image of the target object based on the color information of each spatial sampling point. Therefore, in the scheme, the characteristic graphs of the sample images are introduced in the training process of the neural rendering network, so that the neural rendering network can learn the characteristic information of different sample objects, and the neural rendering network obtained through training can be suitable for different objects. Thus, for different target objects, the color information of each spatial sampling point can be determined by using the feature map of the image to be utilized of the target object and the position information of each spatial sampling point through the neural rendering network obtained by pre-training, so that the rendered image of the target object is further generated, and the universality is achieved. Therefore, the problem that the generality is not achieved when the neural rendering network is used for image rendering in the related technology can be solved through the scheme.

Of course, not all advantages described above need to be achieved at the same time to practice any one product or method of the present disclosure.

Drawings

In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other embodiments can be obtained by those skilled in the art according to the drawings.

Fig. 1 is a flowchart of an image rendering method for a three-dimensional object according to an embodiment of the present disclosure;

fig. 2 is another flowchart of an image rendering method for a three-dimensional object according to an embodiment of the present disclosure;

fig. 3 is a flowchart of an image rendering method for a three-dimensional object at a feature processing stage according to an embodiment of the present disclosure;

FIG. 4 is a flowchart of a method for rendering an image of a three-dimensional object at an input preprocessing stage according to an embodiment of the present disclosure;

fig. 5 is a flowchart of an image rendering method for a three-dimensional object at a feature matting stage according to an embodiment of the present disclosure;

fig. 6 is a flowchart of an image rendering method for a three-dimensional object at a rendering stage according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of an image rendering apparatus for a three-dimensional object according to an embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. All other embodiments that can be derived from the disclosure by one of ordinary skill in the art based on the embodiments in the disclosure are intended to be within the scope of the disclosure.

Three-dimensional reconstruction and new visual angle image rendering are always the core of the computer graphics field, the three-dimensional reconstruction can be realized by the new visual angle image rendering, and when an image of a three-dimensional object under any visual angle can be rendered, the three-dimensional reconstruction of the three-dimensional object is completed.

At present, with the proposition of new concepts such as digital twins and metas, the demand of the industry for new perspective rendering gradually increases, and although the training speed for the neural rendering network is faster and larger at present, the neural rendering network in the related technology does not have universality, a neural rendering network is separately trained for each different three-dimensional object, and the current training of a network for each three-dimensional object separately involves the design, data collection and the like of the network, so that the network is very complex operation and is only suitable for people in the field of professional research. These operations are clearly very unfriendly for persons in non-professional areas, forming an intangible barrier. In order to enable all people to easily master and use the neural rendering network, a universal new visual angle image rendering method is needed, so that three-dimensional reconstruction can be easily completed only by a plurality of pictures of a three-dimensional object under a plurality of angles, and extra complex operation is not needed.

In order to solve the problem that the image rendering by using a neural rendering network in the related art does not have universality, the embodiment of the disclosure provides an image rendering method and device for a three-dimensional object and an electronic device.

First, a method for rendering an image of a three-dimensional object according to an embodiment of the present disclosure is described below.

The image rendering method of the three-dimensional object provided by the embodiment of the disclosure can be applied to electronic equipment. In practical applications, the electronic device may be a server or a terminal device, such as a computer, a smart phone, etc., which is reasonable.

The image rendering method of the three-dimensional object provided by the embodiment of the disclosure may include the following steps:

generating color information of each space sampling point by using a pre-trained neural rendering network based on the position information of each space sampling point and the characteristic diagram of each image to be utilized of the target object; wherein each image to be utilized is an image obtained by shooting from different visual angles of the target object; the neural rendering network is trained to obtain an artificial intelligence model based on the position information of sample sampling points, the characteristic diagram of each sample image and the true value of the image to be predicted; each sample image and the image to be predicted are images shot from different visual angles of a sample object; the sample sampling points are space sampling points utilized when the image to be predicted is rendered;

The following describes an image rendering method for a three-dimensional object according to an embodiment of the present disclosure with reference to the drawings.

As shown in fig. 1, the method for rendering an image of a three-dimensional object according to an embodiment of the present disclosure may include the following steps:

s101, acquiring target visual angle information aiming at a target object; the target visual angle information is visual angle information of a visual angle at which image rendering is to be performed;

the target object may be any three-dimensional object in reality, and the target view information may be used to represent a position and an angle when the target object is observed. The image rendering method of the three-dimensional object in the embodiment of the disclosure is to render an image that appears when the target object is observed at the position and at the angle represented by the target view angle information.

The target view information may be acquired in the following manner: target visual angle information is input through the human-computer interaction interface, or the observation position, the observation angle and the like are selected, and rotation and translation operations can be performed on a target object displayed in the human-computer interaction interface to generate the target visual angle information.

S102, determining the position information of each spatial sampling point utilized when the target object is rendered based on the target visual angle information;

in order to render an image of a certain viewing angle in a three-dimensional scene, a spatial sampling point may be determined first, each pixel point in the image to be rendered corresponds to a plurality of spatial sampling points, and each pixel point is generated based on the spatial sampling point corresponding to the pixel point. Furthermore, it can be understood that the plurality of spatial sampling points corresponding to each pixel point also need to be located on a straight line. Therefore, when the target view angle information includes: the determining, based on the target perspective information, position information of each spatial sampling point utilized in rendering the target object when observing coordinates and an observation angle, may include steps a1-a 2:

step A1, generating a virtual ray corresponding to each pixel point in the image to be rendered according to the observation angle by taking the observation coordinate as an endpoint;

wherein the viewing angle can be represented by a directional vector. When the target object is observed from the observation coordinates, the observed region is a cone whose vertex is the observation coordinates, and the center line of the cone is parallel to the direction vector representing the observation angle. When the image to be rendered is a two-dimensional image of H × W, H × W virtual rays can be uniformly generated in the vertebral body by using the observation coordinates as an endpoint, and each virtual ray corresponds to one pixel point in the image to be rendered.

Step A2, sampling each virtual ray to obtain the position information of each spatial sampling point utilized when rendering the target object.

In one implementation, for each virtual ray, a predetermined number of spatial sampling points may be sampled at intervals of a predetermined length triggered from an endpoint, so as to obtain position information of each spatial sampling point. Or, in another implementation, the virtual rays may be triggered from the end point, a sampling spatial range is set first, and then a predetermined number of spatial sampling points are uniformly sampled in the spatial range for each virtual ray, so as to obtain the position information of each spatial sampling point. Of course, the manner in which each virtual ray is taken is not limited thereto.

Therefore, the virtual ray is sampled to generate the space sampling point, the physical law of light propagating along a straight line is met, and the image rendered according to the space sampling point is more practical.

S103, generating color information of each space sampling point based on the position information of each space sampling point and the characteristic diagram of each image to be utilized of the target object by utilizing a pre-trained neural rendering network; wherein each image to be utilized is an image obtained by shooting from different visual angles of the target object; the neural rendering network is trained to obtain an artificial intelligence model based on the position information of sample sampling points, the characteristic diagram of each sample image and the true value of the image to be predicted; each sample image and the image to be predicted are images obtained by shooting from different visual angles of a sample object; the sample sampling points are space sampling points utilized when the image to be predicted is rendered;

any machine-learned or deep-learned feature extractor, such as resnet (residual neural network), moblnet (a lightweight neural network), etc., may be used to extract the two-dimensional visual features of the image to be utilized when extracting the feature map.

In the related art, the input of the neural rendering network is a spatial coordinate and a viewing angle direction. After the neural rendering network training is completed, a fixed value is output for a fixed spatial position and a fixed visual angle direction. But in the embodiments of the present disclosure, the two-dimensional visual characteristics of the image to be utilized are taken as a prior condition. Different three-dimensional objects have different two-dimensional visual characteristics, so that the input visual characteristics are different, and the output values are naturally different. Therefore, the trained neural rendering network has the capability of distinguishing different three-dimensional objects so as to divide and conquer the three-dimensional objects, for the three-dimensional object 1, the value corresponding to the three-dimensional object 1 is output, and for the three-dimensional object 2, the value … … corresponding to the three-dimensional object 2 is output, so that the color information of each spatial sampling point is determined by utilizing the characteristic diagram of the image to be utilized, the rendered image of the target object is further generated, and the universality is achieved.

And the neural rendering network is used for outputting the color information of the spatial sampling points based on the position information of the spatial sampling points and the characteristic diagram of each image to be utilized. In order to train the neural rendering network, a part of images shot from different perspectives of a sample object can be used as a sample image, and the other part can be used as a to-be-predicted image. Therefore, sample sampling points can be generated according to the visual angle information of any image to be predicted when the image is shot, and the true value of the color information of the spatial sampling points can be calibrated by using the image to be predicted; outputting a predicted value of the color information of the sample sampling point by utilizing a neural rendering network in training based on the position information of the sample sampling point and a characteristic diagram of the sample image; and finally, calculating a loss value of the neural rendering network by using the predicted value of the color information and the true value of the color information, so as to adjust the parameters of the neural rendering network based on the loss value until the neural rendering network is converged, and obtaining the trained neural rendering network. The number of the sample objects may be multiple, that is, in the neural network training, the sample images of different sample objects and the image to be predicted may be utilized.

In the embodiment of the disclosure, a two-dimensional visual feature is introduced, and different three-dimensional objects can be marked, so that the neural rendering network learns the priori knowledge for understanding the three-dimensional objects, and the neural rendering network with good generalization performance can be obtained.

For clarity of the scheme and layout, a specific implementation manner of generating the color information of each spatial sampling point is described below with reference to other embodiments.

And S104, generating a rendering image of the target object based on the color information of each spatial sampling point.

Each spatial sampling point corresponds to a pixel value in an image to be rendered, and the pixel value can be determined by weighting and adding color values represented by color information of the plurality of spatial sampling points or by taking an average value and the like for the plurality of spatial sampling points corresponding to each pixel value.

In one implementation, the color information includes, a color value, and a density value; illustratively, the color values may be three primary RGB values; the density values are used to characterize the weights of the spatial sampling points when the spatial sampling points are used to generate pixel values.

In the image rendering process, for each virtual ray, a pixel value corresponding to the virtual ray is obtained, and in an implementation manner, an integration operation may be performed along the ray, and a specific formula is as follows:

wherein lowercase C (r) indicates the virtual ray correspondencesC represents a color value, e.g. the three primary RGB values, r represents the virtual ray, d represents the direction of the virtual ray, a represents a density value, t represents a color value _f 、t _n Indicating the range of integration.

In practical application, the integration operation can be converted into weighted addition based on density values of spatial sampling points, so that pixel values of pixel points in the image to be rendered corresponding to the virtual ray are obtained, and the image rendering is realized by using a computer. If each virtual ray has n spatial sampling points, the neural rendering network needs to be inquired for n times, and the value of the n points can be calculated to obtain the pixel value of a pixel point in the image to be rendered, so that the rendering of one point is realized. And if the resolution of the image to be rendered is (H, W), repeating the step H x W of obtaining the pixel value of one pixel point to generate the image to be rendered.

Optionally, in another embodiment, regarding step S103, a generation manner of color information of any one of the spatial sampling points may include steps B1-B4:

step B1, determining the mapping position of the spatial sampling point in the image to be utilized aiming at each image to be utilized, and determining the characteristic information at the mapping position from the characteristic diagram of the image to be utilized as the sampling characteristic corresponding to the spatial position point;

in this case, the extracted feature map also needs to maintain consistency with the original image position so that the spatial sampling points can be mapped to corresponding positions of the feature map, so that features corresponding to the spatial sampling points can be obtained from the feature map.

For each image to be utilized, determining the mapping position of the spatial sampling point in the image to be utilized includes:

For example, a camera calibration method, such as a colomap (a three-dimensional reconstruction method), may be used to acquire camera calibration information of the photographing apparatus when photographing an image to be utilized of the target object. The camera calibration information includes camera internal parameters and camera external parameters, the camera external parameters are used for converting a world coordinate system into a camera coordinate system, the camera internal parameters are used for converting the camera coordinate system into a pixel coordinate system, and any point in space, such as a space sampling point, can be corresponding to a pixel point of an image shot by the shooting equipment by using the camera internal parameters and the camera external parameters, namely, a mapping position. The camera internal reference and the camera external reference are both prior art and will not be further described here.

In practical application, for each image to be utilized, the mapping position of the spatial sampling point in the image to be utilized can be obtained by using the coordinates represented by the position information of the spatial sampling point and the camera internal reference and the camera external reference of the camera calibration information, and the mapping position is expressed by a formula as follows:

wherein the content of the first and second substances,

K ₁ for camera internal reference, K ₂ Is a camera external parameter, is a known input. u, v are two-dimensional coordinates in a pixel coordinate system, X _w ，Y _w ，Z _w Are the coordinates of the spatial sampling points.

And calculating the values of u and v to obtain the mapping positions of the space sampling points in the image to be utilized, thereby obtaining the sampling characteristics corresponding to the space sampling points.

Step B2, carrying out feature fusion processing on the sampling features to obtain fusion features corresponding to the spatial sampling points;

each sampling point may correspond to a mapping position in each image to be utilized to obtain a plurality of sampling features, and therefore, feature fusion processing, for example, splicing processing, linear addition processing, and the like, needs to be performed on each sampling feature corresponding to each sampling point to obtain the sampling point fusion feature.

For example, in other implementation manners, a pre-trained self-attention mechanism transform layer may be further used to perform feature fusion processing on each sampling feature to obtain a fusion feature corresponding to the spatial sampling point.

The Transformer layer can learn the relation between every two characteristics, and the Transformer layer can effectively fuse a plurality of extracted characteristics, so that the relevance between different visual angles can be reserved, the fusion characteristics with guiding significance are reserved, more complete three-dimensional information can be extracted, the rendering of the three-dimensional image can be guided more efficiently, and the effect of rendering the image is enhanced.

And step B3, inputting the position information of the spatial sampling points and the fusion characteristics corresponding to the spatial sampling points into a pre-trained neural rendering network to generate the color information of the spatial sampling points.

In the embodiment of the disclosure, for each image to be utilized, the mapping position of a spatial sampling point in the image to be utilized is determined, and the characteristic information at the mapping position is determined from the characteristic diagram of the image to be utilized as the sampling characteristic corresponding to the spatial position point; carrying out feature fusion processing on the sampling features to obtain fusion features corresponding to the spatial sampling points; and inputting the position information of the spatial sampling point and the fusion characteristic corresponding to the spatial sampling point into a pre-trained neural rendering network to generate the color information of the spatial sampling point. According to the scheme, the characteristic fusion processing is carried out on the sampling characteristics of the spatial sampling points at the mapping positions of the image to be utilized, so that the fusion characteristics corresponding to each spatial sampling point are obtained, and therefore the color information of the spatial sampling points can be generated more accurately based on the fusion characteristics corresponding to the spatial sampling points.

Optionally, in another embodiment, the determining, for any one of the images to be utilized, a mapping position of the spatial sampling point in the image to be utilized, and determining feature information at the mapping position from a feature map of the image to be utilized, as a sampling feature corresponding to the spatial position point, may include steps C1-C2:

step C1, if the feature map of the image to be utilized is different from the size of the image to be utilized, linear interpolation processing is carried out on the feature map of the image to be utilized based on the size of the image to be utilized, and a target feature map with the same size as the image to be utilized is obtained;

considering that the extracted feature map is often smaller in size than the original image, the feature map of the image to be utilized may be processed by a linear interpolation method, for example, the feature map of the image to be utilized may be processed by a nearest neighbor interpolation algorithm, a bilinear interpolation algorithm, or the like, so that the processed feature map and the image to be utilized have the same size.

And step C2, extracting the feature information at the mapping position in the target feature map to obtain the sampling feature corresponding to the spatial sampling point.

Therefore, the mapping position in the target characteristic diagram can be directly found according to the position of the space sampling point mapped to the image to be utilized, and the sampling characteristic of the space sampling point is extracted.

In this embodiment, when the feature map of the image to be utilized is different from the size of the image to be utilized, linear interpolation processing is performed on the feature map of the image to be utilized based on the size of the image to be utilized to obtain a target feature map having the same size as the image to be utilized, and then feature information at the mapping position in the target feature map is extracted to obtain sampling features corresponding to spatial sampling points. In this embodiment, by performing linear interpolation processing on the feature map of the image to be utilized, the problem that the feature map of the image to be utilized is different from the size of the image to be utilized is solved, and thus the sampling features corresponding to the spatial sampling points can be directly obtained.

Optionally, in another embodiment, the neural rendering network and the attention mechanism Transformer layer are obtained by jointly training based on the position information of the sample sampling points, the feature maps of the sample images, and the truth values of the images to be predicted.

Specifically, the joint training mode of the neural rendering network and the attention mechanism fransformer layer may include steps D1-D8:

step D1, obtaining each sample image and the image to be predicted;

in one implementation, multiple images may be captured simultaneously from different perspectives of a sample object, with one portion being a sample image and another portion being a to-be-predicted image. Also, the number of sample objects may be plural.

Step D2, determining the position information of the sample sampling points used when rendering the sample object based on the view angle information corresponding to the image to be predicted;

the corresponding view information of the image to be predicted may include a position and an angle representing when the target object is observed, and may also be represented by a spatial coordinate and a direction vector. Determining the position information of the sample sampling points according to the corresponding view angle information of the image to be predicted is similar to the manner of determining the position information of each spatial sampling point used when the target object is rendered based on the target view angle information in step S102.

Step D3, determining the mapping position of the sample sampling point in the sample image for each sample image, and determining the characteristic information at the mapping position from the image characteristic map of the sample image as the sampling characteristic corresponding to the sample sampling point;

and B1, obtaining sampling features corresponding to the sample sampling points, determining the mapping positions of the spatial sampling points in the images to be utilized according to each image to be utilized, and determining feature information at the mapping positions from the feature map of the images to be utilized, wherein the feature information is similar to the sampling features corresponding to the spatial position points, and details are not repeated here.

D4, performing feature fusion processing on each sampling feature of the sample sampling points by using a self-attention mechanism Transformer layer in training to obtain fusion features corresponding to the sample sampling points;

step D5, inputting the position information of the sample sampling point and the fusion characteristic corresponding to the sample sampling point into a neural rendering network in training, and outputting the predicted color information of the sample sampling point;

the predicted color information may also include a color value and a density value.

A step D6 of generating an image of the sample object as an output image when viewed from a specified viewing angle based on the predicted color information of the sample sampling points; the appointed view angle comprises an observation position and an observation angle corresponding to the image to be predicted;

the manner of generating the output image is similar to that of the step S104, and the rendered image of the target object is generated based on the color information of each spatial sampling point, which is not described herein again.

Step D7, calculating model loss by using the output image and the difference of the image to be predicted;

and D8, when the neural rendering network is determined to be not converged by the model loss, adjusting the model parameters of the neural rendering network and the self-attention mechanism fransformer layer until the neural rendering network and the self-attention mechanism fransformer layer are converged.

The method comprises the steps of respectively taking a plurality of images obtained by shooting from different visual angles of a sample object as a sample image and a to-be-predicted image, finally calculating a loss value by using the to-be-predicted image as a true value and an output image obtained by using a neural rendering model, and adjusting model parameters of a neural rendering network and an attention mechanism Transformer layer until the neural rendering network and the attention mechanism Transformer layer converge. By using the image obtained by real shooting as the true value of the output image, the accuracy of processing the new visual angle image rendering problem by the trained neural rendering network and the self-attention mechanism transform layer is improved.

In this embodiment, the effect of the image rendering method for a three-dimensional object provided by the embodiment of the present disclosure can be further improved by jointly training the neural rendering network and the attention mechanism transform layer.

In order to facilitate understanding of the image rendering method of a three-dimensional object provided by the present disclosure, an exemplary description is given below with reference to fig. 2 to 6.

As shown in fig. 2, first, 2-dimensional images of a target object from a known view angle need to be obtained, and feature maps of the 2-dimensional images are obtained by using a feature extraction module, where the feature extraction module is configured to extract visual features of two-dimensional images of the target object from the known view angle, so as to obtain feature maps of the two-dimensional images. Meanwhile, target visual angle information is input, preprocessing operation is carried out, namely, three-dimensional virtual rays are generated by utilizing the target visual angle information, and three-dimensional space sampling points are further generated;

projecting the coordinates x of the space sampling points into each two-dimensional image, and performing linear interpolation processing on the characteristic graph of the two-dimensional image to further obtain mapping positions corresponding to the space sampling points, so as to extract the sampling characteristics of the space sampling points; performing feature fusion processing on a plurality of sampling features corresponding to each spatial sampling point by using a Transformer feature fusion module to obtain fusion features W (x) of the spatial sampling points;

inputting the position information of the spatial sampling point and target visual angle information (x, d), wherein x is the coordinate of the spatial sampling point, d is the coordinate and direction represented by the target visual angle information, into a neural rendering network module, and simultaneously inputting the fusion characteristic W (x) into the neural rendering network module, and outputting the color information of the spatial sampling point, (r, g, b, sigma), wherein r, g, b represent tristimulus values, and sigma represents a density value; the neural rendering network module is used for outputting the color information of the spatial sampling point by utilizing the position information of the spatial sampling point and the corresponding fusion characteristic;

inputting the color information of the space sampling point into a three-dimensional rendering module, and outputting a rendered image of a target object; the three-dimensional rendering module is used for obtaining a pixel value corresponding to each virtual ray by using the color information of each space sampling point on each virtual ray, finally obtaining the pixel value of each pixel point of the image to be rendered, and outputting the rendered image.

As shown in fig. 3, the feature extraction stage in the image rendering method for a three-dimensional object according to the embodiment of the present disclosure may include:

s1-1, acquiring a two-dimensional image of the three-dimensional object from a known view angle;

and S1-2, extracting the features of the two-dimensional images.

As shown in fig. 4, in the input preprocessing stage of the image rendering method for a three-dimensional object provided in the embodiment of the present disclosure, the input preprocessing stage may include:

s2-1, acquiring a target visual angle;

s2-2, generating a virtual ray in the three-dimensional space;

and S2-3, sampling the virtual ray.

As shown in fig. 5, in the feature matting stage of the image rendering method of a three-dimensional object provided in the embodiment of the present disclosure, the method may include:

s3-1, obtaining sampled spatial sampling points;

s3-2, projecting the spatial sampling points to the two-dimensional image;

s3-3, matting the feature of the corresponding position of the feature map after linear interpolation;

s3-4, performing feature fusion processing on the features of the spatial sampling points;

and obtaining the characteristic information of the spatial sampling points.

As shown in fig. 6, in the rendering stage of the image rendering method for a three-dimensional object according to the embodiment of the present disclosure, the rendering stage may include:

s4-1, inputting the position information and the characteristic information into a neural rendering network;

generating color information of each spatial sampling point by using neural rendering network

S4-2, performing volume rendering operation along the spatial sampling point on each virtual ray;

namely, the pixel value of the pixel point corresponding to the virtual ray is calculated according to the color information of each spatial sampling point on the virtual ray.

And S4-3, generating a target image.

I.e. a rendered image of the target three-dimensional image.

An embodiment of the present disclosure further provides an image rendering apparatus for a three-dimensional object, as shown in fig. 7, the apparatus includes:

an obtaining module 710, configured to obtain target view information for a target object; the target visual angle information is visual angle information of a visual angle at which image rendering is to be performed;

a determining module 720, configured to determine, based on the target perspective information, position information of each spatial sampling point utilized when rendering the target object;

the first generating module 730 is configured to generate, by using a neural rendering network trained in advance, color information of each spatial sampling point based on the position information of each spatial sampling point and a feature map of each to-be-utilized image of the target object; wherein each image to be utilized is an image obtained by shooting from different visual angles of the target object; the neural rendering network is trained to obtain an artificial intelligence model based on the position information of sample sampling points, the characteristic diagram of each sample image and the true value of the image to be predicted; each sample image and the image to be predicted are images obtained by shooting from different visual angles of a sample object; the sample sampling points are space sampling points utilized when the image to be predicted is rendered;

a second generating module 740, configured to generate a rendered image of the target object based on the color information of each of the spatial sampling points.

the determining module, based on the target perspective information, includes:

Optionally, the first generating module includes:

Optionally, the method is specifically configured to:

Optionally, the fusion submodule is specifically configured to:

obtaining each sample image and the image to be predicted;

determining the position information of sample sampling points utilized when the sample object is rendered based on the visual angle information corresponding to the image to be predicted;

Optionally, the first determining sub-module includes:

the second generation module includes:

The disclosed embodiment also provides an electronic device, as shown in fig. 8, including a processor 801, a communication interface 802, a memory 803 and a communication bus 804, where the processor 801, the communication interface 802 and the memory 803 complete mutual communication through the communication bus 804,

a memory 803 for storing a computer program;

the processor 801 is configured to implement the steps of the image rendering method for a three-dimensional object described above when executing the program stored in the memory 803.

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

In yet another embodiment provided by the present disclosure, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the above-mentioned image rendering method for a three-dimensional object.

In yet another embodiment provided by the present disclosure, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the above-described method of image rendering of a three-dimensional object.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions described in accordance with the embodiments of the disclosure are, in whole or in part, generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments.

The above description is only a preferred embodiment of the present disclosure, and is not intended to limit the scope of the present disclosure. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present disclosure are included in the scope of protection of the present disclosure.

Claims

1. A method of image rendering of a three-dimensional object, the method comprising:

2. The method of claim 1, wherein the target perspective information comprises: viewing coordinates, and viewing angle;

3. The method according to claim 1 or 2, wherein the generation of the color information of any one of the spatial sampling points comprises:

4. The method of claim 3, wherein determining the mapping position of the spatial sampling point in each image to be utilized comprises:

5. The method according to claim 3, wherein the performing feature fusion processing on each sampling feature to obtain a fusion feature corresponding to the spatial sampling point comprises:

6. The method of claim 5, wherein the neural rendering network and the attention mechanism transform layer are obtained based on position information of the sample sampling points, a feature map of each sample image, and a truth value joint training of the image to be predicted.

7. The method of claim 6, wherein the joint training mode of the neural rendering network and the attention mechanism fransformer layer comprises:

obtaining each sample image and the image to be predicted;

8. The method according to claim 3, wherein the determining, for each image to be utilized, a mapping position of the spatial sampling point in the image to be utilized, and determining feature information at the mapping position from a feature map of the image to be utilized as a sampling feature corresponding to the spatial position point comprises:

9. The method of claim 3, wherein the color information includes, color values, and density values; the density value is used for representing the weight of the spatial sampling point when the spatial sampling point is used for generating the pixel value;

10. An apparatus for rendering an image of a three-dimensional object, the apparatus comprising:

11. The apparatus of claim 10, wherein the target perspective information comprises: viewing coordinates, and viewing angle;

the determining module, based on the target perspective information, includes:

12. The apparatus of claim 10 or 11, wherein the first generating module comprises:

13. The apparatus according to claim 12, wherein the first determining submodule is specifically configured to:

14. The apparatus of claim 12, wherein the fusion submodule is specifically configured to:

15. The apparatus of claim 14, wherein the neural rendering network and the attention mechanism fransformer layer are jointly trained based on position information of the sample sampling points, a feature map of each sample image, and a true value of the image to be predicted.

16. The apparatus of claim 15, wherein the joint training mode of the neural rendering network and the attention mechanism fransformer layer comprises:

obtaining each sample image and the image to be predicted;

17. The apparatus of claim 12, wherein the first determining submodule comprises:

18. The apparatus of claim 12, wherein the color information comprises, a color value, and a density value; the density value is used for representing the weight of the spatial sampling point when the spatial sampling point is used for generating the pixel value;

the second generation module includes:

19. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of any of claims 1-9 when executing a program stored in the memory.

20. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of the claims 1-9.