CN113902848A

CN113902848A - Object reconstruction method and device, electronic equipment and storage medium

Info

Publication number: CN113902848A
Application number: CN202111199719.9A
Authority: CN
Inventors: 李默然; 黄海斌; 郑屹; 马重阳
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2021-10-14
Filing date: 2021-10-14
Publication date: 2022-01-07

Abstract

The present disclosure relates to an object reconstruction method, apparatus, electronic device, and storage medium, the method including acquiring a multi-view target object image and illumination information of each pixel point in the target object image; generating object characteristics, color characteristics and multi-view object difference attribute characteristics of a target object; inputting the pixel coordinates, the object characteristics and the object difference attribute characteristics of the pixel points into a shape representation network for object shape representation to obtain object shape representation information representing the geometric shape boundary distance information from each pixel point to a target object; inputting the pixel coordinates, object shape representation information, color characteristics and illumination information into an image rendering network for image rendering processing to obtain a multi-view rendering image; and performing object reconstruction updating on the object shape representation information based on the rendering image and the target object image to obtain three-dimensional reconstruction information of the target object. By using the embodiment of the disclosure, the storage overhead of object representation can be reduced, and the resolution of object reconstruction information can be improved.

Description

Object reconstruction method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer vision technologies, and in particular, to an object reconstruction method and apparatus, an electronic device, and a storage medium.

Background

With the development of computer vision technology, image processing technology is applied to more and more virtual image application scenes, such as games, social contact, shooting and the like. In order to improve the fidelity of the virtual image, three-dimensional reconstruction of objects required for constructing the virtual image, such as a human face, a hand and the like, is often required.

In the related art, reconstruction of an object such as a human face generally uses a conventional triangle (mesh) to characterize the three-dimensional shape of the object. Specifically, based on a three-dimensional deformable face model (3D Morphable Models,3DMMs), a coding and decoding network for deep learning is used to extract features from an input multi-view image, so as to learn coefficients of each group of basic shapes of the 3 DMMs; and finally, reconstructing the shape of the three-dimensional face by using the shape mixture of the weighted average. However, in the related art, the shape representation based on the triangular patch is discontinuous, and the shape of the object cannot be represented comprehensively, so that the resolution of the three-dimensional reconstruction information in the related art is low, the storage cost of the shape representation based on the triangular patch is positively correlated with the resolution, the higher the resolution is, the more storage space is required, and the problem that the resolution and the storage cost of the object reconstruction information cannot be balanced exists.

Disclosure of Invention

The present disclosure provides an object reconstruction method, an object reconstruction device, an electronic apparatus, and a storage medium, which at least solve the problems of a related art that resolution of three-dimensional reconstruction information is low, and resolution of object reconstruction information and storage overhead cannot be balanced. The technical scheme of the disclosure is as follows:

according to a first aspect of embodiments of the present disclosure, there is provided an object reconstruction method, including:

acquiring a plurality of target object images of a target object collected under multiple visual angles and illumination information of each pixel point in the target object images;

generating object features of the target object, color features of the target object and a plurality of corresponding object difference attribute features of the target object under the multi-view angles;

inputting the pixel coordinates of each pixel point, the object characteristics and the object difference attribute characteristics into a shape representation network for object shape representation to obtain object shape representation information, wherein the object shape representation information represents the distance information from each pixel point to the geometric shape boundary of the target object;

inputting the pixel coordinates of each pixel point, the object shape representation information, the color characteristics and the illumination information into an image rendering network for image rendering processing to obtain a plurality of rendered images under the multiple viewing angles;

and performing object reconstruction updating on the object shape representation information based on the plurality of rendering images and the plurality of target object images to obtain three-dimensional reconstruction information of the target object.

Optionally, the performing, based on the plurality of rendered images and the plurality of target object images, object reconstruction and update on the object shape representation information to obtain three-dimensional reconstruction information of the target object includes:

determining a first target penalty from the plurality of rendered images and the plurality of target object images;

updating the object feature, the plurality of object difference attribute features, and the color feature if the first target loss does not satisfy a first preset condition;

based on the updated object characteristics, the updated object difference attribute characteristics and the updated color characteristics, repeating the step of inputting the pixel coordinates of each pixel point, the object characteristics and the object difference attribute characteristics into a shape representation network for object shape representation to obtain object shape representation information, and determining a first target loss according to the plurality of rendered images and the plurality of target object images until the first target loss meets the first preset condition;

and taking the object shape representation information output by the current shape representation network as the three-dimensional reconstruction information under the condition that the first target loss meets the first preset condition.

Optionally, the method further includes:

determining the pixel coordinates of each pixel point in the target object images;

coding the pixel coordinates to obtain pixel coordinate characteristics;

inputting the pixel coordinates of each pixel point, the object characteristics and the object difference attribute characteristics into a shape representation network for object shape representation, and obtaining object shape representation information includes:

inputting the pixel coordinate characteristics, the object characteristics and the object difference attribute characteristics into the shape representation network for object shape representation to obtain object shape representation information;

inputting the pixel coordinates of each pixel point, the object shape representation information, the color features and the illumination information into an image rendering network for image rendering processing, and obtaining a plurality of rendered images under the multiple viewing angles comprises:

and inputting the pixel coordinate characteristics, the object shape representation information, the color characteristics and the illumination information into the image rendering network for image rendering processing to obtain a plurality of rendered images.

Optionally, the generating an object feature of the target object, a color feature of the target object, and a plurality of object difference attribute features corresponding to the target object under the multiple viewing angles includes:

randomly generating the object features based on first preset dimension information;

randomly generating the color features based on second preset dimension information;

and randomly generating the plurality of object difference attribute features based on third preset dimension information.

Optionally, the method further includes:

acquiring a plurality of sample object images of the sample object collected under the multi-view angle and sample illumination information of each sample pixel point in the plurality of sample object images;

generating a sample object feature of the sample object, a sample color feature of the sample object, and a plurality of sample object difference attribute features corresponding to the sample object at the multiple viewing angles;

inputting the pixel coordinates of each sample pixel point, the sample object characteristics and the difference attribute characteristics of the plurality of sample objects into a first preset neural network for object shape characterization to obtain sample object shape characterization information, wherein the sample object shape characterization information is distance information from each sample pixel point to a geometric shape boundary of each sample object;

inputting the pixel coordinates of each sample pixel point, the sample object shape representation information, the sample color characteristics and the sample illumination information into a second preset neural network for image rendering processing to obtain a plurality of sample rendering images under the multi-view angle;

training the first preset neural network and the second preset neural network based on the plurality of sample rendering images and the plurality of sample object images to obtain the shape representation network and the image rendering network.

Optionally, the training the first preset neural network and the second preset neural network based on the plurality of sample rendering images and the plurality of sample object images to obtain the shape characterization network and the image rendering network includes:

determining a second target penalty from the plurality of sample rendered images and the plurality of sample object images;

updating the sample object feature, the plurality of sample object difference attribute features, the sample color feature, the first network parameter of the first preset neural network, and the second network parameter of the second preset neural network when the second target loss does not satisfy a second preset condition;

based on the updated sample object characteristics, the updated sample object difference attribute characteristics, the updated sample color characteristics, the updated first preset neural network and the updated second preset neural network, repeating the steps of inputting the pixel coordinates of each sample pixel point, the sample object characteristics and the sample object difference attribute characteristics into the first preset neural network for object shape characterization to obtain sample object shape characterization information, and determining a second target loss according to the plurality of sample rendering images and the plurality of sample object images until the second target loss meets a second preset condition;

and under the condition that the second target loss meets the second preset condition, taking a current first preset neural network as the shape characterization network, and taking a current second preset neural network as the image rendering network.

Optionally, the method further includes:

acquiring a three-dimensional coordinate corresponding to each point on a three-dimensional model of a preset object, shape representation information of a labeled object of the preset object and a labeled normal vector corresponding to each point;

inputting the three-dimensional coordinates into an initial neural network to perform object shape characterization, and obtaining a prediction method vector and prediction object shape characterization information corresponding to the preset object;

determining a third target loss according to the prediction normal vector, the prediction object shape representation information, the marking object shape representation information and the marking normal vector;

and training the initial neural network based on the third target loss to obtain the first preset neural network.

According to a second aspect of embodiments of the present disclosure, there is provided an object reconstruction apparatus, comprising:

the information acquisition module is configured to acquire a plurality of target object images of a target object acquired under multiple visual angles and illumination information of each pixel point in the target object images;

a feature generation module configured to perform generating an object feature of the target object, a color feature of the target object, and a plurality of object difference attribute features corresponding to the target object at the multiple viewing angles;

a first object shape representation module, configured to perform object shape representation by inputting the pixel coordinates of each pixel, the object features, and the object difference attribute features into a shape representation network, so as to obtain object shape representation information, where the object shape representation information represents distance information between each pixel and a geometric shape boundary of the target object;

a first image rendering processing module configured to input the pixel coordinates of each pixel point, the object shape representation information, the color features, and the illumination information into an image rendering network for image rendering processing, so as to obtain a plurality of rendered images under the multiple viewing angles;

and the object reconstruction updating module is configured to perform object reconstruction updating on the object shape representation information based on the plurality of rendering images and the plurality of target object images to obtain three-dimensional reconstruction information of the target object.

Optionally, the object reconstruction updating module includes:

a first target loss determination unit configured to perform determining a first target loss from the plurality of rendered images and the plurality of target object images;

a feature updating unit configured to perform updating the object feature, the plurality of object difference attribute features, and the color feature in a case where the first target loss does not satisfy a first preset condition;

a first iteration unit, configured to execute, based on the updated object feature, the updated plurality of object difference attribute features, and the updated color feature, repeating the step of inputting the pixel coordinates of each pixel point, the object feature, and the plurality of object difference attribute features into a shape representation network to perform object shape representation, so as to obtain object shape representation information, and determining a first target loss according to the plurality of rendered images and the plurality of target object images until the first target loss satisfies the first preset condition;

a three-dimensional reconstruction information determination unit configured to perform, as the three-dimensional reconstruction information, object shape representation information output by a current shape representation network in a case where the first target loss satisfies the first preset condition.

Optionally, the apparatus further comprises:

a pixel coordinate determination module configured to perform determining a pixel coordinate of each pixel point in the plurality of target object images;

the encoding processing module is configured to perform encoding processing on the pixel coordinates to obtain pixel coordinate characteristics;

the first object shape representation module is further configured to perform object shape representation by inputting the pixel coordinate features, the object features and the plurality of object difference attribute features into the shape representation network, so as to obtain object shape representation information;

the first image rendering processing module is further configured to input the pixel coordinate features, the object shape representation information, the color features, and the illumination information into the image rendering network for image rendering processing, so as to obtain the plurality of rendered images.

Optionally, the feature generation module includes:

an object feature generation unit configured to perform random generation of the object feature based on first preset dimension information;

a color feature generation unit configured to perform random generation of the color feature based on second preset dimension information;

an object difference attribute feature generation unit configured to perform random generation of the plurality of object difference attribute features based on third preset dimension information.

Optionally, the apparatus further comprises:

a first sample information acquiring module configured to perform acquiring a plurality of sample object images of a sample object acquired under the multi-view and sample illumination information of each sample pixel point in the plurality of sample object images;

a sample feature generation module configured to perform generating a sample object feature of the sample object, a sample color feature of the sample object, and a plurality of sample object difference attribute features to which the sample object corresponds at the multiple viewing angles;

a second object shape representation module, configured to perform object shape representation by inputting the pixel coordinates of each sample pixel point, the sample object features, and the sample object difference attribute features into a first preset neural network, so as to obtain sample object shape representation information, where the sample object shape representation information is distance information from each sample pixel point to a geometric shape boundary of the sample object;

the second image rendering processing module is configured to execute the pixel coordinates of each sample pixel point, the sample object shape representation information, the sample color characteristics and the sample illumination information, input a second preset neural network for image rendering processing, and obtain a plurality of sample rendering images under the multiple viewing angles;

a first network training module configured to perform training the first preset neural network and the second preset neural network based on the plurality of sample rendering images and the plurality of sample object images, resulting in the shape characterization network and the image rendering network.

Optionally, the first network training module includes:

a second target loss determination unit configured to perform determining a second target loss from the plurality of sample rendered images and the plurality of sample object images;

a training data updating unit configured to perform updating of the sample object feature, the plurality of sample object difference attribute features, the sample color feature, a first network parameter of the first preset neural network, and a second network parameter of the second preset neural network, in a case where the second target loss does not satisfy a second preset condition;

a second iteration unit configured to execute, based on the updated sample object feature, the updated plurality of sample object difference attribute features, the updated sample color feature, the updated first preset neural network, and the updated second preset neural network, repeating the step of inputting the pixel coordinate of each sample pixel point, the sample object feature, and the plurality of sample object difference attribute features into the first preset neural network for object shape characterization to obtain sample object shape characterization information, and determining a second target loss according to the plurality of sample rendering images and the plurality of sample object images until the second target loss satisfies the second preset condition;

a network determining unit configured to perform, in a case where the second target loss satisfies the second preset condition, regarding a current first preset neural network as the shape characterizing network, and regarding a current second preset neural network as the image rendering network.

Optionally, the apparatus further comprises:

the second sample information acquisition module is configured to acquire a three-dimensional coordinate corresponding to each point on a three-dimensional model of a preset object, labeled object shape representation information of the preset object and a labeled normal vector corresponding to each point;

the third object shape representation module is configured to input the three-dimensional coordinates into an initial neural network for object shape representation, so that a prediction method vector and prediction object shape representation information corresponding to the preset object are obtained;

a third target loss determination module configured to perform a determination of a third target loss from the prediction normal vector, the prediction object shape characterization information, the annotation object shape characterization information, and the annotation normal vector;

a second network training module configured to perform training the initial neural network based on the third target loss, resulting in the first preset neural network.

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including: a processor; a memory for storing the processor-executable instructions; wherein the processor is configured to execute the instructions to implement the method of any of the first aspects above.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium, wherein instructions, when executed by a processor of an electronic device, enable the electronic device to perform the method of any one of the first aspects of the embodiments of the present disclosure.

According to a fifth aspect of the embodiments of the present disclosure, there is provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the method of any one of the first aspects of the embodiments of the present disclosure.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

the object shape representation learning is carried out by combining the pixel coordinates and the object characteristics of the pixel points in the multiple target object images under multiple visual angles and the multiple object difference attribute characteristics of the multiple visual angles, the learned object shape representation information is rendered by combining an image rendering network, the object shape representation information can be reconstructed and updated by combining the rendered image and the target object images, the accuracy of the shape representation of the target object by the shape representation network is improved, the object reconstruction is carried out by combining the object shape representation representing the geometric shape boundary distance information between each pixel point and the target object, the continuity of the three-dimensional reconstruction information obtained by reconstruction can be improved while the storage overhead of the object representation is effectively reduced, the more comprehensive and accurate object shape representation is realized, the resolution of the object reconstruction information is improved, and the application requirements of the multiple resolutions can be met, and the vividness and the resolution of the virtual image generated based on the three-dimensional reconstruction information are improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

FIG. 1 is a schematic diagram illustrating an application environment in accordance with an illustrative embodiment;

FIG. 2 is a flow chart illustrating a method of object reconstruction in accordance with an exemplary embodiment;

FIG. 3 is a flow diagram illustrating training of a shape representation network and an image rendering network in accordance with an exemplary embodiment;

FIG. 4 is a flowchart illustrating training a first predetermined neural network and a second predetermined neural network based on a plurality of sample rendered images and a plurality of sample object images to obtain a shape characterization network and an image rendering network according to an exemplary embodiment;

FIG. 5 is a flow diagram illustrating a generation of a first neural network, according to an exemplary embodiment;

FIG. 6 is a flowchart illustrating object reconstruction updating of object shape characterization information based on a plurality of rendered images and a plurality of target object images to obtain three-dimensional reconstruction information of a target object according to an exemplary embodiment;

FIG. 7 is a flow chart illustrating another method of object reconstruction in accordance with an exemplary embodiment;

FIG. 8 is a block diagram of an object reconstruction apparatus according to an exemplary embodiment;

FIG. 9 is a block diagram illustrating an electronic device for object reconstruction in accordance with an exemplary embodiment;

FIG. 10 is a block diagram illustrating an electronic device for object reconstruction in accordance with an exemplary embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

It should be noted that, the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for presentation, analyzed data, etc.) referred to in the present disclosure are information and data authorized by the user or sufficiently authorized by each party.

Referring to fig. 1, fig. 1 is a schematic diagram illustrating an application environment according to an exemplary embodiment, which may include a server 100 and a terminal 200, as shown in fig. 1.

In an alternative embodiment, the server 100 may be used to train a shape representation network and an image rendering network. Specifically, the server 200 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), a big data and artificial intelligence platform, and the like.

In an optional embodiment, the terminal 200 may be configured to perform shape reconstruction of the target object based on the shape characterization network and the image rendering network trained by the server 100, so as to obtain three-dimensional reconstruction information of the target object. Specifically, the terminal 200 may include, but is not limited to, a smart phone, a desktop computer, a tablet computer, a notebook computer, a smart speaker, a digital assistant, an Augmented Reality (AR)/Virtual Reality (VR) device, a smart wearable device, and other types of electronic devices, and may also be software running on the electronic devices, such as an application program. Optionally, the operating system running on the electronic device may include, but is not limited to, an android system, an IOS system, linux, windows, and the like.

In addition, it should be noted that fig. 1 shows only one application environment provided by the present disclosure, and in practical applications, other application environments may also be included, for example, shape reconstruction of a target object based on a trained shape characterization network and an image rendering network may also be implemented on a server side.

In the embodiment of the present specification, the server 100 and the terminal 200 may be directly or indirectly connected through wired or wireless communication, and the disclosure is not limited herein.

Fig. 2 is a flowchart illustrating an object reconstruction method according to an exemplary embodiment, and the object reconstruction method is used in an electronic device such as a terminal or a server, as shown in fig. 2, and includes the following steps.

In step S201, a plurality of target object images of a target object collected under multiple viewing angles and illumination information of each pixel point in the plurality of target object images are acquired.

In an optional embodiment, the target object may be an object that needs to be three-dimensionally reconstructed, and may be specifically set in combination with requirements of an actual application scenario. Optionally, in an application scenario of emoticons, the target object may include a human face. Optionally, in an application scenario of character animation, the target object may include elements required in a character animation process, such as hands and feet.

In an optional embodiment, the target object images may be object images at multiple viewing angles (i.e., multiple viewing angles) obtained by rendering in combination with a three-dimensional model of the target object, or may be object images captured by the camera device from multiple viewing angles. Specifically, the selection of the plurality of viewing angles may be set in conjunction with actual applications, such as the front, back, and both sides of the target object.

In a specific embodiment, the illumination information may include an incident vector and a normal vector corresponding to each pixel point. Specifically, the incident vector corresponding to each pixel point may be direction information of incident light received by a three-dimensional point (a point on the target object or a point on the three-dimensional model) corresponding to the pixel point when acquiring the target object image at the corresponding viewing angle; the normal vector corresponding to each pixel point can be a normal vector of the surface where the three-dimensional point corresponding to the pixel point is located when the target object image under the corresponding view angle is collected.

In step S203, an object feature of the target object, a color feature of the target object, and a plurality of object difference attribute features corresponding to the target object in multiple viewing angles are generated.

In a specific embodiment, the object feature may be feature information common to the target object in multiple viewing angles, and the color feature of the target object may be feature information representing a color of the target object; the plurality of target object images include the same target object, and correspondingly, the plurality of target object images correspond to the same object feature. Specifically, the multiple object difference attribute features may represent target attributes of the target object under multiple viewing angles, and specifically, the target attributes may be attributes that may have differences in attribute information of the target object due to different viewing angles. In a specific embodiment, taking a target object as a face as an example, the target attribute may be an expression; taking the target object as a hand as an example, the target attribute may be a gesture.

In a specific embodiment, the dimension information of the object feature, the color feature, and the multiple object difference attribute features may be predetermined in combination with an actual application, and optionally, the generating the object feature of the target object, the color feature of the target object, and the multiple object difference attribute features corresponding to the target object under multiple viewing angles may include:

randomly generating object features based on the first preset dimension information;

randomly generating color features based on second preset dimension information;

and randomly generating a plurality of object difference attribute characteristics based on the third preset dimension information.

In a specific embodiment, the first preset dimension information may be dimension information of a preset object feature; the second preset dimension information may be dimension information of preset color features; the third preset dimension information may be dimension information in which a plurality of object difference attribute features are preset.

In the above embodiment, the preset dimension information is combined, the object features and the color features of the target object and the multiple object difference attribute features under multiple viewing angles are randomly generated, so that more comprehensive object shape information can be conveniently learned subsequently, and the accuracy of object reconstruction is improved.

In step S205, the pixel coordinates, the object characteristics, and the plurality of object difference attribute characteristics of each pixel point are input into a shape representation network to perform object shape representation, so as to obtain object shape representation information.

In a specific embodiment, the object shape characterizing information may characterize distance information from each pixel point to a geometric shape boundary of the target object. In a specific embodiment, the distance information may include a distance value and a sign, wherein the sign includes a positive sign and a negative sign; the positive sign represents that the corresponding pixel point is positioned inside the geometric shape boundary of the target object; the negative sign represents that the corresponding pixel point is outside the geometric shape boundary of the target object.

In a specific embodiment, the object shape characterizing information may be an SDF (Signed Distance Function) value, and specifically, the SDF value may be

Wherein f () can represent a symbol distance function, [, etc]Can represent a pair

z₁And z₂Merging is carried outIn the operation of the method, the operation,

can represent pixel coordinates, z, of pixel points in a plurality of target object images₁Can represent an object feature, z₂Multiple object difference attribute features may be represented.

In a particular embodiment, the shape representation network may be pre-trained, and the shape representation network may be used to learn shape representations of different objects. Optionally, in the process of training the shape representation network, in order to improve the accuracy of the shape representation of different objects by the shape representation network, in the process of training the shape representation network, the training of the image rendering network may be performed in combination with object shape representation information obtained by the shape representation network. Specifically, the image rendering network may be configured to render object shape representation information corresponding to the image, and may further implement, in combination with the rendered image output by the image rendering network, self-supervision training of the shape representation network.

Correspondingly, the method may further include: the step of training the shape representation network and the image rendering network, specifically, as shown in fig. 3, may include the following steps:

in step S301, sample illumination information of each sample pixel point in a plurality of sample object images and a plurality of sample object images of a sample object acquired under multiple viewing angles is acquired;

in step S303, a sample object feature of the sample object, a sample color feature of the sample object, and a plurality of sample object difference attribute features corresponding to the sample object under multiple viewing angles are generated;

in step S305, inputting the pixel coordinates of each sample pixel, the sample object characteristics, and the multiple sample object difference attribute characteristics into a first preset neural network for object shape characterization, to obtain sample object shape characterization information;

in step S307, the pixel coordinates of each sample pixel point, the sample object shape characterization information, the sample color characteristics, and the sample illumination information are input to a second preset neural network to perform image rendering processing, so as to obtain a plurality of sample rendered images under multiple viewing angles;

in step S309, a first preset neural network and a second preset neural network are trained based on the plurality of sample rendering images and the plurality of sample object images, so as to obtain a shape representation network and an image rendering network.

In a specific embodiment, the sample object may be an object of the same type as the target object, and specifically, specific details of the step S301 and the step S303 may refer to specific details of the step S201 and the step S203, which are not described herein again.

In a specific embodiment, the first predetermined neural network may be a shape characterization network to be trained, and the second predetermined neural network may be an image rendering network to be trained. Specifically, the network structures of the first preset neural network and the second preset neural network can be set in combination with practical application; optionally, the first preset neural network may be a network including a plurality of fully-connected layers, or may also be a network including convolutional layers, or the like; optionally, the second predetermined neural network may be a network including a plurality of fully-connected layers, or may be a network including convolutional layers, or the like. The trained shape representation network has the same network structure as the first preset neural network, but has different network parameters; the network structure of the trained image rendering network is the same as that of the second preset neural network, but the network parameters are different.

In a specific embodiment, the sample object shape characterizing information may be distance information between each sample pixel point and a geometric shape boundary of the sample object.

In an alternative embodiment, as shown in fig. 4, the training the first preset neural network and the second preset neural network based on the plurality of sample rendering images and the plurality of sample object images to obtain the shape characterization network and the image rendering network may include the following steps:

in step S401, determining a second target loss from the plurality of sample rendered images and the plurality of sample object images;

in step S403, in a case that the second target loss does not satisfy the second preset condition, updating the sample object feature, the plurality of sample object difference attribute features, the sample color feature, the first network parameter of the first preset neural network, and the second network parameter of the second preset neural network;

in step S405, based on the updated sample object feature, the updated plurality of sample object difference attribute features, the updated sample color feature, the updated first preset neural network, and the updated second preset neural network, repeatedly inputting the pixel coordinate of each sample pixel point, the sample object feature, and the plurality of sample object difference attribute features into the first preset neural network for object shape characterization to obtain sample object shape characterization information, and determining a second target loss according to the plurality of sample rendered images and the plurality of sample object images until the second target loss satisfies a second preset condition;

in step S407, in a case where the second target loss satisfies the second preset condition, the current first preset neural network is taken as the shape characterizing network, and the current second preset neural network is taken as the image rendering network.

In a specific embodiment, determining the second target loss according to the plurality of sample rendering images and the plurality of sample object images may include determining loss information between each sample object image and the corresponding sample rendering image based on a first preset loss function, and then adding the loss information of each sample object image pair to obtain the second target loss. In particular, the second target loss may characterize a difference between the plurality of sample rendered images and the respective corresponding sample object images.

In a particular embodiment, the first predetermined loss function may include, but is not limited to, an L1 norm loss function, a cross entropy loss function, a mean square error loss function, a logic loss function, an exponential loss function, and the like.

In a specific embodiment, the sample object characteristic, the plurality of sample object difference attribute characteristics, the sample color characteristic, the first network parameter of the first preset neural network and the second network parameter of the second preset neural network may be updated in combination with a gradient descent method.

In a specific embodiment, the second target loss meeting the second preset condition may be that the second target loss is less than or equal to a specified threshold, or that a difference between corresponding second target losses in two previous training sessions is less than a certain threshold. In the embodiment of the present specification, the specified threshold and a certain threshold may be set in combination with actual training requirements.

In the above embodiment, in the network training process, the pixel coordinates of the pixel points in the multiple sample object images under the multiple viewing angles, the sample object characteristics, and the multiple sample object difference attribute characteristics of the multiple viewing angles are combined to perform object shape representation learning, the learned sample object shape representation information is rendered by combining the image rendering network, the sample object shape representation information can be reconstructed and updated by combining the sample rendering image and the sample object images, and the accuracy of the shape representation of the sample object by the shape representation network is improved. In the network training process, the loss of a second target determined by combining a plurality of sample rendering images and a plurality of sample object images is combined, the first network parameter and the second network parameter are updated, meanwhile, the sample object characteristic, a plurality of sample object difference attribute characteristics and the sample color characteristic are updated, the characterization accuracy of the sample object characteristic and the sample color characteristic on the sample object can be improved while the characterization capability of the first preset neural network on the object shape is optimized, the accuracy of the plurality of sample object difference attribute characteristics on the attribute characterization of the sample object under a plurality of visual angles is improved, and the object shape characterization accuracy of the trained shape characterization network is further improved.

In an optional embodiment, in order to reduce the difficulty of learning the object shape representation from the sample object image of the sample object, a shape representation network to be trained may be initialized in advance, and accordingly, the method further includes: the first preset neural network generating step, specifically, as shown in fig. 5, may include:

in step S501, a three-dimensional coordinate corresponding to each point on the three-dimensional model of the preset object, labeled object shape representation information of the preset object, and a labeled normal vector corresponding to each point are obtained.

In step S503, the three-dimensional coordinates are input to the initial neural network for object shape representation, so as to obtain a prediction method vector and prediction object shape representation information corresponding to the preset object.

In step S505, a third target loss is determined based on the prediction normal vector, the prediction object shape representation information, the labeling object shape representation information, and the labeling normal vector.

In step S507, the initial neural network is trained based on the third target loss to obtain a first preset neural network.

In a specific embodiment, the three-dimensional model may be generated by an associated hardware device (e.g., a camera device) on the terminal. Specifically, the camera device often generates a three-dimensional model of a preset object in combination with a preset lighting environment. In an alternative, for example, the face may be obtained by obtaining a plurality of three-dimensional face models, and performing averaging processing based on the plurality of three-dimensional face models to obtain a three-dimensional average face model, and optionally, the three-dimensional model of the preset object may be a three-dimensional average object (face) model.

In a specific embodiment, the shape representation information of the labeled object and the labeled normal vector corresponding to each point on the three-dimensional model may be determined in advance by combining the three-dimensional model.

In an optional embodiment, the initial neural network may be a shape characterization network to be trained, and correspondingly, the first preset neural network may be a neural network after the initial neural network is initialized.

In an optional embodiment, the determining the third target loss according to the prediction vector, the prediction object shape representation information, the labeled object shape representation information, and the labeled normal vector may include: determining first loss information between the prediction normal vector and the labeling normal vector based on a second preset loss function, and determining second loss information between the shape representation information of the prediction object and the shape representation information of the labeling object based on a third preset loss function; next, the first loss information and the second loss information are added or weighted-added to obtain a third target loss. Specifically, the third target loss may represent a difference between the prediction normal vector and the labeling normal vector, and a difference between the prediction object shape representation information and the labeling object shape representation information.

In a particular embodiment, the second and third predetermined loss functions may include, but are not limited to, an L1 norm loss function, a cross entropy loss function, a mean square error loss function, a logic loss function, an exponential loss function, and the like. The first predetermined loss function, the second predetermined loss function, and the third predetermined loss function may be the same loss function or different loss functions.

In a specific embodiment, training the initial neural network based on the third target loss to obtain the first preset neural network may include updating network parameters of the initial neural network based on a gradient descent method when the third target loss satisfies a third preset condition, repeating the steps S503 and S505 based on the updated initial neural network until the third target loss satisfies the third preset condition, and taking the current initial neural network as the first preset neural network when the third target loss satisfies the third preset condition.

In a specific embodiment, the third target loss meeting the third preset condition may be that the third target loss is less than or equal to a specified threshold, or a difference between corresponding third target losses in two previous training sessions is less than a certain threshold. In the embodiment of the present specification, the specified threshold and a certain threshold may be set in combination with actual training requirements.

In the above embodiment, the shape representation network to be trained is initialized by combining the three-dimensional model of the preset object, so that the accuracy of the first preset neural network in object shape representation can be improved, the difficulty in learning object shape representation from the sample object image of the sample object is further reduced, and the training speed of the shape representation network is improved.

In step S207, the pixel coordinates, the object shape representation information, the color features, and the illumination information of each pixel point are input to an image rendering network for image rendering processing, so as to obtain a plurality of rendered images under multiple viewing angles.

In a specific embodiment, after the object shape representation information of the target object is learned in combination with the shape representation network, the object shape representation information corresponding to the target object image is rendered in combination with the image rendering network, and then the rendered image output by the image rendering network can be combined, so that the self-supervision training of the shape representation network is realized, and the accuracy of the shape representation network in shape representation of different objects is improved.

In step S209, the object shape representation information is subjected to object reconstruction update based on the plurality of rendering images and the plurality of target object images, and three-dimensional reconstruction information of the target object is obtained.

In an alternative embodiment, as shown in fig. 6, the performing object reconstruction updating on the object shape representation information based on the plurality of rendering images and the plurality of target object images to obtain three-dimensional reconstruction information of the target object may include the following steps:

in step S601, a first target loss is determined from the plurality of rendering images and the plurality of target object images;

in step S603, in a case where the first target loss does not satisfy the first preset condition, updating the object feature, the plurality of object difference attribute features, and the color feature;

in step S605, based on the updated object feature, the updated plurality of object difference attribute features, and the updated color feature, repeatedly inputting the pixel coordinate, the object feature, and the plurality of object difference attribute features of each pixel point into a shape representation network to perform object shape representation, so as to obtain object shape representation information, and determining a first target loss according to the plurality of rendered images and the plurality of target object images until the first target loss satisfies a first preset condition;

in step S607, when the first target loss satisfies the first preset condition, the object shape representing information output by the current shape representing network is used as the three-dimensional reconstruction information.

In a specific embodiment, the specific refinement for determining the first target loss may refer to the related refinement for determining the second target loss, which is not described herein again.

In a specific embodiment, the first target loss meeting the first preset condition may be that the first target loss is less than or equal to a specified threshold, or that a difference between corresponding first target losses in two previous training sessions is less than a certain threshold. In the embodiment of the present specification, the specified threshold and a certain threshold may be set in combination with actual training requirements.

In the above embodiment, in the object reconstruction updating process, the object feature, the plurality of object difference attribute features, and the color feature are updated in combination with the first object loss determined by the plurality of rendering images and the plurality of target object images, so that the accuracy of the object feature and the color feature in characterizing the target object itself, and the accuracy of the plurality of object difference attribute features in characterizing the attributes of the target object at a plurality of viewing angles can be improved, and further the accuracy of object shape characterization of the shape characterization network can be improved better.

In an optional embodiment, based on the three-dimensional reconstruction information of the target object, a map, a special effect animation, and the like of the target object may be generated, and the setting may be specifically performed in combination with the actual application requirement.

As can be seen from the technical solutions provided by the embodiments of the present specification, in the embodiments of the present specification, object shape representation learning is performed by combining pixel coordinates and object features of pixel points in multiple target object images under multiple viewing angles, and rendering learned object shape representation information is performed by combining an image rendering network, object shape representation information can be reconstructed and updated by combining the rendered image and the target object image, accuracy of the shape representation of the target object by the shape representation network is improved, and object reconstruction is performed by combining object shape representation representing distance information between each pixel point and a geometric shape boundary of the target object, so that storage overhead of the object representation can be effectively reduced, continuity of reconstructed three-dimensional reconstruction information is improved, and more comprehensive and accurate object shape representation is realized, the resolution ratio of the object reconstruction information is improved, meanwhile, the application requirements of multiple resolutions can be met, and the vividness and the resolution ratio of a virtual image generated based on the three-dimensional reconstruction information are improved.

In an alternative embodiment, as shown in fig. 7, the method may further include:

in step S211, the pixel coordinates of each pixel point in the target object images are determined;

in step S213, the pixel coordinates are encoded to obtain pixel coordinate characteristics;

in an alternative embodiment, a fourier sine and cosine coordinate encoding algorithm may be combined to encode the pixel coordinates to obtain the pixel coordinate characteristics.

Correspondingly, the step of inputting the pixel coordinate, the object feature and the plurality of object difference attribute features of each pixel point into the shape representation network for object shape representation to obtain the object shape representation information may include:

inputting the pixel coordinate characteristics, the object characteristics and the plurality of object difference attribute characteristics into a shape representation network for object shape representation to obtain object shape representation information;

the above inputting the pixel coordinates, the object shape representation information, the color characteristics, and the illumination information of each pixel point into the image rendering network for image rendering processing to obtain a plurality of rendered images under multiple viewing angles may include:

and inputting the pixel coordinate characteristics, the object shape representation information, the color characteristics and the illumination information into an image rendering network for image rendering processing to obtain a plurality of rendered images.

Optionally, in the shape characterization network and image rendering network training process, the sample pixel coordinates may also be replaced with sample pixel codes.

In the embodiment, the pixel coordinate characteristics obtained after the pixel coordinates are encoded are input into the shape representation network for learning, so that the learning difficulty of the shape representation network on high-frequency information is reduced, the learning difficulty of the network is effectively reduced, and the learning speed of the shape representation network is further improved.

Fig. 8 is a block diagram of an object reconstruction apparatus according to an exemplary embodiment. Referring to fig. 8, the apparatus includes:

the information acquiring module 810 is configured to perform acquiring a plurality of target object images of a target object acquired under multiple viewing angles and illumination information of each pixel point in the plurality of target object images;

a feature generation module 820 configured to perform generating an object feature of the target object, a color feature of the target object, and a plurality of object difference attribute features corresponding to the target object under multiple viewing angles;

a first object shape representation module 830, configured to perform object shape representation by inputting the pixel coordinate of each pixel, the object feature, and the multiple object difference attribute features into a shape representation network, so as to obtain object shape representation information, where the object shape representation information represents distance information from each pixel to a geometric shape boundary of a target object;

the first image rendering processing module 840 is configured to input the pixel coordinates, the object shape representation information, the color features and the illumination information of each pixel point into an image rendering network for image rendering processing, so as to obtain a plurality of rendered images under multiple viewing angles;

and an object reconstruction updating module 850 configured to perform object reconstruction updating on the object shape representation information based on the plurality of rendering images and the plurality of target object images, resulting in three-dimensional reconstruction information of the target object.

Optionally, the object reconstruction update module 850 includes:

a first target loss determination unit configured to perform determining a first target loss from the plurality of rendering images and the plurality of target object images;

a feature updating unit configured to perform updating of the object feature, the plurality of object difference attribute features, and the color feature in a case where the first target loss does not satisfy a first preset condition;

a first iteration unit configured to execute a step of inputting the pixel coordinates, the object features and the object difference attribute features of each pixel point into a shape representation network repeatedly for object shape representation based on the updated object features, the updated object difference attribute features and the updated color features, to obtain object shape representation information, and determining a first target loss according to the plurality of rendered images and the plurality of target object images until the first target loss meets a first preset condition;

and the three-dimensional reconstruction information determining unit is configured to output the object shape representation information output by the current shape representation network as the three-dimensional reconstruction information under the condition that the first target loss meets a first preset condition.

Optionally, the apparatus further comprises:

a pixel coordinate determination module configured to perform determining a pixel coordinate of each pixel point in a plurality of target object images;

the first object shape representation module is also configured to input the pixel coordinate characteristics, the object characteristics and the plurality of object difference attribute characteristics into a shape representation network for object shape representation, so as to obtain object shape representation information;

the first image rendering processing module is further configured to input the pixel coordinate characteristics, the object shape representation information, the color characteristics and the illumination information into an image rendering network for image rendering processing, so as to obtain a plurality of rendered images.

Optionally, the feature generating module 820 includes:

an object feature generation unit configured to perform random generation of object features based on first preset dimension information;

a color feature generation unit configured to perform random generation of color features based on the second preset dimension information;

and the object difference attribute feature generation unit is configured to randomly generate a plurality of object difference attribute features based on the third preset dimension information.

Optionally, the apparatus further comprises:

the system comprises a first sample information acquisition module, a second sample information acquisition module and a third sample information acquisition module, wherein the first sample information acquisition module is configured to acquire a plurality of sample object images of a sample object acquired under multiple visual angles and sample illumination information of each sample pixel point in the plurality of sample object images;

a sample feature generation module configured to perform generating a sample object feature of the sample object, a sample color feature of the sample object, and a plurality of sample object difference attribute features corresponding to the sample object under multiple viewing angles;

the second object shape characterization module is configured to input the pixel coordinates of each sample pixel point, the sample object characteristics and the multiple sample object difference attribute characteristics into a first preset neural network for object shape characterization to obtain sample object shape characterization information, wherein the sample object shape characterization information is distance information from each sample pixel point to a geometric shape boundary of a sample object;

the second image rendering processing module is configured to input the pixel coordinates of each sample pixel point, the sample object shape representation information, the sample color characteristics and the sample illumination information into a second preset neural network for image rendering processing to obtain a plurality of sample rendering images under multiple viewing angles;

and the first network training module is configured to train a first preset neural network and a second preset neural network based on the plurality of sample rendering images and the plurality of sample object images to obtain a shape representation network and an image rendering network.

Optionally, the first network training module includes:

a training data updating unit configured to perform updating of the sample object feature, the plurality of sample object difference attribute features, the sample color feature, the first network parameter of the first preset neural network, and the second network parameter of the second preset neural network, in a case where the second target loss does not satisfy the second preset condition;

a second iteration unit configured to execute a step of repeatedly inputting the pixel coordinate of each sample pixel point, the sample object feature and the plurality of sample object difference attribute features into the first preset neural network for object shape characterization based on the updated sample object feature, the updated plurality of sample object difference attribute features, the updated sample color feature, the updated first preset neural network and the updated second preset neural network, so as to obtain sample object shape characterization information, and determining a second target loss according to the plurality of sample rendering images and the plurality of sample object images until the second target loss meets a second preset condition;

and the network determining unit is configured to execute taking the current first preset neural network as the shape characterization network and taking the current second preset neural network as the image rendering network under the condition that the second target loss meets a second preset condition.

Optionally, the apparatus further comprises:

the second sample information acquisition module is configured to acquire a three-dimensional coordinate corresponding to each point on the three-dimensional model of the preset object, the labeled object shape representation information of the preset object and a labeled normal vector corresponding to each point;

the third object shape representation module is configured to input the three-dimensional coordinates into the initial neural network to carry out object shape representation, and a prediction method vector and prediction object shape representation information corresponding to a preset object are obtained;

and the second network training module is configured to execute training of the initial neural network based on the third target loss to obtain a first preset neural network.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 9 is a block diagram illustrating an electronic device for object reconstruction, which may be a terminal, according to an exemplary embodiment, and an internal structure thereof may be as shown in fig. 9. The electronic device comprises a processor, a memory, a network interface, a display screen and an input device which are connected through a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic equipment comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the electronic device is used for connecting and communicating with an external terminal through a network. The computer program is executed by a processor to implement an object reconstruction method. The display screen of the electronic equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the electronic equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the electronic equipment, an external keyboard, a touch pad or a mouse and the like.

Fig. 10 is a block diagram illustrating an electronic device for object reconstruction, which may be a server, according to an exemplary embodiment, and an internal structure thereof may be as shown in fig. 10. The electronic device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic equipment comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the electronic device is used for connecting and communicating with an external terminal through a network. The computer program is executed by a processor to implement an object reconstruction method.

It will be understood by those skilled in the art that the configurations shown in fig. 9 or fig. 10 are only block diagrams of some configurations relevant to the present disclosure, and do not constitute a limitation on the electronic device to which the present disclosure is applied, and a particular electronic device may include more or less components than those shown in the drawings, or combine some components, or have a different arrangement of components.

In an exemplary embodiment, there is also provided an electronic device including: a processor; a memory for storing the processor-executable instructions; wherein the processor is configured to execute the instructions to implement an object reconstruction method as in embodiments of the present disclosure.

In an exemplary embodiment, there is also provided a computer-readable storage medium, in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform an object reconstruction method in embodiments of the present disclosure.

In an exemplary embodiment, a computer program product containing instructions is also provided, which when run on a computer, causes the computer to perform the object reconstruction method in embodiments of the present disclosure.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method of object reconstruction, comprising:

2. The object reconstruction method according to claim 1, wherein the updating of the object shape representation information based on the plurality of rendered images and the plurality of target object images to obtain the three-dimensional reconstruction information of the target object comprises:

3. The object reconstruction method according to claim 1, further comprising:

coding the pixel coordinates to obtain pixel coordinate characteristics;

4. The object reconstruction method according to claim 1, wherein the generating of the object feature of the target object, the color feature of the target object, and the plurality of object difference attribute features corresponding to the target object under the multi-view comprises:

5. The object reconstruction method according to any one of claims 1 to 4, wherein the method further comprises:

6. The object reconstruction method according to claim 5, wherein the training the first predetermined neural network and the second predetermined neural network based on the plurality of sample rendered images and the plurality of sample object images to obtain the shape characterization network and the image rendering network comprises:

7. An object reconstruction apparatus, comprising:

8. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the object reconstruction method of any one of claims 1 to 6.

9. A computer-readable storage medium, wherein instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the object reconstruction method of any one of claims 1 to 6.

10. A computer program product comprising computer instructions, characterized in that the computer instructions, when executed by a processor, implement the object reconstruction method of any one of claims 1 to 6.