CN116012666B - Image generation, model training and information reconstruction methods and devices and electronic equipment - Google Patents

Image generation, model training and information reconstruction methods and devices and electronic equipment Download PDF

Info

Publication number
CN116012666B
CN116012666B CN202211644067.XA CN202211644067A CN116012666B CN 116012666 B CN116012666 B CN 116012666B CN 202211644067 A CN202211644067 A CN 202211644067A CN 116012666 B CN116012666 B CN 116012666B
Authority
CN
China
Prior art keywords
image
scrambling
sample
processed
map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211644067.XA
Other languages
Chinese (zh)
Other versions
CN116012666A (en
Inventor
李�杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu com Times Technology Beijing Co Ltd
Original Assignee
Baidu com Times Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Baidu com Times Technology Beijing Co Ltd filed Critical Baidu com Times Technology Beijing Co Ltd
Priority to CN202211644067.XA priority Critical patent/CN116012666B/en
Publication of CN116012666A publication Critical patent/CN116012666A/en
Application granted granted Critical
Publication of CN116012666B publication Critical patent/CN116012666B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The disclosure provides an image generation method, a model training method, an information reconstruction device and electronic equipment, relates to the technical field of artificial intelligence, in particular to the technical fields of computer vision, augmented reality, virtual reality, deep learning and the like, and can be applied to scenes such as metauniverse, virtual digital people and the like. The specific implementation scheme is as follows: acquiring a first regression parameter of an image to be processed, wherein the image to be processed comprises a target sample object, and the first regression parameter comprises geometric information of the target sample object; determining a geometric model corresponding to the target sample object according to the geometric information of the target sample object; scrambling the geometric model to obtain a scrambling map; determining label information of the scrambling map according to scrambling regression parameters of the scrambling map; and determining the sample image according to the tag information and the scrambling map.

Description

Image generation, model training and information reconstruction methods and devices and electronic equipment
Technical Field
The disclosure relates to the technical field of artificial intelligence, in particular to the technical fields of computer vision, augmented reality, virtual reality, deep learning and the like, and can be applied to scenes such as metauniverse, virtual digital people and the like, and particularly relates to an image generation method, a model training method, an information reconstruction device and electronic equipment.
Background
Virtual digital people are one of the key elements that create a metauniverse virtual world. According to different business requirements of digital persons, the digital persons can be divided into 2-dimensional, 3-dimensional, cartoon, realistic writing, super realistic writing and the like. In a real scenario, a basic avatar to adapt to business needs to be built for a virtual digital person.
Disclosure of Invention
The disclosure provides an image generation method, a model training method, an information reconstruction device and electronic equipment.
According to an aspect of the present disclosure, there is provided a sample image generating method including: acquiring a first regression parameter of an image to be processed, wherein the image to be processed comprises a target sample object, and the first regression parameter comprises geometric information of the target sample object; determining a geometric model corresponding to the target sample object according to the geometric information of the target sample object; scrambling the geometric model to obtain a scrambling map; determining label information of the scrambling map according to scrambling regression parameters of the scrambling map; and determining a sample image according to the tag information and the scrambling map.
According to another aspect of the present disclosure, there is provided a training method of a deep learning model, including: inputting a sample to-be-processed image into a first neural network of a deep learning model to obtain a second regression parameter of the sample to-be-processed image, wherein the sample to-be-processed image comprises an object to be processed, and the second regression parameter comprises geometric information of the object to be processed; inputting the sample to-be-processed image and the second regression parameters into a second neural network of the deep learning model to obtain a sample rendering image corresponding to the sample to-be-processed image; training the deep learning model by using the sample rendering image and the sample to-be-processed image to obtain a training result; and in response to determining that the training result meets a predetermined condition, fine-tuning parameters of the deep learning model by using a sample image and an image to be processed to obtain a trained deep learning model; wherein the sample image is generated by processing the image to be processed according to the sample image generation method of the present disclosure.
According to another aspect of the present disclosure, there is provided an object information reconstruction method including: acquiring a target to-be-processed image comprising a target object; inputting the target to-be-processed image into a deep learning model to obtain object reconstruction information corresponding to the target object, wherein the deep learning model is obtained by training according to the training method of the deep learning model.
According to another aspect of the present disclosure, there is provided a sample image generating apparatus including: the first acquisition module is used for acquiring first regression parameters of an image to be processed, wherein the image to be processed comprises a target sample object, and the first regression parameters comprise geometric information of the target sample object; the first determining module is used for determining a geometric model corresponding to the target sample object according to the geometric information of the target sample object; the scrambling module is used for scrambling the geometric model to obtain a scrambling map; the second determining module is used for determining label information of the scrambling map according to scrambling regression parameters of the scrambling map; and a third determining module, configured to determine a sample image according to the tag information and the scrambling map.
According to another aspect of the present disclosure, there is provided a training apparatus of a deep learning model, including: the first acquisition module is used for inputting a sample to-be-processed image into a first neural network of the deep learning model to obtain a second regression parameter of the sample to-be-processed image, wherein the sample to-be-processed image comprises an object to be processed, and the second regression parameter comprises geometric information of the object to be processed; the second obtaining module is used for inputting the sample to-be-processed image and the second regression parameters into a second neural network of the deep learning model to obtain a sample rendering image corresponding to the sample to-be-processed image; the training module is used for training the deep learning model by utilizing the sample rendering image and the sample to-be-processed image to obtain a training result; and a fine tuning module for fine tuning parameters of the deep learning model by using a sample image and an image to be processed in response to determining that the training result meets a predetermined condition, to obtain a trained deep learning model; wherein the sample image is generated by processing the image to be processed by the sample image generating device according to the disclosure.
According to another aspect of the present disclosure, there is provided an object information reconstruction apparatus including: the second acquisition module is used for acquiring a target to-be-processed image comprising a target object; and the third obtaining module is used for inputting the target to-be-processed image into a deep learning model to obtain object reconstruction information corresponding to the target object, wherein the deep learning model is obtained by training the object information reconstruction device according to the disclosure.
According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform at least one of the sample image generation method, the training method of the deep learning model, and the object information reconstruction method of the present disclosure.
According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform at least one of the sample image generation method, the training method of the deep learning model, and the object information reconstruction method of the present disclosure.
According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program stored on at least one of a readable storage medium and an electronic device, which when executed by a processor, implements at least one of the sample image generation method, the training method of a deep learning model, and the object information reconstruction method of the present disclosure.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 schematically illustrates an exemplary system architecture to which at least one of a sample image generation method, a training method of a deep learning model, an object information reconstruction method, and corresponding apparatuses may be applied, according to an embodiment of the present disclosure;
FIG. 2 schematically illustrates a flow chart of a sample image generation method according to an embodiment of the disclosure;
FIG. 3 schematically illustrates an overall flowchart of generating a sample image according to an embodiment of the disclosure;
FIG. 4 schematically illustrates a flow chart of a training method of a deep learning model according to an embodiment of the present disclosure;
FIG. 5A schematically illustrates a block diagram of a deep learning model in accordance with an embodiment of the present disclosure;
FIG. 5B schematically illustrates a schematic diagram of training a deep learning model in accordance with an embodiment of the present disclosure;
FIG. 6 schematically illustrates a flow chart of an object information reconstruction method according to an embodiment of the present disclosure;
Fig. 7 schematically illustrates a block diagram of a sample image generating device according to an embodiment of the disclosure;
FIG. 8 schematically illustrates a block diagram of a training apparatus of a deep learning model according to an embodiment of the present disclosure;
fig. 9 schematically illustrates a block diagram of an object information reconstruction apparatus according to an embodiment of the present disclosure; and
FIG. 10 illustrates a schematic block diagram of an example electronic device that may be used to implement embodiments of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing, applying and the like of the personal information of the user all conform to the regulations of related laws and regulations, necessary security measures are adopted, and the public order harmony is not violated.
In the technical scheme of the disclosure, the authorization or consent of the user is obtained before the personal information of the user is obtained or acquired.
When designing an avatar of a high-quality avatar, a professional animator is required to perform professional optimization design on geometric modeling, texture mapping, illumination mapping, and the like of the avatar to construct a basic avatar adapting to business requirements. For example, due to the realistic demands of super-realistic digital persons, fine-grained modeling of digital person materials, light models, 3D models, etc. is required. When designing the super-realistic rendering map of the virtual image, professional designers are required to be relied on, and iterative optimization design is carried out according to service requirements.
The inventor finds that the accuracy of the object information obtained by the design mode is not high in the process of realizing the conception of the disclosure.
Fig. 1 schematically illustrates an exemplary system architecture to which at least one of a sample image generation method, a training method of a deep learning model, an object information reconstruction method, and corresponding apparatuses may be applied according to an embodiment of the present disclosure.
It should be noted that fig. 1 is only an example of a system architecture to which embodiments of the present disclosure may be applied to assist those skilled in the art in understanding the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other devices, systems, environments, or scenarios. For example, in another embodiment, an exemplary system architecture to which at least one of a sample image generating method, a training method of a deep learning model, and an object information reconstructing method and a corresponding apparatus may be applied may include a terminal device, but the terminal device may implement at least one of the sample image generating method, the training method of a deep learning model, and the object information reconstructing method and the corresponding apparatus provided in the embodiments of the present disclosure without interacting with a server.
As shown in fig. 1, a system architecture 100 according to this embodiment may include a first terminal device 101, a second terminal device 102, a third terminal device 103, a network 104, and a server 105. The network 104 is a medium used to provide a communication link between the first terminal device 101, the second terminal device 102, the third terminal device 103, and the server 105. The network 104 may include various connection types, such as wired and/or wireless communication links, and the like.
The user may interact with the server 105 via the network 104 using the first terminal device 101, the second terminal device 102, the third terminal device 103, to receive or send messages etc. Various communication client applications, such as a knowledge reading class application, a web browser application, a search class application, an instant messaging tool, a mailbox client and/or social platform software, etc. (by way of example only) may be installed on the first terminal device 101, the second terminal device 102, the third terminal device 103.
The first terminal device 101, the second terminal device 102, the third terminal device 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server (merely an example) providing support for content browsed by the user with the first terminal apparatus 101, the second terminal apparatus 102, the third terminal apparatus 103. The background management server may analyze and process the received data such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the terminal device. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual Private Server" or simply "VPS") are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.
It should be noted that, at least one of the sample image generating method, the training method of the deep learning model, and the object information reconstructing method provided in the embodiments of the present disclosure may be generally executed by the first terminal device 101, the second terminal device 102, or the third terminal device 103. Accordingly, at least one of the sample image generating device, the training device of the deep learning model, and the object information reconstructing device provided in the embodiments of the present disclosure may also be provided in the first terminal device 101, the second terminal device 102, or the third terminal device 103.
Alternatively, at least one of the sample image generating method, the training method of the deep learning model, and the object information reconstructing method provided in the embodiments of the present disclosure may be generally executed by the server 105. Accordingly, at least one of the sample image generating device, the training device of the deep learning model, and the object information reconstructing device provided in the embodiments of the present disclosure may be generally provided in the server 105. At least one of the sample image generating method, the training method of the deep learning model, and the object information reconstructing method provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103, and/or the server 105. Accordingly, at least one of the sample image generating device, the training device of the deep learning model, and the object information reconstructing device provided in the embodiments of the present disclosure may also be provided in a server or a server cluster that is different from the server 105 and is capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103, and/or the server 105.
For example, when generating a sample image, the first terminal device 101, the second terminal device 102, and the third terminal device 103 may acquire an image to be processed, and then send the acquired image to be processed to the server 105, and the server 105 acquires a first regression parameter of the image to be processed, where the image to be processed includes a target sample object, and the first regression parameter includes geometric information of the target sample object; determining a geometric model corresponding to the target sample object according to the geometric information of the target sample object; scrambling the geometric model to obtain a scrambling map; and determining label information of the scrambling map according to scrambling regression parameters of the scrambling map, and determining a sample image according to the label information and the scrambling map. Or by a server or server cluster capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103 and/or the server 105, and enabling the determination of the sample image.
For example, when training the deep learning model, the first terminal device 101, the second terminal device 102, and the third terminal device 103 may acquire a sample to-be-processed image, then send the acquired sample to-be-processed image to the server 105, and the server 105 inputs the sample to-be-processed image into the first neural network of the deep learning model to obtain a second regression parameter of the sample to-be-processed image, where the sample to-be-processed image includes the to-be-processed object, and the second regression parameter includes geometric information of the to-be-processed object; inputting the sample to-be-processed image and the second regression parameters into a second neural network of the deep learning model to obtain a sample rendering image corresponding to the sample to-be-processed image; training a deep learning model by using the sample rendering image and the sample to-be-processed image to obtain a training result; and in response to determining that the training result meets the predetermined condition, fine-tuning parameters of the deep learning model by using the sample image and the image to be processed to obtain a trained deep learning model. Wherein the sample image is generated by processing the image to be processed according to the sample image generation method of the present disclosure. Or the sample pending image is analyzed by a server or server cluster capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103 and/or the server 105 and a trained deep learning model is obtained.
For example, in reconstructing the object information, the first terminal device 101, the second terminal device 102, and the third terminal device 103 may acquire a target to-be-processed image including the target object, and then send the acquired target to-be-processed image to the server 105, and the server 105 inputs the target to-be-processed image into a deep learning model, to obtain object reconstruction information corresponding to the target object, where the deep learning model is trained according to a training method of the deep learning model of the present disclosure. Or the target image to be processed is analyzed by a server or a server cluster capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103 and/or the server 105, and object reconstruction information corresponding to the target object is obtained.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Fig. 2 schematically illustrates a flowchart of a sample image generation method according to an embodiment of the present disclosure.
As shown in fig. 2, the method includes operations S210 to S250.
In operation S210, a first regression parameter of an image to be processed is acquired, the image to be processed including a target sample object, the first regression parameter including target sample object geometric information.
According to embodiments of the present disclosure, the target sample object may include at least one of a human face, a human body, an animal body, a plant body, and other active, inactive objects, etc., and may not be limited thereto. The image to be processed may be an image including the target sample object. The image to be processed can be obtained by shooting the target sample object, and the image to be processed can also be obtained by grabbing from the Internet. The image to be processed obtained through shooting and grabbing can comprise images which are correspondingly obtained in various scenes such as indoor, outdoor, underwater, smoke and the like.
According to the embodiment of the disclosure, the captured or grabbed image of the target sample object may also be preprocessed first. The preprocessed image may then be determined as the image to be processed. The preprocessing may include at least one of cropping, aligning to a fixed area, and the like of the photographed image, and may not be limited thereto.
According to an embodiment of the disclosure, the first regression parameter may characterize a regression parameter of a feature of the image to be processed in a certain dimension relative to a feature of the reference image in the same dimension. The object in the reference image is of the same object type as the target sample object in the image to be processed. The reference image may characterize reference information of the same type of object. For example, information for a plurality of types of objects may be collected, and then the reference information may be determined by averaging the information for the plurality of types of objects.
According to an embodiment of the present disclosure, the first regression parameter may be determined by calculating difference information of the image to be processed and the reference image in the corresponding dimensional characteristics, and the first regression parameter may also be obtained by inputting the image to be processed into at least one of 3DMM (3D Morphable Models, three-dimensional deformable model) and albedo-3DMM (reflectivity-three-dimensional deformable model), by model regression, and may not be limited thereto.
According to embodiments of the present disclosure, the target sample object geometric information may be geometric information required in constructing a three-dimensional avatar of the target sample object in the image to be processed. For example, in the case where the target sample object is a human face, the target sample object geometric information may include human face shape information, human face expression information, and the like, and may not be limited thereto.
In operation S220, a geometric model corresponding to the target sample object is determined according to the target sample object geometric information.
According to embodiments of the present disclosure, a geometric model may be used to describe geometric information such as shape, size, position, and structural relationships of a target sample object. For example, in the case where the target sample object is a human face, the target sample object geometric information may include human face shape information, human face expression information. According to the facial shape information and the facial expression information, a preliminary geometrical facial surface representing the corresponding facial shape and facial expression can be generated, and can be expressed in the form of a three-dimensional geometric model, so that a three-dimensional geometric model corresponding to the target sample facial surface can be obtained.
Note that, when the target sample object is a two-dimensional object or the geometric information of the target sample object is two-dimensional information, the generated geometric model may be a two-dimensional geometric model, which is not limited herein.
In operation S230, the geometric model is scrambled to obtain a scrambling map.
According to an embodiment of the present disclosure, the geometric model may include features of respective points corresponding to the target sample object, for example, may include features such as position information of respective points corresponding to the target object and a relative position relationship between the respective points.
According to an embodiment of the present disclosure, a method of scrambling a geometric model may include: and scrambling the position information of the points in the geometric model and the relative position relation between the adjacent points to obtain a scrambling map. Rendering materials, textures, illumination and the like on the geometric model, and determining the rendered map as a scrambling map. By scrambling the geometric model in different forms according to different scrambling methods, a plurality of scrambling maps can be obtained.
In operation S240, tag information of the scramble map is determined according to the scramble regression parameters of the scramble map.
In accordance with embodiments of the present disclosure, the scrambling regression parameters may characterize the regression parameters of a scrambling map in a feature of one dimension relative to a reference map in the same feature of the dimension. The object in the reference map is the same object type as the target sample object in the scrambling map. The reference map may characterize reference information for the same type of object. For example, information for a plurality of types of objects may be collected, and then the reference information may be determined by averaging the information for the plurality of types of objects.
According to the embodiment of the disclosure, after the scrambling map is obtained, difference information of the scrambling map and the reference map in corresponding dimension characteristics can be calculated, and scrambling regression parameters are determined. The scramble map may also be input into at least one of a 3DMM and an albedo-3DMM model, obtained by model regression, and may not be limited thereto. After the scrambling regression parameter is obtained, the scrambling regression parameter may be determined as tag information of the scrambling map corresponding to the scrambling regression parameter.
In operation S250, a sample image is determined according to the tag information and the scrambling map.
According to embodiments of the present disclosure, tag information may be used to guide the model in learning during model training. In determining the sample image, the scrambling map may be determined as the sample image, and tag information corresponding to the scrambling map may be determined as a true tag of the sample image. It is also possible to first associate the tag information with the scrambling map and then determine the scrambling map including the tag information as the sample image.
According to the embodiment of the present disclosure, in the case where a plurality of scrambling maps are generated according to a plurality of scrambling parameters, a plurality of sample images including tag information can be obtained. Multiple sample images may be used as training samples for model training.
Through the embodiment of the invention, the sample image can be constructed easily, and the data volume during model training is increased. In addition, by determining the tag information according to the scrambling regression parameters, the model can be guided to learn potential object information of the target sample object, and the output precision of the model obtained based on the training of the sample image is improved.
The method shown in fig. 2 is further described below in connection with the specific examples.
According to an embodiment of the present disclosure, the above operation S230 may include: point normal information corresponding to points in the geometric model is determined, and the point normal information characterizes normal information of the points in the geometric model in a three-dimensional space. And scrambling the points in the geometric model in the normal direction corresponding to the points according to the point normal information to obtain a scrambling map.
According to embodiments of the present disclosure, point normal information may be calculated based on a normal calculation formula. For example, based on a normal calculation formula, the surface normal information corresponding to each point in the geometric model and the adjacent points thereof can be calculated first. Then, point normal information corresponding to the point can be obtained through calculation according to the plane normal information.
According to the embodiment of the disclosure, after determining the normal direction of the point in the geometric model according to the point normal information, the point in the geometric model may be randomly scrambled in the normal direction or scrambled according to a preset rule, so as to obtain a normal map corresponding to the scrambled geometric model. The normal map may be used as a scrambling map obtained by scrambling points in the geometric model in the normal direction.
According to the embodiment of the disclosure, the same point in the geometric model can be scrambled to different degrees in the normal direction corresponding to the point, so as to obtain a plurality of scrambling maps. Different points in the geometric model can be scrambled in the corresponding normal directions to obtain a plurality of scrambling maps.
By the above embodiments of the present disclosure, a scrambling map including normal information of points in a geometric model may be obtained, resulting in finer granularity features. When the model is trained by using the sample image determined based on the scrambling map, the model can learn the normal information of the midpoint of the geometric model, and the output precision of the model is effectively improved.
According to an embodiment of the present disclosure, the above operation S230 may further include: and obtaining scrambling parameters. And scrambling the geometric model according to the scrambling parameters to obtain a scrambling map.
According to an embodiment of the present disclosure, the scrambling parameters may include at least one of: the target sample object geometric scrambling parameters, target sample object texture scrambling parameters, spherical harmonic illumination scrambling parameters, and camera scrambling parameters, and may not be limited thereto.
According to embodiments of the present disclosure, one or more scrambling parameters may be generated randomly or based on preset rules. Scrambling the geometric model according to the scrambling parameters, the obtaining a scrambling map may include: and scrambling the geometric model to different degrees based on one or more scrambling parameters respectively to obtain one or more scrambling maps.
According to embodiments of the present disclosure, the target sample object geometric scrambling parameters may be characterized as one or more geometric information. Scrambling the geometric model based on a plurality of different geometric information can obtain a plurality of scrambling maps corresponding to different geometric deformations. For example, the target sample object is a human face, and the geometric scrambling parameters of the target sample object may scramble facial shape information and facial expression information of the geometric model of the human face.
According to embodiments of the present disclosure, the target sample object texture scrambling parameters may be characterized as one or more texture maps. And performing texture filling on the geometric model based on a plurality of different texture maps to obtain a plurality of scrambling maps corresponding to different texture information.
According to an embodiment of the disclosure, the target sample object texture scrambling parameter may be characterized as one or more texture maps, and the geometric model is color-rendered based on a plurality of different texture maps, so that a plurality of scrambling maps corresponding to different texture information may be obtained.
According to embodiments of the present disclosure, the spherical harmonic illumination scrambling parameters may be characterized as one or more spherical harmonic illumination coefficients. And performing illumination rendering on the geometric model based on a plurality of different spherical harmonic illumination coefficients to obtain a plurality of scrambling maps corresponding to different illumination information.
According to embodiments of the present disclosure, the camera scrambling parameters may be characterized as one or more camera parameters. A geometric model is projected into an image space based on a plurality of different camera parameters, and a plurality of scrambling maps corresponding to different image spaces can be obtained. The camera parameters may include camera intrinsic and camera extrinsic. The camera intrinsic may include parameters related to the characteristics of the camera itself, such as the focal length of the camera, etc. The camera external parameters may include parameters in the world coordinate system such as the position of the camera, the direction of rotation, etc.
Through the embodiment of the disclosure, more sample images can be obtained on the basis of one image to be processed, the acquisition difficulty of the sample images is reduced, the data size of the sample images is increased, and the accuracy of a model obtained through training of the more sample images can be effectively improved.
According to an embodiment of the present disclosure, the first regression parameters may further include a target sample object texture map, a first spherical harmonic illumination coefficient, and a first camera parameter, and may not be limited thereto.
According to embodiments of the present disclosure, the target sample object texture map may be one that does not contain illumination information and does not contain texture details of the sample object. For example, when the target sample object is a human face, details such as acnes and spots on the human face may not be included in the texture map of the target sample object. The target sample object texture map may include a diffuse (diffuse reflection) base map and a specular (specular reflection) base map of the image to be processed. The diffuse base map is mainly used for representing the base essence of an object, and can comprise characteristics of materials, traces left on the object by years and the like. The specular base map may also be referred to as a highlight map or a light reflection map, and may be a light reflection effect when an object encounters strong light. The first spherical harmonic illumination coefficient may be a set of parameters that construct an illumination environment. The first camera parameters may have the same parameters as the aforementioned camera parameters, and are not described herein.
It should be noted that, in this embodiment, the target sample object texture map and the target sample object texture map are decoupled from the illumination, that is, the target sample object texture map and the target sample object texture map do not include illumination information.
According to an embodiment of the disclosure, scrambling the geometric model according to the scrambling parameter may include at least one of the following: and scrambling the geometric model according to the geometric scrambling parameters of the target sample object. And performing texture filling on the geometric model according to the target sample object texture scrambling parameters and the target sample object texture mapping. And performing color rendering on the geometric model according to the scrambling parameters of the target sample object material and the target sample object material map. And performing illumination rendering on the geometric model according to the spherical harmonic illumination scrambling parameter and the first spherical harmonic illumination coefficient. The geometric model is projected into the image space according to the camera scrambling parameters and the first camera parameters.
According to an embodiment of the present disclosure, scrambling the geometric model according to the target sample object geometric scrambling parameters may include: and scrambling the geometric information of the target sample object according to the geometric scrambling parameters of the target sample object to obtain new geometric information of the target sample object. And determining a new geometric model according to the new geometric information of the target sample object.
According to an embodiment of the present disclosure, in the case of texture filling of a geometric model according to a target sample object texture scrambling parameter and a target sample object texture map, the target sample object texture map may be first scrambled according to the target sample object texture scrambling parameter to obtain a new texture map. The geometric model may then be texture filled according to the new texture map. In some embodiments, the target sample object texture scrambling parameters may also be determined as target sample object texture maps, and the geometric model may be texture filled.
According to the embodiment of the disclosure, in the case of performing color rendering on the geometric model according to the target sample object material scrambling parameter and the target sample object material map, the target sample object material map may be first scrambled according to the target sample object material scrambling parameter to obtain a new material map. The geometric model may then be color rendered according to the new texture map. In some embodiments, the target sample object texture scrambling parameter may also be determined as a target sample object texture map, and the geometric model may be color-rendered.
According to the embodiment of the disclosure, in the case of performing illumination rendering on the geometric model according to the spherical harmonic illumination scrambling parameter and the first spherical harmonic illumination coefficient, the first spherical harmonic illumination coefficient may be first scrambled according to the spherical harmonic illumination scrambling parameter to obtain a new spherical harmonic illumination coefficient. Then, the geometric model can be subjected to illumination rendering according to the new spherical harmonic illumination coefficient. In some embodiments, the spherical harmonic illumination scrambling parameter may also be determined as a first spherical harmonic illumination coefficient, and the geometric model may be subjected to illumination rendering.
According to an embodiment of the present disclosure, in the case of projecting the geometric model into the image space according to the camera scrambling parameter and the first camera parameter, the first camera parameter may be first scrambled according to the camera scrambling parameter to obtain a new camera parameter. The geometric model may then be projected into image space according to the new camera parameters. In some embodiments, the camera scrambling parameters may also be determined as first camera parameters, and the geometric model projected into the image space.
Through the embodiment of the invention, when the model is trained by using the sample image determined based on the scrambling map, the model can learn object characteristics with more dimensions, and the generalization capability of the model is effectively improved. In addition, the obtained trained model has higher expansibility and can be multiplexed in various scenes.
According to an embodiment of the present disclosure, the above scrambling regression parameters may include at least one of: scrambling point normal information, target sample object scrambling geometry information, target sample object scrambling texture map, scrambling sphere harmonic illumination coefficients, and scrambling camera parameters, and may not be limited thereto.
In accordance with embodiments of the present disclosure, scrambling point normal information may characterize point normal information for points in a geometric model determined from a scrambling map. The target sample object scrambling geometry information may be determined based on the target sample object geometric scrambling parameters, or may be determined based on the target sample object geometric scrambling parameters and the target sample object geometric information. The target sample object scrambling texture map may be determined according to the target sample object texture scrambling parameter, or may be determined according to the target sample object texture scrambling parameter and the target sample object texture map. The target sample object scrambling texture map may be determined based on the target sample object texture scrambling parameters, or may be determined based on the target sample object texture scrambling parameters and the target sample object texture map. The sphere harmonic illumination coefficient can be determined according to the sphere harmonic illumination scrambling parameter, or can be determined according to the sphere harmonic illumination scrambling parameter and the first sphere harmonic illumination coefficient. The scrambling camera parameters may be determined from the camera scrambling parameters or from the camera scrambling parameters and the first camera parameters.
Through the embodiment of the disclosure, the sample image with various scrambling regression parameters as the label information can be obtained, the model can be conveniently learned to more dimensionality characteristics when being trained based on the sample image, and the accuracy and generalization capability of the trained model can be effectively improved.
According to an embodiment of the present disclosure, the above operation S250 may include: and (5) performing texture boundary filling on the scrambling map to obtain a scrambling rendering image. And associating the tag information with the scrambled rendering image to obtain the scrambled rendering image comprising the tag information as a sample image.
According to the embodiment of the disclosure, the random scrambling of the geometric model based on the scrambling mode is improved, and a preliminary scrambling map can be obtained. In some embodiments, the preliminary scrambling map may have smoother, more complete texture information, in which case the preliminary scrambling map may be determined as a scrambling map. In other embodiments, the preliminary scrambling map may lack some texture information, such as where the target sample object is a human face, and may lack some cheek texture information. In this case, the scrambled rendered image including the complete texture information may be obtained by texture boundary filling of the preliminary scrambling map.
Texture boundary filling may be performed in connection with GAN (Generative Adversarial Networks, generation countermeasure network) networks, according to embodiments of the present disclosure. For example, the scrambling map may be input into a GAN network resulting in a scrambled rendered image that is filled via texture boundaries.
It should be noted that the method for implementing texture boundary filling is only an exemplary embodiment, but not limited thereto, and other methods for filling incomplete texture according to existing geometric information, texture information, and the like, which are known in the art, may be included as long as smooth and complete texture information can be obtained.
According to the embodiment of the disclosure, after the geometric model is randomly scrambled based on different scrambling modes, a large number of sample images with real labels can be obtained according to scrambling regression parameters and scrambling rendering images of the scrambling map obtained by the different scrambling modes.
Through the embodiment of the disclosure, the accuracy of each sample image can be improved, so that the accuracy of a model obtained based on the training of the corresponding sample image is improved.
Fig. 3 schematically illustrates an overall flowchart of generating a sample image according to an embodiment of the disclosure.
As shown in FIG. 3, the method includes operations S310-S340.
In operation S310, a geometric model corresponding to a target sample object is determined according to target sample object geometric information of the target sample object in the image to be processed.
In operation S320, the geometric model is scrambled to obtain a scrambling map.
According to an embodiment of the present disclosure, operation S320 may include operations S321 to S326.
In operation S321, a point in the geometric model is scrambled in a normal direction corresponding to the point according to point normal information corresponding to the point in the geometric model.
In operation S322, the geometric model is scrambled according to the target sample object geometric scrambling parameters.
In operation S323, the geometric model is texture-filled according to the target sample object texture scrambling parameter and the target sample object texture map.
In operation S324, the geometric model is color-rendered according to the target sample object texture scrambling parameter and the target sample object texture map.
In operation S325, the geometric model is illumination rendered according to the spherical harmonic illumination scrambling parameter and the first spherical harmonic illumination coefficient.
In operation S326, the geometric model is projected into the image space according to the camera scrambling parameters and the first camera parameters.
In operation S330, scrambling regression parameters of the scrambling map are acquired.
According to an embodiment of the present disclosure, the scrambling regression parameters may include at least one of: scrambling point normal information 331, target sample object scrambling geometry information 332, target sample object scrambling texture map 333, target sample object scrambling texture map 334, scrambling sphere harmonic illumination coefficients 335, and scrambling camera parameters 336.
In operation S340, a sample image with tag information is determined according to the tag information and the scrambling map determined by the scrambling regression parameter.
Through the embodiment of the invention, the sample image can be constructed easily based on various scrambling modes, and the data volume during model training is increased. In addition, by determining the label information according to the scrambling regression parameters, the model can be guided to learn potential object information of the target sample object, and the output precision and generalization capability of the model obtained based on the sample image training are improved.
Fig. 4 schematically illustrates a flowchart of a training method of a deep learning model according to an embodiment of the present disclosure.
As depicted in FIG. 4, the method includes operations S410-S440.
In operation S410, the sample to-be-processed image is input into the first neural network of the deep learning model to obtain a second regression parameter of the sample to-be-processed image, the sample to-be-processed image includes the to-be-processed object, and the second regression parameter includes geometric information of the to-be-processed object.
According to an embodiment of the present disclosure, the first neural network may characterize a network of regression parameters used to generate the image, for example, may include at least one of 3DMM and albedo-3 DMM.
According to the embodiment of the disclosure, after the sample to-be-processed image containing the to-be-processed object is obtained, the sample to-be-processed image may be input to the first neural network in the deep learning model, and the first neural network in the deep learning model performs parameter regression on the sample to-be-processed image to obtain the second regression parameter of the to-be-processed image.
It should be noted that, the sample to be processed image may have a corresponding feature and an acquiring manner with the aforementioned to-be-processed image, the to-be-processed object may have a corresponding feature with the aforementioned target sample object, the second regression parameter may have a corresponding feature with the aforementioned first regression parameter, and the geometric information of the to-be-processed object may have a corresponding feature with the geometric information of the aforementioned target sample object, which is not described herein.
In operation S420, the sample to-be-processed image and the second regression parameters are input into the second neural network of the deep learning model, resulting in a sample rendered image corresponding to the sample to-be-processed image.
According to embodiments of the present disclosure, the second neural network may characterize a network for reconstructing object information of the object to be processed. The sample rendered image may characterize object information reconstructed for an object to be processed in the sample image to be processed. The sample rendered image may be in the form of a two-dimensional image, except that the object information in the sample rendered image may have more detailed features than the information of the object to be processed in the sample object to be processed, and may include high-precision texture, illumination, and the like. For example, the object to be processed is a human face, details such as acne and spots on the human face may not be included in the sample image to be processed, and details such as acne and spots on the human face may be included in the sample rendered image.
According to the embodiment of the disclosure, the sample to-be-processed image and the second regression parameters of the sample to-be-processed image regressed by the first neural network can be input to the second neural network together, and the sample rendering image of the sample to-be-processed image is output through the second neural network.
In operation S430, the deep learning model is trained using the sample rendered image and the sample to-be-processed image, resulting in a training result.
According to the embodiment of the disclosure, the first training loss can be determined according to the sample rendering image and the sample to-be-processed image, and the deep learning model is trained by adopting the first training loss. For example, the sample rendered image and the sample to-be-processed image may be input into a first predetermined loss function, to obtain a first training loss. The sample rendering image and the sample to-be-processed image may be further processed and then input into a second predetermined loss function to obtain a first training loss, which is not limited.
According to embodiments of the present disclosure, the training results may be characterized by a degree of convergence of the first training loss.
In operation S440, in response to determining that the training result satisfies the predetermined condition, the parameters of the deep learning model are fine-tuned using the sample image and the image to be processed, resulting in a trained deep learning model, the sample image being generated by processing the image to be processed according to the sample image generation method.
According to an embodiment of the present disclosure, the predetermined condition may include at least one of: the training times reach the preset times, and the first training loss converges.
According to the embodiment of the present disclosure, in the case where it is determined that the training of the deep learning model with the sample rendering image and the sample to-be-processed image satisfies the predetermined condition, the fine-tuning training of the deep learning model may be continued with the sample image generated based on the aforementioned sample image generating method and the to-be-processed image for generating the sample image.
According to the embodiment of the disclosure, in the fine tuning training process, a second training loss can be determined according to the sample image and the image to be processed, and parameters of the deep learning model are fine-tuned by adopting the second training loss. For example, the sample rendered image and the sample to-be-processed image may be input into a third predetermined loss function, resulting in a second training loss. The sample rendering image and the sample to-be-processed image may be further processed and then input into a fourth predetermined loss function to obtain a second training loss, which is not limited.
According to the embodiment of the invention, the parameters of the deep learning model are finely adjusted by using the sample image with the tag information, and the tag information comprises the scrambling regression parameters corresponding to the sample image, so that the deep learning model can learn more dimensionality and higher precision characteristics, and the output precision of the trained deep learning model can be effectively improved.
The method shown in fig. 4 is further described below in connection with the specific examples.
According to an embodiment of the disclosure, the second regression parameters may further include a texture map of the object to be processed, a second spherical harmonic illumination coefficient, and a second camera parameter.
It should be noted that, the texture map of the object to be processed, the second spherical harmonic illumination coefficient, and the second camera parameter may have corresponding features with the texture map of the target sample object, the first spherical harmonic illumination coefficient, and the first camera parameter, which are not described herein.
According to an embodiment of the present disclosure, the above-described operation S420 may include: and determining a pixel-level preliminary texture map corresponding to the object to be processed according to the geometric information of the object to be processed and the second camera parameters. And performing illumination removal and feature symmetry processing on the pixel-level preliminary texture map to obtain the pixel-level target texture map. And determining the primary rendering texture map corresponding to the object to be processed according to the texture map of the object to be processed, the texture map of the object to be processed and the second spherical harmonic illumination coefficient. And determining a high-frequency diffuse reflection map corresponding to the object to be processed according to the pixel-level target texture map and the preliminary rendering texture map. And determining a target rendering texture map according to the high-frequency diffuse reflection map, the material map of the object to be processed and the second spherical harmonic illumination coefficient. And projecting the target rendering texture map to an image space according to the second camera parameters to obtain a sample rendering image.
According to embodiments of the present disclosure, a pixel-level preliminary texture map may characterize an image pixel-level texture map that contains illumination information. The pixel-level target texture map may characterize a texture map that does not contain illumination information. The preliminary rendered texture map may characterize a texture map that does not contain illumination information and does not contain texture detail information. For example, the object to be processed is a human face, and details such as acne, spots and the like on the human face may not be included in the preliminary rendering texture map. The high frequency diffuse reflection map may characterize a map that contains texture detail information. For example, the object to be processed is a human face, and details such as acnes, spots and the like on the human face can be contained in the high-frequency diffuse reflection map. Both the target rendered texture map and the sample rendered image may contain rich texture detail information.
According to an embodiment of the present disclosure, determining a pixel-level preliminary texture map corresponding to an object to be processed according to the object to be processed geometric information and the second camera parameter may include: based on the geometric information of the object to be processed, a three-dimensional avatar of the object to be processed is generated. And processing the three-dimensional virtual image according to the sample to-be-processed image and the second camera parameter to obtain the pixel-level preliminary texture map of the to-be-processed object.
According to an embodiment of the present disclosure, generating a three-dimensional avatar of an object to be processed based on geometric information of the object to be processed may include any one of the following methods: and processing the geometric information of the object to be processed based on preset three-dimensional virtual image generating logic to obtain the three-dimensional virtual image of the object to be processed in the sample image to be processed. And linearly summing the geometric information of the object to be processed and preset reference image information to obtain the three-dimensional virtual image.
According to an embodiment of the present disclosure, processing a three-dimensional avatar according to a sample to-be-processed image and a second camera parameter, obtaining a pixel-level preliminary texture map of an object to be processed may include: based on the second camera parameters, the three-dimensional avatar is projected into the image space, creating a mapping of vertices of the three-dimensional avatar to a UV (coordinate system determined from image pixels) map. Based on the mapping, RGB values of pixel points in the sample to-be-processed image are assigned to vertexes on the three-dimensional avatar. And performing UV unfolding on the assigned three-dimensional virtual image based on the mapping between the three-dimensional virtual image and the UV map to obtain the pixel-level preliminary texture map.
According to an embodiment of the present disclosure, performing illumination removal and feature symmetry processing on a pixel-level preliminary texture map, obtaining a pixel-level target texture map may include: and inputting the pixel-level preliminary texture map and the second spherical harmonic illumination coefficient into a pre-trained inpainting texture map generation network. The inpainting texture map generation network performs illumination removal on the pixel-level preliminary texture map based on the second spherical harmonic illumination coefficient. The inpainting mapping generation network performs feature symmetry processing on the pixel-level preliminary texture mapping based on symmetry consistency rules to obtain a pixel-level target texture mapping. The inpainting texture map generation network may be constructed as follows: based on the bottom structure of the residual network, the feature tensor output by each step block (residual block) is turned over, and the output feature tensor and the turned feature tensor are spliced to generate an inpainting texture map generating network with symmetrical and consistent features.
It should be noted that the pixel-level target texture map is more complete than the pixel-level preliminary texture map. For example, the object to be processed is a human face, and in the case where there is an occlusion in the human face, the pixel-level preliminary texture map may not include the texture of the occlusion portion. The pixel level target texture map may include the texture of the entire face, such as the texture including the occlusion portion.
According to an embodiment of the present disclosure, determining a preliminary rendered texture map corresponding to an object to be processed according to the object to be processed texture map, and the second spherical harmonic illumination coefficient may include: and constructing an illumination environment based on the second spherical harmonic illumination coefficient. And under the constructed illumination environment, using a differentiable ray-tracing (ray tracing) renderer to calculate the differentiable BRDF (Bidirectional Reflectance Distribution Function, bi-directional reflection distribution function) of the texture map, the diffuse reflection map and the specular reflection map of the object to be processed, so as to obtain the preliminary object texture map. For example, the object to be processed is a human face, and the preliminary object texture map may be characterized as a preliminary pbr (physical-Based Rendering) face texture map.
According to an embodiment of the present disclosure, determining a high frequency diffuse reflection map corresponding to an object to be processed from a pixel-level target texture map and a preliminary rendered texture map may include: and performing difference calculation on the pixel-level target texture map and the preliminary rendering texture map to obtain texture detail information of the object to be processed in the sample image to be processed. And determining the high-frequency diffuse reflection map according to the texture detail information.
According to an embodiment of the present disclosure, determining a target rendering texture map from a high frequency diffuse reflection map, an object to be processed material map, a second spherical harmonic illumination coefficient may include: and constructing an illumination environment based on the second spherical harmonic illumination coefficient. And under the constructed illumination environment, using a differentiable renderer to calculate differentiable BRDF (Bidirectional Reflectance Distribution Function ) for the high-frequency diffuse reflectance map and the specular reflectance map, so as to obtain the target rendering texture map.
It should be noted that the above-mentioned methods for obtaining the pixel-level preliminary texture map, the pixel-level target texture map, the preliminary rendering texture map, the high-frequency diffuse reflection map, the target rendering texture map, and the rendering image are merely exemplary embodiments, but are not limited thereto, and other methods known in the art may be included as long as the corresponding map and image can be obtained.
Through the embodiment of the disclosure, the illumination removal and the feature symmetry processing are introduced to the pixel-level preliminary texture map, so that the obtained pixel-level target texture map is more accurate. And extracting a high-frequency diffuse reflection map based on the pixel-level target texture map and the preliminary rendering texture map, determining a rendering texture map based on the extracted high-frequency diffuse reflection map, and further determining a rendering image, so that the determined rendering image has rich textures, a model can learn finer differences between a sample to-be-processed image and the rendering image, and the model precision is effectively improved.
Fig. 5A schematically illustrates a block diagram of a deep learning model according to an embodiment of the present disclosure.
As shown in fig. 5A, the deep learning model 520 includes a first neural network 521 and a second neural network 522.
According to an embodiment of the present disclosure, the first neural network 521 may include at least one of the following: 3DMM, albedo-3DMM, etc., and may not be limited thereto. The second neural network 522 may include at least one of: AE (auto encoder), GAN (Generative Adversarial Network, generation countermeasure network), residual network, etc., and may not be limited thereto.
According to an embodiment of the present disclosure, the above-described operation S430 may include: and determining a first distance loss according to the similarity between the sample rendering image and the sample to-be-processed image. And determining a first perception loss according to the characteristic representation of the sample rendering image and the characteristic representation of the sample to-be-processed image. A first normative loss between the sample rendered image and the sample to-be-processed image is determined. The deep learning model is trained based on the first distance loss, the first perceptual loss, and the first norm loss.
According to embodiments of the present disclosure, the similarity between the sample rendered image and the sample to-be-processed image may be characterized by the euclidean distance between the sample rendered image and the sample to-be-processed image. In this case, the first distance loss may be characterized as a euclidean distance loss.
It should be noted that, the euclidean distance representation similarity is only an exemplary embodiment, but is not limited thereto, and other similarity representation methods known in the art may be included as long as the similarity between the sample rendered image and the sample image to be processed can be determined.
According to embodiments of the present disclosure, the above-described feature representation may be characterized as a matrix or vector form. A feature representation of each of the sample rendered image and the sample image to be processed may be determined based on the feature network. For example, the sample rendering image and the sample to-be-processed image may be input to the feature network, and the feature representation of the sample rendering image and the feature representation of the sample to-be-processed image may be obtained through the feature network processing. The first perceived loss may be calculated from perceived loss calculation logic in the feature network. The feature network may include an LPIPS (Learned Perceptual Image Patch Similarity, learning-aware image block similarity) network, and may not be limited thereto.
According to embodiments of the present disclosure, the first norm loss may be determined based on a norm loss function. The norm loss function may be used to calculate the loss of both the sample rendered image and the sample image to be processed in RGB (one color standard) space. The norm loss function may include an L1 norm loss function, an L2 norm loss function, and the like, and may not be limited thereto.
According to the embodiment of the disclosure, after the first distance loss, the first perception loss and the first norm loss are calculated, parameters in the deep learning model can be adjusted according to the first distance loss, the first perception loss and the first norm loss, so that preliminary training of the deep learning model is realized. Adjusting parameters in the deep learning model based on the first distance loss, the first perceptual loss, and the first norm loss may include: a first training loss is determined based on the first distance loss, the first perceptual loss, and the first norm loss. For example, the first distance loss D_loss may be reduced 1 First perceived loss P_loss 1 And a first norm loss L_loss 1 Adding to obtain a first Training loss training_loss 1 =D_loss 1 +P_loss 1 +L_loss 1 . Then, training may be lost according to the first Training 1 _loss 1 Parameters of the deep learning model are adjusted until Training 1 _loss 1 And (5) convergence.
Through the embodiment of the disclosure, the deep learning model can be trained by combining the characteristics of multiple dimension, and the output precision of the deep learning model is improved.
In accordance with an embodiment of the present disclosure, in the case where the first distance loss, the first perceived loss, and the first norm loss are determined, the above-described predetermined condition may include that the first distance loss, the first perceived loss, and the first norm loss all converge.
In case it is determined that the training of the deep learning model with the sample-rendered image and the sample-to-process image satisfies the predetermined condition according to the embodiment of the present disclosure, the above-described operation S440 may include: and determining a second distance loss according to the similarity between the sample image and the image to be processed. And determining a second perception loss according to the characteristic representation of the sample image and the characteristic representation of the image to be processed. A second normative loss between the sample image and the image to be processed is determined. And fine tuning parameters of the deep learning model according to the second distance loss, the second perception loss and the second norm loss.
The second distance loss, the second perceived loss, and the second norm loss are determined in the same manner as the first distance loss, the first perceived loss, and the first norm loss, and are not described in detail herein.
According to the embodiment of the disclosure, after the second distance loss, the second perception loss and the second norm loss are calculated, the parameters in the deep learning model can be finely adjusted according to the second distance loss, the second perception loss and the second norm loss, so that further training of the deep learning model obtained through preliminary training is realized. Fine tuning parameters in the deep learning model based on the second distance loss, the second perceptual loss, and the second norm loss may include: a second training loss is determined based on the second distance loss, the second perceptual loss, and the second norm loss. For example, the second distance loss D_loss may be reduced 2 Second perceptual loss P_loss 2 And a second norm loss L_loss 2 Adding to obtain a second Training loss Training 1 _loss 2 =D_loss 2 +P_loss 2 +L_loss 2 . Then, training may be lost according to the second Training 1 _loss 2 Fine-tuning parameters of the deep learning model until Training 1 _loss 2 And (5) convergence.
According to the embodiment of the disclosure, the parameters of the deep learning model are finely adjusted by combining the sample image with the tag information and the multidimensional features thereof, and the tag information comprises the scrambling regression parameters corresponding to the sample image, so that the deep learning model can learn the features with more dimensionality and higher precision, and the output precision of the trained deep learning model can be further improved.
According to an embodiment of the present disclosure, the foregoing fine tuning of the parameters of the deep learning model according to the second distance loss, the second perception loss, and the second norm loss may include: a regularization term loss configured for the second distance loss, the second perceptual loss, and the second norm loss is determined. And fine tuning parameters of the deep learning model according to the second distance loss, the second perception loss, the second norm loss and the regular term loss.
In accordance with embodiments of the present disclosure, the canonical term loss may be characterized as a constraint configured for a second distance loss, a second perceptual loss, and a second norm loss. From the second distance loss, the second perceptual loss, the second norm loss, and the regularization term loss, a new second training loss may be determined. For example, for a second distance loss D_loss 2 Configuring a first weighting coefficient alpha for a second perceptual loss P_loss 2 Configuring a second weighting coefficient beta for a second norm loss L_loss 2 Configuring a third weighting coefficient gamma to obtain a second Training loss Training 2 _loss 2 =αD_loss 2 +βP_loss 2 +γL_loss 2 Wherein alpha, beta, gamma can be lost as regularization terms. For example, for a second distance loss D_loss 2 Second perceptual loss P_loss 2 And a second norm loss L_loss 2 Configuring an independent auxiliary parameter lambda to obtain a second Training loss Training 3 _loss 2 =D_loss 2 +P_loss 2 +L_loss 2 +λ, where λ can be lost as a regularization term.
In accordance with an embodiment of the present disclosure, a second Training loss Training including a regular term loss is obtained 2 _loss 2 Or tracking 3 _loss 2 In the case of (2) may be based on including a second Training loss Training 2 _loss 2 Or tracking 3 _loss 2 Fine-tuning parameters of the deep learning model until Training 2 _loss 2 Or tracking 3 _loss 2 And (5) convergence.
According to embodiments of the present disclosure, fine tuning may characterize the use of smaller learning rates and fewer iterations for adjusting some of the parameters in the initially trained deep learning model. A smaller learning rate may be characterized, for example, by a learning rate less than or equal to a first preset threshold, and a smaller number of iterations may be characterized, for example, by a number of iterations less than or equal to a second preset threshold.
The above-mentioned manner of calculating the first training loss and the second training loss is merely an exemplary embodiment, but is not limited thereto, and other calculation methods known in the art may be included as long as the first training loss and the second training loss suitable for training the deep learning model can be obtained.
Through the embodiment of the disclosure, the constraint during model training is built based on regular term loss, part of parameters in the model can be fixed, so that the model is not updated, the complexity of the model is effectively reduced, the problem that fitting is easy to be performed when all parameters of the model participate in the training process is effectively reduced, and the output precision of the trained model is further improved.
Fig. 5B schematically illustrates a schematic diagram of training a deep learning model according to an embodiment of the present disclosure.
As shown in fig. 5B, the sample to-be-processed image 511 is processed through the first neural network 521 of the deep learning model 520, and the second regression parameters 512 may be obtained. The sample pending image 511 and the second regression parameters 512 are rendered via a second neural network 522 of the deep learning model 520, resulting in a sample rendered image 513. Sample image 511 may be scrambled via different forms to obtain a sample image 514 with label information.
In training the deep learning model according to embodiments of the present disclosure, the deep learning model may be initially trained using the sample pending image 511 and the sample rendered image 513. Parameters of the deep learning model may then be further fine-tuned using the sample pending image 511 and the sample image 514 with tag information to obtain a trained deep learning model.
In accordance with an embodiment of the present disclosure, during a preliminary training process, a sample to-be-processed image 511 may first be input into a deep learning model 520, and processed via a first neural network 521 and a second neural network 522 in the deep learning model 520, resulting in a sample rendered image 513. Then, a first training loss 531 is determined using the sample pending image 511 and the sample rendered image 513. Thereafter, parameters of the deep learning model 520 are adjusted based on the first training loss 531 until the first training loss 531 converges.
In accordance with an embodiment of the present disclosure, a fine tuning process may be entered in the event that it is determined that the first training loss 531 converges. In the fine tuning process, since the sample image 514 has the tag information determined according to the scrambling regression parameters, the sample image 511 to be processed may not need to be processed by using the deep learning model. First, a second training loss 532 is determined using the label information of the sample image 514 and the sample pending image 511. Then, based on the second training loss 532, the parameters of the deep learning model 520 are fine-tuned until the second training loss 532 converges. In the event that the second training loss 532 is determined to converge, a trained deep learning model may be obtained.
It should be noted that, the process of determining the sample rendering image 513 according to the sample to-be-processed image 511 may refer to the foregoing process of inputting the sample to-be-processed image into the first neural network of the deep learning model to obtain the second regression parameter of the sample to-be-processed image, and inputting the sample to-be-processed image and the second regression parameter into the second neural network of the deep learning model to obtain the sample rendering image corresponding to the sample to-be-processed image, and the process of determining the sample image 514 according to the sample to-be-processed image 511 may refer to the foregoing process corresponding to the foregoing sample image generating method, which will not be described herein.
According to the embodiment of the disclosure, the parameters of the deep learning model are finely adjusted by using the sample image with the tag information, and the tag information comprises the scrambling regression parameters corresponding to the sample image, so that the deep learning model can learn more dimensions and higher-precision characteristics, and the output precision of the trained deep learning model can be effectively improved.
Fig. 6 schematically illustrates a flowchart of an object information reconstruction method according to an embodiment of the present disclosure.
As shown in fig. 6, the method includes operations S610 to S620.
In operation S610, a target image to be processed including a target object is acquired.
In operation S620, the target image to be processed is input into a deep learning model, and object reconstruction information corresponding to the target object is obtained, and the deep learning model is trained according to a training method of the deep learning model.
It should be noted that, the target image to be processed may have a corresponding feature and an acquiring manner with the target image to be processed, and the target object may have a corresponding feature with the target sample object, which is not described herein.
According to embodiments of the present disclosure, the object reconstruction information may include two-dimensional map three-dimensional feature information corresponding to the target object and two-dimensional image information containing more detailed features. For example, the target object is a human face, the human face image is input into the deep learning model, and the obtained object reconstruction information can include three-dimensional reconstructed human face information, such as human face geometric information, human face texture information, human face material information, point normal information of each point corresponding to the human face, and the like, and two-dimensional human face images containing more detail features, such as details including acne, spots, and the like.
Through the above embodiments of the present disclosure, a method for generating a weakly supervised reconstruction of a large-scale data object is provided. In the method, the neural differentiable ray tracing rendering is combined with the large-scale object data, so that the rapid and efficient 3D digital object geometry and material generation can be realized. In addition, the method and the device have the advantages that the problems of high hardware cost when a high-precision scanner is required to scan and high calculation cost when the modeling face is required to be subjected to detail processing optimization can be solved, and the method and the device have great advantages in the aspects of calculation cost, hardware cost, terminal suitability, rendering engine adaptation, convergence speed and the like. The method can be multiplexed in various scenes, such as generating interactive scenes suitable for metauniverse virtual digital people, generating interactive scenes suitable for virtual images of most terminals, and the like.
Fig. 7 schematically shows a block diagram of a sample image generating device according to an embodiment of the present disclosure.
As shown in fig. 7, the sample image generating apparatus 700 includes a first acquisition module 710, a first determination module 720, a scrambling module 730, a second determination module 740, and a third determination module 750.
The first obtaining module 710 is configured to obtain a first regression parameter of an image to be processed, where the image to be processed includes a target sample object, and the first regression parameter includes geometric information of the target sample object.
The first determining module 720 is configured to determine a geometric model corresponding to the target sample object according to the geometric information of the target sample object.
And a scrambling module 730, configured to scramble the geometric model to obtain a scrambling map.
The second determining module 740 is configured to determine tag information of the scrambling map according to the scrambling regression parameter of the scrambling map.
A third determining module 750 is configured to determine a sample image according to the tag information and the scrambling map.
According to an embodiment of the present disclosure, the scrambling module includes a first determination unit and a first scrambling unit.
And the first determining unit is used for determining point normal information corresponding to the points in the geometric model, wherein the point normal information represents normal information of the points in the geometric model in a three-dimensional space.
And the first scrambling unit is used for scrambling the points in the geometric model in the normal direction corresponding to the points according to the point normal information to obtain a scrambling map.
According to an embodiment of the present disclosure, the scrambling module includes an acquisition unit and a second scrambling unit.
And the acquisition unit is used for acquiring the scrambling parameters.
And the second scrambling unit is used for scrambling the geometric model according to the scrambling parameters to obtain a scrambling map.
According to an embodiment of the present disclosure, the first regression parameters further include a target sample object texture map, a first spherical harmonic illumination coefficient, and a first camera parameter. The scrambling parameters include at least one of: a target sample object geometric scrambling parameter, a target sample object texture scrambling parameter, a spherical harmonic illumination scrambling parameter, and a camera scrambling parameter. The second scrambling unit includes at least one of: a scrambling subunit, a texture filling subunit, a color rendering subunit, a lighting rendering subunit, and a projection subunit.
And the scrambling subunit is used for scrambling the geometric model according to the geometric scrambling parameters of the target sample object.
And the texture filling subunit is used for performing texture filling on the geometric model according to the target sample object texture scrambling parameters and the target sample object texture mapping.
And the color rendering subunit is used for performing color rendering on the geometric model according to the target sample object material scrambling parameters and the target sample object material mapping.
And the illumination rendering subunit is used for performing illumination rendering on the geometric model according to the spherical harmonic illumination scrambling parameter and the first spherical harmonic illumination coefficient.
And the projection subunit is used for projecting the geometric model into the image space according to the camera scrambling parameters and the first camera parameters.
According to an embodiment of the present disclosure, the scrambling regression parameters include at least one of: scrambling point normal information, target sample object scrambling geometry information, target sample object scrambling texture map, scrambling sphere harmonic illumination coefficient, and scrambling camera parameters.
According to an embodiment of the disclosure, the third determination module comprises a filling unit and an association unit.
And the filling unit is used for filling texture boundaries of the scrambling map to obtain a scrambling rendering image.
And the association unit is used for associating the tag information with the scrambled rendering image to obtain the scrambled rendering image comprising the tag information as a sample image.
Fig. 8 schematically illustrates a block diagram of a training apparatus of a deep learning model according to an embodiment of the present disclosure.
As shown in fig. 8, the training apparatus 800 of the deep learning model includes a first obtaining module 810, a second obtaining module 820, a training module 830, and a fine tuning module 840.
The first obtaining module 810 is configured to input a sample to-be-processed image into a first neural network of the deep learning model, and obtain a second regression parameter of the sample to-be-processed image, where the sample to-be-processed image includes an object to be processed, and the second regression parameter includes geometric information of the object to be processed.
The second obtaining module 820 is configured to input the sample to-be-processed image and the second regression parameter into a second neural network of the deep learning model, so as to obtain a sample rendered image corresponding to the sample to-be-processed image.
The training module 830 is configured to train the deep learning model by using the sample rendering image and the sample to-be-processed image, so as to obtain a training result.
And a fine tuning module 840, configured to, in response to determining that the training result meets a predetermined condition, fine tune parameters of the deep learning model using the sample image and the image to be processed, and obtain a trained deep learning model. Wherein the sample image is generated by processing the image to be processed by the sample image generating device according to the disclosure.
According to an embodiment of the present disclosure, the training module includes a second determination unit, a third determination unit, a fourth determination unit, and a training unit.
And the second determining unit is used for determining the first distance loss according to the similarity between the sample rendering image and the sample to-be-processed image.
And the third determining unit is used for determining the first perception loss according to the characteristic representation of the sample rendering image and the characteristic representation of the sample to-be-processed image.
And a fourth determining unit for determining a first norm loss between the sample rendered image and the sample image to be processed.
And the training unit is used for training the deep learning model according to the first distance loss, the first perception loss and the first norm loss.
According to an embodiment of the present disclosure, the predetermined condition includes that the first distance loss, the first perceptual loss, and the first norm loss all converge.
According to an embodiment of the present disclosure, the fine adjustment module includes a fifth determination unit, a sixth determination unit, a seventh determination unit, and a fine adjustment unit.
And a fifth determining unit for determining a second distance loss according to the similarity between the sample image and the image to be processed.
And a sixth determining unit, configured to determine a second perceptual loss according to the feature representation of the sample image and the feature representation of the image to be processed.
A seventh determining unit for determining a second norm loss between the sample image and the image to be processed.
And the fine tuning unit is used for fine tuning the parameters of the deep learning model according to the second distance loss, the second perception loss and the second norm loss.
According to an embodiment of the present disclosure, the fine tuning unit comprises a determination subunit and a fine tuning subunit.
A determining subunit for determining a canonical term loss configured for the second distance loss, the second perceptual loss, and the second norm loss.
And the fine tuning subunit is used for fine tuning the parameters of the deep learning model according to the second distance loss, the second perception loss, the second norm loss and the regular term loss.
According to an embodiment of the present disclosure, the second regression parameters further include a texture map of the object to be processed, a second spherical harmonic illumination coefficient, and a second camera parameter. The second obtaining module includes an eighth determining unit, an obtaining unit, a ninth determining unit, a tenth determining unit, an eleventh determining unit, and a projection unit.
And an eighth determining unit, configured to determine a pixel-level preliminary texture map corresponding to the object to be processed according to the geometric information of the object to be processed and the second camera parameter.
And the obtaining unit is used for carrying out illumination removal and feature symmetry processing on the pixel-level preliminary texture map to obtain the pixel-level target texture map.
And the ninth determining unit is used for determining the preliminary rendering texture mapping corresponding to the object to be processed according to the texture mapping of the object to be processed, the texture mapping of the object to be processed and the second spherical harmonic illumination coefficient.
And a tenth determining unit for determining a high-frequency diffuse reflection map corresponding to the object to be processed according to the pixel-level target texture map and the preliminary rendering texture map.
And the eleventh determining unit is used for determining the target rendering texture map according to the high-frequency diffuse reflection map, the material map of the object to be processed and the second spherical harmonic illumination coefficient.
And the projection unit is used for projecting the target rendering texture map to an image space according to the second camera parameter to obtain a sample rendering image.
Fig. 9 schematically shows a block diagram of an object information reconstruction apparatus according to an embodiment of the present disclosure.
As shown in fig. 9, the object information reconstructing apparatus 900 includes a second acquisition module 910 and a third acquisition module 920.
A second acquiring module 910, configured to acquire a target image to be processed including a target object.
The third obtaining module 920 is configured to input the target image to be processed into a deep learning model, to obtain object reconstruction information corresponding to the target object, where the deep learning model is obtained by training by the training device of the deep learning model according to the disclosure.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
According to an embodiment of the present disclosure, an electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform at least one of the sample image generation method, the training method of the deep learning model, and the object information reconstruction method of the present disclosure.
According to an embodiment of the present disclosure, a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform at least one of a sample image generation method, a training method of a deep learning model, and an object information reconstruction method of the present disclosure.
According to an embodiment of the present disclosure, a computer program product includes a computer program stored on at least one of a readable storage medium and an electronic device, which when executed by a processor, implements at least one of a sample image generation method, a training method of a deep learning model, and an object information reconstruction method of the present disclosure.
Fig. 10 shows a schematic block diagram of an example electronic device 1000 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 10, the apparatus 1000 includes a computing unit 1001 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1002 or a computer program loaded from a storage unit 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data required for the operation of the device 1000 can also be stored. The computing unit 1001, the ROM 1002, and the RAM 1003 are connected to each other by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.
Various components in device 1000 are connected to I/O interface 1005, including: an input unit 1006 such as a keyboard, a mouse, and the like; an output unit 1007 such as various types of displays, speakers, and the like; a storage unit 1008 such as a magnetic disk, an optical disk, or the like; and communication unit 1009 such as a network card, modem, wireless communication transceiver, etc. Communication unit 1009 allows device 1000 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks.
The computing unit 1001 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 1001 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 1001 performs the respective methods and processes described above, such as at least one of a sample image generation method, a training method of a deep learning model, and an object information reconstruction method. For example, in some embodiments, at least one of the sample image generation method, the training method of the deep learning model, and the object information reconstruction method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 1008. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 1000 via ROM 1002 and/or communication unit 1009. When the computer program is loaded into the RAM 1003 and executed by the computing unit 1001, one or more steps of at least one of the sample image generating method, the training method of the deep learning model, and the object information reconstructing method described above may be performed. Alternatively, in other embodiments, the computing unit 1001 may be configured to perform at least one of the sample image generation method, the training method of the deep learning model, and the object information reconstruction method in any other suitable manner (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (18)

1. A sample image generation method, comprising:
acquiring a first regression parameter of an image to be processed, wherein the image to be processed comprises a target sample object, and the first regression parameter comprises geometric information of the target sample object;
determining a geometric model corresponding to the target sample object according to the geometric information of the target sample object;
scrambling the geometric model to obtain a scrambling map, including: scrambling the position information of the points in the geometric model and the relative position relation between adjacent points to obtain the scrambling map;
Determining label information of the scrambling map according to scrambling regression parameters of the scrambling map; and
and determining a sample image according to the label information and the scrambling map.
2. The method of claim 1, wherein the scrambling the geometric model to obtain a scrambling map comprises:
determining point normal information corresponding to points in the geometric model, wherein the point normal information represents normal information of the points in the geometric model in a three-dimensional space; and
and scrambling the point in the geometric model in the normal direction corresponding to the point according to the point normal information to obtain the scrambling map.
3. The method of claim 1, wherein the scrambling the geometric model to obtain a scrambling map comprises:
obtaining scrambling parameters; and
and scrambling the geometric model according to the scrambling parameters to obtain the scrambling map.
4. The method of claim 3, wherein the first regression parameters further comprise a target sample object texture map, a first spherical harmonic illumination coefficient, and a first camera parameter; the scrambling parameters include at least one of: a target sample object geometric scrambling parameter, a target sample object texture scrambling parameter, a target sample object material scrambling parameter, a spherical harmonic illumination scrambling parameter and a camera scrambling parameter;
Scrambling the geometric model according to the scrambling parameters to obtain the scrambling map comprises at least one of the following steps:
scrambling the geometric model according to the geometric scrambling parameters of the target sample object;
performing texture filling on the geometric model according to the target sample object texture scrambling parameters and the target sample object texture map;
performing color rendering on the geometric model according to the scrambling parameters of the target sample object materials and the target sample object material maps;
performing illumination rendering on the geometric model according to the spherical harmonic illumination scrambling parameter and the first spherical harmonic illumination coefficient; and
the geometric model is projected into an image space according to the camera scrambling parameters and the first camera parameters.
5. The method of any of claims 1-4, wherein the scrambling regression parameters include at least one of: scrambling point normal information, target sample object scrambling geometry information, target sample object scrambling texture map, scrambling sphere harmonic illumination coefficient, and scrambling camera parameters.
6. The method of claim 1, wherein the determining a sample image from the tag information and the scrambling map comprises:
Performing texture boundary filling on the scrambling map to obtain a scrambling rendering image; and
and correlating the tag information with the scrambled rendering image to obtain the scrambled rendering image comprising the tag information as the sample image.
7. A training method of a deep learning model, comprising:
inputting a sample to-be-processed image into a first neural network of a deep learning model to obtain a second regression parameter of the sample to-be-processed image, wherein the sample to-be-processed image comprises an object to be processed, and the second regression parameter comprises geometric information of the object to be processed;
inputting the sample to-be-processed image and the second regression parameters into a second neural network of the deep learning model to obtain a sample rendering image corresponding to the sample to-be-processed image;
training the deep learning model by using the sample rendering image and the sample to-be-processed image to obtain a training result; and
in response to determining that the training result meets a predetermined condition, fine-tuning parameters of the deep learning model by using a sample image and an image to be processed to obtain a trained deep learning model;
wherein the sample image is generated by processing the image to be processed according to the method of any one of claims 1-6.
8. The method of claim 7, wherein the training the deep learning model with the sample rendered image and the sample pending image comprises:
determining a first distance loss according to the similarity between the sample rendering image and the sample to-be-processed image;
determining a first perception loss according to the characteristic representation of the sample rendering image and the characteristic representation of the sample to-be-processed image;
determining a first norm loss between the sample rendered image and the sample to-be-processed image; and
training the deep learning model according to the first distance loss, the first perception loss and the first norm loss.
9. The method of claim 8, wherein the predetermined condition includes that the first distance loss, the first perceived loss, and the first norm loss all converge.
10. The method of claim 7, wherein the fine-tuning parameters of the deep learning model using the sample image and the image to be processed comprises:
determining a second distance loss according to the similarity between the sample image and the image to be processed;
determining a second perceived loss according to the characteristic representation of the sample image and the characteristic representation of the image to be processed;
Determining a second norm loss between the sample image and the image to be processed; and
and fine tuning parameters of the deep learning model according to the second distance loss, the second perception loss and the second norm loss.
11. The method of claim 10, wherein the fine tuning of parameters of the deep learning model according to the second distance loss, the second perceptual loss, and the second norm loss comprises:
determining a canonical term loss configured for the second distance loss, the second perceptual loss, and the second norm loss; and
and fine tuning parameters of the deep learning model according to the second distance loss, the second perception loss, the second norm loss and the regular term loss.
12. The method of claim 7, wherein the second regression parameters further comprise an object texture map to be processed, a second spherical harmonic illumination coefficient, and a second camera parameter;
inputting the sample to-be-processed image and the second regression parameters into a second neural network of the deep learning model, and obtaining a sample rendering image corresponding to the sample to-be-processed image comprises:
Determining a pixel-level preliminary texture map corresponding to the object to be processed according to the geometric information of the object to be processed and the second camera parameters;
performing illumination removal and feature symmetry processing on the pixel-level preliminary texture map to obtain a pixel-level target texture map;
determining a preliminary rendering texture map corresponding to the object to be processed according to the object to be processed texture map, the object to be processed texture map and the second spherical harmonic illumination coefficient;
determining a high-frequency diffuse reflection map corresponding to the object to be processed according to the pixel-level target texture map and the preliminary rendering texture map;
determining a target rendering texture map according to the high-frequency diffuse reflection map, the material map of the object to be processed and the second spherical harmonic illumination coefficient; and
and projecting the target rendering texture map to an image space according to the second camera parameters to obtain the sample rendering image.
13. An object information reconstruction method, comprising:
acquiring a target to-be-processed image comprising a target object;
inputting the target to-be-processed image into a deep learning model to obtain object reconstruction information corresponding to the target object,
Wherein the deep learning model is trained in accordance with the method of any one of claims 6-12.
14. A sample image generation apparatus comprising:
the first acquisition module is used for acquiring first regression parameters of an image to be processed, wherein the image to be processed comprises a target sample object, and the first regression parameters comprise geometric information of the target sample object;
the first determining module is used for determining a geometric model corresponding to the target sample object according to the geometric information of the target sample object;
the scrambling module is configured to scramble the geometric model to obtain a scrambling map, and includes: scrambling the position information of the points in the geometric model and the relative position relation between adjacent points to obtain the scrambling map;
the second determining module is used for determining label information of the scrambling map according to scrambling regression parameters of the scrambling map; and
and a third determining module, configured to determine a sample image according to the tag information and the scrambling map.
15. A training device for a deep learning model, comprising:
the first acquisition module is used for inputting a sample to-be-processed image into a first neural network of the deep learning model to obtain a second regression parameter of the sample to-be-processed image, wherein the sample to-be-processed image comprises an object to be processed, and the second regression parameter comprises geometric information of the object to be processed;
The second obtaining module is used for inputting the sample to-be-processed image and the second regression parameters into a second neural network of the deep learning model to obtain a sample rendering image corresponding to the sample to-be-processed image;
the training module is used for training the deep learning model by utilizing the sample rendering image and the sample to-be-processed image to obtain a training result; and
the fine adjustment module is used for responding to the fact that the training result meets the preset condition, and fine adjustment of parameters of the deep learning model is achieved through the sample image and the image to be processed, so that a trained deep learning model is obtained;
wherein the sample image is generated by processing the image to be processed according to the apparatus of claim 14.
16. An object information reconstruction apparatus comprising:
the second acquisition module is used for acquiring a target to-be-processed image comprising a target object;
a third obtaining module, configured to input the target image to be processed into a deep learning model, to obtain object reconstruction information corresponding to the target object,
wherein the deep learning model is trained from the apparatus of claim 15.
17. An electronic device, comprising:
At least one processor; and
a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-13.
18. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-13.
CN202211644067.XA 2022-12-20 2022-12-20 Image generation, model training and information reconstruction methods and devices and electronic equipment Active CN116012666B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211644067.XA CN116012666B (en) 2022-12-20 2022-12-20 Image generation, model training and information reconstruction methods and devices and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211644067.XA CN116012666B (en) 2022-12-20 2022-12-20 Image generation, model training and information reconstruction methods and devices and electronic equipment

Publications (2)

Publication Number Publication Date
CN116012666A CN116012666A (en) 2023-04-25
CN116012666B true CN116012666B (en) 2023-10-27

Family

ID=86031872

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211644067.XA Active CN116012666B (en) 2022-12-20 2022-12-20 Image generation, model training and information reconstruction methods and devices and electronic equipment

Country Status (1)

Country Link
CN (1) CN116012666B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111951372A (en) * 2020-06-30 2020-11-17 重庆灵翎互娱科技有限公司 Three-dimensional face model generation method and equipment
CN113807425A (en) * 2021-09-11 2021-12-17 中南大学 Tissue pathology image classification method based on self-adaptive regular depth clustering
CN113869449A (en) * 2021-10-11 2021-12-31 北京百度网讯科技有限公司 Model training method, image processing method, device, equipment and storage medium
CN114612743A (en) * 2022-03-10 2022-06-10 北京百度网讯科技有限公司 Deep learning model training method, target object identification method and device
CN114792355A (en) * 2022-06-24 2022-07-26 北京百度网讯科技有限公司 Virtual image generation method and device, electronic equipment and storage medium
CN114842121A (en) * 2022-06-30 2022-08-02 北京百度网讯科技有限公司 Method, device, equipment and medium for generating mapping model training and mapping
CN115345980A (en) * 2022-10-18 2022-11-15 北京百度网讯科技有限公司 Generation method and device of personalized texture map
CN115439305A (en) * 2021-06-01 2022-12-06 北京字跳网络技术有限公司 Image generation method, apparatus, device and medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111951372A (en) * 2020-06-30 2020-11-17 重庆灵翎互娱科技有限公司 Three-dimensional face model generation method and equipment
CN115439305A (en) * 2021-06-01 2022-12-06 北京字跳网络技术有限公司 Image generation method, apparatus, device and medium
CN113807425A (en) * 2021-09-11 2021-12-17 中南大学 Tissue pathology image classification method based on self-adaptive regular depth clustering
CN113869449A (en) * 2021-10-11 2021-12-31 北京百度网讯科技有限公司 Model training method, image processing method, device, equipment and storage medium
CN114612743A (en) * 2022-03-10 2022-06-10 北京百度网讯科技有限公司 Deep learning model training method, target object identification method and device
CN114792355A (en) * 2022-06-24 2022-07-26 北京百度网讯科技有限公司 Virtual image generation method and device, electronic equipment and storage medium
CN114842121A (en) * 2022-06-30 2022-08-02 北京百度网讯科技有限公司 Method, device, equipment and medium for generating mapping model training and mapping
CN115345980A (en) * 2022-10-18 2022-11-15 北京百度网讯科技有限公司 Generation method and device of personalized texture map

Also Published As

Publication number Publication date
CN116012666A (en) 2023-04-25

Similar Documents

Publication Publication Date Title
JP7373554B2 (en) Cross-domain image transformation
JP7142162B2 (en) Posture variation 3D facial attribute generation
CN113327278B (en) Three-dimensional face reconstruction method, device, equipment and storage medium
CN115082639A (en) Image generation method and device, electronic equipment and storage medium
CN114820905B (en) Virtual image generation method and device, electronic equipment and readable storage medium
US10163247B2 (en) Context-adaptive allocation of render model resources
CN112785674A (en) Texture map generation method, rendering method, device, equipment and storage medium
US11983815B2 (en) Synthesizing high resolution 3D shapes from lower resolution representations for synthetic data generation systems and applications
CN113313832B (en) Semantic generation method and device of three-dimensional model, storage medium and electronic equipment
CN111951368A (en) Point cloud, voxel and multi-view fusion deep learning method
Liu et al. High-quality textured 3D shape reconstruction with cascaded fully convolutional networks
US11741678B2 (en) Virtual object construction method, apparatus and storage medium
CN114792355B (en) Virtual image generation method and device, electronic equipment and storage medium
CN115100337A (en) Whole body portrait video relighting method and device based on convolutional neural network
CN114708374A (en) Virtual image generation method and device, electronic equipment and storage medium
US20230298243A1 (en) 3d digital avatar generation from a single or few portrait images
CN116863078A (en) Three-dimensional human body model reconstruction method, three-dimensional human body model reconstruction device, electronic equipment and readable medium
Lin et al. Multiview textured mesh recovery by differentiable rendering
CN115375847B (en) Material recovery method, three-dimensional model generation method and model training method
CN116012666B (en) Image generation, model training and information reconstruction methods and devices and electronic equipment
CN115082298A (en) Image generation method, image generation device, electronic device, and storage medium
CN116385643B (en) Virtual image generation method, virtual image model training method, virtual image generation device, virtual image model training device and electronic equipment
CN114820908B (en) Virtual image generation method and device, electronic equipment and storage medium
CN115953553B (en) Avatar generation method, apparatus, electronic device, and storage medium
CN116843807A (en) Virtual image generation method, virtual image model training method, virtual image generation device, virtual image model training device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant