CN116385643A

CN116385643A - Virtual image generation method, virtual image model training method, virtual image generation device, virtual image model training device and electronic equipment

Info

Publication number: CN116385643A
Application number: CN202310347132.0A
Authority: CN
Inventors: 李�杰; 陈睿智; 赵晨
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2023-04-03
Filing date: 2023-04-03
Publication date: 2023-07-04
Anticipated expiration: 2043-04-03
Also published as: CN116385643B

Abstract

The disclosure provides a training method, a training device and electronic equipment for virtual image generation and model, relates to the technical field of artificial intelligence, in particular to the technical fields of computer vision, augmented reality, virtual reality, deep learning and the like, and can be applied to scenes such as metauniverse, digital people and the like. The specific implementation scheme is as follows: determining an initial set of neural radiation field vectors for a target object in an input image; determining a stylized nerve radiation field vector set according to the preset style information and the initial nerve radiation field vector set; and generating a stylized avatar of the target object from the set of stylized neural radiation field vectors.

Description

Virtual image generation method, virtual image model training method, virtual image generation device, virtual image model training device and electronic equipment

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to the technical fields of computer vision, augmented reality, virtual reality, deep learning and the like, and can be applied to scenes such as metauniverse, digital people and the like, and particularly relates to a training method, a training device and electronic equipment for virtual image generation and model.

Background

Virtual digital people are one of the key elements that create a metauniverse virtual world. According to different business requirements of digital persons, the digital persons can be divided into 2-dimensional, 3-dimensional, cartoon, realistic writing, super realistic writing and the like. In a real scenario, a basic avatar to adapt to business needs to be built for a virtual digital person.

Disclosure of Invention

The disclosure provides an avatar generation method, an avatar generation device, an avatar model training device and electronic equipment.

According to an aspect of the present disclosure, there is provided an avatar generation method including: determining an initial set of neural radiation field vectors for a target object in an input image; determining a stylized nerve radiation field vector set according to preset style information and the initial nerve radiation field vector set; and generating a stylized avatar of the target object from the set of stylized neural radiation field vectors.

According to another aspect of the present disclosure, there is provided a training method of a deep learning model, including: inputting a sample image into a first neural network of the deep learning model to obtain a sample initial neural radiation field vector set of a sample object, wherein the sample image comprises the sample object; inputting the sample initial neural radiation field vector set into a second neural network of the deep learning model to obtain initial image characteristics of the sample object; obtaining a sample stylized nerve radiation field vector set of the sample object according to the initial image characteristics and the sample initial nerve radiation field vector set; extracting features of the sample stylized nerve radiation field vector set to obtain stylized image features of the sample object; and training the deep learning model according to the sample stylized neural radiation field vector set, the initial image features and the stylized image features to obtain a trained deep learning model.

According to another aspect of the present disclosure, there is provided an avatar generation method including: acquiring an image to be processed, wherein the image to be processed comprises an object to be processed; and inputting the image to be processed into a deep learning model to obtain a stylized virtual image of the object to be processed, wherein the neural network model is trained by using the training method of the deep learning model.

According to another aspect of the present disclosure, there is provided an avatar generating apparatus including: a first determining module for determining an initial set of neural radiation field vectors for a target object in an input image; the second determining module is used for determining a stylized nerve radiation field vector set according to the preset style information and the initial nerve radiation field vector set; and a generation module for generating a stylized avatar of the target object from the set of stylized neural radiation field vectors.

According to another aspect of the present disclosure, there is provided a training apparatus of a deep learning model, including: the first obtaining module is used for inputting a sample image into a first neural network of the deep learning model to obtain a sample initial neural radiation field vector set of a sample object, wherein the sample image comprises the sample object; the second obtaining module is used for inputting the sample initial neural radiation field vector set into a second neural network of the deep learning model to obtain initial image characteristics of the sample object; the third obtaining module is used for obtaining a sample stylized nerve radiation field vector set of the sample object according to the initial image characteristics and the sample initial nerve radiation field vector set; the fourth obtaining module is used for extracting the characteristics of the sample stylized nerve radiation field vector set to obtain the stylized image characteristics of the sample object; and the training module is used for training the deep learning model according to the sample stylized nerve radiation field vector set, the initial image features and the stylized image features to obtain a trained deep learning model.

According to another aspect of the present disclosure, there is provided an avatar generating apparatus including: the acquisition module is used for acquiring an image to be processed, wherein the image to be processed comprises an object to be processed; and a fifth obtaining module, configured to input the image to be processed into a deep learning model, to obtain a stylized avatar of the object to be processed, where the neural network model is trained by using the training device of the deep learning model according to the disclosure.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform at least one of the avatar generation method and the training method of the deep learning model of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform at least one of the avatar generation method and the training method of the deep learning model of the present disclosure.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program stored on at least one of a readable storage medium and an electronic device, which when executed by a processor, implements at least one of the avatar generation method and the training method of the deep learning model of the present disclosure.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 schematically illustrates an exemplary system architecture to which at least one of an avatar generation method and a training method of a deep learning model may be applied, and a corresponding apparatus, according to an embodiment of the present disclosure;

fig. 2 schematically illustrates a flowchart of an avatar generation method according to an embodiment of the present disclosure;

FIG. 3 schematically illustrates a flow chart of a training method of a deep learning model according to an embodiment of the present disclosure;

Fig. 4 schematically illustrates a flowchart of a method for implementing avatar generation based on a deep learning model according to an embodiment of the present disclosure;

FIG. 5 schematically illustrates a schematic diagram of stylized rendering of an image based on a deep learning model in accordance with an embodiment of the present disclosure;

fig. 6 schematically illustrates a block diagram of an avatar generating apparatus according to an embodiment of the present disclosure;

FIG. 7 schematically illustrates a block diagram of a training apparatus of a deep learning model according to an embodiment of the present disclosure;

fig. 8 schematically illustrates a block diagram of an avatar generating apparatus according to an embodiment of the present disclosure; and

FIG. 9 illustrates a schematic block diagram of an example electronic device that may be used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing, applying and the like of the personal information of the user all conform to the regulations of related laws and regulations, necessary security measures are adopted, and the public order harmony is not violated.

In the technical scheme of the disclosure, the authorization or consent of the user is obtained before the personal information of the user is obtained or acquired.

When designing an avatar of a high-quality avatar, a professional animator is required to perform professional optimization design on geometric modeling, texture mapping, illumination mapping, and the like of the avatar to construct a basic avatar adapting to business requirements. For example, fine-grained modeling of digital human material, lighting models, 3D models, etc. is required due to the stylization requirements of the stylized avatar. When the stylized rendering map of the virtual image is designed, professional designers are required to be relied on, and iterative optimization design is carried out according to service requirements.

The inventor finds that in the process of implementing the disclosed concept, the method for generating the virtual image needs professional design on various aspects such as geometric texture and the like by professional designers depending on professional software, and the hardware cost and the design cost are high.

Fig. 1 schematically illustrates an exemplary system architecture to which at least one of an avatar generation method and a training method of a deep learning model may be applied, and a corresponding apparatus, according to an embodiment of the present disclosure.

It should be noted that fig. 1 is only an example of a system architecture to which embodiments of the present disclosure may be applied to assist those skilled in the art in understanding the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other devices, systems, environments, or scenarios. For example, in another embodiment, an exemplary system architecture to which at least one of the avatar generation method and the training method of the deep learning model may be applied may include a terminal device, but the terminal device may implement at least one of the avatar generation method and the training method of the deep learning model provided in the embodiments of the present disclosure without interacting with a server.

As shown in fig. 1, a system architecture 100 according to this embodiment may include a first terminal device 101, a second terminal device 102, a third terminal device 103, a network 104, and a server 105. The network 104 is a medium used to provide a communication link between the first terminal device 101, the second terminal device 102, the third terminal device 103, and the server 105. The network 104 may include various connection types, such as wired and/or wireless communication links, and the like.

The user may interact with the server 105 via the network 104 using the first terminal device 101, the second terminal device 102, the third terminal device 103, to receive or send messages etc. Various communication client applications, such as a knowledge reading class application, a web browser application, a search class application, an instant messaging tool, a mailbox client and/or social platform software, etc. (by way of example only) may be installed on the first terminal device 101, the second terminal device 102, the third terminal device 103.

The first terminal device 101, the second terminal device 102, the third terminal device 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server (merely an example) providing support for content browsed by the user with the first terminal apparatus 101, the second terminal apparatus 102, the third terminal apparatus 103. The background management server may analyze and process the received data such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the terminal device. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual Private Server" or simply "VPS") are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.

It should be noted that, the content processing method provided by the embodiment of the present disclosure may be generally performed by the first terminal device 101, the second terminal device 102, or the third terminal device 103. Accordingly, the content processing apparatus provided by the embodiments of the present disclosure may also be provided in the first terminal device 101, the second terminal device 102, or the third terminal device 103.

Alternatively, at least one of the avatar generation method and the training method of the deep learning model provided by the embodiments of the present disclosure may also be generally performed by the server 105. Accordingly, at least one of the avatar generation device and the training device of the deep learning model provided in the embodiments of the present disclosure may be generally provided in the server 105. At least one of the avatar generation method and the training method of the deep learning model provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103, and/or the server 105. Accordingly, at least one of the avatar generation apparatus and the training apparatus of the deep learning model provided in the embodiments of the present disclosure may also be provided in a server or a server cluster that is different from the server 105 and is capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103, and/or the server 105.

For example, when the avatar needs to be generated, the first terminal device 101, the second terminal device 102, and the third terminal device 103 may acquire an input image, then transmit the acquired input image to the server 105, analyze the input image by the server 105, determine an initial set of neural radiation field vectors of the target object in the input image, determine a set of stylized neural radiation field vectors according to the preset style information and the initial set of neural radiation field vectors, and generate the stylized avatar of the target object according to the set of stylized neural radiation field vectors. Or by a server or server cluster capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103 and/or the server 105, and enabling the generation of a stylized avatar of the target object.

For example, when the deep learning model needs to be trained, the first terminal device 101, the second terminal device 102 and the third terminal device 103 may acquire a sample image, then send the acquired sample image to the server 105, the server 105 analyzes the sample image, and input the sample image into the first neural network of the deep learning model to obtain a sample initial neural radiation field vector set of the sample object, where the sample image includes the sample object; inputting the sample initial nerve radiation field vector set into a second neural network of the deep learning model to obtain a sample stylized nerve radiation field vector set of a sample object; obtaining initial image characteristics of a sample object under a first view angle according to a sample initial nerve radiation field vector set; according to the sample stylized nerve radiation field vector set, obtaining the stylized image characteristics of the sample object under the second view angle; and training the deep learning model according to the sample stylized neural radiation field vector set, the initial image features and the stylized image features to obtain a trained deep learning model. Or by a server or server cluster capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103 and/or the server 105, and enabling training of the deep learning model.

For example, when the avatar needs to be generated, the first terminal device 101, the second terminal device 102, and the third terminal device 103 may further acquire an image to be processed, where the image to be processed includes an object to be processed, and then send the acquired image to the server 105, and the server 105 analyzes the image to be processed, and inputs the image to be processed into the deep learning model, so as to obtain a stylized avatar of the object to be processed. Or by a server or server cluster capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103 and/or the server 105, and obtaining a stylized avatar of the object to be processed.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Fig. 2 schematically illustrates a flowchart of an avatar generation method according to an embodiment of the present disclosure.

As shown in fig. 2, the method includes operations S210 to S230.

In operation S210, an initial set of neural radiation field vectors of a target object in an input image is determined.

In operation S220, a set of stylized neural radiation field vectors is determined from the preset style information and the initial set of neural radiation field vectors.

In operation S230, a stylized avatar of the target object is generated according to the stylized neural radiation field vector set.

According to an embodiment of the present disclosure, the target object may include at least one of a face, a landscape, an animal, a plant, etc., and may not be limited thereto. The input image may include one or more images acquired from one or more angles for the target object. The input image acquisition method may further include: first, a video is acquired for a target object. Then, an input image is obtained by capturing a video frame.

It should be noted that, the face image in this embodiment may come from the public data set, or authorization of the user corresponding to the face image is performed when information is acquired for the face in this embodiment.

According to the embodiment of the disclosure, feature extraction can be performed on an input image first to obtain image features. The image features may then be processed using a Nerf (NEural Radiance Field, neuro-radiation field) model to obtain a plurality of neuro-radiation field vectors. Each neural radiation field vector may include spatial point color and volume density of the target object at the acquisition view angle of the input image. From the plurality of neuro-radiation field vectors, an initial set of neuro-radiation field vectors may be determined.

According to an embodiment of the present disclosure, before the input image is subjected to the foregoing processing, the input image may also be first subjected to preprocessing in combination with a graph preprocessing method, to obtain an input image including only the target object. Then, by performing a Nerf model-based processing procedure on an input image including only the target object, an initial set of neural radiation field vectors of the target object can be obtained.

According to an embodiment of the present disclosure, the preset style information may be style information of the target image itself. The preset style information may also include any one style of, for example, cartoon, european, van Gao, art, etc., and may not be limited thereto. The preset style information can be determined by processing and analyzing the acquired stylized content, and can also be provided by a model pre-trained based on the preset stylized information. The stylized content may be derived from an input image or from other input information, and is not limited herein. The stylized content may include at least one of: text content, audio content, image content, etc. having style information, and may not be limited thereto.

According to the embodiment of the disclosure, after the preset style information is determined, rendering can be performed for each nerve radiation field vector in the initial nerve radiation field vector set in combination with the preset style information to obtain a plurality of nerve radiation field vectors with the preset style information. From a plurality of neural radiation field vectors having preset style information, a set of stylized neural radiation field vectors may be determined.

According to embodiments of the present disclosure, voxel rendering is performed based on a stylized neuro-radiation field vector set, and a predicted three-dimensional avatar corresponding to a target object may be obtained. The three-dimensional avatar may include preset style information. From the three-dimensional avatar, a stylized avatar of the target object may be determined.

Through the embodiment of the present disclosure, the stylized rendering can be performed in the dimension of the neural radiation field vector, and the stylized effect of the avatar determined based on the manner is more detailed and complete.

The method shown in fig. 2 is further described below in connection with the specific examples.

According to an embodiment of the present disclosure, the above operation S210 may include: a target object three-dimensional model for the target object is determined from the input image. And determining an initial nerve radiation field vector set according to the target object three-dimensional model.

According to embodiments of the present disclosure, the input image may include one or more images acquired from different perspectives for the target object. For each input image, feature extraction may be performed on the input image first, to obtain a first object feature of the target object in the input image. Then, the target object can be three-dimensionally reconstructed according to the first object feature extracted from one input image, and a target object three-dimensional model is obtained. The three-dimensional reconstruction of the target object may also be performed according to the first object features extracted from the plurality of input images, to obtain a three-dimensional model of the target object. The three-dimensional model of the target object obtained by the latter can have higher precision.

According to the embodiment of the disclosure, the target object three-dimensional model can be processed based on the Nerf model to obtain an initial nerve radiation field vector set.

By the above-described embodiments of the present disclosure, determining the initial set of neural radiation field vectors based on the target object three-dimensional model, the accuracy of the determined initial set of neural radiation field vectors may be improved.

According to an embodiment of the present disclosure, determining the target object three-dimensional model for the target object from the input image may include: camera intrinsic parameters and expression coefficients corresponding to the target object are determined according to the input image. And determining a three-dimensional model of the target object according to the camera internal parameters and the expression coefficients.

According to embodiments of the present disclosure, camera parameters may include parameters related to the characteristics of the camera itself, such as the focal length of the camera, etc. The expression may include at least one of a facial expression, a limb expression, a speech expression, etc., and may not be limited thereto. The expression coefficient may represent a variation or weight of an object expression of the target object with respect to a preset expression. That is, according to the preset expression and the expression coefficient, the expression of the object can be determined.

According to an embodiment of the present disclosure, an input image may be processed to obtain camera intrinsic parameters and expression coefficients corresponding to a target object in combination with at least one of a 3DMM (3D Morphable Models, three-dimensional deformable model) model and an albedo-3DMM (reflectivity-three-dimensional deformable model). And rendering the first object characteristic of the target object in the input image by combining the camera internal parameters and the expression coefficients, so that a three-dimensional model of the target object can be obtained.

Through the embodiment of the disclosure, the target object is rendered into the three-dimensional image by combining the camera internal parameters and the expression coefficients, so that the method has a higher convergence rate compared with the method for processing the two-dimensional input image, and the processing efficiency and the processing precision can be effectively improved.

According to an embodiment of the present disclosure, determining the initial set of neural radiation field vectors from the three-dimensional model of the target object may include: from the input image, camera parameters corresponding to the target object are determined. And determining an initial nerve radiation field vector set according to the camera external parameters and the target object three-dimensional model.

According to embodiments of the present disclosure, camera external parameters may include parameters in the world coordinate system, such as the position, rotational direction, etc. of the camera. The camera external parameters can also be combined with a 3DMM model to process the input image.

According to embodiments of the present disclosure, from camera external parameters, a relative positional relationship of a camera for acquiring an input image and a three-dimensional model of a target object may be determined. By extracting the second object features of the three-dimensional model of the target object, in combination with the relative positional relationship, an initial set of neural radiation field vectors can be determined.

According to the embodiment of the disclosure, a Nerf model can be combined to process the camera external parameters and the target object three-dimensional model to obtain an initial nerve radiation field vector set.

When the initial set of neural radiation field vectors needs to be determined, the initial set of neural radiation field vectors may also be determined according to the internal and external parameters of the camera and the first object feature of the target object in the input image, which is not limited herein. The first object feature may include illumination information, color information, etc. of each pixel of the target object, and may not be limited thereto. The second object feature may include illumination information, color information, etc. of each voxel of the target object three-dimensional model, and may not be limited thereto.

According to an embodiment of the present disclosure, determining the initial set of neural radiation field vectors from the camera outlier and the target object three-dimensional model may include: and determining the pose of the camera according to the external parameters of the camera. And determining an initial nerve radiation field vector set according to the camera pose and voxel information of the three-dimensional model of the target object.

According to an embodiment of the present disclosure, voxel information may include illumination information, color information, position information, etc. of each voxel that a target object three-dimensional model exhibits at a camera view angle, and may not be limited thereto.

According to embodiments of the present disclosure, a ray initiation point may be determined from a camera pose. The light termination point of each beam of light exiting the camera to the three-dimensional model of the target object may be determined from voxel information of the three-dimensional model of the target object. From the color and volume density of each ray path spatial point, the neural radiation field vector corresponding to the ray can be determined. From the camera pose and voxel information of each voxel in the three-dimensional model of the target object, a neuro-radiation field vector from the camera to each voxel in the three-dimensional model of the target object may be determined, and thus an initial neuro-radiation field vector set of the three-dimensional model of the target object may be determined.

With the above-described embodiments of the present disclosure, the initial set of neural radiation field vectors determined based on the target object three-dimensional model may have a higher accuracy.

According to an embodiment of the present disclosure, the above operation S220 may include: and determining a two-dimensional image of the target object under the first view angle according to the initial nerve radiation field vector set. And determining a stylized nerve radiation field vector set according to the target object two-dimensional image, the initial nerve radiation field vector set and the preset style information.

According to an embodiment of the present disclosure, after obtaining the initial set of neural radiation field vectors, a two-dimensional image of the target object may be obtained by performing an operation of rendering the initial set of neural radiation field vectors into an image space. By rendering the initial set of neural radiation field vectors into a plurality of image spaces, a two-dimensional image of the target object at a plurality of different perspectives of the target object may be obtained.

According to the embodiment of the disclosure, in the process of rendering the initial neural radiation field vector set in combination with the preset style information, a target object two-dimensional image can be further introduced as a constraint. On the basis, the characteristics of the two-dimensional image of the target object can be combined, the initial nerve radiation field vector set can be subjected to stylized rendering, and the stylized nerve radiation field vector set which is more closely related to the target object can be obtained.

Through the above embodiment of the present disclosure, in combination with the two-dimensional image of the target object, the initial set of the neural radiation field vectors, and the preset style information, the determined set of the stylized neural radiation field vectors may be more closely associated with the target object, so that stylized rendering of the target object may be achieved on the basis of not changing the own characteristics of the target object.

According to an embodiment of the present disclosure, the above operation S230 may include: and determining a target object stylized image of the target object under the second view angle according to the stylized nerve radiation field vector set. And determining the stylized virtual image according to the target object stylized image.

According to an embodiment of the present disclosure, after obtaining the set of stylized neural radiation field vectors, a target object stylized image may be obtained by performing an operation of rendering the set of stylized neural radiation field vectors to an image space. By rendering the set of stylized neural radiation field vectors into a plurality of image spaces, a target object stylized image of the target object at a plurality of different perspectives may be obtained. Then, a stylized avatar of the target object may be generated based on the image features of the one or more target object stylized images.

After the set of the stylized neural radiation field vectors is obtained, the set of the stylized neural radiation field vectors may be rendered into a three-dimensional space according to the color and the volume density of each of the stylized neural radiation field vectors, and the stylized avatar may be obtained.

Through the above embodiments of the present disclosure, in the dimension of the neural radiation field vector, the stylized avatar determined based on the stylized neural radiation field vector set may have a finer, accurate, smooth presentation effect.

According to the embodiment of the disclosure, corresponding to the avatar generation method, a deep learning model can be trained, so that an automatic and intelligent process of generating the stylized avatar can be realized.

Fig. 3 schematically illustrates a flowchart of a training method of a deep learning model according to an embodiment of the present disclosure.

As shown in FIG. 3, the method includes operations S310-S350.

In operation S310, a sample image is input to a first neural network of a deep learning model, and a sample initial set of neural radiation field vectors of a sample object is obtained, wherein the sample image includes the sample object.

In operation S320, the sample initial set of neural radiation field vectors is input to a second neural network of the deep learning model, resulting in initial image features of the sample object.

In operation S330, a sample stylized neural radiation field vector set of the sample object is obtained from the initial image features and the sample initial neural radiation field vector set.

In operation S340, feature extraction is performed on the sample stylized neural radiation field vector set to obtain stylized image features of the sample object.

In operation S350, the deep learning model is trained according to the sample stylized neural radiation field vector set, the initial image features, and the stylized image features, resulting in a trained deep learning model.

According to embodiments of the present disclosure, the first neural network may include 3DMM and albed _o -3DMM, and Nert, and may not be limited thereto. The second neural network may include at least one of MLP (Multilayer Perceptron, multi-layer perceptron), FC (fully connected layers, fully connected layer), CNN (Convolutional Neural Networks, convolutional neural network), CLIP (Contrastive Language-Image Pre-training neural network model for matching images and text), and the like, and may not be limited thereto.

According to an embodiment of the present disclosure, after inputting the sample image into the first neural network, for example, the inside and outside of the camera and the expression coefficients of the sample object may be obtained first in conjunction with the 3 DMM. Then, internal and external parameters and expression coefficients of the camera can be processed based on Nert, so that a sample initial nerve radiation field vector set of the sample object is obtained.

According to embodiments of the present disclosure, a second neural network may be used to perform feature extraction on the input data. After obtaining the sample initial set of neural radiation field vectors, the sample initial set of neural radiation field vectors may be input into a second neural network model. The second neural network model may perform a process of extracting features from the sample initial set of neural radiation fields, a process of obtaining a sample stylized set of neural radiation fields from the extracted initial image features and the sample initial set of neural radiation fields, and a process of extracting features from the sample stylized set of neural radiation fields, to obtain stylized image features of the sample object.

For example, after the sample initial set of neural radiation field vectors is input to the CLIP, sample initial neural radiation field vector features may be first extracted. The initial image features may then be determined from the sample initial neural radiation field vector features. After obtaining the sample stylized neural radiation field vector set based on CLIP, sample stylized neural radiation field vector features may first be extracted. The stylized image characteristics may then be determined from the sample stylized neural radiation field vector characteristics.

For example, after the sample initial set of neural radiation field vectors is input to the CLIP, sample initial set of neural radiation field vector features may also be first rendered into image space, resulting in an initial two-dimensional image of the sample object at one or more viewing angles. Then, by performing feature extraction on each initial two-dimensional image, initial image features can be obtained. After obtaining the sample stylized neural radiation field vector set based on CLIP, the sample stylized neural radiation field vector features may also be first rendered into image space to obtain a stylized two-dimensional image of the sample object at one or more perspectives. Then, by performing feature extraction on each stylized two-dimensional image, the stylized image features can be obtained.

According to embodiments of the present disclosure, since the sample object itself has a default style. Thus, based on the color and volume density information of the sample initial set of neural radiation field vectors, after feature extraction of the sample initial set of neural radiation field vectors, initial image features having a default style may be obtained. Processing the initial image features and the sample initial set of neural radiation field vectors based on the second neural network may result in a sample stylized set of neural radiation field vectors having a default style.

According to the embodiment of the disclosure, after the sample stylized neural radiation field vector set, the initial image feature and the stylized image feature are obtained, a loss function can be constructed according to the sample stylized neural radiation field vector set, the initial image feature and the stylized image feature, and a loss value is calculated so as to perform parameter adjustment on the deep learning model, and training on the deep learning model is achieved.

Through the embodiment of the disclosure, the neural radiation field can be combined with the traditional rendering engine based on the deep learning model obtained by training the neural radiation field vector, and the stylized avatar can be generated by combining the three-dimensional model of the object and the neural radiation field, so that the style rendering result of the generated avatar can have higher precision. Particularly when used for stylized rendering of objects, the rendering accuracy can be improved. The model trained based on the method can be migrated to other styles with low cost, and has stronger expansibility.

The method shown in fig. 3 is further described below in connection with the specific examples.

According to an embodiment of the present disclosure, the above operation S350 may include: determining a canonical loss from a distance between each two sample stylized neural radiation field vectors in the sample stylized neural radiation field vector set. And determining the perception loss according to the initial image characteristics and the stylized image characteristics. And adjusting model parameters of the deep learning model according to the regular loss and the perception loss.

According to the embodiment of the disclosure, based on the regular loss, a regular term constraint can be established for a sample stylized neural radiation field vector set, and the regular term constraint can be used for constraining that the distance between adjacent stylized neural radiation field vectors is smaller and the distance between distant stylized neural radiation field vectors is larger.

According to the embodiment of the disclosure, based on the perception loss, coarse-granularity constraint can be established for the initial two-dimensional image and the stylized two-dimensional image of each view, and the coarse-granularity constraint can be used for constraining that different views are aimed at the same sample object. The perceived loss may be calculated from perceived loss calculation logic in the feature network. The feature network may include an LPIPS (Learned Perceptual Image Patch Similarity, learning-aware image block similarity) network, and may not be limited thereto.

According to embodiments of the present disclosure, adjusting model parameters of the deep learning model according to the canonical and perceptual losses may include: a first training penalty is determined based on the canonical penalty and the perceptual penalty. For example, the canonical loss can be lostR_loss and perceived loss P_loss are added to obtain a first Training loss tracking_loss ₁ =r_loss+p_loss. Then, training loss may be based on the first Training loss ₁ Parameters of the deep learning model are adjusted until training_loss ₁ And (5) convergence.

Through the embodiment of the disclosure, the output precision of the deep learning model can be effectively improved by training the deep learning model based on the neural radiation field vector and the obtained characteristics.

According to an embodiment of the present disclosure, the above operation S350 may further include: and determining sample content characteristics according to the sample content information. Training the deep learning model according to the sample content characteristics, the sample stylized neural radiation field vector set, the initial image characteristics and the stylized image characteristics.

According to an embodiment of the present disclosure, the sample content information may include at least one of: sample text information, sample audio information, and sample image information, and may not be limited thereto.

According to the embodiment of the disclosure, in the process of inputting the sample initial neural radiation field vector set and the sample stylized neural radiation field vector set into the second neural network, sample content information can be synchronously input into the second neural network, and sample content characteristics can be obtained. The sample content features may include, for example, text features, audio features, image features, and the like, with style information, and may not be limited thereto. In this embodiment, the deep learning model may be trained by combining the sample content features, the sample stylized neural radiation field vector set obtained as described above, the initial image features, and the stylized image features.

According to embodiments of the present disclosure, training the deep learning model based on the sample content features, the sample stylized neuro-radiation field vector set, the initial image features, and the stylized image features may include: a first distance loss is determined based on the euclidean distance between the sample content features and the stylized image features. A second distance penalty is determined based on the vector distance between the initial image feature and the stylized image feature. And adjusting model parameters of the deep learning model according to the first distance loss and the second distance loss.

According to embodiments of the present disclosure, the first distance loss may be determined from the absolute euclidean spatial distance of the direction of the sample content features and the stylized image features. The second distance penalty may be determined from a relative vector distance constraint of the vector differences between the initial image features and the stylized image features. Based on the second distance loss, finer granularity constraints may be established for the initial two-dimensional image and the stylized two-dimensional image, which may be used to constrain more detailed features.

According to an embodiment of the present disclosure, adjusting model parameters of the deep learning model according to the first distance loss and the second distance loss may include: and determining a second training loss according to the first distance loss and the second distance loss. For example, the first distance loss D_loss may be reduced ₁ And a second distance loss D_loss ₂ Adding to obtain a second Training loss training_loss ₂ ＝D_loss ₁ +D_loss ₂ . Then, training loss may be based on the second Training loss ₂ Parameters of the deep learning model are adjusted until training_loss ₂ And (5) convergence.

According to an embodiment of the present disclosure, training the neural network model from the sample stylized neural radiation field vector set, the initial image features, and the stylized image features may further include: from the stylized image features, local and global stylized features of the sample object are determined. And determining the contrast loss according to the local stylized characteristic and the global stylized characteristic. And adjusting model parameters of the deep learning model according to the contrast loss.

According to embodiments of the present disclosure, a contrast loss may be determined from a comparison of spatial feature vectors of local and global stylized features. Based on the contrast loss, the method can be used for constraining sample objects in the stylized two-dimensional image to be of the same style.

According to the embodiment of the disclosure, the deep learning can be performed according to at least one of the first training loss and the second training loss and the contrast lossModel parameters of the model are adjusted. For example, the third training loss may be determined based on the first training loss, the second training loss, and the contrast loss. Specifically, the Training loss may be based on the first Training loss ₁ Second Training loss training_loss ₂ And the comparison loss C_loss is added to obtain a third Training loss training_loss ₃ ＝Training_loss ₁ +Training_loss ₂ +C_loss. Then, training loss may be based on the third Training loss ₃ Parameters of the deep learning model are adjusted until training_loss ₃ And (5) convergence.

Through the embodiment of the disclosure, the deep learning model can be trained by combining the characteristics of multiple dimensions such as the sample content characteristics, the sample stylized nerve radiation field vector set, the initial image characteristics, the stylized image characteristics and the like, so that the output precision of the deep learning model can be effectively improved, and particularly, the accuracy of the stylized rendering result can be improved.

According to the embodiment of the disclosure, the style information used in training can be learned based on the deep learning model obtained by training through the method. For example, the preset style information may be obtained based on a deep learning model. When the stylized rendering is performed subsequently, the stylized rendering can be performed based on the deep learning model in combination with style information learned in the deep learning model.

Fig. 4 schematically illustrates a flowchart of a method for implementing avatar generation based on a deep learning model according to an embodiment of the present disclosure.

As shown in fig. 4, the method includes operations S410 to S420.

In operation S410, a to-be-processed image including an object to be processed is acquired.

In operation S420, the image to be processed is input into the deep learning model, resulting in a stylized avatar of the object to be processed.

According to an embodiment of the present disclosure, the image to be processed may include an image acquired for the object to be processed. The object to be processed and the target object may have the same or similar characteristics, and will not be described herein.

According to an embodiment of the present disclosure, the above-described operation S420 may include: and inputting the image to be processed into a deep learning model to obtain a target stylized nerve radiation field vector set of the object to be processed. And determining the stylized avatar according to the target stylized nerve radiation field vector set.

According to an embodiment of the present disclosure, after inputting an image to be processed into a deep learning model, a process of determining an inside parameter of a camera and an expression coefficient of an object to be processed and determining an initial set of neural radiation field vectors according to the inside parameter of the camera and the expression coefficient may be performed on the image to be processed based on a first neural network in the deep learning model. Then, feature extraction can be performed on the initial set of neural radiation field vectors based on the second neural network in the deep learning model, and a target set of stylized neural radiation field vectors can be determined based on the extracted features, the initial set of neural radiation field vectors, and the stylized features learned by the deep learning model. The set of neural radiation field vectors may then be stylized to determine a stylized avatar.

According to embodiments of the present disclosure, the above-described process of determining a stylized avatar from a target set of stylized neural radiation field vectors may also be implemented based on one of the network models in the deep learning model, for example. On the basis, after the deep learning model receives the image to be processed, the stylized virtual image of the object to be processed after rendering can be directly performed according to the style information learned by the deep learning model.

Fig. 5 schematically illustrates a schematic diagram of stylized rendering of an image based on a deep learning model in accordance with an embodiment of the present disclosure.

As shown in fig. 5, image 510 is the original image. The deep learning model 520 includes a first neural network 521 and a second neural network 522. The image 510 is input into a deep learning model 520, and a stylized image 530 may be obtained via processing of a first neural network 521 and a second neural network 522 in the deep learning model 520.

According to an embodiment of the present disclosure, the deep learning model may be, for example, a model trained based on the style information of the pencil drawing style or the like (may not be limited thereto), and the stylized image 530 may include the pencil drawing style image or the like as shown in fig. 5, and may not be limited thereto.

Through the embodiment of the disclosure, the image can be easily stylized based on the deep learning model obtained by training the sample content information with the style information, and a better stylized rendering effect can be realized.

According to the avatar generation method, the training method of the deep learning model and the avatar generation method based on the trained deep learning model, a series of multi-mode stylized avatar generation driving methods are realized, the methods can be suitable for generating interactive scenes by the avatar of most terminals, and have great advantages in the aspects of calculation cost, hardware cost, terminal suitability, rendering engine adaptation, convergence speed and the like compared with other methods.

Fig. 6 schematically illustrates a block diagram of an avatar generating apparatus according to an embodiment of the present disclosure.

As shown in fig. 6, the avatar generating apparatus 600 includes a first determination module 610, a second determination module 620, and a generation module 630.

A first determination module 610 is configured to determine an initial set of neural radiation field vectors for a target object in an input image.

The second determining module 620 is configured to determine a stylized neural radiation field vector set according to the preset style information and the initial neural radiation field vector set.

A generating module 630 for generating a stylized avatar of the target object from the set of stylized neural radiation field vectors.

According to an embodiment of the present disclosure, the second determination module includes a first determination sub-module and a second determination sub-module.

The first determining submodule is used for determining a two-dimensional image of the target object under the first visual angle according to the initial nerve radiation field vector set.

The second determining submodule is used for determining a stylized nerve radiation field vector set according to the target object two-dimensional image, the initial nerve radiation field vector set and the preset style information.

According to an embodiment of the present disclosure, the generation module comprises a third determination sub-module and a fourth determination sub-module.

And the third determining submodule is used for determining a target object stylized image of the target object under the second visual angle according to the stylized nerve radiation field vector set.

And a fourth determining sub-module for determining a stylized avatar according to the target object stylized image.

According to an embodiment of the present disclosure, the first determination module includes a fifth determination sub-module and a sixth determination sub-module.

And a fifth determining sub-module for determining a three-dimensional model of the target object for the target object based on the input image.

And a sixth determining submodule, configured to determine an initial set of neural radiation field vectors based on the three-dimensional model of the target object.

According to an embodiment of the present disclosure, the fifth determination submodule includes a first determination unit and a second determination unit.

And the first determining unit is used for determining the camera intrinsic parameters and the expression coefficients corresponding to the target object according to the input image.

And the second determining unit is used for determining a three-dimensional model of the target object according to the camera internal parameters and the expression coefficients.

According to an embodiment of the present disclosure, the sixth determination submodule includes a third determination unit and a fourth determination unit.

And the third determining unit is used for determining camera external parameters corresponding to the target object according to the input image.

And the fourth determining unit is used for determining an initial nerve radiation field vector set according to the camera external parameters and the target object three-dimensional model.

According to an embodiment of the present disclosure, the fourth determination unit includes a first determination subunit and a second determination subunit.

And the first determination subunit is used for determining the pose of the camera according to the camera external parameters.

And the second determining subunit is used for determining an initial nerve radiation field vector set according to the pose of the camera and the voxel information of the three-dimensional model of the target object.

Fig. 7 schematically illustrates a block diagram of a training apparatus of a deep learning model according to an embodiment of the present disclosure.

As shown in fig. 7, the training apparatus 700 of the deep learning model includes a first obtaining module 710, a second obtaining module 720, a third obtaining module 730, a fourth obtaining module 740, and a training module 750.

The first obtaining module 710 is configured to input a sample image into a first neural network of the deep learning model, to obtain a sample initial set of neural radiation field vectors of the sample object, where the sample image includes the sample object.

The second obtaining module 720 is configured to input the sample initial neural radiation field vector set into a second neural network of the deep learning model, so as to obtain initial image features of the sample object.

A third obtaining module 730 is configured to obtain a sample stylized neural radiation field vector set of the sample object according to the initial image feature and the sample initial neural radiation field vector set.

And a fourth obtaining module 740, configured to perform feature extraction on the sample stylized neural radiation field vector set to obtain a stylized image feature of the sample object.

The training module 750 is configured to train the deep learning model according to the sample stylized neural radiation field vector set, the initial image feature, and the stylized image feature, to obtain a trained deep learning model.

According to an embodiment of the present disclosure, the training module includes a seventh determination sub-module, an eighth determination sub-module, and a first adjustment sub-module.

A seventh determination submodule is configured to determine a canonical loss from a distance between each two sample stylized neural radiation field vectors in the set of sample stylized neural radiation field vectors.

And an eighth determination submodule, configured to determine a perceptual loss according to the initial image feature and the stylized image feature.

And the first adjustment sub-module is used for adjusting model parameters of the deep learning model according to the regular loss and the perceived loss.

According to an embodiment of the present disclosure, the training module includes a ninth determination sub-module and a training sub-module.

A ninth determining submodule, configured to determine a sample content feature according to the sample content information; and

and the training sub-module is used for training the deep learning model according to the sample content characteristics, the sample stylized nerve radiation field vector set, the initial image characteristics and the stylized image characteristics.

According to an embodiment of the present disclosure, the training submodule includes a fifth determination unit, a sixth determination unit, and an adjustment unit.

And a fifth determining unit for determining a first distance loss according to the Euclidean distance between the sample content feature and the stylized image feature.

And a sixth determining unit for determining a second distance loss based on the vector distance between the initial image feature and the stylized image feature.

And the adjusting unit is used for adjusting the model parameters of the deep learning model according to the first distance loss and the second distance loss.

According to an embodiment of the present disclosure, the training module further comprises a tenth determination sub-module, an eleventh determination sub-module, and a second adjustment sub-module.

A tenth determination submodule is used for determining local stylized characteristics and global stylized characteristics of the sample object according to the stylized image characteristics.

An eleventh determination submodule is configured to determine a contrast loss based on the local stylized feature and the global stylized feature.

And the second adjusting sub-module is used for adjusting the model parameters of the deep learning model according to the contrast loss.

According to an embodiment of the present disclosure, the sample content information includes at least one of: sample text information, sample audio information, and sample image information.

Fig. 8 schematically illustrates a block diagram of an avatar generating apparatus according to an embodiment of the present disclosure.

As shown in fig. 8, the avatar generating apparatus 800 includes an acquisition module 810 and a fifth acquisition module 820.

The obtaining module 810 is configured to obtain an image to be processed, where the image to be processed includes an object to be processed.

And a fifth obtaining module 820 for inputting the image to be processed into the deep learning model to obtain the stylized avatar of the object to be processed. The neural network model is trained using a training device of the deep learning model according to the present disclosure.

According to an embodiment of the present disclosure, the fifth obtaining module includes an obtaining sub-module and a twelfth determining sub-module.

The obtaining submodule is used for inputting the image to be processed into the deep learning model to obtain a target stylized nerve radiation field vector set of the object to be processed.

A twelfth determination sub-module for determining a stylized avatar from the set of target stylized neural radiation field vectors.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

According to an embodiment of the present disclosure, an electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform at least one of the avatar generation method and the training method of the deep learning model of the present disclosure.

According to an embodiment of the present disclosure, a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform at least one of the avatar generation method and the training method of the deep learning model of the present disclosure.

According to an embodiment of the present disclosure, a computer program product includes a computer program stored on at least one of a readable storage medium and an electronic device, which when executed by a processor, implements at least one of the avatar generation method and the training method of the deep learning model of the present disclosure.

Fig. 9 shows a schematic block diagram of an example electronic device 900 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 9, the apparatus 900 includes a computing unit 901 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the device 900 can also be stored. The computing unit 901, the ROM 902, and the RAM 903 are connected to each other by a bus 904. An input/output (I/O) interface 905 is also connected to the bus 904.

Various components in device 900 are connected to an input/output (I/O) interface 905, including: an input unit 906 such as a keyboard, a mouse, or the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, an optical disk, or the like; and a communication unit 909 such as a network card, modem, wireless communication transceiver, or the like. The communication unit 909 allows the device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunications networks.

The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 901 performs the respective methods and processes described above, for example, at least one of an avatar generation method and a training method of a deep learning model. For example, in some embodiments, at least one of the avatar generation method and the training method of the deep learning model may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 900 via the ROM 902 and/or the communication unit 909. When the computer program is loaded into the RAM 903 and executed by the computing unit 901, one or more steps of at least one of the avatar generation method and the training method of the deep learning model described above may be performed. Alternatively, in other embodiments, the computing unit 901 may be configured to perform at least one of the avatar generation method and the training method of the deep learning model in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. An avatar generation method, comprising:

determining an initial set of neural radiation field vectors for a target object in an input image;

determining a stylized nerve radiation field vector set according to preset style information and the initial nerve radiation field vector set; and

generating a stylized avatar of the target object according to the stylized neural radiation field vector set.

2. The method of claim 1, wherein the determining a set of stylized neural radiation field vectors from preset style information and the initial set of neural radiation field vectors comprises:

Determining a target object two-dimensional image of the target object under a first view angle according to the initial nerve radiation field vector set; and

and determining the stylized nerve radiation field vector set according to the target object two-dimensional image, the initial nerve radiation field vector set and the preset style information.

3. The method of claim 1, wherein the generating a stylized avatar of the target object from the set of stylized neural radiation field vectors comprises:

determining a target object stylized image of the target object at a second view angle according to the stylized neural radiation field vector set; and

and determining the stylized virtual image according to the target object stylized image.

4. The method of claim 1, wherein the determining an initial set of neuro-radiation field vectors for the target object in the input image comprises:

determining a target object three-dimensional model for the target object according to the input image; and

and determining the initial nerve radiation field vector set according to the target object three-dimensional model.

5. The method of claim 4, wherein the determining a target object three-dimensional model for the target object from the input image comprises:

Determining camera intrinsic parameters and expression coefficients corresponding to the target object according to the input image; and

and determining the three-dimensional model of the target object according to the camera internal parameters and the expression coefficients.

6. The method of claim 4, wherein the determining the initial set of neural radiation field vectors from the three-dimensional model of the target object comprises:

determining camera parameters corresponding to the target object according to the input image; and

and determining the initial nerve radiation field vector set according to the camera external parameters and the target object three-dimensional model.

7. The method of claim 6, wherein the determining the initial set of neural radiation field vectors from the camera outliers and the target object three-dimensional model comprises:

determining the pose of the camera according to the camera external parameters; and

and determining the initial nerve radiation field vector set according to the camera pose and voxel information of the target object three-dimensional model.

8. A training method of a deep learning model, comprising:

inputting a sample image into a first neural network of a deep learning model to obtain a sample initial neural radiation field vector set of a sample object, wherein the sample image comprises the sample object;

Inputting the sample initial neural radiation field vector set into a second neural network of the deep learning model to obtain initial image characteristics of the sample object;

obtaining a sample stylized nerve radiation field vector set of the sample object according to the initial image characteristics and the sample initial nerve radiation field vector set;

extracting features of the sample stylized nerve radiation field vector set to obtain stylized image features of the sample object; and

and training the deep learning model according to the sample stylized neural radiation field vector set, the initial image features and the stylized image features to obtain a trained deep learning model.

9. The method of claim 8, wherein the training the deep learning model from the sample set of stylized neural radiation field vectors, the initial image features, and the stylized image features comprises:

determining a regular loss according to the distance between every two sample stylized neural radiation field vectors in the sample stylized neural radiation field vector set;

determining a perceived loss according to the initial image features and the stylized image features; and

And adjusting model parameters of the deep learning model according to the regular loss and the perception loss.

10. The method of claim 8, wherein the training the deep learning model from the sample set of stylized neural radiation field vectors, the initial image features, and the stylized image features comprises:

determining sample content characteristics according to the sample content information; and

training the deep learning model according to the sample content features, the sample stylized neural radiation field vector set, the initial image features and the stylized image features.

11. The method of claim 10, wherein the training the deep learning model from the sample content features, the sample set of stylized neural radiation field vectors, the initial image features, and the stylized image features comprises:

determining a first distance loss according to the Euclidean distance between the sample content features and the stylized image features;

determining a second distance loss according to the vector distance between the initial image feature and the stylized image feature; and

And adjusting model parameters of the deep learning model according to the first distance loss and the second distance loss.

12. The method of any of claims 8-11, wherein the training the deep learning model from the sample set of stylized neural radiation field vectors, the initial image features, and the stylized image features further comprises:

determining local stylized features and global stylized features of the sample object according to the stylized image features;

determining a contrast loss according to the local stylized feature and the global stylized feature; and

and according to the contrast loss, adjusting model parameters of the deep learning model.

13. The method of any of claims 10-11, wherein the sample content information comprises at least one of: sample text information, sample audio information, and sample image information.

14. An avatar generation method, comprising:

acquiring an image to be processed, wherein the image to be processed comprises an object to be processed; and

inputting the image to be processed into a deep learning model to obtain a stylized avatar of the object to be processed, wherein the neural network model is trained using the method according to any one of claims 8-13.

15. The method of claim 14, wherein the inputting the image to be processed into a deep learning model to obtain a stylized avatar for the object to be processed comprises:

inputting the image to be processed into a deep learning model to obtain a target stylized nerve radiation field vector set of the object to be processed; and

and determining the stylized avatar according to the target stylized nerve radiation field vector set.

16. An avatar generation apparatus comprising:

a first determining module for determining an initial set of neural radiation field vectors for a target object in an input image;

the second determining module is used for determining a stylized nerve radiation field vector set according to the preset style information and the initial nerve radiation field vector set; and

and the generating module is used for generating a stylized virtual image of the target object according to the stylized nerve radiation field vector set.

17. The apparatus of claim 16, wherein the second determination module comprises:

the first determining submodule is used for determining a target object two-dimensional image of the target object under a first visual angle according to the initial nerve radiation field vector set; and

And the second determining submodule is used for determining the stylized nerve radiation field vector set according to the target object two-dimensional image, the initial nerve radiation field vector set and the preset style information.

18. The apparatus of claim 16, wherein the generating means comprises:

a third determining submodule, configured to determine a target object stylized image of the target object at a second view angle according to the stylized neural radiation field vector set; and

and a fourth determining sub-module for determining the stylized avatar according to the target object stylized image.

19. The apparatus of claim 16, wherein the first determination module comprises:

a fifth determining sub-module for determining a target object three-dimensional model for the target object from the input image; and

a sixth determination submodule is configured to determine the initial set of neural radiation field vectors from the three-dimensional model of the target object.

20. The apparatus of claim 19, wherein the fifth determination submodule comprises:

a first determining unit configured to determine, according to the input image, a camera intrinsic parameter and an expression coefficient corresponding to the target object; and

And the second determining unit is used for determining the three-dimensional model of the target object according to the camera internal parameters and the expression coefficients.

21. The apparatus of claim 19, wherein the sixth determination submodule comprises:

a third determining unit, configured to determine, according to the input image, a camera external parameter corresponding to the target object; and

and a fourth determining unit, configured to determine the initial set of neural radiation field vectors according to the camera external parameters and the three-dimensional model of the target object.

22. The apparatus of claim 21, wherein the fourth determination unit comprises:

the first determining subunit is used for determining the pose of the camera according to the camera external parameters; and

and the second determining subunit is used for determining the initial nerve radiation field vector set according to the camera pose and the voxel information of the three-dimensional model of the target object.

23. A training device for a deep learning model, comprising:

the first obtaining module is used for inputting a sample image into a first neural network of the deep learning model to obtain a sample initial neural radiation field vector set of a sample object, wherein the sample image comprises the sample object;

the second obtaining module is used for inputting the sample initial neural radiation field vector set into a second neural network of the deep learning model to obtain initial image characteristics of the sample object;

The third obtaining module is used for obtaining a sample stylized nerve radiation field vector set of the sample object according to the initial image characteristics and the sample initial nerve radiation field vector set;

the fourth obtaining module is used for extracting the characteristics of the sample stylized nerve radiation field vector set to obtain the stylized image characteristics of the sample object; and

and the training module is used for training the deep learning model according to the sample stylized nerve radiation field vector set, the initial image characteristics and the stylized image characteristics to obtain a trained deep learning model.

24. The apparatus of claim 23, wherein the training module comprises:

a seventh determining submodule, configured to determine a canonical loss according to a distance between every two sample stylized neural radiation field vectors in the sample stylized neural radiation field vector set;

an eighth determination submodule for determining a perceived loss from the initial image features and the stylized image features; and

and the first adjustment submodule is used for adjusting the model parameters of the deep learning model according to the regular loss and the perception loss.

25. The apparatus of claim 23, wherein the training module comprises:

26. The apparatus of claim 25, wherein the training submodule comprises:

a fifth determining unit configured to determine a first distance loss according to a euclidean distance between the sample content feature and the stylized image feature;

a sixth determining unit configured to determine a second distance loss according to a vector distance between the initial image feature and the stylized image feature; and

27. The apparatus of any of claims 23-26, wherein the training module further comprises:

a tenth determination submodule for determining local stylized features and global stylized features of the sample object according to the stylized image features;

An eleventh determination submodule configured to determine a contrast loss based on the local stylized feature and the global stylized feature; and

and the second adjustment sub-module is used for adjusting the model parameters of the deep learning model according to the contrast loss.

28. The method of any of claims 25-26, wherein the sample content information comprises at least one of: sample text information, sample audio information, and sample image information.

29. An avatar generation apparatus comprising:

the acquisition module is used for acquiring an image to be processed, wherein the image to be processed comprises an object to be processed; and

a fifth obtaining module, configured to input the image to be processed into a deep learning model, and obtain a stylized avatar of the object to be processed, where the neural network model is trained by using the apparatus according to any one of claims 23-28.

30. The method of claim 29, wherein the fifth obtaining module comprises:

the obtaining submodule is used for inputting the image to be processed into a deep learning model to obtain a target stylized nerve radiation field vector set of the object to be processed; and

A twelfth determination sub-module for determining the stylized avatar from the set of target stylized neural radiation field vectors.

31. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-15.

32. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-15.

33. A computer program product comprising a computer program stored on at least one of a readable storage medium and an electronic device, which, when executed by a processor, implements the method according to any one of claims 1-15.