CN116109798B

CN116109798B - Image data processing method, device, equipment and medium

Info

Publication number: CN116109798B
Application number: CN202310350889.5A
Authority: CN
Inventors: 李文娟
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2023-04-04
Filing date: 2023-04-04
Publication date: 2023-06-09
Anticipated expiration: 2043-04-04
Also published as: CN116109798A

Abstract

The embodiment of the application discloses an image data processing method, an image data processing device, image data processing equipment and an image data processing medium, which can be applied to the technical field of image processing. The method comprises the following steps: acquiring a target viewpoint head portrait image, and determining a target three-dimensional head portrait grid based on the initial three-dimensional face shape; performing image unfolding processing on the target viewpoint head portrait image to obtain a first target texture map; obtaining texture coding features based on the first target texture map, obtaining texture image splicing features based on the texture coding features and the image coding features when the image coding features are obtained, and decoding the texture image splicing features to obtain a second target texture map; and obtaining a target rendering head portrait image based on the target three-dimensional head portrait grid and the second target texture mapping so as to generate a three-dimensional head model. By adopting the embodiment of the application, the three-dimensional head model of the virtual object can be automatically reconstructed based on the head portrait image of the virtual object, and the model rendering effect of the three-dimensional head model obtained by rendering is improved.

Description

Image data processing method, device, equipment and medium

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to an image data processing method, apparatus, device, and medium.

Background

The three-dimensional head portrait reconstruction refers to reconstructing head portrait images based on single view or multiple views to obtain three-dimensional information of head parts. Three-dimensional avatar reconstruction techniques may be applied to a variety of scenarios, such as may be used to reconstruct three-dimensional avatars of characters in a game to generate three-dimensional models of the game characters. At present, a three-dimensional model of a game role is rebuilt based on a three-dimensional head portrait rebuilding technology, three-dimensional information of a head part is generally obtained through rebuilding of head images based on single view or multiple views through some neural networks, then a texture map is obtained, and further the three-dimensional model of the game role is obtained through triggering rendering based on the three-dimensional information and the texture map through manual operation.

The inventors have found in practice that in the process of artificially reconstructing a three-dimensional model, a lot of time is required for programmers and artwork to be expended, so that the efficiency of reconstructing a three-dimensional head model is low. In addition, in the three-dimensional reconstruction through the directly acquired texture map, if the texture quality of the texture map once acquired is poor, the model rendering effect of the three-dimensional model obtained by rendering is reduced.

Disclosure of Invention

The embodiment of the application provides an image data processing method, device, equipment and medium, which can automatically reconstruct a three-dimensional head model of a virtual object based on an head portrait image of the virtual object, and can generate a texture map with better texture quality, so that the model rendering effect of the three-dimensional head model obtained by rendering can be improved.

In one aspect, an embodiment of the present application provides an image data processing method, including:

acquiring a target viewpoint head image corresponding to a head part of a target virtual object in a three-dimensional scene, and determining a target three-dimensional head image grid of the head part of the target virtual object in the three-dimensional scene based on the initial three-dimensional face shape when reconstructing the initial three-dimensional face shape of the obtained target viewpoint head image;

performing image unfolding processing on the target viewpoint head portrait image based on the target three-dimensional head portrait grid to obtain a first target texture map corresponding to the target viewpoint head portrait image;

performing texture coding processing on the first target texture map to obtain texture coding features corresponding to the first target texture map, performing feature stitching processing on the texture coding features and the image coding features when the image coding features of the target viewpoint head portrait image are obtained to obtain texture image stitching features for performing texture decoding, and performing texture decoding processing on the texture image stitching features to obtain a second target texture map corresponding to the first target texture map; the texture quality of the second target texture map is better than the texture quality of the first target texture map;

And obtaining a target rendering head image based on the target three-dimensional head image grid and the second target texture map, and generating a three-dimensional head model of the head part of the target virtual object under the three-dimensional scene based on the obtained target rendering head image.

An aspect of an embodiment of the present application provides an image data processing apparatus, including:

the grid generation module is used for acquiring a target viewpoint head image corresponding to a head part of a target virtual object in a three-dimensional scene, and determining a target three-dimensional head image grid of the head part of the target virtual object in the three-dimensional scene based on the initial three-dimensional face shape when reconstructing the initial three-dimensional face shape of the obtained target viewpoint head image;

the first texture generation module is used for performing image unfolding processing on the target viewpoint head portrait image based on the target three-dimensional head portrait grid to obtain a first target texture map corresponding to the target viewpoint head portrait image;

the second texture generation module is used for carrying out texture coding processing on the first target texture map to obtain texture coding features corresponding to the first target texture map, carrying out feature splicing processing on the texture coding features and the image coding features when the image coding features of the target viewpoint head portrait image are obtained to obtain texture image splicing features for texture decoding, and carrying out texture decoding processing on the texture image splicing features to obtain a second target texture map corresponding to the first target texture map; the texture quality of the second target texture map is better than the texture quality of the first target texture map;

And the rendering module is used for obtaining a target rendering head image based on the target three-dimensional head image grid and the second target texture map, and generating a three-dimensional head model of the head part of the target virtual object under the three-dimensional scene based on the obtained target rendering head image.

In one aspect, the present application provides a computer readable storage medium storing a computer program adapted to be loaded and executed by a processor, so that a computer device having the processor performs the method provided in the embodiments of the present application.

In one aspect, the present application provides a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the method provided in the embodiments of the present application.

In the embodiment of the application, the target three-dimensional head portrait grid of the head part of the target virtual object in the three-dimensional scene can be quickly reconstructed based on the input head image (such as the target viewpoint head portrait image), then the first target texture map obtained based on the expansion of the target three-dimensional head portrait grid can be subjected to encoding and decoding processing to obtain the texture map with better texture quality (namely the second target texture map), and further the target rendering head portrait image is obtained based on the target three-dimensional head portrait grid and the second target texture map, so that the three-dimensional head model of the target virtual object is generated. Thereby, the three-dimensional head model of the virtual object can be automatically reconstructed based on the head portrait image of the virtual object. In addition, the embodiment of the application can generate the texture map with better texture quality, so that the model rendering effect of the three-dimensional head model obtained by rendering can be improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of an image data processing method according to an embodiment of the present application;

fig. 2 is an effect schematic diagram of a target viewpoint head portrait image according to an embodiment of the present application;

FIG. 3 is a schematic diagram illustrating an effect of a first target texture map according to an embodiment of the present disclosure;

fig. 4 is a second flowchart of an image data processing method according to an embodiment of the present application;

FIG. 5 is a schematic flow chart of a reconstruction process of a three-dimensional head model according to an embodiment of the present application;

fig. 6 is a flowchart illustrating a method for processing image data according to an embodiment of the present application;

FIG. 7 is a flow chart of a training process for an initial modeling network provided in an embodiment of the present application;

FIG. 8 is a schematic diagram of the effect of a arbiter according to an embodiment of the present disclosure;

Fig. 9 is a flowchart of a method for processing image data according to an embodiment of the present application;

fig. 10 is a schematic structural view of an image data processing apparatus according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

Some technical terms related to the embodiments of the present application are described below:

mesh (mesh): also called 3D mesh, is the basic unit in computer graphics. In computer graphics, a mesh is a collection of vertices, edges, and faces that make up a 3D (3 Dimensions for short, chinese refers to three-dimensional) object. The mesh may be used to define the shape and contour of a 3D object, with the main attribute content of the mesh including vertex coordinates, normals, texture coordinates, and so on. These meshes used to construct 3D objects are typically composed of triangles, quadrilaterals, or other simple polygons. Among these, the most commonly used is the triangular mesh, which generally needs to store three types of information: vertices, edges, and faces. Wherein, the vertex: each triangular mesh has three vertexes, and each vertex is possibly shared with other triangular meshes; edges: edges connecting two vertexes, each triangular mesh is provided with three edges; and (3) surface: each triangular mesh corresponds to a face, which we can represent by a list of vertices or edges. It is understood that any polygonal mesh can be converted into a triangular mesh.

3D reconstruction: the method is also called three-dimensional reconstruction, namely establishing a mathematical model suitable for computer representation and processing on a three-dimensional object, is a basis for processing, operating and analyzing the three-dimensional object in a computer environment, and is also a key technology for establishing virtual reality expressing objective world in a computer. Currently, 3D reconstruction may reconstruct a three-dimensional model of an object in a two-dimensional image based on an input two-dimensional image.

Texture mapping: also known as UV mapping, is a planar representation for 3D model surfaces. U and V are coordinates of a texture map defining information of the 3D position of each point on the 2D picture, simply by mapping the points on the three-dimensional object onto a 2-dimensional space, i.e. corresponding texture coordinates can be specified for the mesh vertices in the three-dimensional mesh of the three-dimensional object. The process of creating a UV map is called UV expansion, which is mainly used to convert a three-dimensional mesh into a texture space (also called UV space) for representation, i.e. corresponding texture coordinates can be determined for mesh vertices in the three-dimensional mesh, which are coordinates represented in a normalized range, e.g. U, V all belong to [0,1]. The texture coordinates in the texture map define information of the position of each point on the texture map, and the points on the texture map are interrelated with the 3D model, so that when the 3D model is displayed, the texture corresponding to each grid in the 3D grid can be determined based on the texture map, and a model rendering image with texture effect can be obtained. It will be appreciated that by specifying texture coordinates, it is possible to map to texels in the texture map (i.e. pixels in the texture map). For example, if a two-dimensional texture map with a size of 256x256 is specified as (0.5, 1.0), the coordinates of the corresponding pixel are (128, 256), and then the pixel value corresponding to the corresponding texture coordinate (i.e. the value representing the color information) can be mapped into the three-dimensional grid, so as to implement the texture map to be attached to the three-dimensional grid.

Virtual object: virtual objects refer to objects that appear in some application scenarios, such as virtual environments that a gaming application is capable of providing. For example, the virtual object may be a virtual character that may be manipulated by a user in a game application, or may be a non-player character (NPC) in a game application, and the like, without limitation. The virtual environment may be a scene that is displayed (or provided) when a client of an application program (such as a game application) runs on a terminal, and the virtual environment refers to a scene that is created for a virtual object to perform an activity (such as a game competition), such as a virtual house, a virtual island, a virtual map, and the like. The virtual environment may be a real-world simulation environment, a semi-simulation and semi-imaginary environment, or a pure imaginary environment, which is not limited herein. In embodiments of the present application, the virtual environment may be a three-dimensional virtual environment, and the virtual object may be presented in three dimensions, and the virtual object is a three-dimensional model created based on some 3D modeling techniques. Each virtual object has its own shape and volume in the three-dimensional virtual environment, occupying a portion of the space in the three-dimensional virtual environment.

The embodiment of the application provides an image data processing method, which can quickly reconstruct and obtain a target three-dimensional head portrait grid of a head part of a target virtual object in a three-dimensional scene based on an input head image (such as a target viewpoint head portrait image), then can obtain a texture map with better texture quality (namely a second target texture map) by performing encoding and decoding processing on a first target texture map obtained by expanding the target three-dimensional head portrait grid, and further obtains a target rendering head portrait image based on the target three-dimensional head portrait grid and the second target texture map, thereby generating a three-dimensional head model of the target virtual object. Thereby, the three-dimensional head model of the virtual object can be automatically reconstructed based on the head portrait image of the virtual object. In addition, the embodiment of the application can generate the texture map with better texture quality, so that the model rendering effect of the three-dimensional head model obtained by rendering can be improved.

It may be appreciated that the embodiments of the present application may be applied to three-dimensional scenes of various business scenes, such as, for example, a game scene or an avatar generating scene, etc., which are not limited herein. For example, taking a service scenario as an example of a game scenario, a game application may include three-dimensional models of a plurality of virtual objects, but there may be some disqualified display effects of the three-dimensional models of the virtual objects in the virtual environment provided by the game application, especially disqualified display effects of head parts of the virtual objects, which may be due to poor modeling or rendering effects of the three-dimensional models of the heads of the virtual objects, so that some of the three-dimensional models of the heads of the virtual objects with disqualified display effects in the game application may be re-modeled, and the original three-dimensional model may be replaced and debugged based on the three-dimensional models of the virtual objects obtained by the re-modeling. Specifically, the remodelling of the three-dimensional model of some virtual objects with poor display effects can be achieved through the above image data processing scheme, that is, the head image of the virtual object with unqualified display effects can be extracted through an engine developer tool, then the corresponding target three-dimensional head portrait grid is obtained through reconstruction of the head image through the above image data processing scheme, the texture map (second target texture map) with excellent texture quality is determined based on the target three-dimensional head portrait grid, that is, the three-dimensional model of the head part of the reconstructed virtual object can be determined based on the three-dimensional head portrait grid and the second target texture map, and the image of the three-dimensional model of the head of the reconstructed virtual object can be rendered for effect display.

It should be noted that, before and during the process of collecting the relevant data of the user, the present application may display a prompt interface, a popup window or output a voice prompt message, where the prompt interface, popup window or voice prompt message is used to prompt the user to collect the relevant data currently, so that the present application only starts to execute the relevant step of obtaining the relevant data of the user after obtaining the confirmation operation of the user to the prompt interface or popup window, otherwise (i.e. when the confirmation operation of the user to the prompt interface or popup window is not obtained), the relevant step of obtaining the relevant data of the user is finished, i.e. the relevant data of the user is not obtained. In other words, all user data collected in the present application is collected with the consent and authorization of the user, and the collection, use and processing of relevant user data requires compliance with relevant laws and regulations and standards of the relevant country and region.

The technical scheme of the application can be applied to computer equipment, wherein the computer equipment can be a server, terminal equipment or other equipment for processing image data, and is not limited herein. Optionally, the method comprises the steps of. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligent platforms. Terminals include, but are not limited to, cell phones, computers, intelligent voice interaction devices, intelligent appliances, vehicle terminals, aircraft, intelligent speakers, intelligent appliances, and the like.

It can be understood that the above scenario is merely an example, and does not constitute a limitation on the application scenario of the technical solution provided in the embodiments of the present application, and the technical solution of the present application may also be applied to other scenarios. For example, as one of ordinary skill in the art can know, with the evolution of the system architecture and the appearance of new service scenarios, the technical solutions provided in the embodiments of the present application are equally applicable to similar technical problems.

Further, referring to fig. 1, fig. 1 is a flowchart illustrating a method for processing image data according to an embodiment of the present application. The method may be performed by the computer device described above. The method may comprise at least the following steps.

S101, acquiring a target viewpoint head image corresponding to a head part of a target virtual object in a three-dimensional scene, and determining a target three-dimensional head image grid of the head part of the target virtual object in the three-dimensional scene based on the initial three-dimensional face shape when reconstructing the initial three-dimensional face shape of the obtained target viewpoint head image.

It will be appreciated that the three-dimensional scene may be a scene used to build a three-dimensional model, for example, a three-dimensional model of a head portion of a virtual object in a gaming application may be reconstructed. The target virtual object may be a virtual object of the three-dimensional model of the head portion to be reconstructed. For example, the virtual object may be a three-dimensional game model in three-dimensional space, the virtual object may include multiple parts, and the parts may be combined in a certain manner, e.g., the part of the virtual object may include a body part, a head part, some decorations, and the like, which are not limited herein. Wherein the three-dimensional model of the head portion of the target virtual object may be referred to as a three-dimensional head model. It can be appreciated that in some scenarios, the target virtual object may be a virtual object with a disqualified display effect of the three-dimensional model of some head parts in the game application, and further, an image of the head part of the target virtual object under a certain angle may be obtained, so that the three-dimensional head model of the target virtual object may be reconstructed based on the obtained head portrait image of the target virtual object.

It is understood that the target viewpoint head portrait image may be an image acquired at the target viewpoint for the head portion of the target virtual object. It will be appreciated that the three-dimensional head model of the target virtual object may be referred to as an initial three-dimensional head model when the target viewpoint head image is acquired. The target viewpoint may be a position of a camera that obtains the target viewpoint head image, and the target viewpoint may be any viewpoint that can capture the face of the target virtual object, for example, the target viewpoint head image may be obtained by capturing from a viewpoint in a direction of a front, a left 30 degrees, or a right 30 degrees of a head portion of the three-dimensional model of the original target virtual object, which is not limited herein. It will be appreciated that, in general, the virtual object may further include decorations attached to the head portion, and the obtained target viewpoint head portrait image may also include the decorations, for example, a hat, hair, etc., which is not limited herein. It can be understood that in the embodiment of the present application, only one head image (i.e., the target viewpoint head image) may be needed for reconstructing the three-dimensional head model of a target virtual object, and compared with other methods that need to perform three-dimensional reconstruction based on multi-view head images, the embodiment of the present application can acquire the head image needed for three-dimensional reconstruction more conveniently and rapidly, which is helpful for improving the efficiency of reconstructing the three-dimensional head model.

For example, referring to fig. 2, fig. 2 is a schematic diagram showing an effect of a target viewpoint avatar image according to an embodiment of the present application. It will be appreciated that the three-dimensional mesh of the three-dimensional mesh combination of the initial three-dimensional head model of the target virtual object and the three-dimensional mesh of the other ornaments to which the head portion is connected may be as shown at 21a in fig. 2. It can be understood that after the texture map of the target virtual object is obtained, the corresponding texture map can be attached to the three-dimensional grid and rendered, so that the head portrait expression of the target virtual object is realized, and the initial three-dimensional head model of the target virtual object is obtained. Further, an image of the head portion of the target virtual object may be obtained under the target viewpoint, so as to obtain a target viewpoint head image, as shown by 22a in fig. 2, and it may be seen that the target viewpoint head image, as shown by 22a in fig. 2, may include the head portion of the target virtual object, some decorations, and the like.

It will be appreciated that the initial three-dimensional face shape may be a shape (also referred to as shape) used to characterize the face in the three-dimensional space after reconstruction of the target viewpoint head portrait image head portrait. The reconstruction of the initial three-dimensional face shape based on the target viewpoint head portrait image may be achieved based on various ways, such as determining the initial three-dimensional face shape based on a deep learning way, without limitation. Depending on the manner in which the initial three-dimensional face shape is obtained, the initial three-dimensional face shape may be represented in a corresponding manner, for example, the initial three-dimensional face shape may be represented as a depth map (depth), a point cloud (point group), a voxel (voxel), a mesh (mesh), or the like, which is not limited herein.

Alternatively, the initial three-dimensional facial shape may be determined based on a three-dimensional deformable head portrait model (3D Morphable Model, 3DMM for short). The three-dimensional coordinate information of the plurality of vertexes can be determined by adopting the three-dimensional deformable head portrait model, so that the three-dimensional coordinate information of the plurality of vertexes is adopted to represent the shape of the three-dimensional face, and the initial three-dimensional face shape is obtained.

In one embodiment, the initial three-dimensional face shape of the target viewpoint avatar image reconstructed based on the three-dimensional deformable avatar model may include: determining initial three-dimensional face parameters based on a pre-trained three-dimensional face coefficient regressor, wherein the initial three-dimensional face parameters comprise identity features, expression features, target face posture coefficients, initial texture coefficients and initial illumination coefficients; an initial three-dimensional facial shape is determined based on the identity features, the expression features in the initial three-dimensional facial parameters.

It will be appreciated that the identity feature is used to indicate the identity of the target virtual object, such as the character type of the target virtual object (e.g., may be a character type divided by gender, age, skill, etc., without limitation), such as the identity feature may be expressed as

The dimension is 80 dimensions. The expression features are used for indicating the facial expression in the target viewpoint head portrait image, such as the expression features can be expressed as +. >

The dimension may be 64 dimensions. The target face pose coefficient may be used to indicate a face pose in the target viewpoint avatar image, e.g., the target face pose coefficient is denoted as P, and its dimension may be 6 dimensions. The initial texture coefficient may be used to indicate a texture corresponding to the face shape determined based on the pre-trained three-dimensional face coefficient regressor, e.g., the initial texture coefficient may be expressed as +.>

The dimension may be 80 dimensions. The initial illumination coefficient may be used to indicate an illumination coefficient determined based on the pre-trained three-dimensional face coefficient regressor, and may be denoted as l, which has a dimension of 27 dimensions. It will be appreciated that the initial texture coefficients and initial illumination coefficients may not fit well, and that texture maps and predicted illumination coefficients of the reconstructed initial three-dimensional face shape may then be determined based on other means for subsequent model rendering. The pretrained three-dimensional face coefficient regressor may be a neural network model, for example, may be a CNN model (convolutional neural network model), which is not limited herein.

For example, determining the initial three-dimensional face shape based on the identity features, the expression features in the initial three-dimensional face parameters may be represented by the following formula (formula 1):

Equation 1

Wherein, the liquid crystal display device comprises a liquid crystal display device,

representing an initial three-dimensional facial shape, < >>

Is the average face shape corresponding to the three-dimensional deformable head portrait model,/-for>

Is an identity feature->

Is an expression signature. This->

A PCA (principal component analysis) base, also referred to as a first principal component analysis base, for indicating identity feature correspondence; this->

A PCA (principal component analysis) basis, also called a second principal component analysis basis, for indicating the correspondence of the expression profile.

It will be appreciated that the above-described process of determining an initial three-dimensional face shape is only one possible way, and the initial three-dimensional face shape may also be represented as a three-dimensional Mesh, e.g., the initial three-dimensional face shape represented by the three-dimensional Mesh may be determined based on a Pixel2Mesh model (a neural network model). The initial three-dimensional face shape may also be represented based on some other manner of determination based on a point cloud, voxel, depth map, etc., without limitation.

The target three-dimensional avatar mesh may be a mesh of an avatar that can be used in a business scenario, such as a three-dimensional avatar mesh that can be directly loaded in a game environment, also referred to as a game mesh (mesh). It may be appreciated that determining the target three-dimensional avatar mesh of the head portion of the target virtual object in the three-dimensional scene based on the initial three-dimensional face shape may determine the initial three-dimensional face mesh based on the initial three-dimensional face shape, and may further transmit the initial three-dimensional face mesh to the template mesh for avatar transfer, thereby obtaining the target three-dimensional avatar mesh of the head portion of the target virtual object.

Specifically, determining a target three-dimensional avatar mesh of a head portion of a target virtual object in a three-dimensional scene based on an initial three-dimensional face shape may include the steps of: determining an initial three-dimensional face mesh based on the initial three-dimensional face shape, and acquiring N first key points used for representing the head part of the target virtual object from M first mesh vertexes contained in the initial three-dimensional face mesh; n is a positive integer less than M; further, a template grid is obtained, and N second key points with the same key attribute as the N first key points are searched in the template grid; a first key point corresponds to a second key point; further, N first key points and N searched second key points form N key point pairs of a head part of the target virtual object, and when an offset function associated with M first grid vertexes is obtained based on fitting of the N key point pairs, offset corresponding to the M first grid vertexes contained in the initial three-dimensional face grid is determined based on the offset function; further, the target position information corresponding to each first grid vertex is determined based on the offset corresponding to each first grid vertex, the space point indicated by the target position information corresponding to each first grid vertex is determined to be a second grid vertex, and the target three-dimensional head portrait grid is determined based on M second grid vertices.

It is understood that the initial three-dimensional face mesh may be a three-dimensional mesh determined based on the initial three-dimensional face shape. Grid vertices in the initial three-dimensional face grid may be referred to as first grid vertices, and M first grid vertices may be included in the initial three-dimensional face grid, M being a positive integer. The M first mesh vertices may include first key points corresponding to N key attributes, that is, N first key points. The key attribute may be used to indicate some key location of the face, such as the left eye corner, right eye corner, left mouth corner, right mouth corner, nose tip, etc., without limitation. Optionally, the value of N may be 68, that is, the first key points corresponding to the 68 key attributes may be determined, that is, 68 first key points are obtained.

It will be appreciated that the initial three-dimensional face mesh may be determined from a representation of the initial three-dimensional face shape. For example, if the initial three-dimensional face shape is represented by the coordinate information of the plurality of vertices determined by the three-dimensional deformation head model, determining the initial three-dimensional face mesh based on the initial three-dimensional face shape may be to use the vertices in the initial three-dimensional face shape as mesh vertices and determine the initial three-dimensional face mesh, and the initial three-dimensional face shape may be processed based on, for example, an Alpha shape (a three-dimensional mesh generation method), a Ballpivoting (a three-dimensional mesh generation method), or a Poisson curved surface reconstruction (a three-dimensional mesh generation method), to obtain the initial three-dimensional face mesh, which is not limited herein. As another example, if the initial three-dimensional face shape is represented by a three-dimensional mesh determined by some neural network, the determination of the three-dimensional mesh indicated by the initial three-dimensional face shape may be directly made as the initial three-dimensional face mesh.

Optionally, M first mesh vertices in the initial three-dimensional face mesh are generated based on the identity features and the first principal component analysis basis of the target viewpoint head image, the expression features and the second principal component analysis basis of the target viewpoint head image, and the average face shape; the identity feature is used for indicating the role type of the target virtual object, and the expression feature is used for indicating the facial expression corresponding to the target viewpoint head image.

It can be appreciated that the method for generating M first mesh vertices based on the identity feature and the first principal component analysis basis of the target viewpoint head image, and the expression feature and the second principal component analysis basis of the target viewpoint head image may be described above, and is not described herein.

The template mesh may be a mesh template adapted to a virtual environment in a business scenario, for example in a game scenario, the template mesh may be a mesh template adapted to a game environment, also referred to as a game template mesh. It is understood that the template grid may include corresponding second keypoints of the N key attributes described above. The key point pairs may be key point pairs formed by a first key point and a second key point having the same key attribute, that is, in the initial three-dimensional face mesh and the template mesh, the first key point and the second key point having the same key attribute may form one key point pair, and N key point pairs may be formed.

The offset function may be used to indicate a fit resulting function for determining an offset for each first mesh vertex in the initial three-dimensional face mesh. The offset of the first mesh vertex is used to indicate the number of changes required to the position of the first mesh vertex when the head portrait is transferred, and it is understood that the offset corresponding to each first mesh vertex can be obtained by inputting the position information (also called coordinate information) of each first mesh vertex into the offset function. Alternatively, the offset function may be calculated by interpolation based on a radial basis function.

The target position information may be position information obtained by changing the position information (i.e., initial position information) of the first grid vertex based on the corresponding offset, so that the spatial point indicated by the target position information of the changed first grid vertex (i.e., the second grid vertex) may be the grid vertex in the target three-dimensional head portrait grid. It will be appreciated that each mesh vertex in the target three-dimensional avatar mesh may be referred to as a second mesh vertex, and one first mesh vertex may correspond to one second mesh vertex.

Optionally, the offset function is configured to instruct weighted summation of K radial basis functions, where each radial basis function is associated with a corresponding function weight; then, when the offset function associated with the M first mesh vertices is obtained based on the N keypoint pair fitting, determining the offsets corresponding to the M first mesh vertices included in the initial three-dimensional face mesh based on the offset function may include the steps of: acquiring an ith first grid vertex from M first grid vertices, and determining initial position information of the ith first grid vertex; i is a positive integer less than or equal to M; determining a function value of the ith first grid vertex for each radial basis function based on the initial position information of the ith first grid vertex and each radial basis function; and carrying out weighted summation processing on the function value of each radial basis function and the function weight corresponding to each radial basis function based on the ith first grid vertex, and taking the sum value obtained after the weighted summation processing as the offset corresponding to the ith first grid vertex.

It is understood that the initial position information is used to indicate position information of a first mesh vertex in the initial three-dimensional face mesh. It is understood that the ith first mesh vertex may be any one of the M first mesh vertices.

The radial basis functions (Radial basis function, RBF for short) are basis functions of a function space, and these basis functions are radial functions. The so-called radial function fulfils one such condition: the function value of x is the same for x equidistant around a certain fixed point C. It is understood that K may be a positive integer less than or equal to N.

For example, the K radial basis functions may be represented by the following formula (formula 2):

equation 2

expression representing the j-th radial basis function,/->

Representing radial basis functions +.>

Is arranged at the center point of the (c),

point x representing the input function and the center point of the radial basis function +.>

Such as euclidean distance). It will be appreciated that point x herein may be any first mesh vertex in the initial three-dimensional face mesh, i.e., may be initial position information for any first mesh vertex.

Further, the offset function can be expressed by the following formula (formula 3):

Equation 3

Where x represents initial position information of one first mesh vertex,

representing the offset of a first mesh vertex, j representing the j-th radial basis function, and K representing the total number of radial basis functions; />

Function weights representing the j-th radial basis function,/->

Represents the j-th radial basis function, +.>

And the position information of the center point of the jth radial basis function is represented.

Specifically, determining the target position information corresponding to each first mesh vertex based on the offset corresponding to each first mesh vertex, respectively, may include the following steps: acquiring an ith first grid vertex from M first grid vertices, and determining initial position information of the ith first grid vertex; i is a positive integer less than or equal to M; and determining target position information corresponding to each first grid vertex based on the sum of the initial position information of the ith first grid vertex and the corresponding offset of the ith first grid vertex.

For example, the target position information can be expressed by the following formula (formula 4):

equation 4

target position information representing the ith first mesh vertex,/th mesh vertex>

Represents the offset of the ith first mesh vertex,/->

Initial position information indicating the ith first mesh vertex.

Specifically, the fitting of the N keypoint pairs to obtain the offset function associated with the M first mesh vertices may include the following steps: performing alignment processing on the initial three-dimensional face grid and the template grid based on the position distances between the first key point and the second key point in each key point pair, and determining the position distances between the first key point and the second key point in each key point pair as target position distances when the sum of the position distances of N key point pairs is minimized; further, interpolation calculation is performed based on the target position distance of each key point pair, K radial basis functions and function weights corresponding to each radial basis function are determined, and offset functions associated with M first grid vertices are determined based on the K radial basis functions and the function weights corresponding to each radial basis function.

It will be appreciated that the alignment process described above is equivalent to minimizing the location distance for each keypoint pair, so that the linear least squares problem can be solved by minimizing the location distance of the keypoint pair between the template mesh and the initial three-dimensional face mesh to determine the functional weight of each radial basis function. It can be understood that the interpolation calculation is equivalent to fitting a smooth interpolation function based on function values corresponding to some known points, so that function values of other points can be calculated based on the function obtained by fitting, where the known points are the above-mentioned N first key points, and the corresponding function values can be the target distance between each key point pair, so that K radial basis functions and function weights corresponding to each radial basis function can be obtained by interpolation calculation, that is, fitting to obtain the above-mentioned offset function.

S102, performing image unfolding processing on the target viewpoint head portrait image based on the target three-dimensional head portrait grid to obtain a first target texture map corresponding to the target viewpoint head portrait image.

It is understood that the first target texture map may be a texture map with color information based on the unfolding of the target three-dimensional head portrait mesh. It can be understood that the image expansion processing is performed on the target viewpoint head portrait image based on the target three-dimensional head portrait mesh, that is, the target three-dimensional head portrait mesh is expanded to the UV space (texture space) to obtain a texture map corresponding to the target three-dimensional head portrait mesh, and color information of each point in the texture map is determined by using color information of each point of the face in the target viewpoint head portrait image to obtain the first target texture map.

It is understood that each second mesh vertex in the target three-dimensional avatar mesh may be associated with corresponding texture coordinates when the target three-dimensional avatar mesh is expanded into UV space. In addition, each second mesh vertex in the target three-dimensional avatar mesh corresponds to one first mesh vertex in the initial three-dimensional face mesh, i.e., corresponds to one vertex in the initial three-dimensional face shape. Optionally, the vertex in the initial three-dimensional face shape may have a mapping relationship with a two-dimensional point in the target viewpoint head image, for example, an affine matrix may be determined to represent by a gold standard algorithm, so that color information corresponding to a grid vertex in the target three-dimensional head image grid may be determined based on the mapping relationship between the grid vertex in the target three-dimensional head image grid and the two-dimensional point in the target viewpoint head image, further, texture coordinates corresponding to the grid vertex in the target three-dimensional head image grid and color information corresponding to the grid vertex in the target three-dimensional head image grid may be determined, that is, color information of a pixel corresponding to each texture coordinate may be determined, that is, image expansion processing is performed on the target viewpoint head image based on the target three-dimensional head image grid. Optionally, when determining the initial three-dimensional face shape, color information corresponding to each vertex in the initial three-dimensional face shape may be determined, where the color information may be determined based on texture coefficients extracted from the target viewpoint head image, so that color information of pixels corresponding to each texture coordinate may be determined based on texture coordinates corresponding to grid vertices in the target three-dimensional head image grid, that is, image expansion processing is performed on the target viewpoint head image based on the target three-dimensional head image grid.

For example, referring to fig. 3, fig. 3 is a schematic diagram illustrating an effect of a first target texture map according to an embodiment of the present application. As shown in fig. 3, after the target three-dimensional head portrait mesh 31a is determined, image expansion processing may be performed on the target viewpoint head portrait image 32a based on the target three-dimensional head portrait mesh, so as to obtain a corresponding first target texture map 33a. It will be appreciated that the grid cuts in the first target texture map 33a are only used to indicate correspondence to the grids in the target three-dimensional avatar grid and do not represent the presence of these grid cuts in the actual texture map.

S103, performing texture coding processing on the first target texture map to obtain texture coding features corresponding to the first target texture map, performing feature stitching processing on the texture coding features and the image coding features when the image coding features of the target viewpoint head portrait image are obtained to obtain texture image stitching features for performing texture decoding, and performing texture decoding processing on the texture image stitching features to obtain a second target texture map corresponding to the first target texture map; the texture quality of the second target texture map is better than the texture quality of the first target texture map.

It will be appreciated that the texture coding feature may be used to indicate the feature resulting from texture coding the first target texture map. The texture coding features are determined based on a target texture encoder in the target modeling network, that is, the target texture encoder may be used to texture encode the texture map. The target modeling network may be a network for modeling based on a head image of a virtual object, and the target modeling network may be a training of an initial modeling grid based on training data. The target texture encoder may be determined based on an initial texture encoder in the initial modeling network when training the initial modeling grid based on training data to obtain the target modeling network.

Optionally, performing texture coding processing on the first target texture map to obtain texture coding features corresponding to the first target texture map may include the following steps: and converting the first target texture map into a frequency domain space based on the pixel value of each pixel in the first target texture map to obtain frequency domain information corresponding to the first target texture map, and determining the frequency domain information corresponding to the first target texture map as texture coding features corresponding to the first target texture map.

Among them, it is understood that the frequency domain space may be a space with frequency as an argument. The frequency domain information of the texture map can be used to characterize the intensity (brightness/gray) change intensity of the image in the texture map, and for the image (such as the texture map), the edge part of the image is a sudden change part, and the change is quicker, so that the response is a high frequency component in the frequency domain, and the gentle change part of the image is a low frequency component. It will be appreciated that the first target texture map belongs to the color space, also referred to as the spatial domain, and refers to the image itself, and thus may act on pixels of the color space to convert the texture map from the color space to the frequency domain space.

For example, a method of converting the first target texture map to the frequency domain space to obtain corresponding frequency domain information may be represented by the following formula (formula 5):

Equation 5

texture frequency representing the first target texture map, < ->

Then the frequency domain information resulting from the conversion of the first target texture map into frequency domain space is represented. s denotes the pixel index in the texture map and n denotes the pixel dimension of the texture map;

pixel value representing a pixel with a pixel index s +.>

Representing the basis functions. Thereby, a conversion of the first target texture map into frequency domain space may be achieved to determine texture coding features of the first target texture map. />

The image coding feature may be used to indicate a feature obtained by performing image coding processing on the target viewpoint head portrait image. The image encoding characteristics are determined based on a target image encoder in the target modeling network, that is, the target image encoder may be used to image encode the avatar image. The target texture encoder may be determined based on an initial image encoder in the initial modeling network when training the initial modeling grid based on training data to obtain the target modeling network. Alternatively, the initial image encoder may be a neural network for image encoding or extraction, without limitation.

The texture image stitching feature may be a feature obtained by performing feature stitching processing based on the texture encoding feature and the image encoding feature. Optionally, the texture coding feature and the image coding feature are spliced, which can be understood as weighting the texture coding feature and the image coding feature, which is equivalent to realizing the effect fusion of the texture and the pixel value (such as RGB value) in the image.

The second target texture map is a texture map obtained by performing texture decoding processing on texture image stitching features. The second target texture map is determined based on a target texture decoder in the target modeling network, that is, the target texture decoder may be used to texture decode the texture image stitching feature. The target texture encoder may be determined based on an initial image encoder in the initial modeling network when training the initial modeling grid based on training data to obtain the target modeling network. It will be appreciated that the texture quality of the second target texture map is superior to the first target texture map, which may be characterized by sharpness, smoothness, reasonable shading and high light, integrity, etc. dimensions, as may be determined by some performance analysis tools.

Optionally, the texture image stitching feature is a feature in frequency domain space; then, performing texture decoding processing on the texture image stitching feature to obtain a second target texture map corresponding to the first target texture map, which may include the following steps: and acquiring a decoding bandwidth required for texture decoding, obtaining color space information corresponding to texture image splicing features when the texture image splicing features are converted from a frequency domain space to a color space based on a frequency range indicated by the decoding bandwidth, and determining a second target texture map based on the color space information.

It will be appreciated that this decoding bandwidth may be a range of frequencies required to transform texture image stitching features from frequency domain space to color space. It will be appreciated that for an image (such as a texture map), the main component of the image is the low frequency information, which forms the basic gray scale of the image, with less decision on the image structure; the intermediate frequency information determines the basic structure of the image, and forms the main edge structure of the image; the high frequency information forms the edges and details of the image and is a further enhancement of the image content on the intermediate frequency information. The more high frequency information the image, the more detailed features the image. If the image lacks high-frequency information, the image edge and the outline of the image are easy to be unclear, but if the high-frequency information in the image is too much, the image outline is easy to be too obvious, so that the decoding bandwidth needs to be selected to be a proper value, for example, the decoding bandwidth can be 18, the frequency range indicated by the decoding bandwidth is 0-18, thereby ensuring that the textures in the generated texture image are clear, the outline is not too obvious, and the purpose of smoothing the image is achieved.

The texture image stitching feature is converted from the frequency domain space to the color space, i.e., equivalent to determining the pixel value corresponding to each pixel, to obtain a second target texture map. The color space information may be used to indicate color information (e.g., RGB values) for each pixel of the decoded texture map. It can be appreciated that, as the texture image stitching feature performs effect fusion on the feature of the first target texture map on the texture and the feature of the target viewpoint head portrait image on the color, the texture map obtained by decoding based on the texture image stitching feature is better in color expression, and the texture map obtained by decoding is finer.

For example, the method of performing texture decoding processing on the texture image stitching feature to obtain the second target texture map may be represented by the following formula (formula 6):

equation 6

color space information representing a second target texture map,/->

Representing the above described decoding bandwidth. />

Representing texture image stitching features, s representing pixel index in texture map, +.>

Representing the basis functions. Thereby, a conversion of texture image stitching features into a color space may be achieved to determine a second target texture map.

It can be understood that, the target texture encoder performs texture encoding processing on the first target texture map to obtain texture encoding features, performs feature stitching processing on the texture encoding features and the image encoding features to obtain stitching features of a texture image, and performs decoding processing on the texture image stitching features by the target texture decoder to obtain the whole network of the second target texture map, which can be called as a micro-renderable network or a texture rendering network, and can obtain more refined texture maps (namely, the second target texture map) for rendering to obtain a target rendering head image, so as to generate a three-dimensional head model.

And S104, obtaining a target rendering head image based on the target three-dimensional head image grid and the second target texture map, and generating a three-dimensional head model of the head part of the target virtual object in the three-dimensional scene based on the obtained target rendering head image.

The target rendering avatar image may be an image rendered based on the target three-dimensional avatar mesh and the second target texture map. The target rendering head image is an image of a three-dimensional head model obtained based on the target viewpoint head image in the three-dimensional scene, namely, a displayed image of the three-dimensional head model of the reconstructed target virtual object under a specific angle. It can be understood that, in the rendering of the embodiment of the present application, the target rendering head image is rendered by adopting a differential rendering processing mode, and the differential rendering may refer to rendering based on a rendering method that is capable of being micro in the rendering process. It will be appreciated that the use of differential rendering is due to the fact that in training to obtain the target modeling network, differential rendering processing is used to enable gradient computation based on the rendered image, so as to facilitate gradient back-transfer to the network requiring parameter updating in the training, such as each network component in the target modeling network, such as the target image encoder, the target texture decoder, etc. described above.

The three-dimensional head model may be a three-dimensional model of the head portion reconstructed based on the target viewpoint head image described above. It will be appreciated that the three-dimensional head model in a three-dimensional scene may be rotated, and that rendering based on the target three-dimensional head portrait mesh and the second target texture map may result in images at different viewpoint angles, thereby generating a three-dimensional head model. It can be appreciated that, since the first target texture map obtained by expanding the target viewpoint head image is refined by the target modeling network (i.e., the first target texture map is encoded and the texture image stitching feature is decoded), the model rendering effect of the reconstructed three-dimensional head model is better than that of the three-dimensional model of the head part of the target virtual object.

Referring to fig. 4, fig. 4 is a flowchart of an image data processing method according to an embodiment of the present application. The method may be performed by the computer device described above. The method may comprise at least the following steps.

S401, acquiring a target viewpoint head image corresponding to a head part of a target virtual object in a three-dimensional scene, and determining a target three-dimensional head image grid of the head part of the target virtual object in the three-dimensional scene based on the initial three-dimensional face shape when reconstructing the initial three-dimensional face shape of the obtained target viewpoint head image.

And S402, performing image unfolding processing on the target viewpoint head portrait image based on the target three-dimensional head portrait grid to obtain a first target texture map corresponding to the target viewpoint head portrait image.

S403, performing texture coding processing on the first target texture map to obtain texture coding features corresponding to the first target texture map, performing feature stitching processing on the texture coding features and the image coding features when the image coding features of the target viewpoint head portrait image are obtained to obtain texture image stitching features for performing texture decoding, and performing texture decoding processing on the texture image stitching features to obtain a second target texture map corresponding to the first target texture map; the texture quality of the second target texture map is better than the texture quality of the first target texture map.

Steps S401 to S403 may refer to the descriptions related to steps S101 to S103, and are not described herein.

S404, carrying out illumination prediction processing on the image coding features based on a target illumination regressor in the target modeling network to obtain target prediction illumination coefficients.

Wherein it is understood that as described above, the image encoding features are determined based on a target image encoder in the target modeling network, the texture encoding features are determined based on a target texture encoder in the target modeling network, and the second target texture map is determined based on a target texture decoder in the target modeling network. The target modeling network can also comprise a target illumination regressor, and the target illumination regressor can be used for carrying out illumination prediction processing on the image coding features to obtain target prediction illumination coefficients.

The target predicted illumination coefficient may be illumination information based on predictions in image coding features. It will be appreciated that the target predicted illumination coefficients are determined based on a target illumination regressor in the target modeling network, that is, the target illumination regressor may be configured to perform an illumination prediction process based on the image encoding features to obtain the predicted illumination coefficients. The target illumination regressor may be determined based on an initial illumination regressor in the initial modeling network when training the initial modeling grid based on the training data to obtain the target modeling network. Through training of training data on an initial modeling network, a target prediction illumination coefficient can be accurately determined through a target illumination regressor, so that illumination can be conveniently carried out based on the target prediction illumination coefficient in a subsequent rendering process, and the brightness change and shadow casting of the surface of the three-dimensional model are rendered, thereby effectively eliminating unnecessary illumination components in an input target viewpoint head image, reducing the influence of illumination on textures, enabling the face in the rendering head image to be smoother, enabling the color to be more vivid, and enabling the model rendering effect of the generated three-dimensional head model to be better.

And S405, when the target face pose coefficient associated with the initial three-dimensional face shape is acquired, performing differential rendering processing on the target three-dimensional head portrait grid, the target face pose coefficient, the target predicted illumination coefficient and the second target texture map to obtain a target rendering head portrait image, and generating a three-dimensional head model of the head part of the target virtual object under the three-dimensional scene based on the obtained target rendering head portrait image.

It will be appreciated that, as described above, the target face pose coefficient may be used to indicate a face pose in the target viewpoint avatar image, e.g., the target face pose parameter may be represented as P, and its dimension may be 6 dimensions. Optionally, the target face pose coefficient may be determined by the above-mentioned pretrained three-dimensional face coefficient regressor, which is not described herein.

It will be appreciated that the target rendered avatar image may be an image rendered based on the target three-dimensional avatar mesh, the target face pose coefficient, the target predicted illumination coefficient, and the second target texture map. By rendering the face posture coefficients, the occurrence of drift in the expression of the generated three-dimensional head model can be avoided. The target predictive illumination coefficient is used for rendering, so that the face in the target rendering head portrait image obtained through rendering is smoother, the color is more lifelike, and the model rendering effect of the generated three-dimensional head model is better.

The three-dimensional head model may be a three-dimensional model of the head portion reconstructed based on the target viewpoint head image described above. It will be appreciated that images at different viewpoint angles may be rendered based on the target three-dimensional head model mesh, the target face pose coefficients, the target predicted illumination coefficients, and the second target texture map, as described above, to thereby generate a three-dimensional head model.

It will be appreciated that the process of reconstructing a three-dimensional head model of a target virtual object is described herein in connection with an illustration. Referring to fig. 5, fig. 5 is a schematic flow chart of a reconstruction process of a three-dimensional head model according to an embodiment of the present application. First, a target viewpoint head image 501 may be acquired, and it is understood that the target viewpoint head image 501 may be an image of a head portion of a target virtual object whose three-dimensional model display effect is not acceptable. The initial three-dimensional face shape 502 can be obtained based on reconstruction of the target viewpoint head portrait image, for example, the identity characteristic and the expression characteristic can be determined through a pre-trained three-dimensional face coefficient regressor, and then the initial three-dimensional face shape can be determined based on the identity characteristic and the expression characteristic based on calling the three-dimensional deformable head portrait model. Further, an initial three-dimensional face grid 503 may be determined based on the initial three-dimensional face shape, then head image transfer may be performed on the initial three-dimensional face grid 503 to obtain a target three-dimensional head image grid 504 that may be loaded into a virtual environment of the service scene, specifically, an offset function may be obtained by fitting a key point pair in the initial three-dimensional face grid and a template grid, so as to calculate and obtain an offset of each grid vertex in the initial three-dimensional face grid, so as to obtain new position information (i.e., target position information) of each grid vertex in the initial three-dimensional face grid, and further obtain a target three-dimensional head image grid. It will be appreciated that in determining the identity feature, the expressive feature based on the pre-trained three-dimensional face coefficient regressor, the target face pose coefficient 505 may also be determined based on the pre-trained three-dimensional face coefficient regressor. It can be appreciated that after the target three-dimensional avatar mesh 504 is obtained, the target three-dimensional avatar mesh 504 may perform image expansion processing on the target viewpoint avatar image to obtain the first target texture map 506, and the specific expansion manner is referred to the above description and will not be repeated herein.

Further, the target viewpoint avatar image 501 and the first target texture map 506 are input to the target modeling network 507. Target modeling network 507 may include a target image encoder 5071, a target texture encoder 5074, a target illumination regressor 5073, and a target texture decoder 5077. Specifically, the target viewpoint head image 501 is subjected to image encoding processing by a target image encoder 5071 to obtain image encoding features 5072, and then the image encoding features 5072 are input to a target illumination regressor for illumination prediction to obtain a target prediction illumination coefficient 508. And, the first target texture feature 506 is subjected to texture coding processing by the target texture encoder 5074 to obtain a texture coding feature 5075, then the image coding feature 5072 and the texture coding feature 5075 are subjected to feature stitching processing to obtain a texture image stitching feature 5076, and further the texture image stitching feature is subjected to texture decoding processing by the target texture decoder 5077 to obtain a second target texture map 509, and it can be understood that the texture quality of the second target texture map 509 is better than that of the first target texture map.

Further, the target three-dimensional avatar mesh 504, the target face pose coefficient 505, the target predicted illumination coefficient 508 generated based on the target illumination regressor 5073 in the target modeling network, and the second target texture map 509 may be input to the differentiable renderer 510, resulting in the target rendered avatar image 511. It will be appreciated that the target rendered avatar image may be reconstructed into an image of a three-dimensional head model.

In the embodiment of the application, the target three-dimensional head portrait grid of the head part of the target virtual object in the three-dimensional scene can be quickly reconstructed based on the input head image (such as the target viewpoint head portrait image), then the first target texture map obtained based on the expansion of the target three-dimensional head portrait grid can be subjected to encoding and decoding processing to obtain the texture map with better texture quality (namely the second target texture map), and then the target rendering head portrait image is rendered by combining some other parameters required by rendering processing, so that the three-dimensional head model of the target virtual object is generated. Thereby, the three-dimensional head model of the virtual object can be automatically reconstructed based on the head portrait image of the virtual object. In addition, the embodiment of the application can generate the texture map with better texture quality, so that the model rendering effect of the three-dimensional head model obtained by rendering can be improved.

Referring to fig. 6, fig. 6 is a flowchart illustrating a method for processing image data according to an embodiment of the present application. The method may be performed by the computer device described above. The method may comprise at least the following steps.

S601, acquiring a training viewpoint head image corresponding to a head part of a training virtual object in a three-dimensional scene, and determining a training three-dimensional head image grid of the head part of the training virtual object in the three-dimensional scene based on the training three-dimensional face shape when reconstructing the training three-dimensional face shape of the obtained training viewpoint head image.

It will be appreciated that the training virtual object may be a virtual object for acquiring training viewpoint avatar images. For example, in some scenarios, the training virtual object may be a virtual object that qualifies for display of a three-dimensional model of some heads in a gaming application.

It is understood that the training viewpoint head portrait image may be an image obtained under a training viewpoint for a head portion of a training virtual object. The training virtual object is constructed based on a three-dimensional model of the head. The training viewpoint may be a position of a camera for acquiring the head portrait image of the training viewpoint, and the training viewpoint may be a viewpoint of a face of any training virtual object that can be photographed, for example, the training viewpoint head portrait image may be obtained by photographing from a viewpoint in a direction of a front face, a left 30 degrees, or a right 30 degrees of a head portion of a three-dimensional model of an original training virtual object, which is not limited herein. It will be appreciated that, in general, the virtual object may further include decorations on the head, and the acquired head portrait image from the training point of view may also include the decorations, for example, a hat, hair, etc., which are not limited herein.

It will be appreciated that the training three-dimensional face shape may be used to characterize the shape (also known as shape) of the face in the training viewpoint head portrait image head in three-dimensional space after reconstruction. It can be understood that, the method for acquiring the training three-dimensional face shape may refer to the above description related to the target three-dimensional face shape, which is not described herein.

The training three-dimensional avatar mesh may be a mesh of avatars that can be used in a virtual environment in a business scenario, such as a three-dimensional avatar mesh that can be loaded directly in a game environment in a game scenario, also referred to as a game mesh (mesh). It can be appreciated that the training three-dimensional head portrait mesh of the head part of the training virtual object in the three-dimensional scene is determined based on the training three-dimensional face shape, the training three-dimensional face mesh can be determined based on the training three-dimensional face shape, and then the training three-dimensional face mesh can be transmitted to the template mesh for head portrait transfer, so that the training three-dimensional head portrait mesh of the head part of the training virtual object can be obtained. It can be appreciated that the method for determining the training three-dimensional avatar mesh may refer to the above description related to determining the target three-dimensional avatar mesh, which is not described herein.

S602, performing image unfolding processing on the training viewpoint head portrait image based on the training three-dimensional head portrait grid to obtain a first training texture map corresponding to the training viewpoint head portrait image.

It is understood that the first training texture map may be a texture map with color information based on training three-dimensional avatar mesh expansion. It will be appreciated that the method for determining the first training texture map may refer to the above description related to determining the first target texture map, which is not described herein.

S603, performing texture coding processing on the first training texture map based on an initial texture encoder in an initial modeling network to obtain training texture features corresponding to the first training texture map, performing feature stitching processing on the training texture features and the training image features when the training image features of the training viewpoint head image are obtained based on the initial image encoder in the initial modeling network to obtain training stitching features for performing texture decoding, and performing texture decoding processing on the training stitching features based on an initial texture decoder in the initial modeling network to obtain a second training texture map corresponding to the first training texture map.

It will be appreciated that the initial modeling network may be an untrained completed modeling network. It will be appreciated that the target modeling network may include a target image encoder, a target illumination regressor, a target texture encoder, and a target texture decoder, and then the target modeling network may correspondingly include an initial image encoder, an initial illumination regressor, an initial texture encoder, and an initial texture decoder.

The training texture feature may be used to indicate a feature resulting from texture encoding the first training texture map. The training texture features are determined based on an initial texture encoder in an initial modeling network. It will be appreciated that the method for determining training texture features may refer to the above description of determining texture coding features, and will not be described herein.

The training image features may be used to indicate features resulting from image encoding of the training viewpoint avatar image. The training image features are determined based on an initial image encoder in an initial modeling network. It will be appreciated that the method for determining the training image features may refer to the above description of the determined image coding features, which is not described herein.

The training stitching feature may be a feature obtained by performing feature stitching processing based on training texture features and training image features. It will be appreciated that the method for determining the training stitching feature may refer to the above description of the texture image stitching feature, which is not described herein.

The second training texture map is a texture map obtained by performing texture decoding processing on training splicing features. The second training texture map is determined based on an initial texture decoder in the initial modeling network. It will be appreciated that the method for determining the second training texture map may refer to the above description related to determining the second target texture map, which is not described herein.

And S604, obtaining a training rendering image based on the training three-dimensional head portrait grid and the second training texture map.

The training face pose coefficients may be used to indicate a face pose in the training viewpoint avatar image. It will be appreciated that the training face pose coefficients may refer to the above description of the target face pose coefficients, and will not be described herein.

The training predicted illumination coefficient may be based on illumination information predicted in the training image features. It will be appreciated that the training predicted illumination coefficients are determined based on an initial illumination regressor in the initial modeling network. It will be appreciated that the training predicted illumination coefficient may refer to the description related to the target predicted illumination coefficient, which is not described herein.

The training rendered image may be an image rendered based on a training three-dimensional avatar mesh, training face pose coefficients, training predicted illumination coefficients, and a second training texture map. It will be appreciated that the training viewpoint head image may include, in addition to a face area, some ornaments on the head (such as hairs and caps), and the training three-dimensional head mesh and the second training texture map obtained by the reconstruction are mesh and texture maps of only the head portion, but not mesh and texture maps of the ornaments of the head, so that the training rendering image may be an image obtained by differentially rendering a skin area in an image of the head based on the training three-dimensional head mesh, a training face pose coefficient, a training prediction illumination coefficient, and the second training texture map, and is equivalent to an image obtained by combining the content of the skin area in the training viewpoint head image based on the rendered image of the head and a portion of the non-skin area in the training viewpoint head image, and then when the distance between the training viewpoint head image and the training rendering image is performed, the training rendering image is calculated based on the distance between the skin areas in the two images.

S605, acquiring a reference texture map corresponding to the second training texture map, performing iterative training on the initial modeling network based on the reference texture map, the second training texture map, the training rendering image and the training viewpoint head image, and taking the initial modeling network after iterative training as a target modeling network.

It will be appreciated that the reference texture map may be a texture map that is required to be lossy calculated with a second training texture map during training of the initial modeling network. It will be appreciated that when the initial modeling network after the iterative training is taken as the target modeling network, the initial image encoder in the initial modeling network may be taken as the target image encoder in the target modeling network, the initial texture encoder in the initial modeling network may be taken as the target texture encoder in the target modeling network, the initial texture decoder in the initial modeling network may be taken as the target texture decoder in the target modeling network, and the initial illumination regressor in the initial modeling network may be taken as the target illumination regressor in the target modeling network.

It can be appreciated that in the embodiment of the present application, training the initial modeling network may be training based on a training manner of semi-supervised learning, so as to reduce the dependence on training data. Specifically, paired data may be used for supervised learning, and unlabeled data may be used for self-supervised learning. The network training strategy for training the initial modeling network based on the mode of supervised learning of the paired data may be a first training strategy, and the network training strategy for training the initial modeling network based on the mode of self-supervised learning of the unlabeled data may be a second training strategy. It will be appreciated that the method of obtaining the reference texture map may be different when training the initial modeling network with different training strategies.

Optionally, obtaining the reference texture map corresponding to the second training texture map may include the following steps: acquiring a network training strategy corresponding to an initial modeling network, extracting a skin region of a training viewpoint head portrait image from the training viewpoint head portrait image when the network training strategy is a first training strategy, and calculating the average color of the skin region of the training viewpoint head portrait image; further, a training three-dimensional head portrait grid of the head part of the training virtual object in the three-dimensional scene is determined based on the training viewpoint head portrait image, and image unfolding processing is carried out on the training viewpoint head portrait image based on the training three-dimensional head portrait grid, so that an initial texture map corresponding to the training viewpoint head portrait image is obtained; further, a template texture map is obtained, an average color is added to the template texture map, and the initial texture map and the template texture map added with the average color are subjected to mixed processing to obtain a mixed texture map; further, performing image optimization processing on the mixed texture map to obtain an optimized texture map of the training viewpoint head portrait image, and taking the optimized texture map as a reference texture map corresponding to the first training strategy.

It will be appreciated that the network training strategy may be a strategy for training an initial modeled network, such as may be a strategy for training based on self-supervised learning as described above, or may be a strategy for training based on supervised learning as described above. The first training strategy may then be a strategy that is trained based on supervised learning. It will be appreciated that when the network training strategy is the first training strategy, the training view head image needs to be associated with a paired texture map as the reference texture map herein.

It will be appreciated that this skin region may be used to indicate the region corresponding to the skin in the training viewpoint head portrait image. The skin region may be determined by a pre-trained head portrait segmentation network. It will be appreciated that a plurality of pixels may be included in a determined skin area, each pixel having a corresponding pixel value (also referred to as color information), and that the average color of the skin area may be derived based on an average of the pixel values of each pixel in the skin area.

It will be appreciated that the initial texture map may be a texture map obtained by performing image expansion processing on a training viewpoint head portrait image based on a training three-dimensional head portrait mesh. It will be appreciated that the method for determining the initial texture map may refer to the above description related to the acquisition of the first target texture map, which is not described herein.

It will be appreciated that the template texture map may be provided by a service provider (e.g., a game developer in a game scene) corresponding to the service scene, and the template texture map may have a number of preset template textures to represent the skin texture of the face. Adding the average color to the template texture map may result in a texture map having the average color and a preset template texture.

It will be appreciated that the hybrid texture map may be a texture map obtained by performing a hybrid process on an initial texture map and a template texture map to which an average color is added. It will be appreciated that the mixing process of the initial texture map and the template texture map with the average color added may be performed in a poisson mixing manner.

It will be appreciated that the optimized texture map may be a texture map obtained by performing an image optimization process on the basis of a hybrid texture map. It will be appreciated that image optimization of the hybrid texture map may result in better texture quality of the texture map, e.g., may result in a more complete hybrid texture map, higher definition, more appropriate contrast, etc., which is not shown here. For example, non-skin areas such as hair and glasses may be removed for the hybrid texture map and symmetry may be used as much as possible to repair occlusion areas. For another example, the blended texture map may be shaded and highlight repaired. It will be appreciated that, because the quality of the generated texture can be controlled manually, the workload of manually selecting the hybrid texture map for shading and highlight restoration is small, each texture map can be refined in only a few minutes, the shading and highlight restoration of the texture map can be performed manually, or the hybrid texture map can be restored based on some pre-trained texture restoration models, without limitation.

It will be appreciated that in the above manner, an RGB texture map set may be created, and the optimized texture map in the RGB texture map set may be paired with the created training viewpoint head image of the optimized texture map to facilitate supervised training of the initial modeling network based on the obtained paired data.

Optionally, obtaining the reference texture map corresponding to the second training texture map may include the following steps: and acquiring a network training strategy corresponding to the initial modeling network, and taking the first training texture map as a reference texture map corresponding to the second training strategy when the network training strategy is the second training strategy.

It will be appreciated that the second training strategy may then be a strategy that is trained based on self-supervised learning. It will be appreciated that when the network training strategy is the second training strategy, the training view head image need not be associated with a paired texture map, but may be directly based on the first training texture map of the training view head image as a reference texture map, such that the second training texture map may be approximated to the first training texture map during the training process.

Specifically, performing iterative training on the initial modeling network based on the reference texture map and the second training texture map, the training rendering image and the training viewpoint head image, and taking the initial modeling network after the iterative training as the target modeling network, may include the following steps: acquiring a skin area mask corresponding to the training viewpoint head portrait image, calculating an image distance between the training rendering image and the training viewpoint head portrait image based on the skin area mask, and determining first reference information based on the image distance; further, a texture map distance between the reference texture map and the second training texture map is calculated, and second reference information is determined based on the texture map distance; furthermore, iterative training is carried out on the initial modeling network based on the first reference information and the second reference information, and the initial modeling network after the iterative training is used as a target modeling network.

It will be appreciated that the skin region mask may be used to indicate the skin region of the training viewpoint head portrait image. It is to be appreciated that the skin region mask may be determined based on a pre-trained head portrait segmentation network. For example, a value corresponding to a skin area is determined as 1, and a value corresponding to a non-skin area is determined as 0, thereby obtaining a skin area mask.

It will be appreciated that the image distance may be used to indicate the distance between the training viewpoint avatar image and the training rendered image. It will be appreciated that the image distance is essentially the image distance of the skin region in the training viewpoint head image from the skin region in the training rendered image, due to the image distance determined based on the skin region mask. The first reference information may be a loss value determined based on the training viewpoint avatar image and the training rendering image.

For example, the first reference information may be defined as a pixel level L1 distance between the training viewpoint head image and the training rendering image, and the determination manner of the first reference information may be represented by the following formula (formula 7):

equation 7

Wherein I represents an input training viewpoint head portrait image, R represents a rendered training rendering image,

Representing the loss between the training view avatar image and the training rendered image, i.e. the first reference information. E denotes the pixel index, E denotes the number of pixels, then +.>

Representing the e-th pixel in the training view head portrait image,>

representing the e-th pixel in the training rendered image. />

Representing the skin area mask described above.

The texture map distance may be used to indicate a distance between the reference texture map and the second training texture map. The second reference information may be a loss value determined based on the reference texture map and the second training texture map.

For example, the second reference information may be defined as an L1 distance between the reference texture map and the second training texture map, and then the determination manner of the second reference information may be represented by the following formula (formula 8):

equation 8

Wherein F represents the second training texture map and G represents the reference texture map, then

Representing the second reference information. Q is the pixel index, Q takes on values of 0-Q, Q is the number of pixels of the texture map.

It will be appreciated that in iteratively training the initial modeling network based on the first reference information and the second reference information, the sum of the first reference information and the second reference information may be made progressively smaller, i.e. equivalent to progressively smaller the image distance between the training view header image and the training rendered image, and the texture map distance between the reference texture map and the second training texture map.

It can be appreciated that further, the embodiment of the application may further use countermeasure training in the training process, so as to promote the network effect of the target modeling network obtained by training. That is, two discriminators are introduced in the training process, one for discriminating whether the head portrait image is a real image or not, and one for discriminating whether the texture map is a real texture map or not. That is, the arbiter may be trained to determine whether the input image is real data (e.g., a training viewpoint head image or a first training texture map obtained by performing image expansion processing on the training viewpoint head image) or false data (e.g., a training rendering image generated based on the training modeling network or a second training texture map), while the training initial modeling network generates a training rendering image capable of spoofing the arbiter.

Specifically, performing iterative training on the initial modeling network based on the first reference information and the second reference information, and taking the initial modeling network after the iterative training as the target modeling network, the method may include the following steps: inputting the training viewpoint head portrait image into a first discriminator for discrimination processing to obtain a first probability for indicating that the training viewpoint head portrait image is a real image, and inputting the training rendering image into the first discriminator for discrimination processing to obtain a second probability for indicating that the training rendering image is a real image; further, determining third reference information based on the first probability and the second probability; further, inputting the reference texture map to a second discriminator for discrimination processing to obtain a third probability for indicating that the reference texture map is a true texture map, and inputting the second training texture map to the second discriminator for discrimination processing to obtain a fourth probability for indicating that the second training texture map is a true texture map; further, fourth reference information is determined based on the third probability and the fourth probability; furthermore, the first discriminator and the second discriminator are subjected to iterative training based on the third reference information and the fourth reference information, and the initial modeling network is subjected to iterative training based on the first reference information, the second probability and the fourth probability, so that the target modeling network is obtained.

The first discriminator may be configured to determine whether the input avatar image is a real image, i.e., determine a probability that the input avatar image is a real image. It will be appreciated that the real image is used to indicate a view head image that is not generated based on the modeling network, but is directly acquired.

The second arbiter may be configured to determine whether the input texture map is a real texture map, i.e. determine the probability that the texture map is a real texture map. It will be appreciated that the real texture map is used to indicate texture maps that are not generated based on a modeling network, but are directly obtained, such as texture maps obtained by expanding training viewpoint head images, optimizing texture maps, and the like.

The objective function of the countermeasure training by the first discriminator and the second discriminator can be expressed by the following formula (formula 9):

equation 9

and->

Representing an objective function. />

Representing the probability that the false data is real data, such as the probability that the training rendered image is a real image (i.e. the second probability), or the probability that the second training texture map is a real texture map (i.e. the fourth probability), then ∈ >

The probability that the dummy data is dummy data, i.e., the probability that the dummy data is not real data, may be expressed. />

The probability that the real data is represented, for example, the probability that the training viewpoint head portrait image is the real image (such as a first probability), or the probability that the reference texture map is the real texture map (such as a third probability). Wherein (1)>

，/>

Representation for head portrait mappingA first arbiter for image discrimination and a second arbiter for image discrimination,

a second arbiter for performing texture map discrimination is shown. E []Representation pair []Is also called quantization.

It will be appreciated that performing iterative training on the first and second discriminators based on the third and fourth reference information may be determining discrimination loss information based on a sum of the third and fourth reference information, which may be loss information for training the first and second discriminators. It can be understood that, with the training of the first discriminator and the second discriminator based on the discrimination loss information, the first probability (i.e., the probability that the training viewpoint head portrait image is a true image) in the third reference information may be gradually increased, that is, the probability that the training viewpoint head portrait image is false data is gradually decreased, and the second probability (i.e., the probability that the training rendering image is a true image) in the third reference information is gradually decreased, that is, the probability that the training rendering image is false data is gradually increased; and, the third probability in the fourth reference information (i.e., the probability that the reference texture map is the real texture map) may be gradually increased, that is, the probability that the reference texture map is the false data may be gradually decreased, and the fourth probability in the fourth reference information (i.e., the probability that the second training texture map is the real image) may be gradually decreased, that is, the probability that the second training texture map is the false data may be gradually increased, so that the arbiter may more accurately discriminate the real data from the false data.

It may be appreciated that performing iterative training on the initial modeling network based on the first reference information, the second probability, and the fourth probability to obtain the target modeling network may determine modeling network loss information based on a sum of the first reference information, the second probability, and the fourth probability, where the modeling network loss information may be loss information for training on the initial modeling network. It will be appreciated that as the initial modeling network is trained based on modeling network loss information, the first reference information and the second reference information may be progressively reduced, i.e., the distance between the training rendered image and the training viewpoint head portrait image is reduced, and the distance between the second training texture map and the reference texture map is reduced. In addition, the second probability (i.e., the probability that the training rendering image is a real image) can be gradually increased, that is, the probability that the training rendering image obtained based on the arbiter is false data is gradually decreased, and the fourth probability (the probability that the second training texture map is a real texture map) obtained based on the arbiter can be gradually increased, that is, the probability that the second training texture map is false data is gradually decreased, so as to achieve the purpose that the generated head image and the texture map can confuse the arbiter.

It will be appreciated that with the loss function defined above, the final loss function for training the first and second discriminators can be represented by the following equation (equation 10):

equation 10

representing a loss function training the arbiter, < +.>

Is the corresponding weight. />

A loss value, which is generated by the above formula 9 and indicates the probability of inputting the training rendering image R and the training viewpoint head image R to the first discriminator, +.>

The loss value generated by the above equation 9, representing the probability of the second training texture map F and the reference texture map G being input to the second arbiter.

It will be appreciated that with the loss function defined above, the final loss function for training the initial modeling network can be represented by the following equation (equation 11):

equation 11

representing a loss function training the initial modeling network, < ->

(e.g.)>

、/>

) Is the corresponding weight that balances the different penalty terms. />

And->

Reference is made to the above description. />

The loss value generated by the above equation 9, representing the probability of inputting the training rendering image R into the first discriminator. />

The loss value generated by the above equation 9, representing the probability of the second training texture map F being input to the second arbiter. It will be appreciated that as training proceeds, the initial modeling network is expected to generate a head image or texture map that confuses the discriminant in order to combat the discriminant, i.e., the initial modeling network is able to be trained such that the probability of the generated training rendered image being input to the discriminant for indicating that the training rendered image is spurious (i.e., the probability of the discriminant predicting that the generated training rendered image is true) is lower and lower, and the second generated training texture map is input The probability of the second training texture map obtained by the input arbiter for indicating the second training texture map as false data is lower and lower (i.e. the probability of the second training texture map obtained by the arbiter for predicting the second training texture map as a real texture map is higher and higher).

It will be appreciated that during the training process, the aim is to address

The problem is solved, so that the training purpose is achieved, that is, the modeling network can generate a training rendering image and a second training texture map which are predicted by the discriminator to be real data and have higher probability of discriminating the real data into the real data and higher probability of discriminating the false data into the false data in the training process, so that the countermeasure is formed. It will be appreciated that the network components that need to be optimized in the training process can be trained in an end-to-end fashion, i.e., the initial modeling network can be trained end-to-end. In the self-supervision learning process, the training viewpoint head portrait images are not matched with the corresponding optimized texture mapping, and then the loss items related to the optimized texture mapping are simply ignored in the training process.

It can be understood that after the modeling network loss information is obtained, the gradient can be calculated based on the modeling network loss information, and the embodiment of the application adopts the differential rendering method to render to obtain the rendered head portrait image, so that the gradient can be calculated, and the gradient is reversely propagated to each module needing parameter updating in the initial modeling network, such as the initial image encoder, the initial texture decoder and the initial illumination regressor, thereby completing the training of the initial modeling network.

Referring to fig. 7, fig. 7 is a schematic flow chart of a training process of an initial modeling network according to an embodiment of the present application. First, a training viewpoint avatar image 701, such as the image shown as 701a, may be acquired, and it may be appreciated that the training viewpoint avatar image 701 may display an image of the head portion of the training virtual object that is qualified for some three-dimensional models. The training three-dimensional face shape 702 can be obtained based on the training viewpoint head portrait image reconstruction, for example, the identity characteristic and the expression characteristic can be determined by a pre-trained three-dimensional face coefficient regressor, and the training three-dimensional face shape can be determined based on the identity characteristic and the expression characteristic based on calling the three-dimensional deformable head portrait model. Further, the training three-dimensional face grid 703 may be determined based on the training three-dimensional face shape 702, and then the head portrait transfer is performed on the training three-dimensional face grid 703 to obtain a training three-dimensional head portrait grid 704 which can be loaded into a virtual environment (such as a game environment) under a service scene, for example, a three-dimensional head portrait grid shown by 704a, specifically, an offset function may be obtained by fitting the training three-dimensional face grid with a key point pair in a template grid, so as to calculate and obtain an offset of each grid vertex in the training three-dimensional face grid, so as to obtain new position information (i.e., training position information) of each grid vertex in the training three-dimensional face grid, and further obtain the training three-dimensional head portrait grid. It will be appreciated that in determining the identity feature, the expression feature based on the pre-trained three-dimensional face coefficient regressor, the training face pose coefficients 705 may also be determined based on the pre-trained three-dimensional face coefficient regressor. It can be appreciated that after the training three-dimensional avatar mesh 704 is obtained, the training viewpoint avatar image may be subjected to image expansion processing by the training three-dimensional avatar mesh 704 to obtain a first training texture map 706, for example, a texture map shown as 706a, and the specific expansion manner is referred to the above description, which is not repeated herein.

Further, the training viewpoint avatar image 701 and the first training texture map 706 are input to the initial modeling network 707. An initial image encoder 7071, an initial texture encoder 7074, an initial illumination regressor 7073, and an initial texture decoder 7077 may be included in the initial modeling network 707. Specifically, the training viewpoint head portrait image 701 is subjected to image coding processing by an initial image encoder 7071 to obtain training image features 7072, and then the training image features 7072 are input to an initial illumination regressor for illumination prediction to obtain training prediction illumination coefficients 708. And, the first training texture feature 706 is subjected to texture coding processing by the initial texture encoder 7074 to obtain a training texture feature 7075, then the training image feature 7072 and the training texture feature 7075 are subjected to feature stitching processing to obtain a training stitching feature 7076, and further the training stitching feature is subjected to texture decoding processing by the initial texture decoder 7077 to obtain a second training texture map 709, for example, a texture map shown as 709 a.

Further, the training three-dimensional avatar mesh 704, the training face pose coefficients 705, and the training predicted illumination coefficients 708 generated based on the initial modeling network, as well as the second training texture map 709, may be input to the differentiable renderer 710, resulting in a training rendered image 711, such as the image shown at 711 a.

Further, after determining the training rendered texture map, a determination may be made of loss information for parameter updating the initial modeling network. The embodiment of the application also introduces a discriminator for countermeasure training. Referring to fig. 8, fig. 8 is a schematic diagram illustrating an effect of a discriminator according to the embodiment of the application. It will be appreciated that as shown in fig. 8, a training rendered image 801 and a training viewpoint avatar image 802 may be input to a first discriminator 803 to determine whether the input image is a real image 804. The second training texture map 805 and the reference texture map 806 may be input to a second arbiter 807 to determine whether the input texture map is a real texture map 808. How to calculate the modeling network loss information and the arbiter loss information can refer to the above description, and will not be repeated here. Therefore, network parameters of the initial modeling network can be updated, so that the initial modeling network can generate a refined texture map, and the display effect of the three-dimensional model obtained through reconstruction is improved.

S606, acquiring a target viewpoint head image corresponding to a head part of a target virtual object in the three-dimensional scene, and determining a target three-dimensional head image grid of the head part of the target virtual object in the three-dimensional scene based on the initial three-dimensional face shape when reconstructing the initial three-dimensional face shape of the obtained target viewpoint head image.

S607, performing image unfolding processing on the target viewpoint head portrait image based on the target three-dimensional head portrait grid to obtain a first target texture map corresponding to the target viewpoint head portrait image.

S608, performing texture coding processing on the first target texture map to obtain texture coding features corresponding to the first target texture map, performing feature stitching processing on the texture coding features and the image coding features when the image coding features of the target viewpoint head portrait image are obtained to obtain texture image stitching features for performing texture decoding, and performing texture decoding processing on the texture image stitching features to obtain a second target texture map corresponding to the first target texture map; the texture quality of the second target texture map is better than the texture quality of the first target texture map.

S609, obtaining a target rendering head image based on the target three-dimensional head image grid and the second target texture map, and generating a three-dimensional head model of the head part of the target virtual object under the three-dimensional scene based on the obtained target rendering head image.

It will be appreciated that the steps S606-S609 may refer to the descriptions related to the steps S101-S104, which are not described herein.

Further, the image data processing method provided by the application is described herein in connection with a business scenario. Referring to fig. 9, fig. 9 is a flowchart illustrating a method for processing image data according to an embodiment of the present application. The method may be performed by the computer device described above. The method may comprise at least the following steps.

S901, acquiring a target viewpoint head image corresponding to a head part of a target virtual object in a three-dimensional scene, and determining a target three-dimensional head image grid of the head part of the target virtual object in the three-dimensional scene based on the initial three-dimensional face shape when reconstructing the initial three-dimensional face shape of the obtained target viewpoint head image.

It will be appreciated that when the traffic scenario is a game scenario, the three-dimensional scenario may be used to model three-dimensionally the game characters indicated by the virtual objects, i.e. the virtual objects in the game scenario may be game characters in particular. It will be appreciated that the target virtual object may be a virtual object with unqualified modeling of a three-dimensional model of some head parts in the game application, and further, a target viewpoint head image of the head part of the target virtual object may be obtained, so that the three-dimensional head model of the target virtual object is reconstructed based on the target viewpoint head image.

It will be appreciated that when the business scenario is a game scenario, the target three-dimensional avatar mesh may be a three-dimensional avatar mesh that can be used directly in the game environment. Specifically, an initial three-dimensional face grid can be determined based on the initial three-dimensional face shape, then a game template grid is obtained, and further head portrait transfer processing is performed on the initial three-dimensional face grid based on the game template grid, so that the target three-dimensional head portrait grid is obtained.

It is understood that the three-dimensional scene may be used to model the avatar in three dimensions when the traffic scene generates a scene for the avatar. The avatar is an avatar virtually created by computer graphics, such as a human avatar, a anthropomorphic avatar, an animal avatar, etc., and is not limited herein, such as an avatar anchor, an avatar singer, etc.

It can be understood that the target virtual object can be an avatar with unqualified modeling effect of the three-dimensional model of some head parts, and further a target viewpoint head image of the head part of the target virtual object can be obtained, so that the three-dimensional head model of the target virtual object is reconstructed based on the target viewpoint head image, and the modeling rendering effect of the virtual object corresponding to the avatar is optimized.

It is understood that the target three-dimensional avatar mesh may be a three-dimensional avatar mesh that can be directly used in an avatar application environment when a service scene generates a scene for an avatar. Specifically, an initial three-dimensional face mesh may be determined based on the initial three-dimensional face shape, then an avatar template mesh may be obtained, and further, head portrait transfer processing may be performed on the initial three-dimensional face mesh based on the avatar template mesh, to obtain the target three-dimensional head portrait mesh.

And S902, performing image unfolding processing on the target viewpoint head portrait image based on the target three-dimensional head portrait grid to obtain a first target texture map corresponding to the target viewpoint head portrait image.

S903, performing texture coding processing on the first target texture map to obtain texture coding features corresponding to the first target texture map, performing feature stitching processing on the texture coding features and the image coding features when the image coding features of the target viewpoint head portrait image are obtained to obtain texture image stitching features for performing texture decoding, and performing texture decoding processing on the texture image stitching features to obtain a second target texture map corresponding to the first target texture map; the texture quality of the second target texture map is better than the texture quality of the first target texture map.

S904, obtaining a target rendering head image based on the target three-dimensional head image grid and the second target texture map, and generating a three-dimensional head model of the head part of the target virtual object in the three-dimensional scene based on the obtained target rendering head image.

It may be appreciated that in performing iterative training on the initial modeling network to obtain the target modeling network, the acquired training viewpoint head portrait image may be determined according to the service scenario. For example, when the service scene is a game scene, head portrait images of some virtual objects with qualified modeling rendering effects in the game application under a certain view point can be obtained as training view point head portrait images. For another example, when the service scene is an avatar generating scene, head portrait images of some modeling rendering effects qualified virtual objects in the avatar generating scene at a certain view point can be obtained as training view point head portrait images. Therefore, the initial modeling network can be subjected to iterative training through training the viewpoint head portrait image to obtain the target modeling network, so that the target modeling network can obtain finer texture patches, and the rendering effect of the three-dimensional head model obtained through reconstruction is improved.

Further, when the three-dimensional head model obtained by reconstruction reaches the target condition, a new virtual object is constructed by utilizing the three-dimensional head model obtained by reconstruction and the three-dimensional model of the other parts except the head part of the target virtual object, and the model rendering effect of the three-dimensional head model of the new virtual object is better than that of the three-dimensional model of the head part of the original target virtual object. It is understood that the target condition may be a condition for indicating that a new virtual object is constructed using the three-dimensional model reconstructed to obtain the three-dimensional head model and the three-dimensional model of the other portion of the target virtual object. For example, the target condition may be that the sharpness of the target rendered avatar image is greater than a certain threshold, and the sharpness of the target rendered avatar image is greater than a certain threshold, which is equivalent to that the quality of the texture of the three-dimensional head model reaches the requirement. For another example, the target condition may be that the similarity between the target rendering head portrait image and the target viewpoint head portrait image is greater than a certain threshold, which is equivalent to that the three-dimensional head model is similar to the structure of the initial three-dimensional head model, so as to avoid that the difference between the shape of the three-dimensional head model obtained by reconstruction and the initial three-dimensional head model is too large. Or other target conditions may be present, which will not be described in detail herein.

Further, the reconstructed new virtual object may be used to load the virtual environment of the service scene, that is, the reconstructed target three-dimensional head portrait mesh and the second target texture map are loaded into the virtual environment. For example, when the service scene is a game scene and the target virtual object is a three-dimensional model of the target role, the reconstructed new virtual object can replace the target virtual object to be loaded into the game environment as the target role, so that a more vivid game model is provided for a game player, and the game experience is improved.

For another example, when the service scene is a scene generated by the avatar, the target virtual object may be a three-dimensional model of the target avatar, and then the new virtual object obtained by reconstruction may be used as the avatar to be loaded into the avatar application environment instead of the target virtual object, thereby providing a more realistic avatar for the user.

Referring to fig. 10, fig. 10 is a schematic structural diagram of an image data processing apparatus according to an embodiment of the present application. As shown in fig. 10, the image data processing apparatus 1 may be a computer program (including program code) running on a computer device, for example, the image data processing apparatus 1 is an application software; it will be appreciated that the image data processing apparatus 1 may be adapted to perform the respective steps of the image data processing method provided in the embodiments of the present application. As shown in fig. 10, the image data processing apparatus 1 may include: a mesh generation module 11, a first texture generation module 12, a second texture generation module 13, a rendering module 14;

The grid generating module 11 is configured to acquire a target viewpoint head image corresponding to a head part of a target virtual object located in a three-dimensional scene, and determine a target three-dimensional head image grid of the head part of the target virtual object in the three-dimensional scene based on the initial three-dimensional face shape when reconstructing the initial three-dimensional face shape of the acquired target viewpoint head image;

the first texture generation module 12 is configured to perform image expansion processing on the target viewpoint head image based on the target three-dimensional head image grid, so as to obtain a first target texture map corresponding to the target viewpoint head image;

the second texture generating module 13 is configured to perform texture encoding processing on the first target texture map to obtain texture encoding features corresponding to the first target texture map, perform feature stitching processing on the texture encoding features and the image encoding features when obtaining image encoding features of the target viewpoint head portrait image, obtain texture image stitching features for performing texture decoding, and perform texture decoding processing on the texture image stitching features to obtain a second target texture map corresponding to the first target texture map; the texture quality of the second target texture map is better than the texture quality of the first target texture map;

The rendering module 14 is configured to obtain a target rendering head image based on the target three-dimensional head mesh and the second target texture map, and generate a three-dimensional head model of the head part of the target virtual object in the three-dimensional scene based on the obtained target rendering head image.

Wherein the image encoding features are determined based on a target image encoder in a target modeling network, the texture encoding features are determined based on a target texture encoder in the target modeling network, the second target texture map is determined based on a target texture decoder in the target modeling network, and the target predicted illumination coefficient is determined based on a target illumination regressor in the target modeling network;

the image data processing apparatus 1 further includes a modeling network training module 15, and the modeling network training module 15 includes: a training mesh generation unit 151, a first training texture unit 152, a second training texture unit 153, a training rendering unit 154, a training unit 155;

a training grid generating unit 151, configured to obtain a training viewpoint head image corresponding to a head portion of a training virtual object located in a three-dimensional scene, and determine a training three-dimensional head image grid of the head portion of the training virtual object in the three-dimensional scene based on the training three-dimensional face shape when reconstructing the training three-dimensional face shape of the obtained training viewpoint head image;

The first training texture unit 152 is configured to perform image expansion processing on the training viewpoint head image based on the training three-dimensional head image grid, so as to obtain a first training texture map corresponding to the training viewpoint head image;

the second training texture unit 153 is configured to perform texture encoding processing on the first training texture map based on an initial texture encoder in the initial modeling network to obtain training texture features corresponding to the first training texture map, and perform feature stitching processing on the training texture features and the training image features when the training image features of the training viewpoint head image are obtained based on the initial image encoder in the initial modeling network to obtain training stitching features for performing texture decoding, and perform texture decoding processing on the training stitching features based on an initial texture decoder in the initial modeling network to obtain a second training texture map corresponding to the first training texture map;

a training rendering unit 154 for obtaining a training rendering image based on the training three-dimensional head portrait mesh and the second training texture map;

the training unit 155 is configured to obtain a reference texture map corresponding to the second training texture map, iteratively train the initial modeling network based on the reference texture map and the second training texture map, the training rendering image, and the training viewpoint head portrait image, and take the initial modeling network after the iterative training as the target modeling network.

Wherein, training unit 155 further includes: a reference texture fetch subunit 1551;

the reference texture obtaining subunit 1551 is configured to obtain a network training policy corresponding to the initial modeling network, extract a skin area of the training viewpoint head portrait image from the training viewpoint head portrait image when the network training policy is the first training policy, and calculate an average color of the skin area of the training viewpoint head portrait image;

the reference texture obtaining subunit 1551 is further configured to determine a training three-dimensional head portrait mesh of a head portion of the training virtual object in the three-dimensional scene based on the training viewpoint head portrait image, and perform image expansion processing on the training viewpoint head portrait image based on the training three-dimensional head portrait mesh, so as to obtain an initial texture map corresponding to the training viewpoint head portrait image;

the reference texture obtaining subunit 1551 is further configured to obtain a template texture map, add an average color to the template texture map, and perform a mixing process on the initial texture map and the template texture map added with the average color to obtain a mixed texture map;

the reference texture obtaining subunit 1551 is further configured to perform image optimization processing on the hybrid texture map to obtain an optimized texture map of the training viewpoint head portrait image, and use the optimized texture map as a reference texture map corresponding to the first training strategy.

Wherein, the reference texture obtaining subunit 1551 is further configured to: and acquiring a network training strategy corresponding to the initial modeling network, and taking the first training texture map as a reference texture map corresponding to the second training strategy when the network training strategy is the second training strategy.

Wherein, training unit 155 further includes: a reference information acquisition subunit 1552;

a reference information acquisition subunit 1552, configured to:

acquiring a skin area mask corresponding to the training viewpoint head portrait image, calculating an image distance between the training rendering image and the training viewpoint head portrait image based on the skin area mask, and determining first reference information based on the image distance;

calculating a texture map distance between the reference texture map and the second training texture map, and determining second reference information based on the texture map distance;

and performing iterative training on the initial modeling network based on the first reference information and the second reference information, and taking the initial modeling network after the iterative training as a target modeling network.

The reference information obtaining subunit 1552 is further configured to:

inputting the training viewpoint head portrait image into a first discriminator for discrimination processing to obtain a first probability for indicating that the training viewpoint head portrait image is a real image, and inputting the training rendering image into the first discriminator for discrimination processing to obtain a second probability for indicating that the training rendering image is a real image;

Determining third reference information based on the first probability and the second probability;

inputting the reference texture map to a second discriminator for discrimination processing to obtain a third probability for indicating that the reference texture map is a real texture map, and inputting the second training texture map to the second discriminator for discrimination processing to obtain a fourth probability for indicating that the second training texture map is a real texture map;

determining fourth reference information based on the third probability and the fourth probability;

and performing iterative training on the first discriminator and the second discriminator based on the third reference information and the fourth reference information, and performing iterative training on the initial modeling network based on the first reference information, the second probability and the fourth probability to obtain the target modeling network.

Wherein the image encoding feature is determined based on a target image encoder in the target modeling network, the texture encoding feature is determined based on a target texture encoder in the target modeling network, and the second target texture map is determined based on a target texture decoder in the target modeling network;

rendering module 14, further for:

performing illumination prediction processing on the image coding features based on a target illumination regressive in a target modeling network to obtain target prediction illumination coefficients;

And when the target face pose coefficient associated with the initial three-dimensional face shape is acquired, performing differential rendering processing on the target three-dimensional head portrait grid, the target face pose coefficient, the target predicted illumination coefficient and the second target texture map to obtain a target rendering head portrait image.

The second texture generating module 13 may include: a texture encoding unit 131;

the texture coding unit 131 is configured to convert the first target texture map into a frequency domain space based on a pixel value of each pixel in the first target texture map, obtain frequency domain information corresponding to the first target texture map, and determine the frequency domain information corresponding to the first target texture map as a texture coding feature corresponding to the first target texture map.

The second texture generating module 13 may include: a texture decoding unit 132;

the texture decoding unit 132 is configured to obtain a decoding bandwidth required for performing texture decoding, obtain color space information corresponding to texture image stitching features when converting the texture image stitching features from a frequency domain space to a color space based on a frequency range indicated by the decoding bandwidth, and determine a second target texture map based on the color space information.

Wherein the grid generating module 11 comprises: an initial grid acquisition unit 111, a template grid acquisition unit 112, an offset calculation unit 113, a transfer unit 114;

an initial mesh acquisition unit 111 for determining an initial three-dimensional face mesh based on the initial three-dimensional face shape, acquiring N first key points for characterizing a head portion of the target virtual object from M first mesh vertices included in the initial three-dimensional face mesh; n is a positive integer less than M;

a template grid acquiring unit 112, configured to acquire a template grid, and search N second key points having the same key attribute as the N first key points in the template grid; a first key point corresponds to a second key point;

an offset calculating unit 113, configured to construct N key point pairs of the head part of the target virtual object from the N first key points and the N second key points, and determine, when an offset function associated with M first mesh vertices is obtained based on fitting the N key point pairs, offsets corresponding to the M first mesh vertices included in the initial three-dimensional face mesh based on the offset function;

and a transferring unit 114, configured to determine, based on the offset corresponding to each first grid vertex, target position information corresponding to each first grid vertex, determine, as second grid vertices, a spatial point indicated by the target position information corresponding to each first grid vertex, and determine, based on the M second grid vertices, a target three-dimensional head portrait grid.

The offset function is used for indicating weighted summation of K radial basis functions, and each radial basis function is associated with a corresponding function weight;

an offset calculation unit 113 for:

acquiring an ith first grid vertex from M first grid vertices, and determining initial position information of the ith first grid vertex; i is a positive integer less than or equal to M;

determining a function value of the ith first grid vertex for each radial basis function based on the initial position information of the ith first grid vertex and each radial basis function;

and carrying out weighted summation processing on the function value of each radial basis function and the function weight corresponding to each radial basis function based on the ith first grid vertex, and taking the sum value obtained after the weighted summation processing as the offset corresponding to the ith first grid vertex.

Wherein, the transfer unit 114 is used for:

and determining target position information corresponding to each first grid vertex based on the sum of the initial position information of the ith first grid vertex and the corresponding offset of the ith first grid vertex.

Wherein, the offset calculating unit 113 is used for:

performing alignment processing on the initial three-dimensional face grid and the template grid based on the position distances between the first key point and the second key point in each key point pair, and determining the position distances between the first key point and the second key point in each key point pair as target position distances when the sum of the position distances of N key point pairs is minimized;

and carrying out interpolation calculation based on the target position distance of each key point pair, determining K radial basis functions and function weights corresponding to each radial basis function, and determining offset functions associated with M first grid vertexes based on the K radial basis functions and the function weights corresponding to each radial basis function.

Wherein M first mesh vertices in the initial three-dimensional face mesh are generated based on the identity features and first principal component analysis basis of the target viewpoint head image, the expression features and second principal component analysis basis of the target viewpoint head image, and the average face shape; the identity feature is used for indicating the role type of the target virtual object, and the expression feature is used for indicating the facial expression corresponding to the target viewpoint head image.

Referring to fig. 11, fig. 11 is a schematic structural diagram of a computer device according to an embodiment of the present application. As shown in fig. 11, the computer device 1000 may include: processor 1001, network interface 1004, and memory 1005, and in addition, the above-described computer device 1000 may further include: a user interface 1003, and at least one communication bus 1002. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display (Display), a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface, among others. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one disk memory. The memory 1005 may also optionally be at least one storage device located remotely from the processor 1001. As shown in fig. 11, an operating system, a network communication module, a user interface module, and a device control application may be included in the memory 1005, which is one type of computer-readable storage medium.

In the computer device 1000 shown in FIG. 11, the network interface 1004 may provide network communication functions; while user interface 1003 is primarily used as an interface for providing input to a user; the processor 1001 may be configured to invoke the device control application stored in the memory 1005 to execute the description of the data processing method in any of the foregoing embodiments, which is not described herein. In addition, the description of the beneficial effects of the same method is omitted.

Furthermore, it should be noted here that: the embodiments of the present application further provide a computer readable storage medium, in which the aforementioned computer program executed by the data processing apparatus 1 is stored, and the computer program includes program instructions, when executed by a processor, can execute the description of the data processing method in the foregoing embodiments, and therefore, a detailed description will not be given here. In addition, the description of the beneficial effects of the same method is omitted. For technical details not disclosed in the embodiments of the computer-readable storage medium according to the present application, please refer to the description of the method embodiments of the present application.

The computer readable storage medium may be the data processing apparatus provided in any one of the foregoing embodiments or an internal storage unit of the computer device, for example, a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) card, a flash card (flash card) or the like, which are provided on the computer device. Further, the computer-readable storage medium may also include both internal storage units and external storage devices of the computer device. The computer-readable storage medium is used to store the computer program and other programs and data required by the computer device. The computer-readable storage medium may also be used to temporarily store data that has been output or is to be output.

Furthermore, it should be noted here that: embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium and executes the computer instructions to cause the computer device to perform the method provided by any of the corresponding embodiments described above. In addition, the description of the beneficial effects of the same method is omitted. For technical details not disclosed in the computer program product or the computer program embodiments related to the present application, please refer to the description of the method embodiments of the present application.

The terms first, second and the like in the description and in the claims and drawings of the embodiments of the present application are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the term "include" and any variations thereof is intended to cover a non-exclusive inclusion. For example, a process, method, apparatus, article, or device that comprises a list of steps or elements is not limited to the list of steps or modules but may, in the alternative, include other steps or modules not listed or inherent to such process, method, apparatus, article, or device.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The foregoing disclosure is only illustrative of the preferred embodiments of the present application and is not intended to limit the scope of the claims herein, as the equivalent of the claims herein shall be construed to fall within the scope of the claims herein.

Claims

1. A method of processing image data, the method comprising:

acquiring a target viewpoint head image corresponding to a head part of a target virtual object in a three-dimensional scene, and determining a target three-dimensional head image grid of the head part of the target virtual object in the three-dimensional scene based on an initial three-dimensional face shape of the reconstructed target viewpoint head image when the initial three-dimensional face shape of the target viewpoint head image is reconstructed;

2. The method of claim 1, wherein the image encoding features are determined based on a target image encoder in a target modeling network, the texture encoding features are determined based on a target texture encoder in the target modeling network, and the second target texture map is determined based on a target texture decoder in the target modeling network;

The method further comprises the steps of:

acquiring a training viewpoint head image corresponding to a head part of a training virtual object in a three-dimensional scene, and determining a training three-dimensional head image grid of the head part of the training virtual object in the three-dimensional scene based on the training three-dimensional face shape when reconstructing the training three-dimensional face shape of the obtained training viewpoint head image;

performing image unfolding processing on the training viewpoint head portrait image based on the training three-dimensional head portrait grid to obtain a first training texture map corresponding to the training viewpoint head portrait image;

performing texture coding processing on the first training texture map based on an initial texture encoder in an initial modeling network to obtain training texture features corresponding to the first training texture map, performing feature stitching processing on the training texture features and the training image features when training image features of the training viewpoint head image are obtained based on the initial image encoder in the initial modeling network to obtain training stitching features for performing texture decoding, and performing texture decoding processing on the training stitching features based on an initial texture decoder in the initial modeling network to obtain second training texture map corresponding to the first training texture map;

Obtaining a training rendering image based on the training three-dimensional head portrait grid and the second training texture map;

and acquiring a reference texture map corresponding to the second training texture map, performing iterative training on the initial modeling network based on the reference texture map, the second training texture map, the training rendering image and the training viewpoint head image, and taking the initial modeling network after iterative training as the target modeling network.

3. The method of claim 2, wherein the obtaining the reference texture map corresponding to the second training texture map comprises:

acquiring a network training strategy corresponding to the initial modeling network, extracting a skin area of the training viewpoint head portrait image from the training viewpoint head portrait image when the network training strategy is a first training strategy, and calculating the average color of the skin area of the training viewpoint head portrait image;

determining a training three-dimensional head portrait grid of the head part of the training virtual object in the three-dimensional scene based on the training viewpoint head portrait image, and performing image unfolding processing on the training viewpoint head portrait image based on the training three-dimensional head portrait grid to obtain an initial texture map corresponding to the training viewpoint head portrait image;

Obtaining a template texture map, adding the average color to the template texture map, and carrying out mixing treatment on the initial texture map and the template texture map added with the average color to obtain a mixed texture map;

and performing image optimization processing on the mixed texture map to obtain an optimized texture map of the training viewpoint head portrait image, and taking the optimized texture map as a reference texture map corresponding to the first training strategy.

4. The method of claim 2, wherein the obtaining the reference texture map corresponding to the second training texture map comprises:

and acquiring a network training strategy corresponding to the initial modeling network, and taking the first training texture map as a reference texture map corresponding to a second training strategy when the network training strategy is the second training strategy.

5. The method of claim 2, wherein iteratively training the initial modeling network based on the reference texture map and the second training texture map, the training rendering image, and the training viewpoint head image, and taking the iteratively trained initial modeling network as the target modeling network, comprises:

and carrying out iterative training on the initial modeling network based on the first reference information and the second reference information, and taking the initial modeling network after iterative training as the target modeling network.

6. The method of claim 5, wherein iteratively training the initial modeling network based on the first reference information and the second reference information and using the iteratively trained initial modeling network as the target modeling network comprises:

inputting the training viewpoint head portrait image to a first discriminator for discrimination processing to obtain a first probability for indicating that the training viewpoint head portrait image is a real image, and inputting the training rendering image to the first discriminator for discrimination processing to obtain a second probability for indicating that the training rendering image is a real image;

7. The method of claim 1, wherein the image encoding features are determined based on a target image encoder in a target modeling network, the texture encoding features are determined based on a target texture encoder in the target modeling network, and the second target texture map is determined based on a target texture decoder in the target modeling network;

The obtaining a target rendering head portrait image based on the target three-dimensional head portrait grid and the second target texture map includes:

performing illumination prediction processing on the image coding features based on a target illumination regressor in the target modeling network to obtain target prediction illumination coefficients;

8. The method according to claim 1, wherein the performing texture coding processing on the first target texture map to obtain texture coding features corresponding to the first target texture map includes:

and converting the first target texture map into a frequency domain space based on the pixel value of each pixel in the first target texture map to obtain frequency domain information corresponding to the first target texture map, and determining the frequency domain information corresponding to the first target texture map as texture coding features corresponding to the first target texture map.

9. The method of claim 1, wherein the texture image stitching feature is a feature in frequency domain space; and performing texture decoding processing on the texture image stitching features to obtain a second target texture map corresponding to the first target texture map, including:

and acquiring decoding bandwidth required for texture decoding, acquiring color space information corresponding to the texture image splicing features when the texture image splicing features are converted from a frequency domain space to a color space based on a frequency range indicated by the decoding bandwidth, and determining the second target texture map based on the color space information.

10. The method of claim 1, wherein the determining a target three-dimensional avatar mesh for the head portion of the target virtual object in the three-dimensional scene based on the initial three-dimensional face shape comprises:

determining an initial three-dimensional face mesh based on the initial three-dimensional face shape, and acquiring N first key points used for representing a head part of the target virtual object from M first mesh vertexes contained in the initial three-dimensional face mesh; n is a positive integer less than M;

Acquiring a template grid, and searching N second key points with the same key attribute as the N first key points in the template grid; a first key point corresponds to a second key point;

n key point pairs of the head part of the target virtual object are formed by the N first key points and the N second key points, and when an offset function associated with the M first grid vertices is obtained based on fitting of the N key point pairs, offset corresponding to the M first grid vertices contained in the initial three-dimensional face grid is determined based on the offset function;

and respectively determining target position information corresponding to each first grid vertex based on the offset corresponding to each first grid vertex, determining a space point indicated by the target position information corresponding to each first grid vertex as a second grid vertex, and determining the target three-dimensional head portrait grid based on M second grid vertices.

11. The method of claim 10, wherein the offset function is used to indicate a weighted summation of K radial basis functions, each radial basis function being associated with a corresponding function weight;

When obtaining an offset function associated with the M first grid vertices based on the N keypoint pairs, determining offsets corresponding to the M first grid vertices included in the initial three-dimensional face grid based on the offset function, including:

acquiring an ith first grid vertex from the M first grid vertices, and determining initial position information of the ith first grid vertex; i is a positive integer less than or equal to M;

and carrying out weighted summation processing on the function value of each radial basis function and the function weight corresponding to each radial basis function based on the ith first grid vertex, and taking the sum obtained after the weighted summation processing as the offset corresponding to the ith first grid vertex.

12. The method of claim 10, wherein the determining the target location information corresponding to each first mesh vertex based on the offset corresponding to each first mesh vertex, respectively, comprises:

13. The method of claim 10, wherein the fitting the offset functions associated with the M first mesh vertices based on the N keypoint pairs comprises:

performing alignment processing on the initial three-dimensional face grid and the template grid based on the position distances between the first key point and the second key point in each key point pair, and determining the position distances between the first key point and the second key point in each key point pair as target position distances when the sum of the position distances of the N key point pairs is minimized;

and carrying out interpolation calculation based on the target position distance of each key point pair, determining K radial basis functions and function weights corresponding to each radial basis function, and determining offset functions associated with the M first grid vertexes based on the K radial basis functions and the function weights corresponding to each radial basis function.

14. The method of claim 10, wherein the M first mesh vertices in the initial three-dimensional face mesh are generated based on an identity feature and a first principal component analysis basis of the target viewpoint avatar image, an expression feature and a second principal component analysis basis of the target viewpoint avatar image, and an average face shape; the identity feature is used for indicating the role type of the target virtual object, and the expression feature is used for indicating the facial expression corresponding to the target viewpoint head portrait image.

15. An image data processing apparatus, characterized in that the apparatus comprises:

the grid generation module is used for acquiring a target viewpoint head image corresponding to a head part of a target virtual object in a three-dimensional scene, and determining a target three-dimensional head image grid of the head part of the target virtual object in the three-dimensional scene based on an initial three-dimensional face shape when the initial three-dimensional face shape of the target viewpoint head image is obtained through reconstruction;

The second texture generation module is used for carrying out texture coding processing on the first target texture map to obtain texture coding features corresponding to the first target texture map, carrying out feature stitching processing on the texture coding features and the image coding features when the image coding features of the target viewpoint head portrait image are obtained to obtain texture image stitching features for texture decoding, and carrying out texture decoding processing on the texture image stitching features to obtain a second target texture map corresponding to the first target texture map; the texture quality of the second target texture map is better than the texture quality of the first target texture map;

and the rendering module is used for obtaining a target rendering head image based on the target three-dimensional head image grid and the second target texture mapping, and generating a three-dimensional head model of the head part of the target virtual object under the three-dimensional scene based on the obtained target rendering head image.

16. A computer device, comprising: a processor and a memory;

the processor is connected to the memory, wherein the memory is configured to store a computer program, and the processor is configured to invoke the computer program to cause the computer device to perform the method of any of claims 1-14.

17. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program adapted to be loaded and executed by a processor to cause a computer device having the processor to perform the method of any of claims 1-14.