CN114549728A

CN114549728A - Training method of image processing model, image processing method, device and medium

Info

Publication number: CN114549728A
Application number: CN202210304129.6A
Authority: CN
Inventors: 王迪; 赵晨; 杨少雄
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-03-25
Filing date: 2022-03-25
Publication date: 2022-05-27

Abstract

The disclosure provides a training method of an image processing model, an image processing method, an image processing device and a medium. Relates to the technical field of computers, in particular to the technical field of artificial intelligence such as computer vision, deep learning and augmented reality. The specific implementation scheme is as follows: inputting the sample face image into a first image processing model to obtain a texture coefficient of the sample face image; generating an initial texture map based on the texture coefficients and the texture base; inputting a target area image extracted from the sample face image into a second image processing model, and acquiring a target object offset characteristic image of the sample face image; obtaining a final texture map based on the initial texture map and the target object offset characteristic map; generating a rendering map of the sample face map based on the final texture map; constructing a loss function based on the rendering graph and the label rendering graph; parameters of the first image processing model and the second image processing model are adjusted based on the loss function, respectively. According to the technical scheme, the similarity between the reconstruction result and the original face can be improved.

Description

Training method of image processing model, image processing method, device and medium

Technical Field

The present disclosure relates to the field of computer technology, and more particularly to the field of artificial intelligence techniques such as computer vision, deep learning, augmented reality, and the like.

Background

With the rapid development of related fields such as computer vision, computer technology and the like, the face three-dimensional reconstruction technology is continuously updated. In the three-dimensional face reconstruction, in addition to reconstructing the face shape, a corresponding face texture map needs to be generated, and how to improve the similarity between the reconstruction result and the original face is an urgent issue to be researched and improved.

Disclosure of Invention

The disclosure provides a training method of an image processing model, an image processing method, an image processing device and a medium.

According to a first aspect of the present disclosure, there is provided a training method of an image processing model, including:

inputting the sample face image into a first image processing model, and acquiring texture coefficients of the sample face image output by the first image processing model;

generating an initial texture map of the sample face map based on the texture coefficients and the texture base;

inputting a target area image extracted from the sample face image into a second image processing model, and acquiring a target object offset characteristic image of the sample face image output by the second image processing model;

obtaining a final texture map based on the initial texture map and the target object offset characteristic map;

generating a rendering map of the sample face map based on the final texture map;

constructing a loss function based on the rendering graph and the label rendering graph of the sample face graph;

parameters of the first image processing model and the second image processing model are adjusted based on the loss function, respectively.

According to a second aspect of the present disclosure, there is provided an image processing method including:

inputting the face image to be processed into a first image processing model, and acquiring texture coefficients of the face image to be processed output by the first image processing model;

generating an initial texture map of the face map to be processed based on the texture coefficients and the texture base;

inputting a target area image extracted from the face image to be processed into a second image processing model, and acquiring a target object offset characteristic image of the face image to be processed output by the second image processing model;

the first image processing model and the second image processing model are obtained by adopting the training method of the image processing model provided by the first aspect.

According to a third aspect of the present disclosure, there is provided a model training apparatus comprising:

the first acquisition module is used for inputting the sample face image into the first image processing model and acquiring the texture coefficient of the sample face image output by the first image processing model;

a first generation module for generating an initial texture map of the sample face map based on the texture coefficients and the texture base;

the second acquisition module is used for inputting the target area image extracted from the sample face image into a second image processing model and acquiring a target object offset characteristic image of the sample face image output by the second image processing model;

the second generation module is used for obtaining a final texture map based on the initial texture map and the target object offset characteristic map;

a third generation module for generating a rendering of the sample face map based on the final texture map;

a construction module for constructing a loss function based on the rendering map and the label rendering map of the sample face map;

and the training module is used for respectively adjusting the parameters of the first image processing model and the second image processing model based on the loss function.

According to a fourth aspect of the present disclosure, there is provided an image processing apparatus comprising:

the fourth acquisition module is used for inputting the face image to be processed into the first image processing model and acquiring the texture coefficient of the face image to be processed output by the first image processing model;

the fifth generation module is used for generating an initial texture map of the face map to be processed based on the texture coefficients and the texture base;

the fifth acquisition module is used for inputting the target area image extracted from the face image to be processed into the second image processing model and acquiring the target object offset characteristic image of the face image to be processed output by the second image processing model;

a sixth generating module, configured to obtain a final texture map based on the initial texture map and the target object offset feature map;

According to a fifth aspect of the present disclosure, there is provided an electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first and second aspects.

According to a sixth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method provided by the first and second aspects.

According to a seventh aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method provided by the first and second aspects above.

According to the embodiment of the disclosure, the similarity between the reconstruction result and the original face can be improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram of a distributed cluster processing scenario according to an embodiment of the present disclosure;

FIG. 2 is a first flowchart illustrating a method for training an image processing model according to an embodiment of the present disclosure;

FIG. 3 is a second flowchart illustrating a method for training an image processing model according to an embodiment of the present disclosure;

FIG. 4 is a framework diagram of model training according to an embodiment of the present disclosure;

FIG. 5 is a first flowchart illustrating an image processing method according to an embodiment of the disclosure;

FIG. 6 is a second flowchart illustrating an image processing method according to an embodiment of the disclosure;

FIG. 7 is a schematic view of a scenario of model training provided by an embodiment of the present disclosure;

fig. 8 is a scene schematic diagram of image processing provided by an embodiment of the present disclosure;

FIG. 9 is a schematic structural diagram of an image processing model training apparatus according to an embodiment of the present disclosure;

fig. 10 is a schematic configuration diagram of an image processing apparatus according to an embodiment of the present disclosure;

fig. 11 is a block diagram of an electronic device for implementing a training method of an image processing model or an image processing method according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The terms "first," "second," and "third," etc. in the description and claims of the present disclosure and the above-described figures are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. Furthermore, the terms "comprises" and "comprising," as well as any variations thereof, are intended to cover a non-exclusive inclusion, such as a list of steps or elements. A method, system, article, or apparatus is not necessarily limited to those steps or elements explicitly listed, but may include other steps or elements not explicitly listed or inherent to such process, system, article, or apparatus.

Fig. 1 is a schematic diagram of a distributed cluster processing scenario according to an embodiment of the present disclosure, where the distributed cluster system is an example of a cluster system, and model training is exemplarily described that can be performed by using the distributed cluster system, the present disclosure is not limited to model training on a single machine or multiple machines, and accuracy of model training can be further improved by using distributed processing. As shown in FIG. 1, in the distributed cluster system, a plurality of nodes (e.g., server cluster 101, server 102, server cluster 103, server 104, server 105) are included, the server 105 may further be connected to electronic devices, such as a mobile phone 1051 and a notebook computer 1052, and one or more model training tasks may be performed between the plurality of nodes and the connected electronic devices together. Optionally, a plurality of nodes in the distributed cluster system may adopt a data parallel model training mode, and then the plurality of nodes may execute a model training task based on the same training mode to better train the model; if the plurality of nodes in the distributed cluster system adopt a model training mode with parallel models, the plurality of nodes can execute model training tasks based on different training modes to better train the models. Optionally, after each round of model training is completed, data exchange (e.g., data synchronization) may be performed between multiple nodes.

The embodiment of the present disclosure provides a training method for an image processing model, and fig. 2 is a schematic flowchart of the training method for an image processing model according to the embodiment of the present disclosure, which may be applied to a training apparatus for an image processing model, for example, the apparatus may be deployed in a situation where a terminal or a server or other processing devices in a single-machine, multi-machine or cluster system execute, and may implement model training and the like. The terminal may be a User Equipment (UE), a mobile device, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like. In some possible implementations, the method may also be implemented by a processor calling computer readable instructions stored in a memory. In some possible implementations, the method may be applied to any node or electronic device (cell phone or desktop, etc.) in the cluster system shown in fig. 1. As shown in fig. 2, the training method of the image processing model includes:

s201, inputting the sample face image into a first image processing model, and acquiring a texture coefficient of the sample face image output by the first image processing model;

s202, generating an initial texture map of the sample face map based on the texture coefficients and the texture base;

s203, inputting a target area image extracted from the sample face image into a second image processing model, and acquiring a target object offset characteristic image of the sample face image output by the second image processing model;

s204, obtaining a final texture map based on the initial texture map and the target object offset characteristic map;

s205, generating a rendering map of the sample face map based on the final texture map;

s206, constructing a loss function based on the rendering graph and the label rendering graph of the sample face graph;

and S207, respectively adjusting parameters of the first image processing model and the second image processing model based on the loss function.

Wherein, the sample face image is a sample image adopted by model training.

The sample face image may be a face image of a target body, where the target body may be a human or an animal, the sample face image may be a face image acquired on line, for example, the face image of the target body may be acquired on line through a web crawler technology, or the sample face image may also be a face image acquired off line, or the sample face image may also be a face image acquired in real time of the target body, or the sample face image may also be a face image synthesized by an operator, and the like, which is not limited in this disclosure.

Wherein the target area map is a portion of the sample face map. The target area map comprises an area map of the target object. Here, the target object includes at least one of eyebrows, eyes, a nose, a mouth, ears, and lips.

Wherein the first image processing model comprises at least one network structure comprising a Convolutional Neural Network (CNN).

Wherein the second image processing model comprises at least one network structure comprising CNNs.

The network layer number of the first image processing model is larger than that of the second image processing model.

The texture substrate may be a single reconstruction-style texture substrate or a plurality of reconstruction-style texture substrates. Reconstruction styles herein include, but are not limited to, real person styles, cartoon styles, funny styles, and the like, to which this disclosure is not limited.

The texture substrate may be an open-source texture substrate, or may be a private texture substrate obtained after multiple rounds of extraction and update of the texture substrate, and the source of the texture substrate is not limited in the present disclosure.

In some embodiments, the target region map comprises an eyebrow region map, the extracting the eyebrow region map from the sample face map comprising: two eyebrow key points of the sample face image are obtained through an open source algorithm, and an external quadrilateral region is deducted to obtain an eyebrow region image. Here, the open source algorithm is a face key point detection algorithm.

In some embodiments, the target region map comprises an eyebrow region map, the extracting the eyebrow region map from the sample face map comprising: and inputting the sample face image into the trained detection model to obtain an eyebrow region image output by the detection model.

In some embodiments, to improve the accuracy of the first image processing model, the texture base style is the same style as the label rendering style.

In the facial three-dimensional reconstruction, eyebrows are generated through texture map reconstruction, so the accuracy of the extracted eyebrow shapes influences the accuracy of the reconstructed texture map. In the related technology, the eyebrow burr boundaries in the texture image are fuzzy and have poor generalization performance, and the similarity between the reconstructed eyebrow shape and the original eyebrow shape is low. In some embodiments, inputting the target region map extracted from the sample face map into the second image processing model, and obtaining the target object shift feature map of the sample face map output by the second image processing model, includes: and inputting the eyebrow region image extracted from the sample face image into a second image processing model, and acquiring the eyebrow shape shift characteristic image of the sample face image output by the second image processing model. Therefore, eyebrows in the texture map can be clearer at least, the mobility and the generalization of the texture map with different reconstruction styles are improved, the identification degree of the texture map is improved, and the similarity between a reconstruction result such as a rendering map and an original face is improved.

In some embodiments, inputting the eyebrow region map extracted from the sample face map into the second image processing model, and obtaining the eyebrow shape shift feature map of the sample face map output by the second image processing model, includes:

inputting the eyebrow region graph into a second image processing model, reducing the dimension to n x m through a full convolution layer, and then increasing the dimension to preset p x q to obtain an eyebrow shape shift characteristic graph; wherein n, m, p and q are integers. For example, n is 2, m is 4; p is 64 and q is 128. Illustratively, the texture substrate uses a fixed eyebrow position, the size of a rectangular frame of an eyebrow region is 64 x 128, the region and the predicted eyebrow shift feature map are substituted into a normalization function in the generated texture image, the eyebrow shape of the region on the texture image automatically generates the eyebrow effect of the sample face map, and iterative training is performed.

It is understood that the target area map may include other area maps such as a mouth area map, a nose area map, and an eye area map, in addition to the eyebrow area map. The target object shift feature map may include, in addition to the eyebrow shift feature map, other shape shift feature maps such as a mouth shift feature map, a nose shift feature map, and an eye shift feature map, which are not described in detail herein.

According to the technical scheme, when a first image processing model and a second image processing model are trained, a target area image extracted from a sample face image is input into the second image processing model, and a target object offset characteristic image of the sample face image output by the second image processing model is obtained; obtaining a final texture map based on the initial texture map and the target object offset characteristic map; generating a rendering map of the sample face map based on the final texture map; constructing a loss function based on the rendering graph and the label rendering graph of the sample face graph; parameters of the first image processing model and the second image processing model are adjusted respectively based on the loss function, so that at least a target object in the texture map can be clearer, the mobility and the generalization of the texture maps with different reconstruction styles are improved, the identification degree of the texture map is improved, and the similarity between a reconstruction result such as a rendering map and an original face is improved.

In some embodiments, constructing a penalty function based on the rendering graph and the tag rendering graph of the sample face graph comprises:

determining a first feature map of the rendering map and a second feature map of the tag rendering map;

and constructing a loss function based on the first feature map of the rendering map and the second feature map of the label rendering map.

In some embodiments, determining the first feature map of the rendering map and the second feature map of the tag rendering map comprises: and inputting the rendering map and the label rendering map into a pre-trained face recognition model to obtain a first feature map of the rendering map and a second feature map of the label rendering map output by the face recognition model.

The first feature map is a map of facial features extracted from the sample face map, for example, the first feature map is an eye feature map, or the first feature map is a nose feature map, or the first feature map is a mouth feature map. Similarly, the second feature map is a map of facial features extracted from the label face map, for example, the second feature map is an eye feature map, or the second feature map is a nose feature map, or the second feature map is a mouth feature map.

The face regions to which the first feature map and the second feature map are directed are the same, that is, when the first feature map is an eye feature map, the second feature map is also an eye feature map; when the first characteristic diagram is a mouth characteristic diagram, the second characteristic diagram is also a mouth characteristic diagram.

In the embodiment of the present disclosure, how the face recognition model is specifically trained is not limited. All models capable of recognizing facial features in an image can be used as face recognition models.

It is to be appreciated that determining the first feature map of the rendering map and the second feature map of the tag rendering map is not limited to one implementation in which the rendering map and the tag rendering map are input to a facial recognition model. For example, facial feature extraction can be performed on the rendering map and the tag rendering map respectively through a feature extraction algorithm to obtain a first feature map of the rendering map and a second feature map of the tag rendering map.

Therefore, a loss function is constructed based on the first feature map of the rendering map and the second feature map of the tag rendering map, which is beneficial to improving the identification degree of the texture image output by the first image processing model and the accuracy of the target object offset feature map output by the second image processing model, so that the target object in the texture image is clearer and more accurate, and the similarity between the reconstruction result and the original face is improved.

The embodiment of the present disclosure provides a training method for an image processing model, and fig. 3 is a schematic flowchart of the training method for an image processing model according to the embodiment of the present disclosure, which may be applied to a training apparatus for an image processing model, for example, the apparatus may be deployed in a situation where a terminal or a server or other processing devices in a single-machine, multi-machine or cluster system execute, and may implement model training and the like. The terminal may be a user device, a mobile device, a personal digital assistant, a handheld device, a computing device, a vehicle-mounted device, a wearable device, and the like. In some possible implementations, the method may also be implemented by a processor calling computer readable instructions stored in a memory. In some possible implementations, the method may be applied to any node or electronic device (mobile phone or desktop, etc.) in the cluster system shown in fig. 1. As shown in fig. 3, the training method of the image processing model includes:

s301, inputting the sample face image into a first image processing model, and acquiring a texture coefficient of the sample face image output by the first image processing model;

s302, generating an initial texture map of the sample face map based on the texture coefficients and the texture base;

s303, acquiring a shape coefficient of a sample face image output by the first image processing model;

s304, generating a shape model of the sample face image based on the shape coefficient and the shape base;

s305, inputting a target area image extracted from the sample face image into a second image processing model, and acquiring a target object offset characteristic image of the sample face image output by the second image processing model;

s306, obtaining a final texture map based on the initial texture map and the target object offset characteristic map;

s307, generating a rendering graph of the sample face graph based on the final texture graph and the shape model;

s308, constructing a loss function based on the rendering graph and the label rendering graph of the sample face graph;

s309, parameters of the first image processing model and the second image processing model are adjusted respectively based on the loss function.

S304 is executed after S303, but S303 may be executed after S302, before S302, or simultaneously with S302.

In the disclosed embodiments, the dimensions of the shape base are not limited. It will be appreciated that the higher the dimension of the shape base, the higher the fineness of the shape model.

The shape base may be a single reconstruction style shape base or a plurality of reconstruction style shape bases. Reconstruction styles herein include, but are not limited to, real person styles, cartoon styles, funny styles, and the like, to which this disclosure is not limited.

The shape substrate may be an open-source shape substrate, or may be a private shape substrate obtained after extraction and update of multiple rounds of shape substrates.

In some embodiments, to improve the accuracy of the first and second image processing models, the style of the shape base is the same as the style of the texture base.

In other embodiments, the same pattern substrate corresponds to a plurality of styles of texture substrates.

Therefore, when the first image processing model and the second image processing model are trained, the texture coefficient and the texture substrate, the shape coefficient and the shape substrate and the target object offset characteristic are considered, the problem that the texture substrate and the shape substrate are insufficient or incomplete in source is weakened, the mobility and the generalization of the texture substrate and the shape substrate are improved, the identification degree of the texture image output by the first image processing model can be improved, the identification degree of the shape model is improved, the accuracy of the target object offset characteristic output by the second image processing model can also be improved, the identification degree of a rendering graph is further improved, and the similarity between a reconstruction result and an original face is improved.

In some embodiments, generating a rendering of the sample face map based on the final texture map and the shape model comprises:

and inputting the final texture map and the shape model into a renderer to obtain a rendering map of the sample face map output by the renderer.

Here, the renderer may be a differentiable renderer.

It will be appreciated that determining the rendering map is not limited to one implementation in which the final texture map and shape model are input to the renderer. For example, the final texture map and the shape model may be rendered by a rendering algorithm to obtain a rendering map.

Therefore, the rendering graph can be determined quickly, the time for obtaining the rendering graph is saved, and the speed for training the first image processing model and the second image processing model is improved.

In some embodiments, adjusting parameters of the first image processing model and the second image processing model based on the loss function, respectively, comprises:

determining loss values of the first feature map and the second feature map corresponding to the sample face map based on a loss function;

and adjusting parameters of each layer of network of the first image processing model and parameters of each layer of network of the second image processing model based on the loss value until the loss value is reduced to a first preset range.

Here, the first image processing model includes a multi-layer network, and parameters of at least one or more layers of the multi-layer network need to be adjusted when the first image processing model is trained.

Here, the first predetermined range is [0, x ], and x is a positive number greater than 0. The value of x can be set or adjusted according to the precision requirement or the speed requirement.

Therefore, parameters of each layer of network of the first image processing model and the second image processing model are adjusted based on the loss value of the single sample face image, the training time of the first image processing model and the second image processing model can be saved, the training speed of the first image processing model and the second image processing model is improved, and the trained first image processing model and the trained second image processing model can better adapt to the same type of reconstruction style as the sample face image.

determining loss values of a first feature map and a second feature map respectively corresponding to the plurality of sample face maps based on a loss function;

determining a total loss value of the plurality of sample face maps based on the loss values respectively corresponding to the plurality of sample face maps;

and adjusting the parameters of each layer of network of the first image processing model and the parameters of each layer of network of the second image processing model based on the total loss value until the total loss value is reduced to a second preset range.

Here, the first image processing model and the second image processing model include a multilayer network, and the number of network layers of the second image processing model is smaller than that of the first image processing model. When training the first image processing model and the second image processing model, parameters of at least one or more layers of the first image processing model and the second image processing model need to be adjusted.

Here, the second predetermined range is [0, y ], and y is a positive number greater than 0. The value of y may be set or adjusted according to accuracy requirements or speed requirements.

Therefore, the parameters of each layer of network of the first image processing model and the second image processing model are adjusted based on the total loss values of the plurality of sample face images, the generalization and the mobility of the first image processing model and the second image processing model can be improved, and the trained first image processing model and the trained second image processing model can be suitable for various reconstruction styles.

FIG. 4 is a schematic diagram of an architecture of model training, as shown in FIG. 4, in which a face map is input into a first image processing model, resulting in texture coefficients; generating a texture map according to the texture substrate and the texture coefficient; inputting a target area image extracted from the sample face image into a second image processing model, and acquiring a target object offset characteristic image of the sample face image output by the second image processing model; substituting a normalization function such as a grid-sample function based on the initial texture map and the target object offset characteristic map to obtain a final texture map; inputting the final texture map into a renderer to obtain a rendering map; constructing a loss function based on the rendering graph and the label rendering graph of the sample face graph; parameters of the first image processing model and the second image processing model are adjusted according to a loss function.

It should be understood that the architecture diagram shown in fig. 4 is merely illustrative, and that various obvious changes and/or substitutions may be made by those skilled in the art based on the example of fig. 4, and still fall within the scope of the disclosure of the embodiments of the disclosure.

Based on the first image processing model and the second image processing model obtained by training the training method of the image processing model, the embodiment of the disclosure provides an image processing method, the image processing method is applied to electronic equipment, the electronic equipment comprises but is not limited to a computer, a mobile phone or a tablet computer, and the like, and the disclosure does not limit the type of the electronic equipment. As shown in fig. 5, the image processing method includes:

s501, inputting the face image to be processed into a first image processing model, and acquiring a texture coefficient of the face image to be processed output by the first image processing model;

s502, generating an initial texture map of the face map to be processed based on the texture coefficient and the texture base;

s503, inputting the target area image extracted from the face image to be processed into a second image processing model, and acquiring a target object offset characteristic image of the face image to be processed output by the second image processing model;

and S504, obtaining a final texture map based on the initial texture map and the target object offset characteristic map.

Here, the face image to be processed is a face image that needs to be processed.

In some embodiments, generating an initial texture map of the face map to be processed based on the texture coefficients and the texture base includes: an initial texture map is generated based on a linear summation of the texture coefficients and the texture basis.

According to the technical scheme, the final texture map is obtained based on the initial texture map and the target object offset characteristic map, the target object in the texture image can be clearer and more accurate and is consistent with the shape of the target object of the original face, and therefore the similarity between a reconstruction result such as a rendering map and the original face is improved.

Based on the first image processing model and the second image processing model obtained by training the training method of the image processing model, the embodiment of the disclosure provides an image processing method, the image processing method is applied to electronic equipment, the electronic equipment comprises but is not limited to a computer, a mobile phone or a tablet computer, and the like, and the disclosure does not limit the type of the electronic equipment. As shown in fig. 6, the image processing method includes:

s601, inputting the face image to be processed into a first image processing model, and acquiring a texture coefficient of the face image to be processed output by the first image processing model;

s602, generating an initial texture map of the face map to be processed based on the texture coefficient and the texture base;

s603, acquiring a shape coefficient of the face image to be processed output by the first image processing model;

s604, generating a shape model of the face image to be processed based on the shape coefficient and the shape base;

s605, inputting a target area graph extracted from the face graph to be processed into a second image processing model, and acquiring a target object offset characteristic graph of the face graph to be processed output by the second image processing model;

s606, obtaining a final texture map based on the initial texture map and the target object offset characteristic map;

and S607, obtaining a rendering map of the face image to be processed based on the final texture map and the shape model.

S604 is executed after S603, but S603 may be executed after S602, before S602, or simultaneously with S602.

In some embodiments, inputting the target area map extracted from the face map to be processed into the second image processing model, and acquiring the target object offset feature map of the face map to be processed output by the second image processing model, includes: and inputting the eyebrow region image extracted from the face image to be processed into a second image processing model, and acquiring the eyebrow shape offset characteristic image of the face image to be processed output by the second image processing model. Therefore, eyebrows in the texture image can be clearer and more accurate and are consistent with eyebrows of the original face, and the similarity between a reconstruction result such as a rendering image and the original face is improved.

According to the technical scheme, the target object in the texture image is clearer and more accurate and is consistent with the shape of the original facial target object, the identification degree of the texture image of the to-be-processed facial image can be improved, the identification degree of the shape model of the to-be-processed facial image can also be improved, the identification degree of the rendering image is further improved, and the similarity between the reconstruction result such as the rendering image and the original face is improved.

An embodiment of the present disclosure further provides a scene schematic diagram of model training, as shown in fig. 7, an electronic device, such as a cloud server, mines a sample face map and a label face map that conform to a training task from a plurality of data sources according to the training task sent by a terminal. Here, the training tasks transmitted by different terminals may be training tasks for different reconstruction styles. Different reconstruction styles of training tasks may require different sample face maps and labeled face maps. The electronic equipment inputs the sample face image into the first image processing model to obtain a texture coefficient of the sample face image; generating an initial texture map of the sample face map based on the texture coefficients and the texture base; inputting an eyebrow region image extracted from the sample face image into a second image processing model, and acquiring an eyebrow shape offset characteristic image of the sample face image; obtaining a final texture map based on the initial texture map and the eyebrow shape offset characteristic map; generating a rendering map of the sample face map based on the final texture map; constructing a loss function based on the rendering graph and the label rendering graph of the sample face graph; adjusting parameters of the first image processing model and the second image processing model respectively based on the loss function; and the terminal returns the trained first image processing model and the trained second image processing model corresponding to the reconstruction style.

As shown in fig. 8, an electronic device, such as a cloud server, receives a facial image to be processed sent by a terminal, and determines a texture map and/or a rendering map according to the facial image to be processed and a reconstruction style sent by the terminal; and returning the texture map and/or the rendering map of the corresponding reconstruction style to the terminal.

The number of the terminals and the electronic devices is not limited in the present disclosure, and a plurality of terminals and a plurality of electronic devices may be included in practical applications.

It should be understood that the scene diagrams shown in fig. 7 and 8 are only illustrative and not restrictive, and those skilled in the art may make various obvious changes and/or substitutions based on the examples of fig. 7 and 8, and the obtained technical solution still belongs to the disclosure scope of the embodiments of the present disclosure.

It should be noted that the training method of the image processing model and the image processing method of the present disclosure are not a human head model for a specific user, and cannot reflect personal information of a specific user.

An embodiment of the present disclosure provides a training apparatus for an image processing model, as shown in fig. 9, the training apparatus for an image processing model may include:

a first obtaining module 910, configured to input the sample face map into a first image processing model, and obtain texture coefficients of the sample face map output by the first image processing model;

a first generating module 920, configured to generate an initial texture map of the sample face map based on the texture coefficients and the texture base;

a second obtaining module 930, configured to input the target region map extracted from the sample face map into the second image processing model, and obtain a target object offset feature map of the sample face map output by the second image processing model;

a second generating module 940, configured to obtain a final texture map based on the initial texture map and the target object offset feature map;

a third generating module 950 for generating a rendering of the sample face map based on the final texture map;

a construction module 960 for constructing a loss function based on the rendering map and the label rendering map of the sample face map;

a training module 970 for adjusting parameters of the first image processing model and the second image processing model respectively based on the loss function.

In some embodiments, the second obtaining module 930 is specifically configured to:

and inputting the eyebrow region image extracted from the sample face image into a second image processing model, and acquiring the eyebrow shape shift characteristic image of the sample face image output by the second image processing model.

In some embodiments, the building block 960 is specifically configured to:

In some embodiments, the training device for the image processing model may further include:

the third acquisition module is used for acquiring the shape coefficient of the sample face image output by the first image processing model;

a fourth generation module to generate a shape model of the sample face map based on the shape coefficient and the shape base.

In some embodiments, the fourth generating module is specifically configured to:

In some embodiments, the training module 970 is specifically configured to:

and adjusting parameters of each layer of network of the first image processing model and parameters of each layer of network of the second image processing model based on the total loss value until the total loss value is reduced to a second preset range.

It should be understood by those skilled in the art that the functions of the processing modules in the training apparatus for image processing models according to the embodiments of the present disclosure may be understood by referring to the description related to the aforementioned method for training image processing models, and the processing modules in the training apparatus for image processing models according to the embodiments of the present disclosure may be implemented by analog circuits that implement the functions of the embodiments of the present disclosure, or by running software that performs the functions of the embodiments of the present disclosure on electronic devices.

The training device for the image processing model, disclosed by the embodiment of the disclosure, can enable the target object in the texture image to be clearer, improve the mobility and the generalization of the target object in the texture image with different reconstruction styles, and improve the identification degree of the texture image, thereby improving the similarity between the reconstruction result and the original face.

An embodiment of the present disclosure provides an image processing apparatus, as shown in fig. 10, which may include:

a fourth obtaining module 1010, configured to input the face image to be processed into the first image processing model, and obtain a texture coefficient of the face image to be processed output by the first image processing model;

a fifth generating module 1020, configured to generate an initial texture map of the face map to be processed based on the texture coefficients and the texture base;

a fifth obtaining module 1030, configured to input the target region map extracted from the face map to be processed into the second image processing model, and obtain a target object offset feature map of the face map to be processed output by the second image processing model;

a sixth generating module 1040, configured to obtain a final texture map based on the initial texture map and the target object offset feature map;

the first image processing model and the second image processing model are obtained according to the training method of the image processing model.

In some embodiments, the fifth obtaining module 1030 is specifically configured to:

and inputting the eyebrow region image extracted from the face image to be processed into a second image processing model, and acquiring the eyebrow shape offset characteristic image of the face image to be processed output by the second image processing model.

In some embodiments, the image processing apparatus may further include: the sixth acquisition module is used for acquiring the shape coefficient of the face image to be processed output by the first image processing model; a seventh generating module, configured to generate a shape model of the face map to be processed based on the shape coefficient and the shape base; and the eighth generation module is used for obtaining a rendering map of the face image to be processed based on the final texture map and the shape model.

It should be understood by those skilled in the art that the functions of the processing modules in the image processing apparatus according to the embodiments of the present disclosure may be understood by referring to the description of the foregoing image processing method, and the processing modules in the image processing apparatus according to the embodiments of the present disclosure may be implemented by analog circuits that implement the functions described in the embodiments of the present disclosure, or by running software that performs the functions described in the embodiments of the present disclosure on electronic devices.

The image processing device of the embodiment of the disclosure can make the target object in the texture image more clear and accurate, keep the same with the shape of the original facial target object, not only can improve the recognition degree of the texture image of the to-be-processed facial image, but also can improve the recognition degree of the shape model of the to-be-processed facial image, thereby further improving the recognition degree of the rendering image and improving the similarity of the reconstruction result and the original face.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 11 shows a schematic block diagram of an example electronic device 1100 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not intended to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 11, the device 1100 includes a computing unit 1101, which can perform various appropriate actions and processes in accordance with a computer program stored in a Read-Only Memory (ROM) 1102 or a computer program loaded from a storage unit 1108 into a Random Access Memory (RAM) 1103. In the RAM 1103, various programs and data necessary for the operation of the device 1100 may also be stored. The calculation unit 1101, the ROM1102, and the RAM 1103 are connected to each other by a bus 1104. An Input/Output (I/O) interface 1105 is also connected to bus 1104.

A number of components in device 1100 connect to I/O interface 1105, including: an input unit 1106 such as a keyboard, a mouse, and the like; an output unit 1107 such as various types of displays, speakers, and the like; a storage unit 1108 such as a magnetic disk, optical disk, or the like; and a communication unit 1109 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 1109 allows the device 1100 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 1101 can be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing Unit 1101 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable Processor, controller, microcontroller, and the like. The calculation unit 1101 performs the respective methods and processes described above, such as a training method of an image processing model or an image processing method. For example, in some embodiments, the training method of the image processing model or the image processing method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as the storage unit 1108. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 1100 via ROM1102 and/or communication unit 1109. When the computer program is loaded into the RAM 1103 and executed by the computing unit 1101, one or more steps of the training method of the image processing model or the image processing method described above may be performed. Alternatively, in other embodiments, the computing unit 1101 may be configured by any other suitable means (e.g., by means of firmware) to perform a training method of an image processing model or an image processing method.

Various implementations of the systems and techniques described here above may be realized in digital electronic circuitry, Integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application-Specific Standard Products (ASSPs), System-on-Chip (SOC), load Programmable Logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard Disk, a random access Memory, a Read-Only Memory, an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a Compact disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a Display device (e.g., a Cathode Ray Tube (CRT) or Liquid Crystal Display (LCD) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client and server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method of training an image processing model, comprising:

inputting a sample face image into a first image processing model, and acquiring texture coefficients of the sample face image output by the first image processing model;

generating an initial texture map of the sample face map based on the texture coefficients and a texture base;

inputting a target area map extracted from the sample face map into a second image processing model, and acquiring a target object offset feature map of the sample face map output by the second image processing model;

generating a rendering of the sample face map based on the final texture map;

constructing a loss function based on the rendering map and a label rendering map of the sample face map;

2. The method according to claim 1, wherein the inputting a target region map extracted from the sample face map into a second image processing model, obtaining a target object shift feature map of the sample face map output by the second image processing model, comprises:

3. The method of claim 1, wherein the constructing a loss function based on the rendering graph and a label rendering graph of the sample face graph comprises:

and constructing a loss function based on the first characteristic diagram of the rendering diagram and the second characteristic diagram of the tag rendering diagram.

4. The method of claim 3, wherein said adjusting parameters of said first and second image processing models, respectively, based on said loss function comprises:

determining loss values of the first feature map and the second feature map corresponding to the sample face map based on the loss function;

5. The method of claim 3, wherein said adjusting parameters of said first and second image processing models, respectively, based on said loss function comprises:

determining loss values of the first feature map and the second feature map respectively corresponding to the plurality of sample face maps based on the loss function;

determining a total loss value of the plurality of sample face maps based on loss values corresponding to the plurality of sample face maps respectively;

6. The method of claim 1, further comprising:

acquiring a shape coefficient of the sample face image output by the first image processing model;

generating a shape model of the sample face map based on the shape coefficients and a shape base.

7. The method of claim 6, wherein the generating a rendering of the sample face map based on the final texture map comprises:

8. An image processing method comprising:

inputting a face image to be processed into a first image processing model, and acquiring a texture coefficient of the face image to be processed output by the first image processing model;

generating an initial texture map of the face map to be processed based on the texture coefficients and a texture base;

inputting a target area image extracted from the face image to be processed into a second image processing model, and acquiring a target object offset feature image of the face image to be processed output by the second image processing model;

wherein the first image processing model and the second image processing model are obtained by adopting the training method of the image processing model according to any one of claims 1 to 7.

9. The method according to claim 8, wherein the inputting a target region map extracted from the face map to be processed into a second image processing model, and acquiring a target object shift feature map of the face map to be processed output by the second image processing model comprises:

10. The method of claim 8, further comprising:

acquiring a shape coefficient of the face image to be processed output by the first image processing model;

generating a shape model of the face map to be processed based on the shape coefficients and a shape base;

and obtaining a rendering map of the face image to be processed based on the final texture map and the shape model.

11. A model training apparatus for an image processing model, comprising:

the first acquisition module is used for inputting a sample face image into a first image processing model and acquiring texture coefficients of the sample face image output by the first image processing model;

a first generation module for generating an initial texture map of the sample face map based on the texture coefficients and a texture base;

a second obtaining module, configured to input a target region map extracted from the sample face map into a second image processing model, and obtain a target object offset feature map of the sample face map output by the second image processing model;

a second generation module, configured to obtain a final texture map based on the initial texture map and the target object offset feature map;

a construction module for constructing a loss function based on the rendering map and a label rendering map of the sample face map;

a training module for adjusting parameters of the first image processing model and the second image processing model based on the loss function, respectively.

12. The apparatus of claim 11, wherein the second obtaining means is configured to:

13. The apparatus of claim 11, wherein the build module is to:

14. The apparatus of claim 13, wherein the training module is to:

15. The apparatus of claim 13, wherein the training module is to:

16. The apparatus of claim 11, further comprising:

a third obtaining module, configured to obtain a shape coefficient of the sample face image output by the first image processing model;

a fourth generation module to generate a shape model of the sample face map based on the shape coefficients and a shape base.

17. The apparatus of claim 16, wherein the fourth generating means is configured to:

18. An image processing apparatus comprising:

a fifth generating module, configured to generate an initial texture map of the to-be-processed face map based on the texture coefficients and a texture base;

a fifth obtaining module, configured to input a target region map extracted from the to-be-processed face map into a second image processing model, and obtain a target object offset feature map of the to-be-processed face map output by the second image processing model;

19. The apparatus of claim 18, wherein the fifth obtaining means is configured to:

20. The apparatus of claim 18, further comprising:

a sixth obtaining module, configured to obtain a shape coefficient of the to-be-processed face map output by the first image processing model;

a seventh generating module, configured to generate a shape model of the to-be-processed face map based on the shape coefficient and a shape base;

and the eighth generation module is used for obtaining a rendering map of the face map to be processed based on the final texture map and the shape model.

21. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-10.

22. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-10.

23. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-10.