CN115909430A

CN115909430A - Image processing method, device, storage medium and computer program product

Info

Publication number: CN115909430A
Application number: CN202111152280.4A
Authority: CN
Inventors: 王锐; 陈健; 程培; 俞刚; 高常鑫; 桑农
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-09-29
Filing date: 2021-09-29
Publication date: 2023-04-04

Abstract

The embodiment of the application provides an image processing method, an image processing device, a storage medium and a computer program product, wherein the method comprises the following steps: the method comprises the steps of obtaining control parameters corresponding to target face attributes, wherein the control parameters comprise parameter adjustment quantities, the parameter adjustment quantities are used for adjusting dimensions corresponding to the target face attributes in a plurality of dimensions included by modulation parameters in an image processing network, determining second modulation parameters according to first modulation parameters corresponding to a first face image and the control parameters, then calling the image processing network to generate a second face image by using the second modulation parameters, and the second face image is an image of the first face image after the target face attributes are adjusted, so that the accuracy and the efficiency of local attribute editing of the face image can be improved, and the editing effect is improved.

Description

Image processing method, device, storage medium and computer program product

Technical Field

The present application relates to the field of computer technologies, and in particular, to an image processing method, an image processing apparatus, a storage medium, and a computer program product.

Background

With the progress and development of scientific technologies such as internet, computers and the like, the need for editing pictures is increasing, including editing local attributes of face images. The editing of the local attribute of the face picture means that the attribute of the target local area is modified under the condition that the identity and other attributes of the generated picture are not changed. The current method for generating the confrontation network based on non-pre-training to edit the face attribute has lower resolution of the generated picture due to the limitation of the capacity of the generated network and the training difficulty. In addition, due to the rich variety of local attributes, it is very difficult to collect a large amount of labeled training picture data for each attribute. The existing method for generating the network based on pre-training directly modifies the hidden variable of the whole picture, so that other incoherent regions of the face are changed greatly when the local attribute is modified, the accuracy of local attribute editing is low, and the effect is poor. Therefore, how to efficiently and accurately edit the local attributes of the face image becomes a problem to be solved urgently.

Disclosure of Invention

The embodiment of the application provides an image processing method, an image processing device, a storage medium and a computer program product, which can improve the accuracy and efficiency of local attribute editing of a face image and improve the editing effect.

In a first aspect, an embodiment of the present application provides an image processing method, where the method includes:

the method comprises the steps of obtaining control parameters corresponding to target face attributes, wherein the control parameters comprise parameter adjustment quantities, and the parameter adjustment quantities are used for adjusting dimensions corresponding to the target face attributes in a plurality of dimensions included by modulation parameters in an image processing network.

And determining a second modulation parameter according to the first modulation parameter corresponding to the first face image and the control parameter.

And calling the image processing network to generate a second face image by using the second modulation parameter, wherein the second face image is an image obtained by adjusting the target face attribute of the first face image.

In a second aspect, an embodiment of the present application provides an image processing apparatus, including:

the system comprises an acquisition module and an image processing module, wherein the acquisition module is used for acquiring control parameters corresponding to target face attributes, the control parameters comprise parameter adjustment quantities, and the parameter adjustment quantities are used for adjusting the dimensionality corresponding to the target face attributes in a plurality of dimensionalities included by modulation parameters in an image processing network.

And the determining module is used for determining a second modulation parameter according to the first modulation parameter corresponding to the first face image and the control parameter.

And the generating module is used for calling the image processing network to generate a second face image by using the second modulation parameter, wherein the second face image is an image obtained by adjusting the target face attribute of the first face image.

In a third aspect, an embodiment of the present application provides a computer device, where the computer device includes a processor, a network interface, and a storage device, where the processor, the network interface, and the storage device are connected to each other, where the network interface is controlled by the processor to transmit and receive data, and the storage device is used to store a computer program, and the computer program includes program instructions, and the processor is configured to call the program instructions to execute the image processing method according to the first aspect.

In a fourth aspect, the present application provides a computer-readable storage medium, in which a computer program is stored, where the computer program includes program instructions, and the program instructions are executed by a processor to execute the image processing method according to the first aspect.

In a fifth aspect, the present application discloses a computer program product or a computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to execute the image processing method according to the first aspect.

According to the embodiment of the application, the control parameters corresponding to the target face attributes can be obtained, the control parameters comprise parameter adjustment quantities, the parameter adjustment quantities are used for adjusting the dimensions corresponding to the target face attributes in a plurality of dimensions included by the modulation parameters in an image processing network, the second modulation parameters are determined according to the first modulation parameters and the control parameters corresponding to the first face image, then the image processing network is called to generate the second face image by using the second modulation parameters, the second face image is an image of the first face image after the target face attributes are adjusted, and other face attributes except the target face attributes are kept unchanged, so that the accuracy and efficiency of local attribute editing of the face image can be improved, and the editing effect is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flowchart of an image processing method according to an embodiment of the present application;

FIG. 2 is a schematic diagram illustrating an operation of an image processing network according to an embodiment of the present disclosure;

FIG. 3 is a schematic flowchart of another image processing method provided in an embodiment of the present application;

fig. 4 is a schematic diagram illustrating an effect of editing local attributes of a face image according to an embodiment of the present application;

FIG. 5 is a schematic flowchart of another image processing method provided in the embodiments of the present application;

fig. 6a is a schematic diagram of an application effect of editing local attributes of a face image according to an embodiment of the present application;

fig. 6b is a schematic diagram illustrating comparison of application effects of editing local attributes of a face image according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.

An Artificial Intelligence (AI) technology is a comprehensive subject, and relates to a wide range of fields, namely a hardware technology and a software technology. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Computer Vision technology (CV) is a science for researching how to make a machine look, and more specifically, it refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further perform graphic processing, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. The computer vision technology generally includes image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technology, virtual reality, augmented reality, synchronous positioning and map construction, automatic driving, intelligent transportation and other technologies, and also includes common biometric identification technologies such as face recognition and fingerprint recognition.

With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and researched in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical services, smart customer service, internet of vehicles, automatic driving, smart traffic and the like.

The computer device described in the embodiments of the present application may be a server or a server cluster, or may be a terminal device. The server may specifically be a cloud server providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, web services, cloud communication, middleware services, domain name services, security services, CDNs, and big data and artificial intelligence platforms. The terminal device may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a vehicle-mounted smart terminal, and the like.

The scheme provided by the embodiment of the application relates to the computer vision technology of artificial intelligence and the like, and is specifically explained by the following embodiment:

fig. 1 is a schematic flow chart of an image processing method according to an embodiment of the present application. The image processing method comprises the following steps:

101. the method comprises the steps of obtaining control parameters corresponding to target face attributes, wherein the control parameters comprise parameter adjustment quantities, and the parameter adjustment quantities are used for adjusting dimensions corresponding to the target face attributes in a plurality of dimensions included by modulation parameters in an image processing network.

The target face attribute may refer to any face attribute of a certain face region in the face image. For example, the face image may be divided into a plurality of face regions, which may include: the human face attributes are characteristics of each human face region, and for a human face region of "eyes", the corresponding human face attributes may include opening and closing of eyes, size of eyes, color of eyeballs, and the like; for a face region of 'mouth', the corresponding face attributes may include opening and closing of the mouth, thickness of lips, presence or absence of beard, mouth expression, and the like; for the face region of "hair", the corresponding face attributes may include the color of hair, the straightness of hair, the density of hair, the presence or absence of bang, and the like; for example, the target facial attribute may be an opening/closing attribute of a face region, i.e., the "eyes", and the eyes in the generated facial image may be opened or closed by modifying the opening/closing attribute.

The image processing network provided by the present application may specifically be a pre-trained generation countermeasure network, for example, the StyleGAN2 generation countermeasure network may be used to generate a high definition image, the modulation parameter is used to control the image processing network to generate a corresponding image, the modulation parameter may specifically be a multi-dimensional feature vector, each of the face attributes is associated with some dimensions in the modulation parameter, that is, the present application embodiment may control the modification of the corresponding face attribute by using several dimensions related in the modulation parameter corresponding to the image, for example, for the opening and closing attribute of the eyes, the modification of the opening and closing attribute may be implemented by adjusting the values of several dimensions related in the modulation parameter, that is, the opening and closing attribute of the eyes in the face image is edited from a current state (a closed state or an open state) to an open state or a closed state.

Specifically, corresponding control parameters can be set for each face attribute, the corresponding relation between the face attributes and the control parameters is established, and the corresponding face attributes can be modified by using the control parameters, so that the local editing of the face region in the face image is realized. For the target face attribute, the computer device may obtain a control parameter corresponding to the target face attribute according to the correspondence, where the control parameter includes a parameter adjustment amount, and the parameter adjustment amount is used to adjust a dimension corresponding to the target face attribute in multiple dimensions included in a modulation parameter in the image processing network.

102. And determining a second modulation parameter according to a first modulation parameter corresponding to the first face image and the control parameter.

Specifically, the first modulation parameter is a modulation parameter used for controlling the image processing network to generate the first face image, when a target face attribute in the generated first face image needs to be modified, a parameter adjustment amount included in the control parameter may be increased or decreased for the first modulation parameter to obtain a second modulation parameter, and the modification of the target face attribute may be implemented by increasing or decreasing the parameter adjustment amount included in the control parameter through the first modulation parameter, for example, the target face attribute is an opening/closing attribute of an eye region, the eyes may be edited to be open by increasing the parameter adjustment amount, the eyes may be edited to be closed by decreasing the parameter adjustment amount, the eyes may be edited to be closed by increasing the parameter adjustment amount, and the eyes may be edited to be open by decreasing the parameter adjustment amount.

In some possible embodiments, the parameter adjustment amount includes a number of dimensions equal to the number of dimensions included in the first modulation parameter, and the parameter adjustment amount includes a number of dimensions other than the dimension corresponding to the target face attribute, which are each 0.

103. And calling the image processing network to generate a second face image by using the second modulation parameter, wherein the second face image is an image obtained by adjusting the target face attribute of the first face image.

Wherein the image processing network may include a plurality of generation layers, the image processing network communicates the control information using the modulation parameters, each generation layer corresponds to one modulation component of the modulation parameters, each modulation component includes a certain number of dimensions in the modulation parameters, for example, 9000 dimensions of the modulation parameters, the image processing network includes 3 generation layers, the modulation component corresponding to the 1 st generation layer may include 4000 dimensions in the 9000 dimensions, the modulation component corresponding to the 2 nd generation layer may include 3000 dimensions in the 9000 dimensions, and the modulation component corresponding to the 3 rd generation layer may include 2000 dimensions in the 9000 dimensions. As shown in fig. 2, the principles of the image processing network may include: a noise vector Z sampled from Gaussian noise is first converted into modulation parameters W, and then converted into a modulation component S corresponding to each generation layer by an affine transformation layer in each generation layer ⁰ 、……、S ^l-1 、S ^l < 8230 >, each generation layer is correspondingly processed by modulation components and characteristic maps, and the characteristic maps in figure 2 comprise constant tensor characteristic maps Const, < 8230 > < 8230 >, F ^l-1 、F ^l 、F ^l+1 8230the 1 st generation layer corresponds to a constant tensor characteristic diagram Const and a modulation component S ⁰ Const is a randomly initialized feature map, dimensions and modulation components S ⁰ Corresponding to the dimension of (1) th layer, and corresponding to the feature map F ^l-1 And a modulation component S ^l-1 Layer I corresponds to feature map F ^l And a modulation component S ^l 8230that each generation layer carries out dot product treatment on the characteristic graph obtained by the previous generation layer and the modulation component corresponding to the characteristic graph to obtain the characteristic graph which is output to the next generation layer for treatment, the characteristic graph obtained by the last generation layer treatment is the image generated by the image processing network, for example, the characteristic graph F obtained by the first layer generation layer treating the first-1 layer ^l And a modulation component S ^l Obtaining a characteristic diagram F after dot product processing ^l+1 And input to the l +1 th layer. And the dimension of the modulation component corresponding to each generation layer is the same as that of the corresponding characteristic diagram.

Specifically, the computer device may call an affine transformation layer of the image processing network to determine a modulation component corresponding to each generation layer of the image processing network from the second modulation parameter, and may generate the second face image by using the modulation component corresponding to each generation layer and the feature map.

It is understood that, compared with the first modulation parameter, the second modulation parameter includes a plurality of dimensions, only the dimension corresponding to the target face attribute is changed, so that the second face image generated based on the second adjustment parameter can ensure that only the target face attribute is modified, and the other face attributes of the face region corresponding to the target face attribute and the respective face attributes of the other face regions are all kept unchanged, for example, the target face attribute is the opening and closing of eyes, only the opening and closing attributes of eyes in the obtained second face image are modified, such as the eyes are changed from closed to open, and the other face attributes (such as size and eye color) of the eye region are all kept unchanged, and the respective face attributes (such as mouth, hair and the like) of the other face regions are also kept unchanged.

In some possible embodiments, the target face attribute may include multiple face attributes, that is, multiple face attributes may be batch-modified at one time, and other face attributes may be kept unchanged. Specifically, the computer device may obtain a parameter adjustment amount corresponding to each face attribute included in the target face attribute, then calculate a parameter adjustment total amount by using the parameter adjustment amount corresponding to each face attribute, add or reduce the parameter adjustment total amount to the first modulation parameter corresponding to the first face image to obtain a third modulation parameter, then invoke the image processing network to generate the second face image by using the third modulation parameter, for example, the target face attribute may include the opening and closing of eyes and the color of hair, the computer device may obtain a parameter adjustment amount related to the opening and closing of eyes and a parameter adjustment amount related to the color of hair, perform vector addition processing on the parameter adjustment amount related to the opening and closing of eyes and the parameter adjustment amount related to the color of hair to obtain a parameter adjustment total amount for modifying the opening and closing of eyes and the color of hair, then add or reduce the parameter adjustment total amount to the first modulation parameter corresponding to the first face image to obtain the third modulation parameter, then invoke the image processing network to generate the second face image by using the third modulation parameter, at this time, the second face image is the first face image, the local adjustment amount of the opening and the color of the first face image may be increased, thereby, and the local facial image editing efficiency of multiple faces may be improved, and the batch editing efficiency of multiple face images may be achieved.

In some possible implementations, the generated second face image may be used as a sample image of another image processing task, for example, the second face image and the first face image are used as a sample pair to train another image processing network, so as to implement a corresponding image processing function, that is, in this embodiment of the present application, a large number of high-quality sample images with accurately edited local attributes, for example, high-definition face pictures at 1024 × 1024 pixel level, may be generated. The generated sample images are used for other image processing tasks, such as various human face attribute editing tasks on the product side, such as expression editing, hair style editing and various interesting playbacks.

In the embodiment of the application, the computer device can obtain control parameters corresponding to target face attributes, the control parameters comprise parameter adjustment quantities, the parameter adjustment quantities are used for adjusting dimensions corresponding to the target face attributes in a plurality of dimensions included by modulation parameters in an image processing network, second modulation parameters are determined according to first modulation parameters and the control parameters corresponding to a first face image, then the image processing network is called to generate a second face image by using the second modulation parameters, the second face image is an image of the first face image after the target face attributes are adjusted, and other face attributes are kept unchanged, so that the accuracy of local attribute editing of the face image can be improved, and the editing effect is improved.

Fig. 3 is a schematic flow chart of another image processing method according to an embodiment of the present application. The image processing method comprises the following steps:

301. the method comprises the steps of obtaining a positive sample image and a negative sample image related to a target face attribute, wherein the target face attribute is any one of a plurality of face attributes of a target face area, and the target face area is any one face area.

The target face area may be any one of a plurality of divided face areas, for example, the target face area may be eyes, mouth, hair, or the like, the target face attribute is any one of a plurality of face attributes of the target face area, each face area generally has a plurality of face attributes, and certainly may only have one face attribute, which is not limited in the embodiment of the present application.

Specifically, the computer device may obtain, from the sample image set, a positive sample image and a negative sample image related to the attribute of the target face, where the positive sample image refers to a sample image corresponding to one state of the attribute of the target face, and the negative sample image refers to a sample image corresponding to another state of the attribute of the target face, for example, the positive sample image may be an image with eyes open, and the negative sample image may be an image with eyes closed.

302. And acquiring the correlation between the target face area and each dimension included by the modulation parameters in the image processing network.

Specifically, since the modulation parameter is used to control the generation of the image, different dimensions of the modulation parameter correspond to the generation of different face attributes of the image, and in order to determine the correlation between the face region and the dimensions of the modulation parameter, the correlation between the target face region and each dimension included in the modulation parameter in the image processing network may be calculated.

In some possible implementations, the computer device may obtain a sample image set, the sample image set including a plurality of sample images, obtain a semantic segmentation map for each face region of each sample image, and determine a correlation between each face region and each dimension included in the modulation parameters in the image processing network using the semantic segmentation map for each face region of each sample image.

In some possible embodiments, the computer device may use a semantic segmentation map of each face region of each sample image as an image gradient, perform backward propagation on the modulation parameters in the image processing network, obtain a cumulative gradient of each face region in each dimension included in the modulation parameters, and then determine a correlation between each face region and each dimension in the modulation parameters according to an absolute value of the cumulative gradient.

In some possible embodiments, the face may be divided into 15 face regions, and each face region i is assumed to have an absolute value of the cumulative gradient generated in the k-th dimension of the modulation parameter

The correlation corresponding to the face region i may be determined as a ratio of an absolute value of an accumulated gradient of the face region i in the k-th dimension to a sum of absolute values of accumulated gradients of all face regions in the k-th dimension, and the correlation between the k-th dimension and the face region i @>

The calculation formula of (c) can be expressed as:

in some possible embodiments, the computer device may invoke a face analysis network to divide each sample image of the multiple sample images into multiple face regions, invoke the face analysis network to process each face region of the multiple face regions, and obtain a semantic segmentation map of each face region, where the semantic segmentation map includes position information of a corresponding face region in the sample image, is substantially a Mask map of the corresponding face region, and is labeled with a position of the corresponding face region, for example, a semantic segmentation map of a mouth is labeled with a position coordinate of the mouth in the face image.

303. And determining a control parameter corresponding to the target face attribute according to the modulation parameter corresponding to the positive sample image, the modulation parameter corresponding to the negative sample image and the correlation.

Specifically, the computer device may determine the first adjustment amount according to the modulation parameter corresponding to the positive sample image and the modulation parameter corresponding to the negative sample image, where the first adjustment amount is a coarse direction vector, but if the coarse direction vector is directly used, the coarse direction vector may cause modification of other face regions, and thus the coarse direction vector cannot be directly used. The computer device may update the first adjustment amount according to the degree of correlation to obtain a second adjustment amount, thereby only keeping the modification of the dimensions related to the target face attribute, and taking the second adjustment amount as a parameter adjustment amount included in the control parameter corresponding to the target face attribute.

In some possible embodiments, the computer device may obtain a first average parameter of the modulation parameters corresponding to the positive sample image, and determine a first type central parameter S1 of the modulation parameters corresponding to the positive sample image according to the first average parameter; acquiring a second average parameter of the modulation parameter corresponding to the negative sample image, and determining a second class center parameter S2 of the modulation parameter corresponding to the negative sample image according to the second average parameter; and determining a first adjustment quantity according to the first class central parameter S1 and the second class central parameter S2, wherein the first adjustment quantity can be a feature vector obtained by S1-S2.

In some possible embodiments, the specific manner in which the computer device updates the first adjustment amount according to the degree of correlation may include:

the computer device may compare the correlation between the target face region and each dimension included in the modulation parameters in the image processing network with a correlation threshold, where the correlation threshold may be 0.2, determine a first target dimension whose corresponding correlation is less than or equal to the correlation threshold from among the plurality of dimensions included in the modulation parameters, set a value of the first target dimension in the first adjustment amount to zero to obtain a second adjustment amount, thereby only retaining a change in the dimension of the modulation parameters whose correlation is sufficiently large, which not only ensures a good editing effect, but also avoids modifications to other non-relevant face attributes and face regions.

In some possible embodiments, the modulation parameters of the image processing network have a hierarchical decoupling property, that is, a certain specific face property is basically controlled only by a modulation component corresponding to a layer generated by a certain layer. The specific manner of updating the first adjustment amount according to the correlation by the computer device may include:

the computer device can acquire a first generation layer corresponding to the target human face attribute in a plurality of generation layers included in the image processing network; determining a second target dimension included by a modulation component corresponding to the first generation layer in the modulation parameters of the image processing network, and determining a third target dimension of which the corresponding correlation degree is less than or equal to a correlation degree threshold value from other dimensions except the second target dimension included by the modulation parameters; and setting the value of the third target dimension in the first adjustment quantity to zero to obtain a second adjustment quantity, namely, keeping all modifications of the relevant dimension of the first generation layer corresponding to the target face attribute in the coarse direction vector, and only keeping the change of the dimension with the enough correlation degree for the dimension corresponding to the other generation layers.

In some possible embodiments, the computer device may determine a correspondence between each face attribute and each generation layer according to a correlation between each face region and each dimension in the modulation parameter, for example, may determine all fourth target dimensions of each face region and the modulation parameter, where the correlation is greater than or equal to the correlation threshold, obtain a modulation component corresponding to each fourth target dimension, and use a generation layer corresponding to a modulation component with the largest number of fourth target dimensions as a generation layer corresponding to each face attribute included in the face region, so as to accurately establish a correspondence between a face attribute and a generation layer.

304. And acquiring control parameters corresponding to the target face attributes, wherein the control parameters comprise parameter adjustment quantity and target modulation component.

The parameter adjustment quantity is used for adjusting the dimensionality corresponding to the target face attribute in multiple dimensionalities included by the modulation parameters in the image processing network; the target modulation component is used for updating the modulation component corresponding to a second generation layer of the image processing network in the modulation parameters, and the second generation layer is a generation layer related to the first generation layer corresponding to the target human face attribute in the image processing network.

305. Updating a modulation component corresponding to a second generation layer in the first modulation parameter corresponding to the first face image to the target modulation component to obtain an updated first modulation parameter, wherein the second generation layer is a generation layer related to the first generation layer corresponding to the target face attribute in the image processing network.

Wherein the second generation layer may be a previous generation layer of the first generation layer, or may be located before the first generation layer but not adjacent to the first generation layer; for example, the first generation layer may specifically be a generation layer for editing a target face attribute, and the second generation layer may specifically be a generation layer for indicating an editing region of the target face attribute, that is, the feature map obtained by processing by the second generation layer may indicate an image region when the target face attribute is edited to the first generation layer, for example, the target face attribute is mouth, and the feature map obtained by processing by the second generation layer indicates position information of a region where the mouth is located in the face image, for example, coordinates of a feature point of the mouth, contour information of the mouth, and the like.

Specifically, in order to further optimize the effect of local attribute editing, the modulation component and the feature map of the first generation layer corresponding to the target face attribute may be modified simultaneously, the modulation component of the first generation layer is modified by the parameter adjustment amount in the control parameter, and the modulation component of the previous generation layer is replaced by the target modulation component in the control parameter, so as to change the feature map generated by the previous generation layer and ensure the attribute editing effect corresponding to the first generation layer.

As shown in fig. 4, sub-graph (a) is an image generation effect 401 without any modification, assuming that the moustache attribute of the mouth corresponds to the l-th layer generation layer of the image processing network. Sub-diagram (b) is a diagram for modifying only the modulation component S corresponding to the l-th layer ^l When, i.e. adding the parameter adjustment Δ S ^l Using (S) ^l +ΔS ^l ) Corresponding characteristic diagram F ^l Performing dot product processing to generate an image 402, wherein the image 402 is generated with the addition of moustaches but the moustaches is not complete enoughThe left half of the beard is not present, and the analysis can find that the incompleteness of the addition of the beard is mainly due to the characteristic diagram F provided by the previous generation layer (i.e. the l-1 layer) ^l Inaccurately, the labeling of the

mouth areas

411, 412 is incomplete, labeling only the position of the right-half mouth, resulting in layer l +1 adding only the mustache for the right-half mouth. For this purpose, the characteristic diagram F can be modified simultaneously ^l And, in the diagram (c), the modulation component S of the l +1 th layer is modified simultaneously ^l And the characteristic diagram F generated by the l layer ^l The modified feature maps 421 and 422 can accurately and completely mark the position of the mouth, so that the effect 403 of editing the local attribute of adding a complete beard can be achieved.

306. And increasing or decreasing the parameter adjustment quantity included in the control parameter to the updated first modulation parameter to obtain a second modulation parameter.

Specifically, the computer device may update a modulation component corresponding to a second generation layer in a first modulation parameter corresponding to the first face image to be a target modulation component, that is, replace a modulation component of a previous generation layer to change a feature map generated by the previous generation layer, to obtain an updated first modulation parameter, and then increase or decrease a parameter adjustment amount included in the control parameter for the updated first modulation parameter, to obtain a second modulation parameter, where the second modulation parameter includes both a modification of the modulation component of the generation layer corresponding to the target face attribute and a modification of the feature map corresponding to the generation layer, so as to further optimize an editing effect of the local attribute.

In some possible embodiments, the computer device may obtain a first feature value of the target face region and a second feature value of a face region other than the target face region by using a modulation component and a feature map corresponding to the second generation layer in the modulation parameters; and adjusting the modulation component corresponding to the second generation layer according to the first characteristic value and the second characteristic value until the first characteristic value and the second characteristic value meet a preset optimization condition, so as to obtain a target modulation component.

The preset optimization conditions may be: the feature value of the target face region is maximized, while the feature value of the non-target face region is constrained to be 0, which can be formally expressed as:

L＝-||F⊙M _r ||+||F⊙(1-M _r )||

wherein F is a characteristic diagram, M _r Mask for the target face region, which is an element multiplication, optimizes the loss function by a back propagation method to obtain the optimal target modulation component. Taking fig. 4 as an example, the optimization goal is to set all the feature values of the mouth area to 1 and all the feature values of other non-mouth areas to 0, so as to accurately and completely mark the position of the mouth area for the first layer to perform the local attribute editing operation of adding or removing the mustache.

307. And calling the image processing network to generate a second face image by using the second modulation parameter, wherein the second face image is an image obtained by adjusting the target face attribute of the first face image.

Specifically, the computer device may determine, starting from a first generation layer of the image processing network, a feature map corresponding to a next generation layer by using the modulation component and the feature map corresponding to each generation layer; determining a target characteristic diagram according to the modulation component corresponding to the last generation layer and the characteristic diagram; and determining a second face image according to the target feature map.

In some possible embodiments, the computer device may train the generation countermeasure network to obtain an image processing network, which may specifically include: acquiring a sample image set, wherein the sample image set comprises a plurality of sample images; aligning the plurality of sample images by using the positions of the designated face areas (such as eyes) to obtain a plurality of sample images with aligned face positions; and training the initialized generation countermeasure network by using the plurality of sample images after the face positions are aligned to obtain an image processing network.

In the embodiment of the application, the computer equipment can acquire a positive sample image and a negative sample image related to the target face attribute, acquire the correlation between the target face area and each dimension included by the modulation parameters in the image processing network, and determine the control parameter corresponding to the target face attribute according to the modulation parameter corresponding to the positive sample image, the modulation parameter corresponding to the negative sample image and the correlation; when the target face attribute is edited, control parameters corresponding to the target face attribute can be obtained, the control parameters comprise a parameter adjustment amount and a target modulation component, a modulation component corresponding to a second generation layer in a first modulation parameter corresponding to a first face image is updated to be a target modulation component, the updated first modulation parameter is obtained, the second generation layer is a generation layer related to a first generation layer corresponding to the target face attribute in an image processing network, the updated first modulation parameter is increased or decreased by the parameter adjustment amount included in the control parameters, and the second modulation parameter is obtained.

In some possible implementations, as shown in fig. 5, the image processing method of the embodiment of the present application may include the following steps:

501. acquiring high-definition face training data;

502. training StyleGAN2 to generate an antagonistic network;

503. calculating the correlation between each dimension of the modulation parameters and the face area;

504. subtracting the positive and negative sample characteristic vectors to obtain a coarse direction vector;

505. cutting the coarse direction vector according to the correlation to obtain a fine direction vector, and then determining a control unit;

506. the optimization results in the modulation parameters that produce the control unit that is best suited for editing.

Wherein, the coarse direction vector is the first adjustment amount, the coarse direction vector is the second adjustment amount, the control unit is the control parameter, and the modulation parameter is the target modulation component; for different face attributes, only the steps 503 to 506 need to be repeated, and only the control unit corresponding to the new face attribute needs to be searched again without collecting data and training the network again.

In some feasible embodiments, the image processing method according to the embodiment of the application can implement editing of multiple local attributes of multiple face regions, as shown in fig. 6a, the face regions may include mouths, eyes, eyebrows, hair colors, hairstyles, and the like, the face attributes corresponding to the mouths may include smiles, beards, lips, and the like, the face attributes corresponding to the eyes may include closed eyes, eye pockets, squints, and the like, the face attributes corresponding to the eyebrows may include raised eyebrows, thick eyebrows, willow eyebrows, and the like, the face attributes corresponding to the hair colors may include blooms, silver hairs, black hairs, and the face attributes corresponding to the hairstyles may include baldness, liu, and big back, and thus flexible editing of multiple face attributes of multiple face regions can be implemented, requirements of other image processing tasks on various types of high definition training samples are fully met, and the abundant face local attribute editing requirements of users are finally met.

In some feasible embodiments, as shown in fig. 6b, for the comparison effect between the image processing method provided by the present application and other local face attribute editing methods, the other local face attribute editing methods mainly include inversion (Inverted), inter facegan/W, inter facegan/S, and style space, and it can be seen that the generated picture obtained by the embodiment of the present application is more natural, the accuracy of local attribute editing is higher, and the decoupling degree is higher.

Fig. 7 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present disclosure. The device comprises:

an obtaining module 701, configured to obtain a control parameter corresponding to a target face attribute, where the control parameter includes a parameter adjustment amount, and the parameter adjustment amount is used to adjust a dimension corresponding to the target face attribute in multiple dimensions included in a modulation parameter in an image processing network.

A determining module 702, configured to determine a second modulation parameter according to a first modulation parameter corresponding to the first face image and the control parameter.

A generating module 703, configured to invoke the image processing network to generate a second face image by using the second modulation parameter, where the second face image is an image obtained by adjusting the target face attribute of the first face image.

Optionally, the obtaining module 701 is further configured to obtain a positive sample image and a negative sample image related to a target face attribute, where the target face attribute is any one of multiple face attributes of a target face area, and the target face area is any one face area.

The obtaining module 701 is further configured to obtain a correlation between the target face region and each dimension included in the modulation parameters in the image processing network.

The determining module 702 is further configured to determine a control parameter corresponding to the target face attribute according to the modulation parameter corresponding to the positive sample image, the modulation parameter corresponding to the negative sample image, and the correlation.

Optionally, the determining module 702 is specifically configured to:

and determining a first adjustment quantity according to the modulation parameter corresponding to the positive sample image and the modulation parameter corresponding to the negative sample image.

And updating the first adjustment quantity according to the correlation degree to obtain a second adjustment quantity.

And taking the second adjustment quantity as a parameter adjustment quantity included in the control parameters corresponding to the target face attributes.

Optionally, the determining module 702 is specifically configured to:

and acquiring a first average parameter of the modulation parameter corresponding to the positive sample image, and determining a first type of central parameter of the modulation parameter corresponding to the positive sample image according to the first average parameter.

And acquiring a second average parameter of the modulation parameter corresponding to the negative sample image, and determining a second class center parameter of the modulation parameter corresponding to the negative sample image according to the second average parameter.

And determining a first adjustment amount according to the first class central parameters and the second class central parameters.

Optionally, the determining module 702 is specifically configured to:

and comparing the correlation between the target face area and each dimension included by the modulation parameters in the image processing network with a correlation threshold.

And determining a first target dimension of which the corresponding correlation degree is less than or equal to the correlation degree threshold value from a plurality of dimensions included in the modulation parameters.

And setting the value of the first target dimension in the first adjustment quantity to zero to obtain a second adjustment quantity.

Optionally, the determining module 702 is specifically configured to:

and acquiring a first generation layer corresponding to the target human face attribute in a plurality of generation layers included in the image processing network.

Determining a second target dimension included in a modulation component of the modulation parameters of the image processing network corresponding to the first generated layer.

And determining a third target dimension of which the corresponding correlation degree is less than or equal to a correlation degree threshold value from the other dimensions except the second target dimension included in the modulation parameters.

And setting the value of the third target dimension in the first adjustment quantity to zero to obtain a second adjustment quantity.

Optionally, the obtaining module 701 is further configured to obtain a sample image set, where the sample image set includes a plurality of sample images.

The obtaining module 701 is further configured to obtain a semantic segmentation map of each face region of each sample image in the multiple sample images.

The determining module 702 is further configured to determine a correlation between each face region of each sample image and each dimension included in the modulation parameters in the image processing network, by using the semantic segmentation map of each face region.

Optionally, the determining module 702 is specifically configured to:

and taking the semantic segmentation graph of each face region of each sample image as an image gradient, and reversely propagating modulation parameters in an image processing network to obtain the cumulative gradient of each face region in each dimension included by the modulation parameters.

And determining the correlation degree between each face area and each dimension included by the modulation parameters in the image processing network according to the absolute value of the accumulated gradient.

Optionally, the obtaining module 701 is specifically configured to:

and calling a face analysis network to divide each sample image in the plurality of sample images into a plurality of face areas.

And calling the face analysis network to process each face area in the plurality of face areas to obtain a semantic segmentation map of each face area, wherein the semantic segmentation map comprises position information of the corresponding face area in a sample image.

Optionally, the determining module 702 is specifically configured to:

and increasing or decreasing the parameter adjustment amount included in the control parameter to the first modulation parameter corresponding to the first human face image to obtain a second modulation parameter.

Optionally, the control parameter further includes a target modulation component, and the determining module 702 is specifically configured to:

updating a modulation component corresponding to a second generation layer in the first modulation parameter corresponding to the first face image to the target modulation component to obtain an updated first modulation parameter, wherein the second generation layer is a generation layer related to the first generation layer corresponding to the target face attribute in the image processing network.

And increasing or decreasing the parameter adjustment amount included in the control parameter to the updated first modulation parameter to obtain a second modulation parameter.

Optionally, the obtaining module 701 is further configured to obtain a first feature value of the target face region and a second feature value of another face region except the target face region by using the modulation component and the feature map corresponding to the second generation layer in the modulation parameter.

The obtaining module 701 is further configured to adjust the modulation component corresponding to the second generation layer according to the first eigenvalue and the second eigenvalue until the first eigenvalue and the second eigenvalue meet a preset optimization condition, so as to obtain a target modulation component.

Optionally, the generating module 703 is specifically configured to:

and calling the image processing network to determine a modulation component corresponding to each generation layer of the image processing network from the second modulation parameter.

And generating a second face image by using the modulation component corresponding to each generation layer and the feature map.

Optionally, the generating module 703 is specifically configured to:

and determining a feature map corresponding to the next generation layer by using the modulation component corresponding to each generation layer and the feature map from the first generation layer of the image processing network.

And determining a target characteristic diagram according to the modulation component corresponding to the last generation layer and the characteristic diagram.

And determining a second face image according to the target feature map.

The obtaining module 701 is further configured to perform alignment processing on the multiple sample images by using the position of the designated face area, so as to obtain multiple sample images with aligned face positions.

The obtaining module 701 is further configured to train the initialized generation countermeasure network with the multiple sample images after the face positions are aligned, so as to obtain an image processing network.

It should be noted that the functions of the functional modules of the image processing apparatus in the embodiment of the present application may be specifically implemented according to the method in the foregoing method embodiment, and the specific implementation process may refer to the related description of the foregoing method embodiment, which is not described herein again.

Fig. 8 is a schematic structural diagram of a computer device according to an embodiment of the present application, where the computer device according to the embodiment of the present application includes a power supply module and the like, and includes a processor 801, a storage device 802, and a network interface 803. The processor 801, the storage 802, and the network interface 803 may interact with each other.

The storage device 802 may include a volatile memory (volatile memory), such as a random-access memory (RAM); the storage device 802 may also include a non-volatile memory (non-volatile memory), such as a flash memory (flash memory), a solid-state drive (SSD), or the like; the storage means 802 may also comprise a combination of memories of the kind described above.

The processor 801 may be a Central Processing Unit (CPU) 801. In one embodiment, the processor 801 may also be a Graphics Processing Unit (GPU) 801. The processor 801 may also be a combination of a CPU and a GPU. In one embodiment, the storage device 802 is used to store program instructions. The processor 801 may invoke the program instructions to perform the following operations:

Optionally, the processor 801 is further configured to:

the method comprises the steps of obtaining a positive sample image and a negative sample image related to a target face attribute, wherein the target face attribute is any one of a plurality of face attributes of a target face area, and the target face area is any one face area.

And obtaining the correlation between the target face area and each dimension included by the modulation parameters in the image processing network.

And determining a control parameter corresponding to the target face attribute according to the modulation parameter corresponding to the positive sample image, the modulation parameter corresponding to the negative sample image and the correlation.

Optionally, the processor 801 is specifically configured to:

And taking the second adjustment quantity as a parameter adjustment quantity included in the control parameters corresponding to the target face attribute.

Optionally, the processor 801 is specifically configured to:

And acquiring a second average parameter of the modulation parameter corresponding to the negative sample image, and determining a second class central parameter of the modulation parameter corresponding to the negative sample image according to the second average parameter.

And determining a first adjustment quantity according to the first class central parameters and the second class central parameters.

Optionally, the processor 801 is specifically configured to:

Determining a second target dimension included in a modulation component corresponding to the first generation layer in the modulation parameters of the image processing network.

Optionally, the processor 801 is further configured to:

a sample image set is obtained, the sample image set comprising a plurality of sample images.

And acquiring a semantic segmentation map of each face region of each sample image in the plurality of sample images.

And determining the correlation degree between each face region and each dimension included by the modulation parameters in the image processing network by using the semantic segmentation graph of each face region of each sample image.

Optionally, the processor 801 is specifically configured to:

and increasing or decreasing the parameter adjustment amount included in the control parameter to the first modulation parameter corresponding to the first face image to obtain a second modulation parameter.

Optionally, the control parameter further includes a target modulation component, and the processor 801 is specifically configured to:

and updating a modulation component corresponding to a second generation layer in a first modulation parameter corresponding to a first human face image into the target modulation component to obtain an updated first modulation parameter, wherein the second generation layer is a generation layer related to the first generation layer corresponding to the target human face attribute in the image processing network.

And increasing or decreasing the parameter adjustment quantity included in the control parameter to the updated first modulation parameter to obtain a second modulation parameter.

Optionally, the processor 801 is further configured to:

and acquiring a first characteristic value of the target face area and a second characteristic value of other face areas except the target face area by using the modulation component and the characteristic diagram corresponding to the second generation layer in the modulation parameters.

And adjusting the modulation component corresponding to the second generation layer according to the first characteristic value and the second characteristic value until the first characteristic value and the second characteristic value meet a preset optimization condition, so as to obtain a target modulation component.

Optionally, the processor 801 is specifically configured to:

And determining a second face image according to the target feature map.

Optionally, the processor 801 is further configured to:

And aligning the plurality of sample images by using the position of the designated face area to obtain a plurality of sample images with aligned face positions.

And training the initialized generation countermeasure network by using the plurality of sample images after the face positions are aligned to obtain an image processing network.

In a specific implementation, the processor 801, the storage device 802, and the network interface 803 described in this embodiment of the present application may execute the implementation described in the related embodiment of the image processing method provided in fig. 1, 3, and 5 of this embodiment of the present application, and may also execute the implementation described in the related embodiment of the image processing device provided in fig. 7 of this embodiment of the present application, which is not described herein again.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by hardware instructions of a computer program, where the program includes one or more instructions that can be stored in a computer storage medium, and when executed, the program may include processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

Embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer-readable storage medium. The computer instructions are read by a processor of a computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform the steps performed in the embodiments of the methods described above.

The above disclosure is only a few examples of the present application, and certainly should not be taken as limiting the scope of the present application, which is therefore intended to cover all modifications that are within the scope of the present application and which are equivalent to the claims.

Claims

1. An image processing method, characterized in that the method comprises:

acquiring control parameters corresponding to target face attributes, wherein the control parameters comprise parameter adjustment quantities, and the parameter adjustment quantities are used for adjusting dimensions corresponding to the target face attributes in a plurality of dimensions included by modulation parameters in an image processing network;

determining a second modulation parameter according to a first modulation parameter corresponding to the first face image and the control parameter;

2. The method according to claim 1, wherein before the obtaining of the control parameter corresponding to the target face attribute, the method further comprises:

acquiring a positive sample image and a negative sample image related to a target face attribute, wherein the target face attribute is any one of a plurality of face attributes of a target face area, and the target face area is any one face area;

obtaining the correlation between the target face area and each dimension included by the modulation parameters in the image processing network;

3. The method according to claim 2, wherein the determining the control parameter corresponding to the target face attribute according to the modulation parameter corresponding to the positive sample image, the modulation parameter corresponding to the negative sample image, and the correlation comprises:

determining a first adjustment amount according to the modulation parameter corresponding to the positive sample image and the modulation parameter corresponding to the negative sample image;

updating the first adjustment quantity according to the correlation degree to obtain a second adjustment quantity;

4. The method of claim 3, wherein determining the first adjustment amount according to the modulation parameter corresponding to the positive sample image and the modulation parameter corresponding to the negative sample image comprises:

acquiring a first average parameter of the modulation parameter corresponding to the positive sample image, and determining a first type central parameter of the modulation parameter corresponding to the positive sample image according to the first average parameter;

acquiring a second average parameter of the modulation parameter corresponding to the negative sample image, and determining a second class center parameter of the modulation parameter corresponding to the negative sample image according to the second average parameter;

5. The method according to claim 3 or 4, wherein the updating the first adjustment amount according to the correlation to obtain a second adjustment amount comprises:

comparing the correlation between the target face area and each dimension included by the modulation parameters in the image processing network with a correlation threshold;

determining a first target dimension of which the corresponding correlation degree is less than or equal to the correlation degree threshold value from a plurality of dimensions included in the modulation parameters;

6. The method according to claim 3 or 4, wherein the updating the first adjustment amount according to the correlation to obtain a second adjustment amount comprises:

acquiring a first generation layer corresponding to the target human face attribute in a plurality of generation layers included in the image processing network;

determining a second target dimension included in a modulation component corresponding to the first generated layer in the modulation parameters of the image processing network;

determining a third target dimension of which the corresponding correlation degree is less than or equal to a correlation degree threshold value from the other dimensions except the second target dimension included in the modulation parameters;

7. The method according to any one of claims 2 to 4, further comprising:

obtaining a sample image set, wherein the sample image set comprises a plurality of sample images;

obtaining a semantic segmentation map of each face region of each sample image in the plurality of sample images;

8. The method of claim 7, wherein the determining a correlation between each face region of each sample image and each dimension included in modulation parameters in an image processing network using the semantic segmentation map of each face region comprises:

taking the semantic segmentation map of each face region of each sample image as an image gradient, and reversely propagating modulation parameters in an image processing network to obtain an accumulated gradient of each face region in each dimension included by the modulation parameters;

9. The method of claim 7, wherein obtaining the semantic segmentation map for each face region of each sample image of the plurality of sample images comprises:

calling a face analysis network to divide each sample image in the plurality of sample images into a plurality of face areas;

10. The method according to any one of claims 1 to 4, wherein the determining a second modulation parameter from the first modulation parameter and the control parameter corresponding to the first face image comprises:

11. The method according to any one of claims 1 to 4, wherein the control parameters further include a target modulation component, and the determining a second modulation parameter from a first modulation parameter corresponding to a first face image and the control parameters comprises:

updating a modulation component corresponding to a second generation layer in a first modulation parameter corresponding to a first human face image into the target modulation component to obtain an updated first modulation parameter, wherein the second generation layer is a generation layer related to a first generation layer corresponding to the target human face attribute in the image processing network;

12. The method according to claim 11, wherein before determining the second modulation parameter according to the first modulation parameter corresponding to the first face image and the control parameter, the method further comprises:

acquiring a first characteristic value of the target face area and a second characteristic value of other face areas except the target face area by using a modulation component and a characteristic diagram corresponding to the second generation layer in the modulation parameters;

13. The method according to any one of claims 1 to 4, wherein the invoking the image processing network to generate a second face image using the second modulation parameter comprises:

calling the image processing network to determine a modulation component corresponding to each generation layer of the image processing network from the second modulation parameter;

14. The method according to claim 13, wherein the generating a second face image by using the modulation component corresponding to each generation layer and the feature map comprises:

determining a feature map corresponding to a next generation layer by using the modulation component corresponding to each generation layer and the feature map from a first generation layer of the image processing network;

determining a target characteristic diagram according to the modulation component corresponding to the last generation layer and the characteristic diagram;

and determining a second face image according to the target feature map.

15. The method according to any one of claims 1 to 4, further comprising:

aligning the plurality of sample images by using the position of the designated face area to obtain a plurality of sample images with aligned face positions;

16. An image processing apparatus, characterized in that the apparatus comprises:

the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring control parameters corresponding to target face attributes, the control parameters comprise parameter adjustment quantities, and the parameter adjustment quantities are used for adjusting the dimensionality corresponding to the target face attributes in a plurality of dimensionalities included by modulation parameters in an image processing network;

the determining module is used for determining a second modulation parameter according to a first modulation parameter corresponding to the first face image and the control parameter;

17. A computer device, characterized in that the computer device comprises a processor, a network interface and a storage device, the processor, the network interface and the storage device are connected with each other, wherein the network interface is controlled by the processor for transceiving data, the storage device is used for storing a computer program, the computer program comprises program instructions, the processor is configured for invoking the program instructions for executing the image processing method according to any one of claims 1 to 15.

18. A computer-readable storage medium, characterized in that it stores a computer program comprising program instructions to be executed by a processor for performing the image processing method according to any one of claims 1 to 15.

19. A computer program product comprising a computer program, characterized in that the computer program realizes the image processing method of any one of claims 1 to 15 when executed by a computer processor.