WO2023179075A1

WO2023179075A1 - Image processing method and apparatus, and electronic device, storage medium and program product

Info

Publication number: WO2023179075A1
Application number: PCT/CN2022/134943
Authority: WO
Inventors: 林纯泽; 王权; 钱晨
Original assignee: 上海商汤智能科技有限公司
Priority date: 2022-03-22
Filing date: 2022-11-29
Publication date: 2023-09-28
Also published as: CN114373215A

Abstract

The present disclosure relates to an image processing method and apparatus, and an electronic device, a storage medium and a program product. The method comprises: acquiring a facial image to be processed; performing encoding processing on the facial image, so as to obtain a first hidden variable of the facial image; in response to a setting operation for an attribute editing degree of a facial attribute, editing the first hidden variable according to the set attribute editing degree and an attribute editing direction corresponding to the facial attribute, so as to obtain an edited second hidden variable, wherein the attribute editing direction represents an enhancement direction or a weakening direction of the facial attribute, different facial attributes correspond to different attribute editing directions, and the attribute editing degree represents an enhancement degree or a weakening degree of the facial attribute; and performing decoding processing on the second hidden variable, so as to obtain a target facial image, wherein the display effect of the facial attribute in the target facial image is different from the display effect of the facial attribute in the facial image.

Description

Image processing methods and devices, electronic equipment, storage media and program products

Cross-references to related applications

This disclosure is based on a Chinese patent application with the application number 202210279511.6, the filing date is March 22, 2022, and the application name is "Image processing method and device, electronic equipment and storage medium", and claims the priority of this Chinese patent application, The entire content of the Chinese patent application is hereby incorporated into this disclosure in its entirety.

Technical field

The present disclosure relates to the field of computer technology, and in particular, to an image processing method and device, electronic equipment, storage media and program products.

Background technique

Face attribute editing refers to manipulating and changing face attributes in face images. In the field of deep learning, facial attribute editing is no longer limited to face deformation, but can edit any facial attribute, such as adding glasses, adding beards, changing eye color, changing facial expressions, etc.

However, the current related technology cannot edit a specific face attribute without affecting other face attributes. For example, if the user wants to add glasses, although the related technology can add eyes, it may also cause facial deformation.

Contents of the invention

The present disclosure proposes an image processing technical solution.

According to one aspect of the present disclosure, an image processing method is provided, including: acquiring a face image to be processed; encoding the face image to obtain a first latent variable of the face image; responding to The setting operation of the attribute editing degree of the face attribute, according to the set attribute editing degree and the attribute editing direction corresponding to the face attribute, edit the first latent variable to obtain the edited second latent variable, wherein, the The attribute editing direction represents the enhancement or weakening direction of the face attributes. Different face attributes correspond to different attribute editing directions. The attribute editing degree represents the enhancement or weakening degree of the face attributes; for the second The latent variables are decoded to obtain a target face image, and the display effects of the face attributes in the target face image and the face image are different.

According to an aspect of the present disclosure, an image processing device is provided, including: an acquisition part configured to acquire a face image to be processed; an encoding part configured to encode the face image to obtain the face image. The first latent variable of the image; the editing part is configured to respond to a setting operation for the attribute editing degree of the face attribute, and edit the first hidden variable according to the set attribute editing degree and the attribute editing direction corresponding to the face attribute. variable to obtain the edited second latent variable, wherein the attribute editing direction represents the enhancement direction or weakening direction of the face attribute, different face attributes correspond to different attribute editing directions, and the attribute editing degree represents the The degree of enhancement or weakening of the face attributes; the decoding part is configured to decode the second latent variable to obtain a target face image, the target face image and the face attributes in the face image The display effect is different.

According to an aspect of the present disclosure, an electronic device is provided, including: a processor; a memory configured to store instructions executable by the processor; wherein the processor is configured to call instructions stored in the memory to execute the above method.

According to an aspect of the present disclosure, a computer-readable storage medium is provided, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the above method is implemented.

According to an aspect of the present disclosure, a computer program product is provided, including computer readable code. When the computer readable code is run in an electronic device, a processor in the electronic device executes a configuration configured to implement the above method.

In the embodiment of the present disclosure, the first latent variable is obtained by encoding the face image, and the first latent variable is obtained by editing the first latent variable according to the set attribute editing degree of the face attribute and the attribute editing direction corresponding to the face attribute. two latent variables, and then decode the second latent variable to obtain the target face image. Since the attribute editing directions of different face attributes are different, the face attributes specified by the user can be accurately edited without affecting other face attributes. display effect.

It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only, and are not restrictive of the disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments with reference to the accompanying drawings.

Description of the drawings

The accompanying drawings herein are incorporated into and constitute a part of this specification. They illustrate embodiments consistent with the disclosure and, together with the description, serve to explain the technical solutions of the disclosure.

FIG. 1 shows a flowchart of an image processing method according to an embodiment of the present disclosure.

FIG. 2 shows a schematic diagram of an operation control according to an embodiment of the present disclosure.

Figure 3a shows a schematic diagram of a human face image according to an embodiment of the present disclosure.

Figure 3b shows a schematic diagram of a target face image according to an embodiment of the present disclosure.

Figure 4a shows a schematic diagram of a human face image according to an embodiment of the present disclosure.

Figure 4b shows a schematic diagram of a target face image according to an embodiment of the present disclosure.

FIG. 5 shows a schematic diagram of an image processing flow according to an embodiment of the present disclosure.

Figure 6 shows a schematic diagram of a sample distribution space according to an embodiment of the present disclosure.

FIG. 7 shows a block diagram of an image processing device according to an embodiment of the present disclosure.

FIG. 8 shows a block diagram of an electronic device according to an embodiment of the present disclosure.

Detailed ways

Various exemplary embodiments, features, and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. The same reference numbers in the drawings identify functionally identical or similar elements. Although various aspects of the embodiments are illustrated in the drawings, the drawings are not necessarily drawn to scale unless otherwise indicated.

The word "exemplary" as used herein means "serving as an example, example, or illustrative." Any embodiment described herein as "exemplary" is not necessarily to be construed as superior or superior to other embodiments.

The term "and/or" in this article is just an association relationship that describes related objects, indicating that three relationships can exist. For example, A and/or B can mean: A exists alone, A and B exist simultaneously, and they exist alone. B these three situations. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, and C, which can mean including from A, Any one or more elements selected from the set composed of B and C.

In addition, in order to better explain the present disclosure, numerous specific details are given in the following detailed description. It will be understood by those skilled in the art that the present disclosure may be practiced without certain specific details. In some instances, methods, means, components and circuits that are well known to those skilled in the art are not described in detail in order to emphasize the subject matter of the disclosure.

Figure 1 shows a flow chart of an image processing method according to an embodiment of the present disclosure. The image processing method can be executed by an electronic device such as a terminal device or a server. The terminal device can be a user equipment (User Equipment, UE), a mobile device, a user Terminal, terminal, cellular phone, cordless phone, personal digital assistant (Personal Digital Assistant, PDA), handheld device, computing device, vehicle-mounted device, wearable device, etc., the method can call the computer-readable data stored in the memory through the processor The method can be implemented by means of instructions, or the method can be executed by the server. As shown in Figure 1, the image processing method includes steps S11 to S14:

In step S11, the face image to be processed is obtained.

The face image may be an image collected in real time by an image acquisition device, or may be an image extracted from local storage, or may be an image transmitted by other electronic devices, to which embodiments of the present disclosure are not limited. It should be understood that the face in the face image may be a real face or a virtual face, such as the face of an anime character, etc., and the embodiments of the present disclosure are not limited to this.

In step S12, the face image is encoded to obtain the first latent variable of the face image.

In a possible implementation, the face image can be encoded by a face image encoder to obtain the first latent variable of the face image. The first latent variable can be expressed as M first N-dimensional vectors, M With N being a positive integer, for example, the face image encoder can encode the face image into 18 512-dimensional vectors.

Among them, the face image encoder can be implemented using deep learning technology known in the art. For example, the face image encoder can use a deep neural network to extract features of the face image, and use the extracted depth features as the face image. the first hidden variable. It should be understood that the embodiment of the present disclosure does not limit the encoding method of face images.

In step S13, in response to the setting operation of the attribute editing degree of the face attribute, the first latent variable is edited according to the set attribute editing degree and the attribute editing direction corresponding to the face attribute, and the edited second latent variable is obtained.

In a possible implementation, the face attributes may include, for example, but are not limited to: the face shape and posture of the face, the gender, age, and emotion represented by the face, the beard, glasses, and masks on the face, and the pupil color. , hair color, makeup color, filter color at least one.

It should be understood that those skilled in the art can use software development technologies known in the art to design and implement the application program of the image processing method and the corresponding graphical interactive interface of the embodiment of the present disclosure, and the graphical interactive interface can provide for The operation control for setting the attribute editing degree is used to implement the user's setting operation for the attribute editing degree of any facial attribute, and this embodiment of the present disclosure does not limit this. Figure 2 shows a schematic diagram of an operation control according to an embodiment of the present disclosure. As shown in Figure 2, for facial attributes such as beard 21, age 22, and gender 23, the value ranges corresponding to the respective attribute editing degrees can be passed Adjust the position of the "filled circle" of the operation control corresponding to any face attribute on the corresponding value range line segment to set the attribute editing degree of any face attribute.

Among them, the attribute editing direction can represent the direction of enhancement or weakening of the face attributes. As mentioned above, the first latent variable can be expressed as M first N-dimensional vectors. In order to facilitate editing of the first latent variable, the attribute editing direction can be expressed as a second N-dimensional vector. What is known is that the vector has a direction. The direction of the second N-dimensional vector represents the editing direction of the attribute, and the attribute editing directions corresponding to different face attributes are different. In this way, when the user wishes to edit a certain face attribute, he can edit the attribute according to the direction. It is expected that the attribute editing direction corresponding to the face attribute to be edited edits at least one first N-dimensional vector in the first latent variable without affecting other face attributes.

In a possible implementation, the attribute editing direction corresponding to the face attribute is obtained by using an attribute classifier to classify the sample face image into two categories. Among them, the attribute classifier can collect support vector machines, image classification networks, etc., and the embodiments of the present disclosure are not limited to this. It should be understood that each face attribute can correspond to its own attribute classifier, or one attribute classifier can be used to classify one face attribute.

For example, the attribute classifier corresponding to gender can be used to classify the sample face image into two categories, that is, the sample face image is divided into male faces and female faces. Then the attribute editing direction can be the enhancement direction that represents masculine faces ( That is, the weakening direction of a feminine face), or the weakening direction of a masculine face (that is, the strengthening direction of a feminine face); you can also use the attribute classifier corresponding to the beard to classify the sample face image into two categories, That is, the sample face image is divided into a bearded face and a beardless face. Then the attribute editing direction can be the enhancement direction that represents the beard (that is, the weakening direction without the beard), or it can also represent the weakening direction of the beard ( That is, the direction of enhancement without beard).

Among them, the degree of attribute editing can represent the degree of enhancement or weakening of a face attribute, or in other words, the degree of attribute editing can represent the extent to which the user expects to edit the face attribute. For example, if the user expects to enhance a certain face attribute, then the person The degree of facial attribute enhancement can be the attribute editing degree set by the user.

In a possible implementation, a certain value range can be set for the attribute editing degree. For example, the value range for the attribute editing degree can be set to [-3,3], [-10,10], etc. A positive value means that the person The degree of enhancement of face attributes. Negative values mean the degree of weakening of face attributes. For example, if the user wants to add a beard to the face, the attribute editing degree can be a positive value. The greater the attribute editing degree, the thicker the beard will be; conversely, if the user wants to remove the beard on the face, the attribute editing degree can be a negative value. The smaller the attribute edit, the sparser the beard.

In one possible implementation, editing the first latent variable according to the set attribute editing degree and the attribute editing direction corresponding to the face attribute to obtain the edited second latent variable may include: combining the set attribute editing degree with The product between the attribute editing directions is added to at least one first N-dimensional vector in the first latent variable to obtain the edited second latent variable. In this way, the specified face attributes can be edited.

In step S14, the second latent variable is decoded to obtain a target face image. The display effects of the face attributes in the target face image and the face image are different.

In a possible implementation, the generation network can be used to decode the second latent variable to obtain the target face image. It should be understood that the embodiments of the present disclosure do not limit the network structure, network type, and training method of the generating network. For example, the generating network can be obtained by training a generative adversarial network (GAN).

The generation network can be used to generate an image with a preset image style based on M N-dimensional vectors. The image style can include at least a real style and a non-realistic style, for example. The non-realistic style can include at least a comic style, a European and American style, and a sketch style. , oil painting style, printmaking style, etc. That is, the face in the target face image may be a real-style face or a non-realistic-style face.

As mentioned above, the second latent variable is obtained by editing the first latent variable of the face image. The face attributes in the target face image generated based on the second latent variable are the same as the face attributes in the original face image. The display effect is different. Among them, when the product of the attribute editing direction and the attribute editing degree is a positive value, the display effect of the face attributes of the target face image obtained based on the second latent variable is enhanced relative to the face image; when the attribute editing direction and the When the product of the attribute editing degrees is a negative value, the display effect of the face attributes of the target face image obtained based on the second latent variable is weakened relative to the face image.

Among them, the product of the attribute editing direction and the attribute editing degree is a positive value, which means that the face attributes of the attribute editing degree are enhanced along the same direction as the attribute editing direction; the product of the attribute editing direction and the attribute editing degree is a negative value, It means that the face attributes weaken the degree of attribute editing in the opposite direction to the attribute editing direction.

Figure 3a shows a schematic diagram of a human face image according to an embodiment of the present disclosure, and Figure 3b shows a schematic diagram of a target human face image according to an embodiment of the present disclosure. Among them, the target face image shown in Figure 3b can be an image obtained by editing the attributes of the face image shown in Figure 3a. As shown in Figures 3a and 3b, the target face image in Figure 3b is larger than that in Figure 3a. The faces in the face images shown are younger. Figures 3a and 3b are both realistic-style images.

Figure 4a shows a schematic diagram of a human face image according to an embodiment of the present disclosure, and Figure 4b shows a schematic diagram of a target human face image according to an embodiment of the present disclosure. Among them, the target face image shown in Figure 4b can be an image obtained by editing the attributes of the face image shown in Figure 4a. As shown in Figures 4a and 4b, the target face image in Figure 4b is larger than that in Figure 4a. The faces in the face images shown are younger. Figures 4a and 4b are both oil painting style images.

Figure 5 shows a schematic diagram of an image processing flow according to an embodiment of the present disclosure. As shown in Figure 5, the image processing flow may include: inputting the face image 51 to the face image encoder 52 to obtain the face image corresponding to The first latent variable; according to the set attribute editing degree of the face attribute and the attribute editing direction of the face attribute, edit the first latent variable to obtain the edited second latent variable 53, and input the second latent variable into the generation network 54, the target face image is obtained. As shown in Figure 5, since the attribute editing degree of the face attribute beard is positive, there are more beards on the face in the target face image 54 than in the face image, that is, the target face image is different from the person in the face image. The display effects of face attributes are different.

In the embodiment of the present disclosure, the first latent variable is obtained by encoding the face image, and the first latent variable is obtained by editing the first latent variable according to the set attribute editing degree of the face attribute and the attribute editing direction corresponding to the face attribute. two latent variables, and then decode the second latent variable to obtain the target face image. Since the attribute editing directions of different face attributes are different, the face attributes specified by the user can be accurately edited without affecting other people. The display effect of face attributes.

As mentioned above, the attribute editing direction corresponding to the face attribute is obtained by using the attribute classifier to classify the sample face image into two categories. In one possible implementation, using an attribute classifier to classify the sample face image into two categories includes: using an attribute classifier corresponding to the face attribute to classify the sample face image into two categories, and obtaining the hidden representation of the sample face image. The attribute classification boundary in the space, the latent space represents the sample distribution space of the sample latent variable distribution corresponding to the sample face image; the direction in which the attribute classification boundary faces the positive sample attribute of the face attribute is determined as the attribute editing direction.

The sample face image may be a face image randomly generated by the above-mentioned generation network based on M randomly distributed N-dimensional vectors, or it may be a face image actually captured by the image acquisition device, which is not limited by the embodiments of the present disclosure. The sample latent variable corresponding to the sample face image can be understood as the feature vector corresponding to the sample face image, so the latent space can also be understood as the vector distribution space in which the feature vectors corresponding to the sample face image are distributed.

What is known is that the two-classification of sample face images can be briefly described as finding an interface to divide the sample latent variables in the latent space into two parts. One part represents the positive sample attributes of the face attributes, and the other part represents the positive sample attributes of the face attributes. Negative sample attributes of face attributes, this interface can be used as the attribute classification boundary. It should be understood that after obtaining the interface, the normal vector or unit normal vector perpendicular to the interface can be obtained. For example, for the interface represented by ax+by+cz=d, the normal vector is (a, b, c) , the unit normal vector is the normal vector divided by the length of the normal vector. Then the normal vector (or unit normal vector) pointing to the positive sample attribute side can be determined as the attribute editing direction. Of course, you can also point to the negative sample attribute side. The normal vector (or unit normal vector) is determined as the attribute editing direction.

Figure 6 shows a schematic diagram of a sample distribution space according to an embodiment of the present disclosure. As shown in Figure 6, the sample distribution space is expressed as a two-dimensional space for ease of understanding. It is understandable that the actual sample distribution space can be a high-dimensional distribution space. . In Figure 6, the square represents the positive sample attribute, and the triangle represents the negative sample attribute. The attribute editing direction is the direction in which the attribute classification boundary faces the positive sample attribute.

It should be understood that after the direction in which the attribute classification boundary faces the positive sample attribute is determined as the attribute editing direction, the attribute editing direction can represent the enhancement direction of the positive sample attribute (or the weakening direction of the negative sample attribute), then the opposite direction of the attribute editing direction The direction can represent the direction of weakening the attributes of positive samples (or the direction of strengthening the attributes of negative samples). Of course, the direction in which the attribute classification boundary faces the negative sample attribute can also be determined as the attribute editing direction. At this time, the attribute editing direction can represent the enhancement direction of the negative sample attribute (or the weakening direction of the positive sample attribute). The opposite direction of the attribute editing direction can be Characterizes the weakening direction of negative sample attributes (or the strengthening direction of positive sample attributes).

It should be understood that the positive sample attributes and negative sample attributes of each face attribute can be customized, and this is not limited by the embodiments of the present disclosure. For example, you can set a beard as a positive sample attribute, then no beard as a negative sample attribute; you can also set a younger face as a positive sample attribute, then an older face as a negative sample attribute; you can also set a smiling face as a positive sample attribute, Then non-smiling faces (such as crying faces, angry faces, etc.) are negative sample attributes, etc.; you can also set blue pupils as positive sample attributes, then non-blue pupils (such as green pupils, black pupils, etc.) are negative sample attributes.

In the embodiments of the present disclosure, the attribute editing directions of different face attributes can be determined quickly and effectively.

As mentioned above, the first latent variable can be expressed as M first N-dimensional vectors, and the attribute editing direction can be expressed as a second N-dimensional vector. N and M are positive integers. In one possible implementation, in step S13 , edit the first latent variable according to the set attribute editing degree and the attribute editing direction corresponding to the face attribute, and obtain the edited second latent variable, including:

According to the attribute type corresponding to the face attribute, determine at least one first N-dimensional vector on which the attribute editing direction acts; calculate the product between the second N-dimensional vector and the attribute editing degree to obtain the third N-dimensional vector; combine the M-th At least one first N-dimensional vector in an N-dimensional vector is added to the third N-dimensional vector respectively to obtain M fourth N-dimensional vectors, and the second hidden variable is represented as M fourth N-dimensional vectors.

Among them, considering the experimental findings, the generation network includes multiple networks with different resolutions. Network layers with different resolutions have different sensitivities to different face attributes, or have different learning effects. Therefore, the networks with different resolutions of the generation network can They are used to process the second latent variables corresponding to different facial attributes. The low-resolution network layer of the generative network is relatively sensitive to the face shape, posture, and first-class facial attributes represented by the face, such as gender, age, and emotion. The medium-resolution network layer is sensitive to beards, glasses, etc. , masks and other second-category face attributes are more sensitive, and the high-resolution network layer is more sensitive to the third-category face attributes such as pupil color, hair color, makeup color, and filter color. That is to say, the low-resolution network layer of the generative network is more sensitive to the first type of face attributes than the medium-resolution network layer and high-resolution network layer of the generative network; the medium-resolution network layer of the generative network is The second type of face attributes is more sensitive to the face attributes than the low-resolution network layer and the high-resolution network layer. The high-resolution network layer of the generating network is more sensitive to the third type of face attributes. Higher than the low-resolution network layer and the medium-resolution network layer.

Therefore, the attribute editing direction of a certain face attribute can be applied to part of the first N-dimensional vector of the first latent vector, that is, according to the attribute type corresponding to the face attribute, at least one first N-dimensional vector to which the attribute editing direction acts can be determined. vector, and because different face attributes correspond to different attribute editing directions, it is possible to reduce the impact on other face attributes when editing a certain face attribute, or even have no impact on other face attributes.

Based on the above experimental findings, in a possible implementation method, the attribute editing direction of the first type of face attributes can be applied to the first N-dimensional vector to the i-th first N-dimensional vector; The attribute editing direction of face attributes acts on the i+1 first N-dimensional vector to the j-th first N-dimensional vector; the attribute editing direction of the third type of face attributes acts on the j+1 first N dimensional vector to the Mth first N-dimensional vector; where, i∈[1,M], j∈[2,M]. In this way, facial attributes of different attribute types can be corresponding to each first N-dimensional vector, so that the generation network can be subsequently used to encode the second N-dimensional vector edited based on the first N-dimensional vector to obtain adjustments. Target face images with display effects of different face attributes.

For example, assuming that the first latent variable is expressed as three two-dimensional vectors {(a1,b1),(a2,b2),(a3,b3)}, the attribute editing direction of a certain attribute type of face attribute acts on Two-dimensional vector (a1, b1), the attribute editing degree is 2, and the attribute editing direction is expressed as a two-dimensional vector (m, n). Then based on the above implementation, the second hidden variable is expressed as three two-dimensional vectors {(2m+ a1,2n+b1),(a2,b2),(a3,b3)}

In the embodiments of the present disclosure, any facial attribute can be accurately edited while reducing the impact on other facial attributes, or even having no impact on other facial attributes.

As mentioned above, in a possible implementation, the attribute types corresponding to the face attributes include at least one of the first type of face attributes, the second type of face attributes and the third type of face attributes. The first type Face attributes include: face shape, posture, and at least one of gender, age, and emotion represented by the face. The second type of face attributes includes at least one of beard, glasses, and mask on the face. The third type of face attributes includes at least one of pupil color, hair color, makeup color, and filter color.

In a possible implementation, at least one first N-dimensional vector on which the attribute editing direction acts is determined based on the attribute type corresponding to the face attribute, including:

In the case where the face attributes include the first type of face attributes, determine the attribute editing direction to act on the first N-dimensional vector to the i-th first N-dimensional vector; and/or,

In the case where the face attributes include the second type of face attributes, determine the attribute editing direction to act on the i+1th first N-dimensional vector to the j-th first N-dimensional vector; and/or,

In the case where the face attributes include the third type of face attributes, determine the attribute editing direction to act on the j+1th first N-dimensional vector to the M-th first N-dimensional vector; where, i∈[1,M] ,j∈[2,M].

As mentioned above, the low-resolution network layer of the generative network is relatively sensitive to the face shape, posture, and the first type of face attributes represented by the face, such as gender, age, and emotion. The medium-resolution network layer is sensitive to the face The second type of face attributes such as beards, glasses, and masks are more sensitive, and the high-resolution network layer is more sensitive to the third type of face attributes such as pupil color, hair color, makeup color, and filter color. Thus, the 1st first N-dimensional vector to the i-th first N-dimensional vector may correspond to the low-resolution network layer of the generating network, and the i+1th first N-dimensional vector to the j-th first N-dimensional vector The vector may correspond to a medium-resolution network layer of the generating network, and the j+1th first N-dimensional vector to the M-th first N-dimensional vector may correspond to a high-resolution network layer of the generating network.

It should be understood that the values of i and j can be empirical values determined by experimental testing based on the network structure of the generated network, and the embodiments of the present disclosure are not limited to this. For example, if M is 18, i can be set to 5 and j is 10.

In a possible implementation, at least one first N-dimensional vector among the M first N-dimensional vectors is added to the third N-dimensional vector respectively to obtain M fourth N-dimensional vectors, including:

Add the 1st first N-dimensional vector to the i-th first N-dimensional vector among the M first N-dimensional vectors and the third N-dimensional vector respectively to obtain M fourth N-dimensional vectors; and/or,

Add the i+1th first N-dimensional vector to the j-th first N-dimensional vector among the M first N-dimensional vectors and the third N-dimensional vector respectively to obtain M fourth N-dimensional vectors; and/ or,

Add the j+1th first N-dimensional vector to the M-th first N-dimensional vector among the M first N-dimensional vectors to the third N-dimensional vector respectively, to obtain M fourth N-dimensional vectors.

For example, assuming that the first latent variable is represented by 18 first 512-dimensional vectors, the product of the attribute editing direction and the attribute editing degree is the third 512-dimensional vector, i is 5, j is 10, then when the face attributes include the In the case of a type of face attributes, the first to the fifth first 512-dimensional vectors among the 18 first 512-dimensional vectors can be added to the third 512-dimensional vector respectively; in the face When the attributes include the second type of face attributes, the 6th to 10th first N-dimensional vectors among the 18 first N-dimensional vectors can be added to the third N-dimensional vector respectively; In the case where the face attributes include the third type of face attributes, the 11th first N-dimensional vector to the 18th first N-dimensional vector among the M first N-dimensional vectors are respectively associated with the third N-dimensional vector. add.

As mentioned above, the generation network includes multiple network layers with different resolutions. The networks with different resolutions can be used to process the second latent variables corresponding to different face attributes. The generation network is used to generate predictions based on M N-dimensional vectors. Assume an image style image; and use the generation network to decode the second latent variable to obtain the target face image. The second latent variable is represented as M fourth N-dimensional vectors. In a possible implementation, the generation network It includes M-layer network layers, and uses the generation network to decode the second latent variable to obtain the target face image, including:

Input the first fourth N-dimensional vector to the first network layer of the generation network to obtain the first intermediate image output by the first layer network layer; add the m-th fourth N-dimensional vector and the m-1 intermediate The graph is input to the m-th network layer of the generation network, and the m-th intermediate graph output by the m-th network layer is obtained, m∈[2,M); the M-th fourth N-dimensional vector and the M-1-th intermediate The image is input to the Mth network layer of the generation network, and the target face image output by the Mth network layer is obtained.

It should be understood that when m is 2, the m-1th intermediate graph is also the first intermediate graph, so the second intermediate graph is obtained based on the second, fourth N-dimensional vector and the first intermediate graph. ; when m∈[2,M), the second to M-1 intermediate images are all based on the m-th fourth N-dimensional vector and the m-1 intermediate image output by the upper network layer. determined, it can be known that the M-1th intermediate graph is based on the M-1 fourth N-dimensional vector and the M-2th intermediate graph, and the M-2th intermediate graph is based on the M-th intermediate graph. The 2 fourth N-dimensional vectors and the M-3 intermediate picture are obtained, and so on.

In a possible implementation, the generative network can be used to generate images in increasing resolutions. The input of the first network layer of the generative network is a fourth N-dimensional vector, and the input of each subsequent network layer includes a fourth N-dimensional vector. Four N-dimensional vectors and the intermediate image output by the upper network layer, and the last network layer outputs the target face image. Among them, the generative network can also be called a multi-layer transformation generative network.

It can be understood that the low-resolution network layer of the generative network (also called a shallow network layer) first learns and generates a low-resolution intermediate image (such as 4×4 resolution), and then gradually increases with the depth of the network. Increase, continue to learn and generate higher resolution intermediate images (such as 512×512 resolution), and finally generate the highest resolution target face image (such as 1024×1024 resolution).

In the embodiment of the present disclosure, the generation network can be used to decode the second latent variable to effectively obtain the target face image.

It should be understood that the user can perform at least one attribute editing on the same facial attributes according to the image processing method of steps S11 to S14 in the above embodiments of the present disclosure, or can perform at least one attribute editing on different facial attributes respectively, to obtain Target face image.

In related technologies, editable face attributes are limited through traditional image distortion methods, and existing attribute editing cannot be decoupled, making it easy for attribute editing to affect each other. According to the embodiments of the present disclosure, specific face attributes can be edited more accurately, and the impact of attribute editing on any face attribute on other face attributes can be reduced or even have no impact on other face attributes. It can be applied to Face attribute editing in different image styles.

It can be understood that the above-mentioned method embodiments mentioned in this disclosure can be combined with each other to form a combined embodiment without violating the principle logic. Those skilled in the art can understand that in the above-mentioned methods of specific embodiments, the specific execution order of each step should be determined by its function and possible internal logic.

In addition, the present disclosure also provides image processing devices, electronic equipment, computer-readable storage media, and programs, all of which can be used to implement any image processing method provided by the present disclosure. For corresponding technical solutions and descriptions, please refer to the corresponding records in the method section. .

Figure 7 shows a block diagram of an image processing device according to an embodiment of the present disclosure. As shown in Figure 7, the device includes:

The acquisition part 101 is configured to acquire the face image to be processed;

The encoding part 102 is configured to perform encoding processing on the face image to obtain the first latent variable of the face image;

The editing part 103 is configured to respond to the setting operation of the attribute editing degree of the face attribute, edit the first latent variable according to the set attribute editing degree and the attribute editing direction corresponding to the face attribute, and obtain the edited The second latent variable, wherein the attribute editing direction represents the enhancement direction or weakening direction of the face attribute, different face attributes correspond to different attribute editing directions, and the attribute editing degree represents the enhancement degree of the face attribute. or degree of weakening;

The decoding part 104 is configured to decode the second latent variable to obtain a target face image, where the display effects of the face attributes in the target face image and the face image are different.

In a possible implementation, the attribute editing direction corresponding to the face attribute is obtained by using an attribute classifier to classify the sample face image; wherein, the attribute classifier is used to classify the sample face image. Classification includes: using an attribute classifier corresponding to the face attribute to perform two classifications on the sample face image to obtain the attribute classification boundary of the sample face image in the latent space, and the latent space represents the The sample distribution space of the sample latent variable distribution corresponding to the sample face image; the direction in which the attribute classification boundary faces the positive sample attribute of the face attribute is determined as the attribute editing direction.

In a possible implementation, the first latent variable is represented as M first N-dimensional vectors, the attribute editing direction is represented as a second N-dimensional vector, N and M are positive integers, and the editing part 103 , including: a determination sub-part configured to determine at least one first N-dimensional vector on which the attribute editing direction acts based on the attribute type corresponding to the face attribute; a calculation sub-part configured to calculate the second N-dimensional vector The product of the vector and the attribute editing degree obtains a third N-dimensional vector; the addition subpart is configured to combine the at least one first N-dimensional vector among the M first N-dimensional vectors with the The third N-dimensional vectors are added to obtain M fourth N-dimensional vectors, and the second hidden variable is expressed as the M fourth N-dimensional vectors.

In a possible implementation, the attribute type of the face attribute includes at least one of a first type of face attribute, a second type of face attribute and a third type of face attribute; the first type of person The attribute editing direction of the face attribute acts on the first N-dimensional vector to the i-th first N-dimensional vector; the attribute editing direction of the second type of face attribute acts on the i+1 first N-dimensional vector to the j-th first N-dimensional vector; the attribute editing direction of the third type of face attribute acts on the j+1-th first N-dimensional vector to the M-th first N-dimensional vector; where, i∈[1 ,M], j∈[2,M].

In a possible implementation, the first type of human face attributes include: facial shape, posture, and at least one of gender, age, and emotion represented by the human face; the second type of human face attributes include: The face attributes include at least one of beards, glasses, and masks on the human face; the third type of face attributes include at least one of pupil color, hair color, makeup color, and filter color.

In a possible implementation, the attribute type includes a first type of face attribute, and at least one first N-dimensional vector on which the attribute editing direction acts is determined based on the attribute type corresponding to the face attribute. , including: when the face attribute includes the first type of face attribute, determining that the attribute editing direction acts on the first N-dimensional vector to the i-th first N-dimensional vector; wherein, Adding the at least one first N-dimensional vector among the M first N-dimensional vectors to the third N-dimensional vector respectively to obtain M fourth N-dimensional vectors includes: adding the M The 1st first N-dimensional vector to the i-th first N-dimensional vector among the first N-dimensional vectors are respectively added to the third N-dimensional vector to obtain the M fourth N-dimensional vectors.

In a possible implementation, the attribute type includes a second type of face attribute, and at least one first N-dimensional vector on which the attribute editing direction acts is determined based on the attribute type corresponding to the face attribute. , including: when the face attribute includes the second type of face attribute, determining that the attribute editing direction acts on the i+1th first N-dimensional vector to the j-th first N-dimensional vector, i∈[1,M], j∈[2,M]; wherein, the at least one first N-dimensional vector among the M first N-dimensional vectors is respectively combined with the third N-dimensional vector Adding to obtain M fourth N-dimensional vectors includes: combining the i+1th first N-dimensional vector to the j-th first N-dimensional vector among the M first N-dimensional vectors with the said Three N-dimensional vectors are added to obtain the M fourth N-dimensional vectors.

In a possible implementation, the attribute type includes a third type of face attribute, and at least one first N-dimensional vector on which the attribute editing direction acts is determined based on the attribute type corresponding to the face attribute. , including: when the face attribute includes the third type of face attribute, determining that the attribute editing direction acts on the j+1th first N-dimensional vector to the M-th first N-dimensional vector, j∈[2,M]; wherein the at least one first N-dimensional vector among the M first N-dimensional vectors is added to the third N-dimensional vector respectively to obtain M fourth The N-dimensional vector includes: adding the j+1th first N-dimensional vector to the M-th first N-dimensional vector among the M first N-dimensional vectors and the third N-dimensional vector respectively, to obtain The M fourth N-dimensional vectors.

In a possible implementation, the decoding part 104 includes: a network decoding sub-part configured to use a generating network to decode the second latent variable to obtain a target face image; the generating network is used to Generate an image with a preset image style based on M N-dimensional vectors; wherein the generation network includes a plurality of network layers with different resolutions, and the network layers with different resolutions are respectively used to process second hidden images corresponding to different face attributes. variable.

In a possible implementation, the generating network includes M network layers, the second latent variable is represented as M fourth N-dimensional vectors, and the generating network is used to decode the second latent variable. , obtaining the target face image, including: inputting the first fourth N-dimensional vector to the first network layer of the generation network, obtaining the first intermediate image output by the first network layer; converting the mth The fourth N-dimensional vector and the m-1th intermediate image are input to the m-th network layer of the generating network, and the m-th intermediate image output by the m-th network layer is obtained, m∈[2,M) ; Input the Mth fourth N-dimensional vector and the M-1th intermediate image to the Mth network layer of the generation network to obtain the target face image output by the Mth network layer.

In a possible implementation, when the product of the attribute editing direction and the attribute editing degree is a positive value, the target face image obtained based on the second latent variable, relative to the The display effect of the face attributes of the face image is enhanced; when the product of the attribute editing direction and the attribute editing degree is a negative value, the target face image obtained based on the second latent variable , the display effect of the face attribute relative to the face image is weakened.

In some embodiments, the functions or included parts of the device provided by the embodiments of the present disclosure can be configured to perform the method described in the above method embodiments, and for its specific implementation, reference can be made to the description of the above method embodiments.

Embodiments of the present disclosure also provide a computer-readable storage medium on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the above method is implemented. The computer-readable storage medium may be a non-volatile computer-readable storage medium.

An embodiment of the present disclosure also provides an electronic device, including: a processor; a memory configured to store instructions executable by the processor; wherein the processor is configured to call instructions stored in the memory to perform the above method.

Embodiments of the present disclosure also provide a computer program product, including computer readable code, or a non-volatile computer readable storage medium carrying the computer readable code. When the computer readable code is stored in a processor of an electronic device, When running, the processor in the electronic device executes the above method.

The electronic device may be provided as a terminal, a server, or other forms of equipment.

FIG. 8 shows a block diagram of an electronic device according to an embodiment of the present disclosure. For example, the electronic device 1900 may be provided as a server or terminal device. Referring to FIG. 8 , electronic device 1900 includes a processing component 1922 , which may include one or more processors, and memory resources represented by memory 1932 configured to store instructions, such as applications, executable by processing component 1922 . An application stored in memory 1932 may include one or more portions, each corresponding to a set of instructions. Furthermore, the processing component 1922 is configured to execute instructions to perform the above-described methods.

Electronic device 1900 may also include a power supply component 1926 configured to perform power management of electronic device 1900, a wired or wireless network interface 1950 configured to connect electronic device 1900 to a network, and an input-output (I/O) interface 1958 .

In an exemplary embodiment, a non-volatile computer-readable storage medium is also provided, such as a memory 1932 including computer program instructions, which can be executed by the processing component 1922 of the electronic device 1900 to complete the above method.

Embodiments of the present disclosure may be systems, methods, and/or computer program products. A computer program product may include a computer-readable storage medium having thereon computer-readable program instructions configured to cause a processor to implement aspects of the disclosed embodiments.

Computer-readable storage media may be tangible devices that can retain and store instructions for use by an instruction execution device. The computer-readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the above. Computer-readable storage media can include: portable computer disks, hard disks, random access memory (RAM, Random Access Memory), read-only memory, erasable programmable read-only memory (EPROM or flash memory), static random access memory, Portable compact disk read-only memory (CD-ROM, Compact Disc Read-Only Memory), digital versatile disk (DVD, Digital Video Disc), memory stick, floppy disk, mechanical encoding device, such as punched card with instructions stored on it Or a protruding structure in the groove, and any suitable combination of the above. As used herein, computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., light pulses through fiber optic cables), or through electrical wires. transmitted electrical signals.

Computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to various computing/processing devices, or to an external computer or external storage device over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage on a computer-readable storage medium in the respective computing/processing device .

Aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus, thereby producing a machine that, when executed by the processor of the computer or other programmable data processing apparatus, , resulting in an apparatus that implements the functions/actions specified in one or more blocks in the flowchart and/or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium. These instructions cause the computer, programmable data processing device and/or other equipment to work in a specific manner. Therefore, the computer-readable medium storing the instructions includes An article of manufacture that includes instructions that implement aspects of the functions/acts specified in one or more blocks of the flowcharts and/or block diagrams.

Computer-readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other equipment, causing a series of operating steps to be performed on the computer, other programmable data processing apparatus, or other equipment to produce a computer-implemented process , thereby causing instructions executed on a computer, other programmable data processing apparatus, or other equipment to implement the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a portion, segment, or portion of instructions that contains one or more elements configured to implement the specified logical function Executable instructions. In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two consecutive blocks may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved. It will also be noted that each block of the block diagram and/or flowchart illustration, and combinations of blocks in the block diagram and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts. , or can be implemented using a combination of specialized hardware and computer instructions.

The computer program product may be implemented in hardware, software or a combination thereof. In an optional embodiment, the computer program product is embodied as a computer storage medium. In another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK) and so on.

If the disclosed technical solution involves personal information, the products applying the disclosed technical solution will clearly inform the personal information processing rules and obtain the individual's independent consent before processing personal information. If the disclosed technical solution involves sensitive personal information, the product applying the disclosed technical solution must obtain the individual's separate consent before processing the sensitive personal information, and at the same time meet the requirement of "express consent". For example, setting up clear and conspicuous signs on personal information collection devices such as cameras to inform them that they have entered the scope of personal information collection, and that personal information will be collected. If an individual voluntarily enters the collection scope, it is deemed to have agreed to the collection of his or her personal information; or On personal information processing devices, when using obvious logos/information to inform personal information processing rules, obtain personal authorization through pop-up messages or asking individuals to upload their personal information; among them, personal information processing rules may include personal information processing rules. Information such as information processors, purposes of processing personal information, methods of processing, and types of personal information processed.

The embodiments of the present disclosure have been described above. The above description is illustrative, not exhaustive, and is not limited to the disclosed embodiments. Many modifications and variations will be apparent to those skilled in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principles, practical applications, or improvements to the technology in the market, or to enable other persons of ordinary skill in the art to understand the embodiments disclosed herein.

Industrial applicability

In the embodiment of the present disclosure, a face image to be processed is obtained; the face image is encoded to obtain the first latent variable of the face image; in response to the setting operation of the attribute editing degree of the face attribute, according to the set attribute Editing degree and the attribute editing direction corresponding to the face attribute, edit the first latent variable, and obtain the edited second latent variable. Among them, the attribute editing direction represents the enhancement direction or weakening direction of the face attribute, and the attributes corresponding to different face attributes The editing direction is different, and the degree of attribute editing represents the degree of enhancement or weakening of the face attributes; the second latent variable is decoded to obtain the target face image, and the display effects of the face attributes in the target face image and the face image are different. The embodiments of the present disclosure can accurately edit the face attributes specified by the user.

Claims

An image processing method including:

Get the face image to be processed;

Perform encoding processing on the face image to obtain the first latent variable of the face image;

In response to the setting operation for the attribute editing degree of the face attribute, according to the set attribute editing degree and the attribute editing direction corresponding to the face attribute, the first latent variable is edited to obtain the edited second latent variable, where , the attribute editing direction represents the enhancement direction or weakening direction of the face attribute, different face attributes correspond to different attribute editing directions, and the attribute editing degree represents the enhancement or weakening degree of the face attribute;

The second latent variable is decoded to obtain a target face image, and the display effects of the face attributes in the target face image and the face image are different.
The method according to claim 1, wherein the attribute editing direction corresponding to the face attribute is obtained by using an attribute classifier to classify the sample face image into two categories;

Among them, the use of attribute classifiers to classify sample face images includes:

Utilize the attribute classifier corresponding to the face attribute to perform two classifications on the sample face image to obtain the attribute classification boundary of the sample face image in the latent space, and the latent space represents the sample face image The sample distribution space of the corresponding sample latent variable distribution;

The direction in which the attribute classification boundary faces the positive sample attribute of the face attribute is determined as the attribute editing direction.
The method according to claim 1 or 2, wherein the first latent variable is represented by M first N-dimensional vectors, the attribute editing direction is represented by a second N-dimensional vector, N and M are positive integers, so It is described that the first latent variable is edited according to the set attribute editing degree and the attribute editing direction corresponding to the face attribute, and the edited second latent variable is obtained, including:

Determine at least one first N-dimensional vector on which the attribute editing direction acts according to the attribute type corresponding to the face attribute;

Calculate the product between the second N-dimensional vector and the attribute editing degree to obtain a third N-dimensional vector;

The at least one first N-dimensional vector among the M first N-dimensional vectors is added to the third N-dimensional vector respectively to obtain M fourth N-dimensional vectors, and the second hidden variable is expressed as The M fourth N-dimensional vectors.
The method according to any one of claims 1 to 3, wherein the attribute type corresponding to the face attribute includes at least one of a first type of face attribute, a second type of face attribute and a third type of face attribute. kind; the first hidden variable is expressed as M first N-dimensional vectors;

The attribute editing direction of the first type of face attributes acts on the first N-dimensional vector to the i-th first N-dimensional vector; the attribute editing direction of the second type of face attributes acts on the i+1 The first N-dimensional vector to the j-th first N-dimensional vector; the attribute editing direction of the third type of face attribute acts on the j+1-th first N-dimensional vector to the M-th first N-dimensional vector; Among them, i∈[1,M], j∈[2,M].
The method of claim 4, wherein the first type of face attributes include: face shape, posture, and at least one of gender, age, and emotion represented by the face; and the second The human face attributes include at least one of beards, glasses, and masks on the human face; the third type of human face attributes include at least one of pupil color, hair color, makeup color, and filter color.
The method according to any one of claims 3 to 5, wherein the attribute type includes a first type of face attribute, and based on the attribute type corresponding to the face attribute, it is determined that the attribute editing direction acts on At least one first N-dimensional vector, including:

In the case where the face attributes include the first type of face attributes, it is determined that the attribute editing direction acts on the first N-dimensional vector to the i-th first N-dimensional vector;

Wherein, adding the at least one first N-dimensional vector among the M first N-dimensional vectors to the third N-dimensional vector respectively to obtain M fourth N-dimensional vectors includes:

Add the first to the i-th first N-dimensional vector among the M first N-dimensional vectors to the third N-dimensional vector respectively to obtain the M fourth N-dimensional vectors. vector.
The method according to any one of claims 3 to 5, wherein the attribute type includes a second type of face attribute, and based on the attribute type corresponding to the face attribute, it is determined that the attribute editing direction acts on At least one first N-dimensional vector, including:

In the case where the face attributes include the second type of face attributes, it is determined that the attribute editing direction acts on the i+1th first N-dimensional vector to the j-th first N-dimensional vector;

Wherein, adding the at least one first N-dimensional vector among the M first N-dimensional vectors to the third N-dimensional vector respectively to obtain M fourth N-dimensional vectors includes:

Add the i+1th first N-dimensional vector to the j-th first N-dimensional vector among the M first N-dimensional vectors respectively to the third N-dimensional vector to obtain the M fourth N-dimensional vector.
The method according to any one of claims 3 to 5, wherein the attribute type includes a third type of face attribute, and based on the attribute type corresponding to the face attribute, it is determined that the attribute editing direction acts on At least one first N-dimensional vector, including:

In the case where the face attributes include the third type of face attributes, it is determined that the attribute editing direction acts on the j+1th first N-dimensional vector to the M-th first N-dimensional vector;

Wherein, adding the at least one first N-dimensional vector among the M first N-dimensional vectors to the third N-dimensional vector respectively to obtain M fourth N-dimensional vectors includes:

Add the j+1th first N-dimensional vector to the M-th first N-dimensional vector among the M first N-dimensional vectors respectively to the third N-dimensional vector to obtain the M fourth N-dimensional vector.
The method according to any one of claims 1 to 8, wherein said decoding the second latent variable to obtain the target face image includes:

A generation network is used to decode the second latent variable to obtain the target face image; the generation network is used to generate an image with a preset image style based on M N-dimensional vectors;

Wherein, the generation network includes a plurality of network layers with different resolutions, and the network layers with different resolutions are respectively used to process second latent variables corresponding to different face attributes.
The method according to claim 9, wherein the generating network includes M network layers, the second latent variable is represented by M fourth N-dimensional vectors, and the generating network is used to perform Decoding process to obtain the target face image, including:

Input the first fourth N-dimensional vector into the first network layer of the generating network to obtain the first intermediate image output by the first network layer;

Input the m-th fourth N-dimensional vector and the m-1 intermediate image into the m-th network layer of the generating network, and obtain the m-th intermediate image output by the m-th network layer, m∈[2 ,M);

Input the Mth fourth N-dimensional vector and the M-1th intermediate image to the Mth network layer of the generation network to obtain the target face image output by the Mth network layer.
The method according to any one of claims 1 to 10, wherein when the product of the attribute editing direction and the attribute editing degree is a positive value, the target person obtained based on the second latent variable A face image, the display effect of the face attributes relative to the face image is enhanced;

In the case where the product of the attribute editing direction and the attribute editing degree is a negative value, the target face image obtained based on the second latent variable, relative to the face attribute of the face image The display effect is weakened.
An image processing device, including:

The acquisition part is configured to acquire the face image to be processed;

The encoding part is configured to perform encoding processing on the face image to obtain the first latent variable of the face image;

The editing part is configured to respond to the setting operation of the attribute editing degree of the face attribute, edit the first latent variable according to the set attribute editing degree and the attribute editing direction corresponding to the face attribute, and obtain the edited third Two latent variables, wherein the attribute editing direction represents the enhancement direction or weakening direction of the face attribute, different face attributes correspond to different attribute editing directions, and the attribute editing degree represents the enhancement degree or weakening direction of the face attribute. degree of weakening;

The decoding part is configured to perform decoding processing on the second latent variable to obtain a target face image, where the display effects of the face attributes in the target face image and the face image are different.
The device according to claim 12, wherein the attribute editing direction corresponding to the face attribute is obtained by using an attribute classifier to classify the sample face image into two categories; wherein the attribute classifier is used to classify the sample face image. Carry out two categories, including:

Utilize the attribute classifier corresponding to the face attribute to perform two classifications on the sample face image to obtain the attribute classification boundary of the sample face image in the latent space, and the latent space represents the sample face image The sample distribution space of the corresponding sample latent variable distribution; the direction in which the attribute classification boundary faces the positive sample attribute of the face attribute is determined as the attribute editing direction.
The device according to claim 12 or 13, wherein the first latent variable is represented by M first N-dimensional vectors, the attribute editing direction is represented by a second N-dimensional vector, N and M are positive integers, so The editing section includes:

The determination sub-part is configured to determine at least one first N-dimensional vector on which the attribute editing direction acts based on the attribute type corresponding to the face attribute;

A calculation sub-part configured to calculate the product between the second N-dimensional vector and the attribute editing degree to obtain a third N-dimensional vector;

The addition sub-part is configured to add the at least one first N-dimensional vector among the M first N-dimensional vectors to the third N-dimensional vector respectively, to obtain M fourth N-dimensional vectors, so The second latent variable is expressed as the M fourth N-dimensional vectors.
The device according to any one of claims 12 to 14, wherein the attribute type corresponding to the face attribute includes at least one of a first type of face attribute, a second type of face attribute and a third type of face attribute. kind; the first hidden variable is expressed as M first N-dimensional vectors;

The attribute editing direction of the first type of face attributes acts on the first N-dimensional vector to the i-th first N-dimensional vector; the attribute editing direction of the second type of face attributes acts on the i+1 The first N-dimensional vector to the j-th first N-dimensional vector; the attribute editing direction of the third type of face attribute acts on the j+1-th first N-dimensional vector to the M-th first N-dimensional vector; Among them, i∈[1,M], j∈[2,M].
The device according to claim 15, wherein the first type of human face attributes include: face shape, posture, and at least one of gender, age, and emotion represented by the human face; and the second The human face attributes include at least one of beards, glasses, and masks on the human face; the third type of human face attributes include at least one of pupil color, hair color, makeup color, and filter color.
The device according to any one of claims 14 to 16, wherein the attribute type includes a first type of face attribute, and based on the attribute type corresponding to the face attribute, it is determined that the attribute editing direction acts on At least one first N-dimensional vector, including:

In the case where the face attributes include the first type of face attributes, it is determined that the attribute editing direction acts on the first N-dimensional vector to the i-th first N-dimensional vector;

Wherein, adding the at least one first N-dimensional vector among the M first N-dimensional vectors to the third N-dimensional vector respectively to obtain M fourth N-dimensional vectors includes:

Add the first to the i-th first N-dimensional vector among the M first N-dimensional vectors to the third N-dimensional vector respectively to obtain the M fourth N-dimensional vectors. vector.
The device according to any one of claims 14 to 16, wherein the attribute type includes a second type of face attribute, and based on the attribute type corresponding to the face attribute, it is determined that the attribute editing direction acts on At least one first N-dimensional vector, including:

In the case where the face attributes include the second type of face attributes, it is determined that the attribute editing direction acts on the i+1th first N-dimensional vector to the j-th first N-dimensional vector;

Wherein, adding the at least one first N-dimensional vector among the M first N-dimensional vectors to the third N-dimensional vector respectively to obtain M fourth N-dimensional vectors includes:

Add the i+1th first N-dimensional vector to the j-th first N-dimensional vector among the M first N-dimensional vectors respectively to the third N-dimensional vector to obtain the M fourth N-dimensional vector.
The device according to any one of claims 14 to 16, wherein the attribute type includes a third type of face attribute, and based on the attribute type corresponding to the face attribute, it is determined that the attribute editing direction acts on At least one first N-dimensional vector, including:

In the case where the face attributes include the third type of face attributes, it is determined that the attribute editing direction acts on the j+1th first N-dimensional vector to the M-th first N-dimensional vector;

Wherein, adding the at least one first N-dimensional vector among the M first N-dimensional vectors to the third N-dimensional vector respectively to obtain M fourth N-dimensional vectors includes:

Add the j+1th first N-dimensional vector to the M-th first N-dimensional vector among the M first N-dimensional vectors respectively to the third N-dimensional vector to obtain the M fourth N-dimensional vector.
The device according to any one of claims 12 to 19, wherein the decoding part includes: a network decoding sub-part configured to use a generating network to decode the second latent variable to obtain the target face image. ;The generation network is used to generate an image with a preset image style based on M N-dimensional vectors;

Wherein, the generation network includes a plurality of network layers with different resolutions, and the network layers with different resolutions are respectively used to process second latent variables corresponding to different face attributes.
The device according to claim 20, the generating network includes M network layers, the second latent variable is expressed as M fourth N-dimensional vectors, and the generating network is used to perform Decoding process to obtain the target face image, including:

Input the first fourth N-dimensional vector into the first network layer of the generating network to obtain the first intermediate image output by the first network layer;

Input the m-th fourth N-dimensional vector and the m-1 intermediate image into the m-th network layer of the generating network, and obtain the m-th intermediate image output by the m-th network layer, m∈[2 ,M);

Input the Mth fourth N-dimensional vector and the M-1th intermediate image to the Mth network layer of the generation network to obtain the target face image output by the Mth network layer.
The device according to any one of claims 12 to 21, wherein when the product of the attribute editing direction and the attribute editing degree is a positive value, the target person obtained based on the second latent variable A face image, the display effect of the face attributes relative to the face image is enhanced;

In the case where the product of the attribute editing direction and the attribute editing degree is a negative value, the target face image obtained based on the second latent variable, relative to the face attribute of the face image The display effect is weakened.
An electronic device including:

processor;

Memory configured to store instructions executable by the processor;

Wherein, the processor is configured to call instructions stored in the memory to execute the method described in any one of claims 1 to 11.
A computer-readable storage medium having computer program instructions stored thereon. When the computer program instructions are executed by a processor, the method of any one of claims 1 to 11 is implemented.
A computer program product comprising computer readable code. When the computer readable code is run in an electronic device, a processor in the electronic device executes a configuration configured to implement the method described in any one of claims 1 to 12 method.