WO2020216033A1 - 用于面部图像生成的数据处理方法、设备和介质 - Google Patents
用于面部图像生成的数据处理方法、设备和介质 Download PDFInfo
- Publication number
- WO2020216033A1 WO2020216033A1 PCT/CN2020/082918 CN2020082918W WO2020216033A1 WO 2020216033 A1 WO2020216033 A1 WO 2020216033A1 CN 2020082918 W CN2020082918 W CN 2020082918W WO 2020216033 A1 WO2020216033 A1 WO 2020216033A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- facial
- feature
- network
- facial image
- Prior art date
Links
- 230000001815 facial effect Effects 0.000 title claims abstract description 559
- 238000003672 processing method Methods 0.000 title claims abstract description 25
- 239000013598 vector Substances 0.000 claims abstract description 60
- 230000001131 transforming effect Effects 0.000 claims abstract description 10
- 238000012549 training Methods 0.000 claims description 111
- 230000006870 function Effects 0.000 claims description 85
- 238000000034 method Methods 0.000 claims description 73
- 238000012545 processing Methods 0.000 claims description 39
- 239000002131 composite material Substances 0.000 claims description 36
- 230000009466 transformation Effects 0.000 claims description 32
- 238000004590 computer program Methods 0.000 claims description 13
- 230000015572 biosynthetic process Effects 0.000 claims description 10
- 238000003786 synthesis reaction Methods 0.000 claims description 10
- 230000011218 segmentation Effects 0.000 claims description 7
- 230000008569 process Effects 0.000 description 47
- 210000004709 eyebrow Anatomy 0.000 description 18
- 238000010586 diagram Methods 0.000 description 14
- 238000010276 construction Methods 0.000 description 3
- 238000000354 decomposition reaction Methods 0.000 description 3
- 230000008521 reorganization Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000012937 correction Methods 0.000 description 2
- 238000011840 criminal investigation Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000011022 operating instruction Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 210000001061 forehead Anatomy 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000003205 muscle Anatomy 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 210000001519 tissue Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
Definitions
- This application relates to the field of image processing, and more specifically, to data processing methods, equipment, media, and computer equipment for facial image generation.
- Facial image generation technology is an emerging research field, which has broad application prospects in progeny face prediction, criminal image restoration in criminal investigation, and construction of virtual characters. For example, by inputting a facial image, another brand-new facial image that is similar to but different from the facial image can be generated as the target image.
- the existing facial image generation scheme uses a general processing network to generate the target image. For example, input a face image to the trained encoding network and decoding network, and then output the target image.
- the problem with such an image generation scheme is that the harmony and naturalness of the synthesized facial image output by the general processing network is very poor, and it is difficult for users to believe that this is a real facial image.
- embodiments of the present application provide a data processing method, device, medium, and computer device for facial image generation, which can generate a synthetic facial image closer to a real facial image.
- a data processing method for facial image generation executed by a server, including: acquiring a first facial image and a second facial image; acquiring a first facial image ( IMA ) M first image blocks corresponding to facial features, and N second image blocks corresponding to facial features in the second facial image ( IFA ) are acquired; M first image blocks and N second image blocks are transformed To the feature space to generate M first feature blocks and N second feature blocks; select a part of the first feature block and a part of the second feature block according to a specific control vector; at least based on the selected part of the first feature block and part of the first feature block Two feature blocks, generating a first synthetic feature map; and inversely transforming the first synthetic feature map back into image space to generate a third facial image, where M and N are natural numbers.
- a data processing device for facial image generation including: a segmentation device for acquiring M first image blocks corresponding to facial features in an input first facial image, And obtain N second image blocks corresponding to facial features in the input second facial image; a first transformation device for transforming M first image blocks and N second image blocks into a feature space to generate M A first feature block and N second feature blocks; a selection device for selecting a part of the first feature block and a part of the second feature block according to a specific control vector; a first synthesis device for at least a part of the first feature block selected The feature block and a part of the second feature block generate a first synthetic feature map; and a first inverse transform device is used to inversely transform the first synthetic feature map back to the image space to generate a third facial image.
- a computer-readable recording medium having a computer program stored thereon for executing the facial image generation method described in the above embodiment when the computer program is executed by a processor.
- Data processing method for executing the facial image generation method described in the above embodiment when the computer program is executed by a processor.
- a computer device including a memory and a processor, the memory is configured to store a computer program, and the processor is configured to execute the computer program to implement the method described in the foregoing embodiment Data processing method for facial image generation.
- Fig. 1 is a flowchart illustrating a process of a data processing method for facial image generation according to an embodiment of the present application
- Figure 2 shows a schematic diagram of a data flow about an inherited network according to an embodiment of the present application
- Figure 3 shows the facial image generation results under different control vectors according to the embodiments of the present application.
- FIG. 4 shows a facial image generation result when random factors are added to the input facial image according to an embodiment of the present application
- Fig. 5 shows a schematic diagram of a data flow of an attribute enhancement network according to an embodiment of the present application
- Figure 6 shows facial images of different ages generated under the specified control vector
- Figure 7 shows facial images of different ages and genders generated under the specified control vector
- FIG. 8 is a flowchart illustrating the training process of the inherited network according to an embodiment of the present application.
- FIG. 9 is a schematic diagram illustrating the process of two facial feature exchanges in the training process of the inherited network
- Fig. 10 shows a schematic diagram of data flow in the training process of the inherited network according to an embodiment of the present application
- Fig. 11 is a flowchart illustrating a training process of an attribute enhancement network according to an embodiment of the present application.
- Fig. 12 shows a data flow diagram in the training process of the attribute enhancement network according to an embodiment of the present application
- FIG. 13 shows a schematic diagram of an application environment of an embodiment of the present application
- Fig. 14 shows a functional block diagram of the configuration of a data processing device for facial image generation according to an embodiment of the present application
- FIG. 15 shows an example of a data processing device for facial image generation according to an embodiment of the present application as a hardware entity
- Fig. 16 shows a schematic diagram of a computer-readable recording medium according to an embodiment of the present application.
- the output facial image is far from the real facial image.
- training the encoding network and the decoding network in the general processing network needs to collect and build a real face database in order to provide supervision information for the output synthetic facial image.
- the real child facial image is used as the processing network to supervise information based on the composite child facial image output from the parent or mother facial image, so as to adjust the parameters of the processing network so that the trained processing network can output a facial image similar to the input and similar to the real Image composite face image.
- collecting and establishing such a database requires a large cost.
- an inheritance network dedicated to facial image synthesis is proposed. Compared with general processing networks, it can output synthesized facial images that are closer to real images, and can accurately control the inheritance of synthesized facial images. Which facial features in the two input facial images.
- the embodiment of the present application further proposes an attribute enhancement network, which can adjust the attributes (such as age, gender, etc.) of the synthesized facial image in a larger range on the basis of inheriting the synthesized facial image output by the network.
- a training method of inheritance network and attribute enhancement network without the face database of the parent, mother and child relationship is proposed. In the training process of the inheritance network and the attribute enhancement network according to the embodiments of the present application, there is no need to establish a face database with parent, mother, and child relationships, but directly use any existing face database to complete the training of the processing network .
- the first facial image in the application mode, an image input to the inheritance network, represented by I MA ;
- the second facial image in the application mode, another image input to the inheritance network, represented by IFA ;
- the third facial image in the application mode, the image output by the inheritance network, represented by I o1 ;
- Seventh facial image in training mode, an image output by the inheritance network, denoted by I 'M , and the fifth facial image I M as the supervision image;
- the network output a succession of images, to I 'F expressed, the sixth face image as supervision image I F;
- Ninth facial image In training mode, an image output by the attribute enhancement network to Indicates that the seventh facial image I 'M is used as the supervision image;
- Tenth facial image in training mode, an image output by the attribute enhancement network to Indicates that the eighth facial image I'F is used as the supervision image.
- FIG. 1 a data processing method for facial image generation according to an embodiment of the present application will be described with reference to FIG. 1.
- the method is executed by a server.
- the data processing method includes the following steps.
- step S101 a first facial image (I MA ) and a second facial image (I FA ) are acquired.
- step S102 obtain M first image blocks corresponding to facial features in the first facial image (I MA ), and obtain N second images corresponding to facial features in the second facial image (I FA ) Piece.
- the facial features can be organs (such as eyebrows, eyes, nose, mouth, face profile), tissues or local features (such as features on the forehead, face, skin), and the like.
- the M first image blocks respectively correspond to different facial features
- the N second image blocks also correspond to different facial features respectively.
- M and N are natural numbers.
- the first facial image and the second facial image may be facial images of people of different genders, such as a male facial image and a female facial image.
- the first facial image and the second facial image may be facial images of people of the same gender.
- the first facial image and the second facial image may be real facial images taken by a camera.
- the first facial image and the second facial image may also be composite images generated based on facial feature images selected from an existing facial feature library.
- the first facial image may be a synthetic image generated by randomly selecting and replacing a facial feature from a facial feature library on the basis of a person’s original facial features, and the second facial image may also be generated in a similar manner Composite image.
- the first facial image may also be a composite image generated by randomly selecting and combining all facial features from a facial feature library, and the second facial image may also be a composite image generated in a similar manner.
- first facial image and the second facial image may also be cartoon facial images. It can be seen that in the embodiment of the present application, the types of the first facial image and the second facial image are not particularly limited. Any two facial images that can be used as input can be similarly applied to the embodiment of the present application, and are included in Within the scope of this application.
- the input facial image you can first locate the position of each facial feature through facial calibration, and then decompose the facial image into image blocks corresponding to each facial feature.
- the total number of different facial features required to generate a new facial image is set in advance, denoted as L, and L is a natural number.
- L is a natural number.
- facial features can be divided into left eye and left eyebrow, right eye and right eyebrow, nose, mouth, and face profile.
- the total number of different facial features required to generate a new facial image is five. If an input facial image is a complete frontal image, the number of image blocks obtained by decomposition will be consistent with the total number of different facial features mentioned above. In other words, all required different facial features can be detected from the facial image.
- the input facial image can be decomposed into five image blocks, which are: the image block corresponding to the left eye and left eyebrow, the image block corresponding to the right eye and right eyebrow, and the image block corresponding to the nose.
- this decomposition method is only an example, and any other decomposition methods are also feasible.
- the input facial image can also be decomposed into image blocks corresponding to eyes, image blocks corresponding to eyebrows, image blocks corresponding to the nose, image blocks corresponding to the mouth, and image blocks corresponding to the face profile.
- the number of image blocks decomposed from this facial image will be less than the total number of different facial features required In other words, some facial features may not be detected from the facial image. Since in the subsequent steps, a new facial image can be synthesized by selecting some facial features in the first facial image and some facial features in the second facial image, there is no need to generate a new face in an input facial image All the facial features required by the image only need to be pieced together from the two input facial images to generate all the facial features required to generate a new facial image.
- the number M of first image blocks and the number N of second image blocks may both be equal to the total number L of different facial features required to generate a new facial image.
- one of the number M of the first image block and the number N of the second image block may be equal to the total number L of different facial features required to generate a new facial image, and the other may be less than L.
- the number M of the first image block and the number N of the second image block may both be less than L, and M and N may be equal or different.
- step S103 the M first image blocks and N second image blocks are transformed into the feature space to generate M first feature blocks and N second feature blocks.
- the transformation from image space to feature space can be realized through transformation networks, such as coding networks.
- the same coding network can be set for image blocks with different facial features. Or, as another possible implementation manner, due to the apparent difference of each facial feature, an exclusive feature is acquired for each facial feature.
- a coding network can be set up for the image block corresponding to each facial feature. For example, set a set of coding networks, where the coding network E1 is used for the image blocks corresponding to the left eye and left eyebrow, the coding network E2 is used for the image blocks corresponding to the right eye and the right eyebrow, and the coding network E3 is used for the image blocks corresponding to the nose.
- the coding network E4 is used for the image block corresponding to the mouth
- the coding network E5 is used for the image block corresponding to the face profile.
- the parameters of the coding network E1 to E5 are different.
- the M first image blocks are transformed into the feature space through the corresponding coding networks E1 to E5, and similarly, the N second image blocks are transformed into the feature space through the corresponding coding networks E1 to E5, respectively.
- a two-dimensional image block can be transformed into a three-dimensional feature block with length, width and height.
- step S104 a part of the first characteristic block and a part of the second characteristic block are selected according to a specific control vector.
- the specific control vector includes L information bits corresponding to each facial feature. It should be noted here that the number of information bits is the same as the total number L of different facial features required to generate a new facial image described above, and as described above, L is a natural number, and M ⁇ L, N ⁇ L .
- the control vector includes five information bits, and these five information bits Corresponding to the left eye and left eyebrow, right eye and right eyebrow, nose, mouth and face respectively.
- the specific control vector can be manually set by the user, or it can be set automatically at random.
- the step of selecting a part of the first characteristic block and a part of the second characteristic block according to a specific control vector further includes: when an information bit in the specific control vector is a first value, selecting from the M first characteristic blocks Select the feature block of the facial feature corresponding to the information bit in the block, and when the information bit in the specific control vector is the second value, select the face corresponding to the information bit from the N second feature blocks Feature block of features.
- the selection is made according to each information bit in the control vector in turn, and then L feature blocks are obtained.
- These feature blocks are mixed feature blocks composed of a part of the first feature block and a part of the second feature block.
- step S105 a first composite feature map is generated based on at least a part of the first feature block and a part of the second feature block selected.
- control vector v when the control vector v is 10010, it can be based on the first feature block corresponding to the left eye and left eyebrow, the feature block corresponding to the mouth, and the second feature block corresponding to The feature blocks of the right eye and the right eyebrows, the feature blocks corresponding to the nose, and the feature blocks corresponding to the face profile are used to generate the first composite feature map, that is, the feature blocks from different sources are recombined in the feature space A new composite feature map with various facial features is formed.
- the attributes (eg, age and gender) of the output third facial image can be controlled.
- the input attribute information of the first facial image and the second facial image may be quite different.
- the age of the first facial image may be very different from the age of the second facial image.
- the age of the first facial image is 20 years old
- the age of the second facial image is 60 years old.
- the attribute features are further superimposed.
- the attribute features of female gender can be further superimposed to remove male features such as beards.
- the attribute characteristics of the average age in the above example, it may be 40
- the step of generating the first synthetic feature map may further include the following steps.
- the designated attribute information is expanded into attribute feature blocks in the feature space.
- the attribute information can be expanded to feature blocks having the same length and width as the feature blocks but different heights.
- a first composite feature map is generated.
- step S106 the first synthetic feature map is inversely transformed back into the image space to generate a third facial image (I o1 ).
- the inverse transformation from feature space to image space can be realized through an inverse transformation network, such as decoding network D.
- the inherited network may include the encoding network E1 to E5 and the decoding network D described above, and may be implemented by various neural networks. If the inherited network is expressed as a function f inh , then the input of this function includes the first facial image I MA and the second facial image I FA and the control vector v, and the output is the third facial image I o1 , the specific formula is as follows :
- I o1 f inh (I MA , I FA , v) (1)
- the input of the function further includes the age (y a ) and gender (y g ) of the third facial image desired to be output.
- the specific formula is as follows:
- I o1 f inh (I MA , I FA , v, y a , y g ) (2)
- Fig. 2 shows a data flow diagram of an inherited network according to an embodiment of the present application.
- the first facial image I MA and the second facial image I FA as input sources are decomposed into image blocks corresponding to facial features and then transformed into feature blocks in the feature space via a set of coding networks E1 to E5.
- the feature blocks are selected and exchanged according to the control vector v, then spliced with the attribute feature blocks, and finally transformed back to the image space via the decoding network D to generate the third facial image I o1 .
- the third facial image is a composite facial image that inherits a part of facial features in the first facial image and a part of facial features in the second facial image.
- the generated third facial image may be an offspring facial image assuming that the two persons are parents.
- the generated third facial image may be a virtual facial image synthesized by combining the advantages of the facial features of the two persons.
- the generated third facial image can be used to infer the facial image of a specific person. This is especially important in the identification of eyewitnesses in criminal investigations.
- the facial features are combined to generate a low-quality synthetic facial image that does not resemble a real photo.
- the synthesized facial image as the first facial image, and arbitrarily selecting a second facial image, and setting the specific control vector to 11111 (that is, all facial feature images in the first facial image are selected), it can be output
- the third facial image very similar to the real image to facilitate the identification of the suspect.
- the data processing method for facial image generation it can be seen by referring to the processing steps described in FIG. 1 that through the segmentation of the facial feature image and the reorganization in the feature space, the inherited A third facial image of a part of the facial features in the first facial image and a part of the facial features in the second facial image.
- the output third facial image close to the real image while ensuring the similarity between the output third facial image and the facial image as the input source.
- the third facial image is viewed by the user, it is difficult to distinguish whether the image is a real image or a composite image.
- Figure 3 shows the results of facial image generation under different control vectors. It can be seen from FIG. 3 that by setting different control vectors, the inheritance relationship between the facial features in the generated third facial image and the two facial images as the input source can be accurately controlled.
- Figure 4 shows the facial image generation results with random factors added to the input facial image, that is, as described above, the input facial image is based on a person’s original facial features from the facial feature library The result of facial image generation in the case of randomly selecting and replacing a facial feature to generate a composite image.
- the rows from top to bottom respectively show the results of adding random factors to the eyes and eyebrows, nose, mouth, and face.
- the attributes of the third facial image can be specified and the harmony and naturalness of the third facial image can be further improved.
- the main purpose of the inheritance network described above is to output a third facial image that is similar in facial features to the first facial image and the second facial image, so the superposition of the attribute feature blocks included in it is to ensure similarity. Fine-tuning.
- the third facial image output by the inheritance network is similar to the first facial image and the second facial image as the input source in attributes such as age.
- step S105 In order to adjust the attributes of the output face image in a larger range, as another possible implementation manner, referring back to FIG. 1, after step S105, the following steps may be further included.
- step S107 the third facial image is transformed into a feature space to generate a third feature map.
- the transformation from image space to feature space can be realized through the encoding network E0.
- the parameters of the coding network E0 here are not the same as the parameters of the coding networks E1 to E5 described above.
- step S108 the specific attribute information is expanded into an attribute feature map in the feature space.
- the attribute information can be expanded to a feature map with the same length and width as the three-dimensional feature map but different heights.
- step S109 a second composite feature map is generated based on the attribute feature map and the third feature map.
- step S110 the second synthetic feature map is inversely transformed back into the image space to generate a fourth facial image.
- the inverse transformation from feature space to image space can be realized through the decoding network D0.
- the parameters of the decoding network D0 are not the same as the parameters of the decoding network D mentioned in step S105 above.
- steps S107 to S110 are shown in dashed boxes in FIG. 1.
- the attribute enhancement network may include the encoding network E0 and the decoding network D0 described above, and may be implemented by various neural networks. If the attribute enhancement network is expressed as a function f att , then the input of this function includes the third facial image I o1 and the expected output of the age (y a ) and gender (y g ) of the fourth facial image, and the output is the first
- the four-face image I o2 the specific formula is as follows:
- FIG. 5 shows a data flow diagram of an attribute enhancement network according to an embodiment of the present application.
- the third face image I o1 converted by the encoding network E0 FIG 5 is a third feature of the feature space Z 1, and Z 1 in the feature space, and attribute information y g y a splice, and by The decoding network D0 is inversely transformed back to the image space to obtain the fourth facial image I o2 .
- the fourth facial image can greatly change in attributes. For example, based on the inputted third facial image of 20 years old, the fourth facial image of 5 years old may be output.
- Figure 6 shows facial images of different ages generated under the specified control vector. It can be seen from Figure 6 that through the inheritance network and the attribute enhancement network, facial images of different age groups can be generated, and the facial differences of each age group are obvious.
- Figure 7 shows facial images of different ages and different genders generated under the specified control vector. It can be seen from Figure 7 that by inheriting the network and the attribute enhancement network, even under the same control vector, it can also reflect the difference in facial features of the generated facial image due to different gender and age, such as apple muscle, eyebrows, and decrees. Lines, lip color, etc.
- the specific process of the data processing method for generating a facial image according to an embodiment of the present application is described in detail above with reference to FIGS. 1 to 7.
- the data processing method is implemented by inheriting the network or inheriting the network and the attribute enhancement network.
- the data processing method described above is the processing performed in the application process of the inherited network and the attribute enhanced network.
- the inheritance network and the attribute enhancement network may include an encoding network and a decoding network, and both the encoding network and the decoding network include multiple parameters to be determined. These parameters are determined through the training process to complete the construction of inheritance network and attribute enhancement network. In this way, the inheritance network and the attribute enhancement network can realize the function of facial image generation.
- the inherited network can be obtained through the following training steps shown in FIG. 8.
- the facial images involved in the training process are limited to the fifth to tenth facial images.
- step S801 L fifth image blocks corresponding to each facial feature in the fifth facial image ( IM ) are acquired, and L sixth image blocks corresponding to each facial feature in the sixth facial image (I F ) are acquired. Image block.
- the facial features obtained from the two input facial images can be less than or equal to L, where L is the total number of different facial features required to generate a new facial image, only the two input facial images can be pieced together to generate a new facial image All facial features are sufficient.
- L is the total number of different facial features required to generate a new facial image
- the number of acquired image blocks is all L, where as described above, L is the total number of different facial features required to generate a new facial image.
- step S802 1 block and selecting a portion of the fifth portion of the sixth image block to generate a first synthesized image based on the first control vector v And according to the second control vector v 2 to select another part of the fifth image block and another part of the sixth image block to generate a second composite image
- the composite image after the exchange of facial features is further fused by color correction method to avoid inconsistent color blocks in the composite image.
- the first facial image and the second facial image may be synthesized images generated based on facial feature images selected from an existing facial feature library.
- the composite image since the inheritance network has been trained, the composite image may not need to perform color correction processing.
- step S803 the first composite image is acquired L seventh image blocks corresponding to each facial feature in, and obtain the second composite image L eighth image blocks corresponding to each facial feature in.
- step S804 the L seventh image blocks and L eighth image blocks are input to the inheritance network.
- step S805 through the inheritance network, output a seventh facial image (I' M ) generated based on a part of the seventh image block and a part of the eighth image block selected based on the first control vector, and output based on the second control vector
- An eighth facial image (I' F ) generated by vector selection of another part of the seventh image block and another part of the eighth image block.
- the fifth facial image is a supervision image used to provide supervision information to the seventh facial image
- the sixth facial image is a supervision image used to provide supervision information to the eighth facial image
- the fifth to eighth facial images are taken as one The group inherits the training data.
- the attributes of the desired output facial image are set to be the same as the attributes of the facial image as the input source, so as to facilitate the calculation of the subsequent loss function.
- the training process of the inherited network is different in that the facial feature exchange process is performed in advance before the face image as the input source is input to the inherited network.
- the purpose of this is to provide supervision information for the facial images output by the inherited network.
- the facial features are exchanged once through a control vector, and the composite image after the facial feature exchange is provided to the inheritance network, Then if the parameter settings of the inherited network are accurate, by using the same control vector to exchange facial features again, you should be able to get the original fifth facial image or sixth facial image.
- FIG. 9 shows a schematic process of two facial feature exchanges in the training process of the inherited network.
- the letter A represents the image block of each facial feature in the fifth facial image (I M ) as the input source
- the letter B represents the image block of each facial feature in the sixth facial image (I F ) as the input source.
- the supervised image of the image (I' F ) may not need to establish a face database with the relationship of parent, mother and child, but directly use any existing face database to complete the training process of the inheritance network.
- a Generative Adversarial Network (GAN) method is adopted for learning.
- the generative confrontation network includes a generative network and a discriminant network, and learns data distribution through a new way of playing between the generative network and the discriminant network.
- the purpose of the generation network is to learn the real data distribution as much as possible, and the purpose of the discrimination network is to correctly determine whether the input data comes from the real data or the generation network; in the training process, the generation network and the discrimination network need to be continuously optimized, and each improves itself The ability to generate and discriminate.
- the inheritance network can be regarded as the generative network here.
- the so-called true means that the output facial image is a real image; the so-called false means that the output facial image is an image output by the inheritance network.
- step S806 at least one set of inherited training data is input to the first discriminant network, wherein the first discriminant network is set to output an image when an image is input to the first discriminant network The probability value of the real image.
- step S807 based on the first loss function, alternately train the inherited network and the first discriminant network until the first loss function converges.
- Fig. 10 shows a data flow diagram in the training process of the inherited network according to an embodiment of the present application.
- the two face images of the input source are respectively used as the supervision images of the two output facial images of the inherited network. Therefore, in order to facilitate comparison, the two face images of the inherited network are also shown in Figure 10 Road output.
- Figure 10 Road output In fact, as described above with reference to FIG. 2, whenever two facial images are provided as input to the inheritance network, only one facial image is output.
- the fifth facial image I M is exchanged twice with the same control vector v 1 to obtain the seventh facial image I′ M , and I M is used as the supervision image of I′ M.
- the sixth facial image I F is exchanged twice with the same control vector v 2 to obtain the eighth facial image I′ F , and I F is used as the supervision image of I′ F.
- the first loss function is based on the probability value output by the first discriminant network for at least one set of inherited training data and the difference between the facial image and the corresponding supervised image in the at least one set of inherited training data. Pixel difference is determined.
- the first loss function includes the sum of two parts of the counter loss and the pixel loss. Fight against loss
- the distribution of facial images generated by the inheritance network is closer to the real image, and can be calculated by the following formula:
- D I denotes a first discriminating network, D I (I 's) to the output when the first discrimination output from the image inherited network determines a first network input network (probability value), D I (I s) as to the second The output (probability value) of the first discrimination network when the first discrimination network inputs real images.
- D I represents a first discriminating network
- ⁇ gp is the hyperparameter of WGAN
- Pixel loss Used to ensure the similarity between the facial image generated by the inheritance network and the facial image as the input source, the pixel level loss between the facial image generated by the inheritance network and the real facial image, that is, the absolute difference between the pixel values of the two images
- the specific formula is as follows:
- the first loss function can be expressed as follows:
- ⁇ 11 and ⁇ 12 are weight coefficients.
- the inheritance network can be fixed first, and the first discriminant network can be trained. At this time, it is desirable that the value of the first loss function is as small as possible. Then, you can fix the first discriminant network and train the inherited network. At this time, it is desirable that the value of the first loss function is as large as possible. After multiple rounds of training, when the first loss function does not fluctuate much for different inherited training data, that is, when the first loss function converges, the training of the inherited network is completed.
- the first loss function may be further determined based on at least one of the following: at least one set of inherited attributes of the facial image in the training data and The differences between the attributes of the corresponding supervised images and the differences between the features of the facial images in at least one set of inherited training data and the features of the corresponding supervised images.
- the first loss function may further include attribute loss.
- the attribute loss is determined by the difference between the attributes of the facial image output by the inherited network and the attributes of the real facial image as the input source.
- the loss functions of age and gender can be calculated by the following formulas:
- D a and D g are networks that distinguish the age and gender of an image.
- ResNet can be used to pre-train the regression model of age and gender, so that when an image I 's is input to the model, the age and gender information of the image can be output.
- D a (I' s ) represents the age of the face image (I' s ) judged by D a
- D g (I' s ) represents the gender of the face image (I' s ) judged by D g .
- the first loss function may further include perceptual loss.
- perceptual loss 19-layer VGG features can be used to calculate the perceptual loss That is, the distance between the VGG feature of the facial image output by the inheriting network and the VGG feature of the real facial image as the input source, the specific formula is as follows:
- the first loss function can also be expressed as follows:
- ⁇ 11 , ⁇ 12 , ⁇ 13 , ⁇ 14 and ⁇ 15 are all different weight coefficients, which can be assigned according to the importance of each loss function.
- the attribute enhancement network can be obtained through the following training steps shown in FIG. 11.
- GAN generative confrontation network
- the attribute enhancement network can be regarded as the generative network here.
- the so-called true means that the output facial image is a real image; the so-called false means that the output facial image is an image output by the attribute enhancement network.
- the fourth facial image is generated by inputting the third facial image to the attribute enhancement network, and the attribute enhancement network is obtained through the following training steps shown in FIG. 11.
- step S1101 the seventh facial image (I' M ) and the eighth facial image (I' F ) are input to the attribute enhancement network.
- step S1102 through the attribute enhancement network, output a ninth facial image corresponding to the seventh facial image And the tenth facial image corresponding to the eighth facial image
- the seventh facial image is a supervision image used to provide supervision information to the ninth facial image
- the eighth facial image is a supervision image used to provide supervision information to the tenth facial image
- the seventh to tenth facial images are taken as one Group attribute training data.
- the generation process of the attribute enhancement network can be expressed by the following formula:
- the attributes of the desired output facial image are set to be the same as the attributes of the facial image as the input source, so as to facilitate the calculation of the subsequent loss function.
- step S1103 at least one set of attribute training data is input to the second discriminant network, wherein the second discriminant network is set to output an image as a real image when an image is input to the second discriminant network The probability value.
- step S1104 based on the second loss function, the attribute enhancement network and the second discriminant network are alternately trained until the second loss function converges.
- Fig. 12 shows a data flow diagram in the training process of the attribute enhancement network according to an embodiment of the present application. Similar to Figure 10, Figure 12 also shows two outputs of the attribute enhancement network.
- the seventh facial image I 'M and the eighth facial image I'F are input to the attribute enhancement network and transformed into the feature space to obtain feature maps Z M and Z F respectively , which are stitched with the attribute features in the feature space And inversely transform back to the image space to get the ninth facial image And tenth facial image And take the seventh facial image I 'M and the eighth facial image I'F as the ninth facial image respectively And tenth facial image Supervised image.
- the second loss function is based on the probability value output by the second discriminant network for at least one set of attribute training data and the difference between the facial image and the corresponding supervised image in the at least one set of attribute training data. Pixel difference is determined.
- the second loss function includes the sum of two parts of the counter loss and the pixel loss.
- Fight against loss Make the distribution of facial images generated by the attribute enhancement network closer to real images, and can be calculated by the following formula:
- Represents the second discriminant network Is the output (probability value) of the second discriminant network when the image output by the attribute enhancement network is input to the second discriminant network, It is the output (probability value) of the second discrimination network when the real image is input to the second discrimination network.
- the face image is being input Is the mean value of the logarithm output by the second discriminant network, where It is the facial image output by the attribute enhancement network. It represents the mean number of the input face image I s second discrimination output from the network, where I s is the facial image of the real face database.
- ⁇ gp is the hyperparameter of WGAN
- Pixel loss It is used to ensure the similarity between the facial image generated by the attribute enhancement network and the facial image output by the inheritance network.
- the pixel level loss between the facial image generated by the attribute enhancement network and the image output by the inheritance network that is, the pixel value of the two images
- the difference is expressed as the sum of absolute values, the specific formula is as follows:
- the second loss function can be expressed as follows:
- ⁇ 21 and ⁇ 22 are weight coefficients.
- the second loss function may be further determined based on at least one of the following: the attributes of the facial image in the at least one set of attribute training data and The difference between the attributes of the corresponding supervised images and the difference between the features of the facial image and the features of the corresponding supervised image in at least one set of attribute training data.
- the second loss function may further include attribute loss.
- the attribute loss is determined by the difference between the attributes of the facial image output by the attribute enhancement network and the attributes of the facial image output by the inheriting network.
- the loss functions of age and gender can be calculated by the following formulas:
- D a and D g are networks that distinguish the age and gender of an image.
- ResNet can be used to pre-train the regression model of age and gender, so that when an image I 's is input to the model, the age and gender information of the image can be output.
- Represents the facial image judged by D a Age Represents the facial image judged by D g Gender.
- Represents the age of the facial image output by the inheritance network Represents the gender of the facial image output as the inheritance network. Since the age and gender of the facial image output by the inherited network are the same as the age and gender of the real facial image as the input source, the age and gender of the real facial image can be directly used as the with
- the first loss function may further include perceptual loss.
- perceptual loss 19-layer VGG features can be used to calculate the perceptual loss That is, the distance between the VGG feature of the facial image output by the attribute enhancement network and the VGG feature of the facial image output by the inheritance network, the specific formula is as follows:
- the second loss function can also be expressed as follows:
- ⁇ 21 , ⁇ 22 , ⁇ 23 , ⁇ 24 and ⁇ 25 are different weight coefficients, which can be assigned according to the importance of each loss function.
- the network can be enhanced with fixed attributes first, and the second discriminant network can be trained. At this time, it is desirable that the value of the second loss function is as small as possible. Then, the second discriminant network can be fixed and the attribute enhancement network can be trained. At this time, it is desirable that the value of the second loss function is as large as possible. After multiple rounds of training, when the second loss function does not fluctuate much for different attribute training data, that is, when the second loss function converges, the training of the attribute enhancement network is completed.
- the attributes of the original input facial image (such as age) can be greatly changed, but in the training process of the attribute enhancement network, in order to provide supervision information, Select the same attributes as the originally input face image.
- the separate training process for inheritance network and attribute enhancement network is described.
- the two networks can also be jointly trained to find the global optimal solution.
- the inheritance network and the attribute enhancement network are further optimized through the following joint training steps: based on the first loss function and the second loss function, determine a total loss function; based on the total loss function, alternate Training the inheritance network and the attribute enhancement network, the first discriminant network and the second discriminant network until the total loss function converges.
- the weighted sum of the first loss function and the second loss function can be used as the total loss function L, and the specific formula is as follows:
- ⁇ 01 and ⁇ 02 are different weight coefficients, which can be assigned according to the importance of each loss function.
- the inheritance network and the attribute enhancement network can be fixed first, and the first discriminant network and the second discriminant network can be trained. At this time, it is hoped that the value of the total loss function is as small as possible, and the parameters of the first discriminant network and the second discriminant network are uniformly adjusted. Then, the first discriminant network and the second discriminant network can be fixed, and the inheritance network and the attribute enhancement network can be trained. At this time, it is hoped that the value of the total loss function is as large as possible, and the parameters of the inheritance network and the attribute enhancement network are uniformly adjusted. After multiple rounds of training, when the total loss function converges, the joint training of the two networks is completed.
- the server 10 is connected to a plurality of terminal devices 20 through a network 30.
- the plurality of terminal devices 20 are devices that provide first facial images and second facial images as input sources.
- the terminal may be a smart terminal, such as a smart phone, a PDA (personal digital assistant), a desktop computer, a notebook computer, a tablet computer, etc., or other types of terminals.
- the server 10 is a device for training the inherited network and the attribute enhancement network described above based on the existing face database.
- the server is also a device that applies the trained inheritance network and the attribute enhancement network to facial image generation.
- the server 10 is connected to the terminal device 20, receives the first facial image and the second facial image from the terminal device 20, and generates the third facial image based on the trained inheritance network and the attribute enhancement network on the server 10 or The fourth facial image, and the generated facial image is transmitted to the terminal device 20.
- the server 10 may be a data processing device described below.
- the network 30 may be any type of wired or wireless network, such as the Internet. It should be appreciated that the number of terminal devices 20 shown in FIG. 13 is illustrative and not restrictive. Of course, the data processing device for facial image generation according to the embodiment of the present application may also be a stand-alone device that is not networked.
- Fig. 14 is a diagram illustrating a data processing device for facial image generation according to an embodiment of the present application.
- the data processing device 1400 includes: a dividing device 1401, a first transforming device 1402, a selecting device 1403, a first combining device 1404, and a first inverse transforming device 1405.
- the segmentation device 1401 is configured to obtain M first image blocks corresponding to each facial feature in the input first facial image, and obtain N second image blocks corresponding to each facial feature in the input second facial image.
- the first transformation device 1402 is configured to transform the M first image blocks and N second image blocks into a feature space to generate M first feature blocks and N second feature blocks.
- the first transformation device 1402 may perform the transformation through a first transformation network (for example, a coding network).
- the selecting device 1403 is used for selecting a part of the first characteristic block and a part of the second characteristic block according to a specific control vector.
- the specific control vector includes L information bits corresponding to each facial feature
- the selection device 1403 is further configured to: when one information bit in the specific control vector is the first When the value is one, the feature block of the facial feature corresponding to the information bit is selected from the M first feature blocks, and when the information bit in the specific control vector is the second value, from the N second feature blocks Select the feature block of the facial feature corresponding to the information bit from the block.
- L is a natural number, and M ⁇ L and N ⁇ L.
- the first synthesis device 1404 is configured to generate a first synthesis feature map based on at least a part of the first feature block and a part of the second feature block selected.
- the attributes (eg, age and gender) of the output third facial image can be controlled. For example, you can specify the gender of the third facial image you want to output.
- the input attribute information of the first facial image and the second facial image may be quite different. Therefore, as another possible implementation manner, the first synthesizing device 140 is further configured to: expand the specified attribute information into attribute feature blocks in the feature space; and based on a selected part of the first feature block , A part of the second feature block and the attribute feature block to generate the first composite feature map.
- the first inverse transformation device 1405 is configured to inversely transform the first synthetic feature map back into the image space to generate a third facial image.
- the first inverse transform device 1405 may perform the inverse transform through a first inverse transform network (for example, a decoding network). And, the first transformation network and the first inverse transformation network constitute a succession network.
- the output third facial image can be made close to the real image while ensuring the similarity between the output third facial image and the facial image as the input source. In other words, when the third facial image is viewed by the user, it is difficult to distinguish whether the image is a real image or a composite image.
- the control vector it is possible to accurately control which facial features of the two input facial images are inherited by the third facial image.
- the attributes of the third facial image can be specified and the harmony and naturalness of the third facial image can be further improved.
- the main purpose of the inheritance network described above is to output a third facial image that is similar in facial features to the first facial image and the second facial image, so the superposition of the attribute feature blocks included in it is to ensure similarity. Fine-tuning.
- the third facial image output by the inheritance network is similar to the first facial image and the second facial image as the input source in attributes such as age.
- the data processing device 1400 may further include: a second transformation device 1406, an expansion device 1407, a second synthesis module 1408, and a second inverse transformation ⁇ 1409.
- the second transformation device 1406 is used to transform the third facial image into a feature space to generate a third feature map.
- the second transformation device may perform the transformation through a second transformation network (for example, a coding network), and the second transformation network here is different from the first transformation network above.
- the expansion device 1407 is used to expand specific attribute information into an attribute feature map in the feature space.
- the second synthesis module 1408 is configured to generate a second synthesized feature map based on the attribute feature map and the third feature map.
- the second inverse transform device 1409 is used to inversely transform the second synthetic feature map back into the image space to generate a fourth facial image.
- the second inverse transformation device may perform the transformation through a second inverse transformation network (for example, a decoding network), and the second inverse transformation network here is different from the first inverse transformation network above.
- the second transformation network and the second inverse transformation network constitute an attribute enhancement network.
- the second transformation device 1406 Due to the optionality of the second transformation device 1406, the expansion device 1407, the second synthesis module 1408, and the second inverse transformation device 1409, they are shown in a dashed frame in FIG. 14.
- the fourth facial image can greatly change in attributes. For example, based on the inputted third facial image of 20 years old, the fourth facial image of 5 years old may be output.
- the inheritance network and the attribute enhancement network may include an encoding network and a decoding network, and both the encoding network and the decoding network include multiple parameters to be determined. These parameters are determined through the training process to complete the construction of inheritance network and attribute enhancement network. In this way, the inheritance network and the attribute enhancement network can realize the function of facial image generation. In other words, before applying inheritance network and attribute enhancement network, we must first train inheritance network and attribute enhancement network.
- the data processing device 1400 further includes a training device 1410.
- the training device 1410 is used to train the inherited network in the training mode. Specifically, the training device 1410 includes: a pre-exchange module, a first discrimination module, and a first training module.
- the pre-exchange module is used to obtain L fifth image blocks corresponding to each facial feature in the fifth facial image ( IM ), and obtain L sixth image blocks corresponding to each facial feature in the sixth facial image ( IF ) , According to the first control vector to select a part of the fifth image block and a part of the sixth image block to generate the first composite image And select another part of the fifth image block and another part of the sixth image block according to the second control vector to generate a second composite image
- the segmentation device is further configured to obtain the first composite image L seventh image blocks corresponding to each facial feature in, and obtain the second composite image L eighth image blocks corresponding to each facial feature in, and input L seventh image blocks and L eighth image blocks into the inheritance network.
- L is a natural number, and M ⁇ L and N ⁇ L.
- the first discriminating module is configured to receive at least one set of inherited training data, and through the first discriminating network, output the probability value of the input inherited training data as a real image, wherein the at least one set of inherited training data includes fifth to An eighth facial image, the seventh facial image (I' M ) is generated by selecting a part of the seventh image block and a part of the eighth image block based on the first control vector through the inheritance network, the eighth facial image (I' F ) Generated by the inherited network by selecting another part of the seventh image block and another part of the eighth image block based on the second control vector, where the fifth facial image is a supervision image used to provide supervision information for the seventh facial image, and the first The six-face image is a supervision image used to provide supervision information for the eighth face image.
- the first training module is configured to alternately train the inherited network and the first discriminant network based on the first loss function until the first loss function converges.
- the first loss function is determined based on the probability value output by the first discriminant network for at least one set of inherited training data and the pixel difference between the face image and the corresponding supervised image in the at least one set of inherited training data.
- the first loss function is further determined based on at least one of the following: the difference between at least one set of attributes of the face image in the inherited training data and the attributes of the corresponding supervised image and at least A set of differences between the features of the facial image in the inherited training data and the features of the corresponding supervised image.
- the training device 1410 is also used to train the attribute enhancement network in the training mode.
- the training device 1410 further includes: a second discrimination module and a second training module.
- the second discriminant module is used to receive at least one set of attribute training data, and through the second discriminant network, output the probability value used to discriminate the input attribute training data as a real image, wherein the at least one set of attribute training data includes seventh to The tenth facial image, the ninth facial image Based on the seventh facial image output through the attribute enhancement network, the tenth facial image
- the attribute enhancement network is based on the eighth facial image output, where the seventh facial image is a supervision image used to provide supervision information for the ninth facial image, and the eighth facial image is a supervision image used to provide supervision information for the tenth facial image image.
- the second training module is configured to alternately train the attribute enhancement network and the second discriminant network based on the second loss function until the second loss function converges.
- the second loss function is determined based on the probability value output by the second discriminant network for at least one set of attribute training data and the pixel difference between the face image and the corresponding supervised image in the at least one set of attribute training data.
- the second loss function is further determined based on at least one of the following: the difference between the attributes of the facial image and the attributes of the corresponding supervised image in at least one set of attribute training data and at least The difference between the feature of the facial image and the feature of the corresponding supervised image in a set of attribute training data.
- the training device may further include: a joint training module for determining a total loss function based on the first loss function and second loss function, and alternately training the inherited network based on the total loss function And the attribute enhancement network and the first discriminant network and the second discriminant network until the total loss function converges.
- FIG. 15 An example of a data processing device for facial image generation according to an embodiment of the present application as a hardware entity is shown in FIG. 15.
- the terminal device includes a processor 1501, a memory 1502, and at least one external communication interface 1503.
- the processor 1501, the memory 1502, and the external communication interface 1503 are all connected via a bus 1504.
- a microprocessor when performing processing, a microprocessor, a central processing unit (CPU, Central Processing Unit), a digital signal processor (DSP, Digital Singnal Processor), or a programmable logic array can be used.
- CPU Central Processing Unit
- DSP Digital Singnal Processor
- programmable logic array For the processor 1501 used for data processing, when performing processing, a microprocessor, a central processing unit (CPU, Central Processing Unit), a digital signal processor (DSP, Digital Singnal Processor), or a programmable logic array can be used.
- DSP Digital Singnal Processor
- FPGA Field-Programmable Gate Array
- Fig. 16 shows a schematic diagram of a computer-readable recording medium according to an embodiment of the present application.
- a computer-readable recording medium 1600 according to an embodiment of the present application stores computer program instructions 1601.
- the computer program instructions 1601 are executed by the processor, the data processing method for facial image generation according to the embodiment of the present application described with reference to the above drawings is executed.
- An embodiment of the present application also provides a computer device, including a memory and a processor.
- the memory stores a computer program that can be run on the processor.
- the processor executes the computer program, the computer program described in the foregoing embodiment can be implemented.
- the computer device can be the server described above or any device capable of data processing.
- the data processing method, device and medium for facial image generation according to embodiments of the present application have been described in detail with reference to FIGS. 1 to 16.
- the data processing method, device, and medium for facial image generation according to the embodiments of the present application through the segmentation of facial feature images and the reorganization in the feature space, it is possible to generate facial features that inherit a part of the first facial image And a third facial image of facial features in a part of the second facial image.
- the third facial image is viewed by the user, it is difficult to distinguish whether the image is a real image or a composite image.
- the inheritance network by setting a control vector, it is possible to precisely control which facial features of the two input facial images are inherited by the third facial image.
- the attributes of the third facial image can be specified and the harmony and naturalness of the third facial image can be further improved.
- the attributes of the generated facial images can be changed in a wider range.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Human Computer Interaction (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- General Engineering & Computer Science (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Computing Systems (AREA)
- Image Processing (AREA)
- Processing Or Creating Images (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (14)
- 一种用于面部图像生成的数据处理方法,由服务器执行,包括:获取第一面部图像(I MA)及第二面部图像(I FA);获取第一面部图像(I MA)中与面部特征对应的M个第一图像块,并获取第二面部图像(I FA)中与面部特征对应的N个第二图像块;将M个第一图像块和N个第二图像块变换到特征空间以生成M个第一特征块和N个第二特征块;根据特定的控制向量选择一部分第一特征块和一部分第二特征块;至少基于所选择的一部分第一特征块和一部分第二特征块,生成第一合成特征图;以及将所述第一合成特征图反变换回图像空间以生成第三面部图像(I O1),其中M和N为自然数。
- 根据权利要求1所述的方法,其中所述特定的控制向量包括与各面部特征对应的L个信息位,其中L为自然数,且M≤L,N≤L,并且根据特定的控制向量选择一部分第一特征块和一部分第二特征块的步骤包括:当所述特定的控制向量中的一个信息位为第一值时,从M个第一特征块中选择与该信息位对应的面部特征的特征块,而当所述特定的控制向量中的该信息位为第二值时,从N个第二特征块中选择与该信息位对应的面部特征的特征块。
- 根据权利要求1所述的方法,其中至少基于所选择的一部分第一特征块和一部分第二特征块,生成第一合成特征图的步骤包括:将指定属性信息扩展为处于所述特征空间中的属性特征块;以及基于所选择的一部分第一特征块、一部分第二特征块以及属性特征块,生成第一合成特征图。
- 根据权利要求1所述的方法,其中通过将M个第一图像块和N个第二图像块输入到继承网络来生成所述第三面部图像,并且所述继承网络通过以下训练步骤得到:获取第五面部图像(I M)中与各面部特征对应的L个第五图像块,并获取第六面部图像(I F)中与各面部特征对应的L个第六图像块,其中L为自然数,且M≤L且N≤L;将L个第七图像块和L个第八图像块输入到继承网络;通过所述继承网络,输出基于第一控制向量选择的一部分第七图像块和一部分第八图像块而生成的第七面部图像(I' M),并输出基于第二控制向量选择的另一部分第七图像块和另一部分第八图像块而生成的第八面部图像(I' F),其中第五面部图像是用于对第七面部图像提供监督信息的监督图像,第六面部图像是用于对第八面部图像提供监督信息的监督图像,并且将第五至第八面部图像作为一组继承训练数据;将至少一组继承训练数据输入至第一判别网络,其中所述第一判别网络被设置为当向所述第一判别网络输入一图像时,输出该图像为真实图像的概率值;以及基于第一损失函数,交替地训练所述继承网络和所述第一判别网络,直至所述第一损失函数收敛为止。
- 根据权利要求4所述的方法,其中所述第一损失函数基于所述第一判别网络对于至少一组继承训练数据输出的概率值以及至少一组继承训练数据中面部图像与对应的监督图像之间的像素差异而确定。
- 根据权利要求5所述的方法,其中所述第一损失函数进一步基于以下至少之一而确定:至少一组继承训练数据中面部图像的属性与对应的监督图像的属性之间的差异和至少一组继承训练数据中面部图像的特征与对应的监督图像的特征之间的差异。
- 根据权利要求4所述的方法,进一步包括:将所述第三面部图像(I O1)变换至特征空间以生成第三特征图;将特定的属性信息扩展为所述特征空间中的属性特征图;基于所述属性特征图与所述第三特征图,生成第二合成特征图;以及将第二合成特征图反变换回图像空间,以生成第四面部图像(I O2)。
- 根据权利要求7所述的方法,其中通过将第三面部图像输入到属性增强网络来生成所述第四面部图像,并且所述属性增强网络通过以下训练步骤得到:将第七面部图像(I' M)和第八面部图像(I' F)输入至属性增强网络;通过属性增强网络,输出与第七面部图像对应的第九面部图像 以及与第八面部图像对应的第十面部图像 其中第七面部图像是用于对第九面部图像提供监督信息的监督图像,第八面部图像是用于对第十面部图像提供监督信息的监督图像,并且将第七至第十面部图像作为一组属性训练数据;将至少一组属性训练数据输入至第二判别网络,其中所述第二判别网络被设置为当向所述第二判别网络输入一图像时,输出该图像为真实图像的概率值;基于第二损失函数,交替地训练所述属性增强网络和所述第二判别网络,直至所述第二损失函数收敛为止。
- 根据权利要求8所述的方法,其中所述第二损失函数基于所述第二判别网络对于至少一组属性训练数据输出的概率值以及至少一组属性训练数据中面部图像与对应的监督图像之间的像素差异而确定。
- 根据权利要求9所述的方法,其中所述第二损失函数进一步基于以下至少之一而确定:至少一组属性训练数据中面部图像的属性与对应的监督图像的属性之间的差异和至少一组属性训练数据中面部图像的特征与对应的监督图像的特征之间的差异。
- 根据权利要求8所述的方法,其中所述继承网络和所述属性增强网络通过以下联合训练步骤进一步优化:基于所述第一损失函数和所述第二损失函数,确定总损失函数;基于所述总损失函数,交替地训练所述继承网络和所述属性增强网络与第一判别网络和第二判别网络,直至所述总损失函数收敛为止。
- 一种用于面部图像生成的数据处理设备,包括:分割装置,用于获取输入的第一面部图像中与面部特征对应的M个第一图像块,并获取输入的第二面部图像中与面部特征对应的N个第二图像块;第一变换装置,用于将M个第一图像块和N个第二图像块变换到特征空间以生成M个第一特征块和N个第二特征块;选择装置,用于根据特定的控制向量选择一部分第一特征块和一部分第二特征块;第一合成装置,用于至少基于所选择的一部分第一特征块和一部分第二特征块,生成第一合成特征图;以及第一反变换装置,用于将所述第一合成特征图反变换回图像空间以生成第三面部图像。
- 一种计算机可读记录介质,在其上存储计算机程序,当由处理器执行所述计算机程序时,执行根据权利要求1至11中任意一项所述的方法。
- 一种计算机设备,包括存储器和处理器,所述存储器用于存储计算机程序,所述处理器用于执行所述计算机程序以实现权利要求1-11任一项所述的用于面部图像生成的数据处理方法。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP20796212.7A EP3961486A4 (en) | 2019-04-26 | 2020-04-02 | DATA PROCESSING METHOD AND DEVICE FOR FACIAL IMAGING AND MEDIUM |
KR1020217020518A KR102602112B1 (ko) | 2019-04-26 | 2020-04-02 | 얼굴 이미지 생성을 위한 데이터 프로세싱 방법 및 디바이스, 및 매체 |
JP2021534133A JP7246811B2 (ja) | 2019-04-26 | 2020-04-02 | 顔画像生成用のデータ処理方法、データ処理機器、コンピュータプログラム、及びコンピュータ機器 |
US17/328,932 US11854247B2 (en) | 2019-04-26 | 2021-05-24 | Data processing method and device for generating face image and medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910345276.6 | 2019-04-26 | ||
CN201910345276.6A CN110084193B (zh) | 2019-04-26 | 2019-04-26 | 用于面部图像生成的数据处理方法、设备和介质 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/328,932 Continuation US11854247B2 (en) | 2019-04-26 | 2021-05-24 | Data processing method and device for generating face image and medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020216033A1 true WO2020216033A1 (zh) | 2020-10-29 |
Family
ID=67417067
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/082918 WO2020216033A1 (zh) | 2019-04-26 | 2020-04-02 | 用于面部图像生成的数据处理方法、设备和介质 |
Country Status (6)
Country | Link |
---|---|
US (1) | US11854247B2 (zh) |
EP (1) | EP3961486A4 (zh) |
JP (1) | JP7246811B2 (zh) |
KR (1) | KR102602112B1 (zh) |
CN (1) | CN110084193B (zh) |
WO (1) | WO2020216033A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112613460A (zh) * | 2020-12-30 | 2021-04-06 | 深圳威富优房客科技有限公司 | 人脸生成模型的建立方法和人脸生成方法 |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110084193B (zh) * | 2019-04-26 | 2023-04-18 | 深圳市腾讯计算机系统有限公司 | 用于面部图像生成的数据处理方法、设备和介质 |
US11373352B1 (en) * | 2021-03-04 | 2022-06-28 | Meta Platforms, Inc. | Motion transfer using machine-learning models |
US11341701B1 (en) * | 2021-05-06 | 2022-05-24 | Motorola Solutions, Inc | Method and apparatus for producing a composite image of a suspect |
CN114708644B (zh) * | 2022-06-02 | 2022-09-13 | 杭州魔点科技有限公司 | 一种基于家庭基因模板的人脸识别方法和系统 |
CN116012258B (zh) * | 2023-02-14 | 2023-10-13 | 山东大学 | 一种基于循环生成对抗网络的图像和谐化方法 |
CN117078974B (zh) * | 2023-09-22 | 2024-01-05 | 腾讯科技(深圳)有限公司 | 图像处理方法及装置、电子设备、存储介质 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1490764A (zh) * | 2002-05-31 | 2004-04-21 | 欧姆龙株式会社 | 用于合成图像的方法、设备、系统、程序以及计算机可读介质 |
US20080199055A1 (en) * | 2007-02-15 | 2008-08-21 | Samsung Electronics Co., Ltd. | Method and apparatus for extracting facial features from image containing face |
CN103295210A (zh) * | 2012-03-01 | 2013-09-11 | 汉王科技股份有限公司 | 婴儿图像合成方法及装置 |
US20170301121A1 (en) * | 2013-05-02 | 2017-10-19 | Emotient, Inc. | Anonymization of facial images |
CN108171124A (zh) * | 2017-12-12 | 2018-06-15 | 南京邮电大学 | 一种相似样本特征拟合的人脸图像清晰化方法 |
CN110084193A (zh) * | 2019-04-26 | 2019-08-02 | 深圳市腾讯计算机系统有限公司 | 用于面部图像生成的数据处理方法、设备和介质 |
Family Cites Families (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005107848A (ja) * | 2003-09-30 | 2005-04-21 | Lic Corporation:Kk | 子供画像生成装置 |
KR20080106596A (ko) * | 2007-03-22 | 2008-12-09 | 연세대학교 산학협력단 | 가상 얼굴 생성 방법 |
CN106682632B (zh) * | 2016-12-30 | 2020-07-28 | 百度在线网络技术(北京)有限公司 | 用于处理人脸图像的方法和装置 |
US10430978B2 (en) * | 2017-03-02 | 2019-10-01 | Adobe Inc. | Editing digital images utilizing a neural network with an in-network rendering layer |
US10474881B2 (en) * | 2017-03-15 | 2019-11-12 | Nec Corporation | Video retrieval system based on larger pose face frontalization |
CN107273818B (zh) * | 2017-05-25 | 2020-10-16 | 北京工业大学 | 遗传算法融合差分进化的选择性集成人脸识别方法 |
CN107578017B (zh) * | 2017-09-08 | 2020-11-17 | 百度在线网络技术(北京)有限公司 | 用于生成图像的方法和装置 |
CN107609506B (zh) * | 2017-09-08 | 2020-04-21 | 百度在线网络技术(北京)有限公司 | 用于生成图像的方法和装置 |
CN108288072A (zh) * | 2018-01-26 | 2018-07-17 | 深圳市唯特视科技有限公司 | 一种基于生成对抗网络的面部表情合成方法 |
CN108510473A (zh) | 2018-03-09 | 2018-09-07 | 天津工业大学 | 结合深度可分离卷积与通道加权的fcn视网膜图像血管分割 |
CN108510437B (zh) * | 2018-04-04 | 2022-05-17 | 科大讯飞股份有限公司 | 一种虚拟形象生成方法、装置、设备以及可读存储介质 |
CN109508669B (zh) * | 2018-11-09 | 2021-07-23 | 厦门大学 | 一种基于生成式对抗网络的人脸表情识别方法 |
CN109615582B (zh) * | 2018-11-30 | 2023-09-01 | 北京工业大学 | 一种基于属性描述生成对抗网络的人脸图像超分辨率重建方法 |
-
2019
- 2019-04-26 CN CN201910345276.6A patent/CN110084193B/zh active Active
-
2020
- 2020-04-02 WO PCT/CN2020/082918 patent/WO2020216033A1/zh unknown
- 2020-04-02 KR KR1020217020518A patent/KR102602112B1/ko active IP Right Grant
- 2020-04-02 JP JP2021534133A patent/JP7246811B2/ja active Active
- 2020-04-02 EP EP20796212.7A patent/EP3961486A4/en active Pending
-
2021
- 2021-05-24 US US17/328,932 patent/US11854247B2/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1490764A (zh) * | 2002-05-31 | 2004-04-21 | 欧姆龙株式会社 | 用于合成图像的方法、设备、系统、程序以及计算机可读介质 |
US20080199055A1 (en) * | 2007-02-15 | 2008-08-21 | Samsung Electronics Co., Ltd. | Method and apparatus for extracting facial features from image containing face |
CN103295210A (zh) * | 2012-03-01 | 2013-09-11 | 汉王科技股份有限公司 | 婴儿图像合成方法及装置 |
US20170301121A1 (en) * | 2013-05-02 | 2017-10-19 | Emotient, Inc. | Anonymization of facial images |
CN108171124A (zh) * | 2017-12-12 | 2018-06-15 | 南京邮电大学 | 一种相似样本特征拟合的人脸图像清晰化方法 |
CN110084193A (zh) * | 2019-04-26 | 2019-08-02 | 深圳市腾讯计算机系统有限公司 | 用于面部图像生成的数据处理方法、设备和介质 |
Non-Patent Citations (2)
Title |
---|
See also references of EP3961486A4 * |
XIAO YANG ET AL.: "Recognizing Minimal Facial Sketch by Generating Photorealistic Faces With the Guidance of Descriptive Attributes", vol. Speech and Signal Processing (ICASSP), no. 2018 IEEE International Conference on Acoustics, XP033403904, ISSN: 2379-190X * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112613460A (zh) * | 2020-12-30 | 2021-04-06 | 深圳威富优房客科技有限公司 | 人脸生成模型的建立方法和人脸生成方法 |
Also Published As
Publication number | Publication date |
---|---|
US11854247B2 (en) | 2023-12-26 |
EP3961486A4 (en) | 2022-07-13 |
CN110084193A (zh) | 2019-08-02 |
KR20210095696A (ko) | 2021-08-02 |
US20210279515A1 (en) | 2021-09-09 |
JP2022513858A (ja) | 2022-02-09 |
KR102602112B1 (ko) | 2023-11-13 |
EP3961486A1 (en) | 2022-03-02 |
CN110084193B (zh) | 2023-04-18 |
JP7246811B2 (ja) | 2023-03-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020216033A1 (zh) | 用于面部图像生成的数据处理方法、设备和介质 | |
CN111754596B (zh) | 编辑模型生成、人脸图像编辑方法、装置、设备及介质 | |
US20220028139A1 (en) | Attribute conditioned image generation | |
US20200402284A1 (en) | Animating avatars from headset cameras | |
JP7144699B2 (ja) | 信号変更装置、方法、及びプログラム | |
WO2021027759A1 (en) | Facial image processing | |
WO2023050992A1 (zh) | 用于人脸重建的网络训练方法、装置、设备及存储介质 | |
JP2023548921A (ja) | 画像の視線補正方法、装置、電子機器、コンピュータ可読記憶媒体及びコンピュータプログラム | |
CN111353546B (zh) | 图像处理模型的训练方法、装置、计算机设备和存储介质 | |
CN115565238B (zh) | 换脸模型的训练方法、装置、设备、存储介质和程序产品 | |
CN110288513A (zh) | 用于改变人脸属性的方法、装置、设备和存储介质 | |
US20220101121A1 (en) | Latent-variable generative model with a noise contrastive prior | |
CN112101087A (zh) | 一种面部图像身份去识别方法、装置及电子设备 | |
WO2022166840A1 (zh) | 人脸属性编辑模型的训练方法、人脸属性编辑方法及设备 | |
US20220101122A1 (en) | Energy-based variational autoencoders | |
Liu et al. | Learning shape and texture progression for young child face aging | |
CN116825127A (zh) | 基于神经场的语音驱动数字人生成方法 | |
KR20210019182A (ko) | 나이 변환된 얼굴을 갖는 직업영상 생성 장치 및 방법 | |
CN117237521A (zh) | 语音驱动人脸生成模型构建方法、目标人说话视频生成方法 | |
CN116152631A (zh) | 模型训练及图像处理方法、装置、设备及存储介质 | |
CN113822790B (zh) | 一种图像处理方法、装置、设备及计算机可读存储介质 | |
CN115914505A (zh) | 基于语音驱动数字人模型的视频生成方法及系统 | |
US20220101145A1 (en) | Training energy-based variational autoencoders | |
CN114943912A (zh) | 视频换脸方法、装置及存储介质 | |
KR102147061B1 (ko) | 사용자의 주관적 선호도를 반영한 가상 얼굴 성형 장치 및 방법 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20796212 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2021534133 Country of ref document: JP Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 20217020518 Country of ref document: KR Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2020796212 Country of ref document: EP Effective date: 20211126 |