WO2020216033A1 - 用于面部图像生成的数据处理方法、设备和介质 - Google Patents

用于面部图像生成的数据处理方法、设备和介质 Download PDF

Info

Publication number
WO2020216033A1
WO2020216033A1 PCT/CN2020/082918 CN2020082918W WO2020216033A1 WO 2020216033 A1 WO2020216033 A1 WO 2020216033A1 CN 2020082918 W CN2020082918 W CN 2020082918W WO 2020216033 A1 WO2020216033 A1 WO 2020216033A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
facial
feature
network
facial image
Prior art date
Application number
PCT/CN2020/082918
Other languages
English (en)
French (fr)
Inventor
张勇
李乐
刘志磊
吴保元
樊艳波
李志锋
刘威
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP20796212.7A priority Critical patent/EP3961486A4/en
Priority to KR1020217020518A priority patent/KR102602112B1/ko
Priority to JP2021534133A priority patent/JP7246811B2/ja
Publication of WO2020216033A1 publication Critical patent/WO2020216033A1/zh
Priority to US17/328,932 priority patent/US11854247B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Definitions

  • This application relates to the field of image processing, and more specifically, to data processing methods, equipment, media, and computer equipment for facial image generation.
  • Facial image generation technology is an emerging research field, which has broad application prospects in progeny face prediction, criminal image restoration in criminal investigation, and construction of virtual characters. For example, by inputting a facial image, another brand-new facial image that is similar to but different from the facial image can be generated as the target image.
  • the existing facial image generation scheme uses a general processing network to generate the target image. For example, input a face image to the trained encoding network and decoding network, and then output the target image.
  • the problem with such an image generation scheme is that the harmony and naturalness of the synthesized facial image output by the general processing network is very poor, and it is difficult for users to believe that this is a real facial image.
  • embodiments of the present application provide a data processing method, device, medium, and computer device for facial image generation, which can generate a synthetic facial image closer to a real facial image.
  • a data processing method for facial image generation executed by a server, including: acquiring a first facial image and a second facial image; acquiring a first facial image ( IMA ) M first image blocks corresponding to facial features, and N second image blocks corresponding to facial features in the second facial image ( IFA ) are acquired; M first image blocks and N second image blocks are transformed To the feature space to generate M first feature blocks and N second feature blocks; select a part of the first feature block and a part of the second feature block according to a specific control vector; at least based on the selected part of the first feature block and part of the first feature block Two feature blocks, generating a first synthetic feature map; and inversely transforming the first synthetic feature map back into image space to generate a third facial image, where M and N are natural numbers.
  • a data processing device for facial image generation including: a segmentation device for acquiring M first image blocks corresponding to facial features in an input first facial image, And obtain N second image blocks corresponding to facial features in the input second facial image; a first transformation device for transforming M first image blocks and N second image blocks into a feature space to generate M A first feature block and N second feature blocks; a selection device for selecting a part of the first feature block and a part of the second feature block according to a specific control vector; a first synthesis device for at least a part of the first feature block selected The feature block and a part of the second feature block generate a first synthetic feature map; and a first inverse transform device is used to inversely transform the first synthetic feature map back to the image space to generate a third facial image.
  • a computer-readable recording medium having a computer program stored thereon for executing the facial image generation method described in the above embodiment when the computer program is executed by a processor.
  • Data processing method for executing the facial image generation method described in the above embodiment when the computer program is executed by a processor.
  • a computer device including a memory and a processor, the memory is configured to store a computer program, and the processor is configured to execute the computer program to implement the method described in the foregoing embodiment Data processing method for facial image generation.
  • Fig. 1 is a flowchart illustrating a process of a data processing method for facial image generation according to an embodiment of the present application
  • Figure 2 shows a schematic diagram of a data flow about an inherited network according to an embodiment of the present application
  • Figure 3 shows the facial image generation results under different control vectors according to the embodiments of the present application.
  • FIG. 4 shows a facial image generation result when random factors are added to the input facial image according to an embodiment of the present application
  • Fig. 5 shows a schematic diagram of a data flow of an attribute enhancement network according to an embodiment of the present application
  • Figure 6 shows facial images of different ages generated under the specified control vector
  • Figure 7 shows facial images of different ages and genders generated under the specified control vector
  • FIG. 8 is a flowchart illustrating the training process of the inherited network according to an embodiment of the present application.
  • FIG. 9 is a schematic diagram illustrating the process of two facial feature exchanges in the training process of the inherited network
  • Fig. 10 shows a schematic diagram of data flow in the training process of the inherited network according to an embodiment of the present application
  • Fig. 11 is a flowchart illustrating a training process of an attribute enhancement network according to an embodiment of the present application.
  • Fig. 12 shows a data flow diagram in the training process of the attribute enhancement network according to an embodiment of the present application
  • FIG. 13 shows a schematic diagram of an application environment of an embodiment of the present application
  • Fig. 14 shows a functional block diagram of the configuration of a data processing device for facial image generation according to an embodiment of the present application
  • FIG. 15 shows an example of a data processing device for facial image generation according to an embodiment of the present application as a hardware entity
  • Fig. 16 shows a schematic diagram of a computer-readable recording medium according to an embodiment of the present application.
  • the output facial image is far from the real facial image.
  • training the encoding network and the decoding network in the general processing network needs to collect and build a real face database in order to provide supervision information for the output synthetic facial image.
  • the real child facial image is used as the processing network to supervise information based on the composite child facial image output from the parent or mother facial image, so as to adjust the parameters of the processing network so that the trained processing network can output a facial image similar to the input and similar to the real Image composite face image.
  • collecting and establishing such a database requires a large cost.
  • an inheritance network dedicated to facial image synthesis is proposed. Compared with general processing networks, it can output synthesized facial images that are closer to real images, and can accurately control the inheritance of synthesized facial images. Which facial features in the two input facial images.
  • the embodiment of the present application further proposes an attribute enhancement network, which can adjust the attributes (such as age, gender, etc.) of the synthesized facial image in a larger range on the basis of inheriting the synthesized facial image output by the network.
  • a training method of inheritance network and attribute enhancement network without the face database of the parent, mother and child relationship is proposed. In the training process of the inheritance network and the attribute enhancement network according to the embodiments of the present application, there is no need to establish a face database with parent, mother, and child relationships, but directly use any existing face database to complete the training of the processing network .
  • the first facial image in the application mode, an image input to the inheritance network, represented by I MA ;
  • the second facial image in the application mode, another image input to the inheritance network, represented by IFA ;
  • the third facial image in the application mode, the image output by the inheritance network, represented by I o1 ;
  • Seventh facial image in training mode, an image output by the inheritance network, denoted by I 'M , and the fifth facial image I M as the supervision image;
  • the network output a succession of images, to I 'F expressed, the sixth face image as supervision image I F;
  • Ninth facial image In training mode, an image output by the attribute enhancement network to Indicates that the seventh facial image I 'M is used as the supervision image;
  • Tenth facial image in training mode, an image output by the attribute enhancement network to Indicates that the eighth facial image I'F is used as the supervision image.
  • FIG. 1 a data processing method for facial image generation according to an embodiment of the present application will be described with reference to FIG. 1.
  • the method is executed by a server.
  • the data processing method includes the following steps.
  • step S101 a first facial image (I MA ) and a second facial image (I FA ) are acquired.
  • step S102 obtain M first image blocks corresponding to facial features in the first facial image (I MA ), and obtain N second images corresponding to facial features in the second facial image (I FA ) Piece.
  • the facial features can be organs (such as eyebrows, eyes, nose, mouth, face profile), tissues or local features (such as features on the forehead, face, skin), and the like.
  • the M first image blocks respectively correspond to different facial features
  • the N second image blocks also correspond to different facial features respectively.
  • M and N are natural numbers.
  • the first facial image and the second facial image may be facial images of people of different genders, such as a male facial image and a female facial image.
  • the first facial image and the second facial image may be facial images of people of the same gender.
  • the first facial image and the second facial image may be real facial images taken by a camera.
  • the first facial image and the second facial image may also be composite images generated based on facial feature images selected from an existing facial feature library.
  • the first facial image may be a synthetic image generated by randomly selecting and replacing a facial feature from a facial feature library on the basis of a person’s original facial features, and the second facial image may also be generated in a similar manner Composite image.
  • the first facial image may also be a composite image generated by randomly selecting and combining all facial features from a facial feature library, and the second facial image may also be a composite image generated in a similar manner.
  • first facial image and the second facial image may also be cartoon facial images. It can be seen that in the embodiment of the present application, the types of the first facial image and the second facial image are not particularly limited. Any two facial images that can be used as input can be similarly applied to the embodiment of the present application, and are included in Within the scope of this application.
  • the input facial image you can first locate the position of each facial feature through facial calibration, and then decompose the facial image into image blocks corresponding to each facial feature.
  • the total number of different facial features required to generate a new facial image is set in advance, denoted as L, and L is a natural number.
  • L is a natural number.
  • facial features can be divided into left eye and left eyebrow, right eye and right eyebrow, nose, mouth, and face profile.
  • the total number of different facial features required to generate a new facial image is five. If an input facial image is a complete frontal image, the number of image blocks obtained by decomposition will be consistent with the total number of different facial features mentioned above. In other words, all required different facial features can be detected from the facial image.
  • the input facial image can be decomposed into five image blocks, which are: the image block corresponding to the left eye and left eyebrow, the image block corresponding to the right eye and right eyebrow, and the image block corresponding to the nose.
  • this decomposition method is only an example, and any other decomposition methods are also feasible.
  • the input facial image can also be decomposed into image blocks corresponding to eyes, image blocks corresponding to eyebrows, image blocks corresponding to the nose, image blocks corresponding to the mouth, and image blocks corresponding to the face profile.
  • the number of image blocks decomposed from this facial image will be less than the total number of different facial features required In other words, some facial features may not be detected from the facial image. Since in the subsequent steps, a new facial image can be synthesized by selecting some facial features in the first facial image and some facial features in the second facial image, there is no need to generate a new face in an input facial image All the facial features required by the image only need to be pieced together from the two input facial images to generate all the facial features required to generate a new facial image.
  • the number M of first image blocks and the number N of second image blocks may both be equal to the total number L of different facial features required to generate a new facial image.
  • one of the number M of the first image block and the number N of the second image block may be equal to the total number L of different facial features required to generate a new facial image, and the other may be less than L.
  • the number M of the first image block and the number N of the second image block may both be less than L, and M and N may be equal or different.
  • step S103 the M first image blocks and N second image blocks are transformed into the feature space to generate M first feature blocks and N second feature blocks.
  • the transformation from image space to feature space can be realized through transformation networks, such as coding networks.
  • the same coding network can be set for image blocks with different facial features. Or, as another possible implementation manner, due to the apparent difference of each facial feature, an exclusive feature is acquired for each facial feature.
  • a coding network can be set up for the image block corresponding to each facial feature. For example, set a set of coding networks, where the coding network E1 is used for the image blocks corresponding to the left eye and left eyebrow, the coding network E2 is used for the image blocks corresponding to the right eye and the right eyebrow, and the coding network E3 is used for the image blocks corresponding to the nose.
  • the coding network E4 is used for the image block corresponding to the mouth
  • the coding network E5 is used for the image block corresponding to the face profile.
  • the parameters of the coding network E1 to E5 are different.
  • the M first image blocks are transformed into the feature space through the corresponding coding networks E1 to E5, and similarly, the N second image blocks are transformed into the feature space through the corresponding coding networks E1 to E5, respectively.
  • a two-dimensional image block can be transformed into a three-dimensional feature block with length, width and height.
  • step S104 a part of the first characteristic block and a part of the second characteristic block are selected according to a specific control vector.
  • the specific control vector includes L information bits corresponding to each facial feature. It should be noted here that the number of information bits is the same as the total number L of different facial features required to generate a new facial image described above, and as described above, L is a natural number, and M ⁇ L, N ⁇ L .
  • the control vector includes five information bits, and these five information bits Corresponding to the left eye and left eyebrow, right eye and right eyebrow, nose, mouth and face respectively.
  • the specific control vector can be manually set by the user, or it can be set automatically at random.
  • the step of selecting a part of the first characteristic block and a part of the second characteristic block according to a specific control vector further includes: when an information bit in the specific control vector is a first value, selecting from the M first characteristic blocks Select the feature block of the facial feature corresponding to the information bit in the block, and when the information bit in the specific control vector is the second value, select the face corresponding to the information bit from the N second feature blocks Feature block of features.
  • the selection is made according to each information bit in the control vector in turn, and then L feature blocks are obtained.
  • These feature blocks are mixed feature blocks composed of a part of the first feature block and a part of the second feature block.
  • step S105 a first composite feature map is generated based on at least a part of the first feature block and a part of the second feature block selected.
  • control vector v when the control vector v is 10010, it can be based on the first feature block corresponding to the left eye and left eyebrow, the feature block corresponding to the mouth, and the second feature block corresponding to The feature blocks of the right eye and the right eyebrows, the feature blocks corresponding to the nose, and the feature blocks corresponding to the face profile are used to generate the first composite feature map, that is, the feature blocks from different sources are recombined in the feature space A new composite feature map with various facial features is formed.
  • the attributes (eg, age and gender) of the output third facial image can be controlled.
  • the input attribute information of the first facial image and the second facial image may be quite different.
  • the age of the first facial image may be very different from the age of the second facial image.
  • the age of the first facial image is 20 years old
  • the age of the second facial image is 60 years old.
  • the attribute features are further superimposed.
  • the attribute features of female gender can be further superimposed to remove male features such as beards.
  • the attribute characteristics of the average age in the above example, it may be 40
  • the step of generating the first synthetic feature map may further include the following steps.
  • the designated attribute information is expanded into attribute feature blocks in the feature space.
  • the attribute information can be expanded to feature blocks having the same length and width as the feature blocks but different heights.
  • a first composite feature map is generated.
  • step S106 the first synthetic feature map is inversely transformed back into the image space to generate a third facial image (I o1 ).
  • the inverse transformation from feature space to image space can be realized through an inverse transformation network, such as decoding network D.
  • the inherited network may include the encoding network E1 to E5 and the decoding network D described above, and may be implemented by various neural networks. If the inherited network is expressed as a function f inh , then the input of this function includes the first facial image I MA and the second facial image I FA and the control vector v, and the output is the third facial image I o1 , the specific formula is as follows :
  • I o1 f inh (I MA , I FA , v) (1)
  • the input of the function further includes the age (y a ) and gender (y g ) of the third facial image desired to be output.
  • the specific formula is as follows:
  • I o1 f inh (I MA , I FA , v, y a , y g ) (2)
  • Fig. 2 shows a data flow diagram of an inherited network according to an embodiment of the present application.
  • the first facial image I MA and the second facial image I FA as input sources are decomposed into image blocks corresponding to facial features and then transformed into feature blocks in the feature space via a set of coding networks E1 to E5.
  • the feature blocks are selected and exchanged according to the control vector v, then spliced with the attribute feature blocks, and finally transformed back to the image space via the decoding network D to generate the third facial image I o1 .
  • the third facial image is a composite facial image that inherits a part of facial features in the first facial image and a part of facial features in the second facial image.
  • the generated third facial image may be an offspring facial image assuming that the two persons are parents.
  • the generated third facial image may be a virtual facial image synthesized by combining the advantages of the facial features of the two persons.
  • the generated third facial image can be used to infer the facial image of a specific person. This is especially important in the identification of eyewitnesses in criminal investigations.
  • the facial features are combined to generate a low-quality synthetic facial image that does not resemble a real photo.
  • the synthesized facial image as the first facial image, and arbitrarily selecting a second facial image, and setting the specific control vector to 11111 (that is, all facial feature images in the first facial image are selected), it can be output
  • the third facial image very similar to the real image to facilitate the identification of the suspect.
  • the data processing method for facial image generation it can be seen by referring to the processing steps described in FIG. 1 that through the segmentation of the facial feature image and the reorganization in the feature space, the inherited A third facial image of a part of the facial features in the first facial image and a part of the facial features in the second facial image.
  • the output third facial image close to the real image while ensuring the similarity between the output third facial image and the facial image as the input source.
  • the third facial image is viewed by the user, it is difficult to distinguish whether the image is a real image or a composite image.
  • Figure 3 shows the results of facial image generation under different control vectors. It can be seen from FIG. 3 that by setting different control vectors, the inheritance relationship between the facial features in the generated third facial image and the two facial images as the input source can be accurately controlled.
  • Figure 4 shows the facial image generation results with random factors added to the input facial image, that is, as described above, the input facial image is based on a person’s original facial features from the facial feature library The result of facial image generation in the case of randomly selecting and replacing a facial feature to generate a composite image.
  • the rows from top to bottom respectively show the results of adding random factors to the eyes and eyebrows, nose, mouth, and face.
  • the attributes of the third facial image can be specified and the harmony and naturalness of the third facial image can be further improved.
  • the main purpose of the inheritance network described above is to output a third facial image that is similar in facial features to the first facial image and the second facial image, so the superposition of the attribute feature blocks included in it is to ensure similarity. Fine-tuning.
  • the third facial image output by the inheritance network is similar to the first facial image and the second facial image as the input source in attributes such as age.
  • step S105 In order to adjust the attributes of the output face image in a larger range, as another possible implementation manner, referring back to FIG. 1, after step S105, the following steps may be further included.
  • step S107 the third facial image is transformed into a feature space to generate a third feature map.
  • the transformation from image space to feature space can be realized through the encoding network E0.
  • the parameters of the coding network E0 here are not the same as the parameters of the coding networks E1 to E5 described above.
  • step S108 the specific attribute information is expanded into an attribute feature map in the feature space.
  • the attribute information can be expanded to a feature map with the same length and width as the three-dimensional feature map but different heights.
  • step S109 a second composite feature map is generated based on the attribute feature map and the third feature map.
  • step S110 the second synthetic feature map is inversely transformed back into the image space to generate a fourth facial image.
  • the inverse transformation from feature space to image space can be realized through the decoding network D0.
  • the parameters of the decoding network D0 are not the same as the parameters of the decoding network D mentioned in step S105 above.
  • steps S107 to S110 are shown in dashed boxes in FIG. 1.
  • the attribute enhancement network may include the encoding network E0 and the decoding network D0 described above, and may be implemented by various neural networks. If the attribute enhancement network is expressed as a function f att , then the input of this function includes the third facial image I o1 and the expected output of the age (y a ) and gender (y g ) of the fourth facial image, and the output is the first
  • the four-face image I o2 the specific formula is as follows:
  • FIG. 5 shows a data flow diagram of an attribute enhancement network according to an embodiment of the present application.
  • the third face image I o1 converted by the encoding network E0 FIG 5 is a third feature of the feature space Z 1, and Z 1 in the feature space, and attribute information y g y a splice, and by The decoding network D0 is inversely transformed back to the image space to obtain the fourth facial image I o2 .
  • the fourth facial image can greatly change in attributes. For example, based on the inputted third facial image of 20 years old, the fourth facial image of 5 years old may be output.
  • Figure 6 shows facial images of different ages generated under the specified control vector. It can be seen from Figure 6 that through the inheritance network and the attribute enhancement network, facial images of different age groups can be generated, and the facial differences of each age group are obvious.
  • Figure 7 shows facial images of different ages and different genders generated under the specified control vector. It can be seen from Figure 7 that by inheriting the network and the attribute enhancement network, even under the same control vector, it can also reflect the difference in facial features of the generated facial image due to different gender and age, such as apple muscle, eyebrows, and decrees. Lines, lip color, etc.
  • the specific process of the data processing method for generating a facial image according to an embodiment of the present application is described in detail above with reference to FIGS. 1 to 7.
  • the data processing method is implemented by inheriting the network or inheriting the network and the attribute enhancement network.
  • the data processing method described above is the processing performed in the application process of the inherited network and the attribute enhanced network.
  • the inheritance network and the attribute enhancement network may include an encoding network and a decoding network, and both the encoding network and the decoding network include multiple parameters to be determined. These parameters are determined through the training process to complete the construction of inheritance network and attribute enhancement network. In this way, the inheritance network and the attribute enhancement network can realize the function of facial image generation.
  • the inherited network can be obtained through the following training steps shown in FIG. 8.
  • the facial images involved in the training process are limited to the fifth to tenth facial images.
  • step S801 L fifth image blocks corresponding to each facial feature in the fifth facial image ( IM ) are acquired, and L sixth image blocks corresponding to each facial feature in the sixth facial image (I F ) are acquired. Image block.
  • the facial features obtained from the two input facial images can be less than or equal to L, where L is the total number of different facial features required to generate a new facial image, only the two input facial images can be pieced together to generate a new facial image All facial features are sufficient.
  • L is the total number of different facial features required to generate a new facial image
  • the number of acquired image blocks is all L, where as described above, L is the total number of different facial features required to generate a new facial image.
  • step S802 1 block and selecting a portion of the fifth portion of the sixth image block to generate a first synthesized image based on the first control vector v And according to the second control vector v 2 to select another part of the fifth image block and another part of the sixth image block to generate a second composite image
  • the composite image after the exchange of facial features is further fused by color correction method to avoid inconsistent color blocks in the composite image.
  • the first facial image and the second facial image may be synthesized images generated based on facial feature images selected from an existing facial feature library.
  • the composite image since the inheritance network has been trained, the composite image may not need to perform color correction processing.
  • step S803 the first composite image is acquired L seventh image blocks corresponding to each facial feature in, and obtain the second composite image L eighth image blocks corresponding to each facial feature in.
  • step S804 the L seventh image blocks and L eighth image blocks are input to the inheritance network.
  • step S805 through the inheritance network, output a seventh facial image (I' M ) generated based on a part of the seventh image block and a part of the eighth image block selected based on the first control vector, and output based on the second control vector
  • An eighth facial image (I' F ) generated by vector selection of another part of the seventh image block and another part of the eighth image block.
  • the fifth facial image is a supervision image used to provide supervision information to the seventh facial image
  • the sixth facial image is a supervision image used to provide supervision information to the eighth facial image
  • the fifth to eighth facial images are taken as one The group inherits the training data.
  • the attributes of the desired output facial image are set to be the same as the attributes of the facial image as the input source, so as to facilitate the calculation of the subsequent loss function.
  • the training process of the inherited network is different in that the facial feature exchange process is performed in advance before the face image as the input source is input to the inherited network.
  • the purpose of this is to provide supervision information for the facial images output by the inherited network.
  • the facial features are exchanged once through a control vector, and the composite image after the facial feature exchange is provided to the inheritance network, Then if the parameter settings of the inherited network are accurate, by using the same control vector to exchange facial features again, you should be able to get the original fifth facial image or sixth facial image.
  • FIG. 9 shows a schematic process of two facial feature exchanges in the training process of the inherited network.
  • the letter A represents the image block of each facial feature in the fifth facial image (I M ) as the input source
  • the letter B represents the image block of each facial feature in the sixth facial image (I F ) as the input source.
  • the supervised image of the image (I' F ) may not need to establish a face database with the relationship of parent, mother and child, but directly use any existing face database to complete the training process of the inheritance network.
  • a Generative Adversarial Network (GAN) method is adopted for learning.
  • the generative confrontation network includes a generative network and a discriminant network, and learns data distribution through a new way of playing between the generative network and the discriminant network.
  • the purpose of the generation network is to learn the real data distribution as much as possible, and the purpose of the discrimination network is to correctly determine whether the input data comes from the real data or the generation network; in the training process, the generation network and the discrimination network need to be continuously optimized, and each improves itself The ability to generate and discriminate.
  • the inheritance network can be regarded as the generative network here.
  • the so-called true means that the output facial image is a real image; the so-called false means that the output facial image is an image output by the inheritance network.
  • step S806 at least one set of inherited training data is input to the first discriminant network, wherein the first discriminant network is set to output an image when an image is input to the first discriminant network The probability value of the real image.
  • step S807 based on the first loss function, alternately train the inherited network and the first discriminant network until the first loss function converges.
  • Fig. 10 shows a data flow diagram in the training process of the inherited network according to an embodiment of the present application.
  • the two face images of the input source are respectively used as the supervision images of the two output facial images of the inherited network. Therefore, in order to facilitate comparison, the two face images of the inherited network are also shown in Figure 10 Road output.
  • Figure 10 Road output In fact, as described above with reference to FIG. 2, whenever two facial images are provided as input to the inheritance network, only one facial image is output.
  • the fifth facial image I M is exchanged twice with the same control vector v 1 to obtain the seventh facial image I′ M , and I M is used as the supervision image of I′ M.
  • the sixth facial image I F is exchanged twice with the same control vector v 2 to obtain the eighth facial image I′ F , and I F is used as the supervision image of I′ F.
  • the first loss function is based on the probability value output by the first discriminant network for at least one set of inherited training data and the difference between the facial image and the corresponding supervised image in the at least one set of inherited training data. Pixel difference is determined.
  • the first loss function includes the sum of two parts of the counter loss and the pixel loss. Fight against loss
  • the distribution of facial images generated by the inheritance network is closer to the real image, and can be calculated by the following formula:
  • D I denotes a first discriminating network, D I (I 's) to the output when the first discrimination output from the image inherited network determines a first network input network (probability value), D I (I s) as to the second The output (probability value) of the first discrimination network when the first discrimination network inputs real images.
  • D I represents a first discriminating network
  • ⁇ gp is the hyperparameter of WGAN
  • Pixel loss Used to ensure the similarity between the facial image generated by the inheritance network and the facial image as the input source, the pixel level loss between the facial image generated by the inheritance network and the real facial image, that is, the absolute difference between the pixel values of the two images
  • the specific formula is as follows:
  • the first loss function can be expressed as follows:
  • ⁇ 11 and ⁇ 12 are weight coefficients.
  • the inheritance network can be fixed first, and the first discriminant network can be trained. At this time, it is desirable that the value of the first loss function is as small as possible. Then, you can fix the first discriminant network and train the inherited network. At this time, it is desirable that the value of the first loss function is as large as possible. After multiple rounds of training, when the first loss function does not fluctuate much for different inherited training data, that is, when the first loss function converges, the training of the inherited network is completed.
  • the first loss function may be further determined based on at least one of the following: at least one set of inherited attributes of the facial image in the training data and The differences between the attributes of the corresponding supervised images and the differences between the features of the facial images in at least one set of inherited training data and the features of the corresponding supervised images.
  • the first loss function may further include attribute loss.
  • the attribute loss is determined by the difference between the attributes of the facial image output by the inherited network and the attributes of the real facial image as the input source.
  • the loss functions of age and gender can be calculated by the following formulas:
  • D a and D g are networks that distinguish the age and gender of an image.
  • ResNet can be used to pre-train the regression model of age and gender, so that when an image I 's is input to the model, the age and gender information of the image can be output.
  • D a (I' s ) represents the age of the face image (I' s ) judged by D a
  • D g (I' s ) represents the gender of the face image (I' s ) judged by D g .
  • the first loss function may further include perceptual loss.
  • perceptual loss 19-layer VGG features can be used to calculate the perceptual loss That is, the distance between the VGG feature of the facial image output by the inheriting network and the VGG feature of the real facial image as the input source, the specific formula is as follows:
  • the first loss function can also be expressed as follows:
  • ⁇ 11 , ⁇ 12 , ⁇ 13 , ⁇ 14 and ⁇ 15 are all different weight coefficients, which can be assigned according to the importance of each loss function.
  • the attribute enhancement network can be obtained through the following training steps shown in FIG. 11.
  • GAN generative confrontation network
  • the attribute enhancement network can be regarded as the generative network here.
  • the so-called true means that the output facial image is a real image; the so-called false means that the output facial image is an image output by the attribute enhancement network.
  • the fourth facial image is generated by inputting the third facial image to the attribute enhancement network, and the attribute enhancement network is obtained through the following training steps shown in FIG. 11.
  • step S1101 the seventh facial image (I' M ) and the eighth facial image (I' F ) are input to the attribute enhancement network.
  • step S1102 through the attribute enhancement network, output a ninth facial image corresponding to the seventh facial image And the tenth facial image corresponding to the eighth facial image
  • the seventh facial image is a supervision image used to provide supervision information to the ninth facial image
  • the eighth facial image is a supervision image used to provide supervision information to the tenth facial image
  • the seventh to tenth facial images are taken as one Group attribute training data.
  • the generation process of the attribute enhancement network can be expressed by the following formula:
  • the attributes of the desired output facial image are set to be the same as the attributes of the facial image as the input source, so as to facilitate the calculation of the subsequent loss function.
  • step S1103 at least one set of attribute training data is input to the second discriminant network, wherein the second discriminant network is set to output an image as a real image when an image is input to the second discriminant network The probability value.
  • step S1104 based on the second loss function, the attribute enhancement network and the second discriminant network are alternately trained until the second loss function converges.
  • Fig. 12 shows a data flow diagram in the training process of the attribute enhancement network according to an embodiment of the present application. Similar to Figure 10, Figure 12 also shows two outputs of the attribute enhancement network.
  • the seventh facial image I 'M and the eighth facial image I'F are input to the attribute enhancement network and transformed into the feature space to obtain feature maps Z M and Z F respectively , which are stitched with the attribute features in the feature space And inversely transform back to the image space to get the ninth facial image And tenth facial image And take the seventh facial image I 'M and the eighth facial image I'F as the ninth facial image respectively And tenth facial image Supervised image.
  • the second loss function is based on the probability value output by the second discriminant network for at least one set of attribute training data and the difference between the facial image and the corresponding supervised image in the at least one set of attribute training data. Pixel difference is determined.
  • the second loss function includes the sum of two parts of the counter loss and the pixel loss.
  • Fight against loss Make the distribution of facial images generated by the attribute enhancement network closer to real images, and can be calculated by the following formula:
  • Represents the second discriminant network Is the output (probability value) of the second discriminant network when the image output by the attribute enhancement network is input to the second discriminant network, It is the output (probability value) of the second discrimination network when the real image is input to the second discrimination network.
  • the face image is being input Is the mean value of the logarithm output by the second discriminant network, where It is the facial image output by the attribute enhancement network. It represents the mean number of the input face image I s second discrimination output from the network, where I s is the facial image of the real face database.
  • ⁇ gp is the hyperparameter of WGAN
  • Pixel loss It is used to ensure the similarity between the facial image generated by the attribute enhancement network and the facial image output by the inheritance network.
  • the pixel level loss between the facial image generated by the attribute enhancement network and the image output by the inheritance network that is, the pixel value of the two images
  • the difference is expressed as the sum of absolute values, the specific formula is as follows:
  • the second loss function can be expressed as follows:
  • ⁇ 21 and ⁇ 22 are weight coefficients.
  • the second loss function may be further determined based on at least one of the following: the attributes of the facial image in the at least one set of attribute training data and The difference between the attributes of the corresponding supervised images and the difference between the features of the facial image and the features of the corresponding supervised image in at least one set of attribute training data.
  • the second loss function may further include attribute loss.
  • the attribute loss is determined by the difference between the attributes of the facial image output by the attribute enhancement network and the attributes of the facial image output by the inheriting network.
  • the loss functions of age and gender can be calculated by the following formulas:
  • D a and D g are networks that distinguish the age and gender of an image.
  • ResNet can be used to pre-train the regression model of age and gender, so that when an image I 's is input to the model, the age and gender information of the image can be output.
  • Represents the facial image judged by D a Age Represents the facial image judged by D g Gender.
  • Represents the age of the facial image output by the inheritance network Represents the gender of the facial image output as the inheritance network. Since the age and gender of the facial image output by the inherited network are the same as the age and gender of the real facial image as the input source, the age and gender of the real facial image can be directly used as the with
  • the first loss function may further include perceptual loss.
  • perceptual loss 19-layer VGG features can be used to calculate the perceptual loss That is, the distance between the VGG feature of the facial image output by the attribute enhancement network and the VGG feature of the facial image output by the inheritance network, the specific formula is as follows:
  • the second loss function can also be expressed as follows:
  • ⁇ 21 , ⁇ 22 , ⁇ 23 , ⁇ 24 and ⁇ 25 are different weight coefficients, which can be assigned according to the importance of each loss function.
  • the network can be enhanced with fixed attributes first, and the second discriminant network can be trained. At this time, it is desirable that the value of the second loss function is as small as possible. Then, the second discriminant network can be fixed and the attribute enhancement network can be trained. At this time, it is desirable that the value of the second loss function is as large as possible. After multiple rounds of training, when the second loss function does not fluctuate much for different attribute training data, that is, when the second loss function converges, the training of the attribute enhancement network is completed.
  • the attributes of the original input facial image (such as age) can be greatly changed, but in the training process of the attribute enhancement network, in order to provide supervision information, Select the same attributes as the originally input face image.
  • the separate training process for inheritance network and attribute enhancement network is described.
  • the two networks can also be jointly trained to find the global optimal solution.
  • the inheritance network and the attribute enhancement network are further optimized through the following joint training steps: based on the first loss function and the second loss function, determine a total loss function; based on the total loss function, alternate Training the inheritance network and the attribute enhancement network, the first discriminant network and the second discriminant network until the total loss function converges.
  • the weighted sum of the first loss function and the second loss function can be used as the total loss function L, and the specific formula is as follows:
  • ⁇ 01 and ⁇ 02 are different weight coefficients, which can be assigned according to the importance of each loss function.
  • the inheritance network and the attribute enhancement network can be fixed first, and the first discriminant network and the second discriminant network can be trained. At this time, it is hoped that the value of the total loss function is as small as possible, and the parameters of the first discriminant network and the second discriminant network are uniformly adjusted. Then, the first discriminant network and the second discriminant network can be fixed, and the inheritance network and the attribute enhancement network can be trained. At this time, it is hoped that the value of the total loss function is as large as possible, and the parameters of the inheritance network and the attribute enhancement network are uniformly adjusted. After multiple rounds of training, when the total loss function converges, the joint training of the two networks is completed.
  • the server 10 is connected to a plurality of terminal devices 20 through a network 30.
  • the plurality of terminal devices 20 are devices that provide first facial images and second facial images as input sources.
  • the terminal may be a smart terminal, such as a smart phone, a PDA (personal digital assistant), a desktop computer, a notebook computer, a tablet computer, etc., or other types of terminals.
  • the server 10 is a device for training the inherited network and the attribute enhancement network described above based on the existing face database.
  • the server is also a device that applies the trained inheritance network and the attribute enhancement network to facial image generation.
  • the server 10 is connected to the terminal device 20, receives the first facial image and the second facial image from the terminal device 20, and generates the third facial image based on the trained inheritance network and the attribute enhancement network on the server 10 or The fourth facial image, and the generated facial image is transmitted to the terminal device 20.
  • the server 10 may be a data processing device described below.
  • the network 30 may be any type of wired or wireless network, such as the Internet. It should be appreciated that the number of terminal devices 20 shown in FIG. 13 is illustrative and not restrictive. Of course, the data processing device for facial image generation according to the embodiment of the present application may also be a stand-alone device that is not networked.
  • Fig. 14 is a diagram illustrating a data processing device for facial image generation according to an embodiment of the present application.
  • the data processing device 1400 includes: a dividing device 1401, a first transforming device 1402, a selecting device 1403, a first combining device 1404, and a first inverse transforming device 1405.
  • the segmentation device 1401 is configured to obtain M first image blocks corresponding to each facial feature in the input first facial image, and obtain N second image blocks corresponding to each facial feature in the input second facial image.
  • the first transformation device 1402 is configured to transform the M first image blocks and N second image blocks into a feature space to generate M first feature blocks and N second feature blocks.
  • the first transformation device 1402 may perform the transformation through a first transformation network (for example, a coding network).
  • the selecting device 1403 is used for selecting a part of the first characteristic block and a part of the second characteristic block according to a specific control vector.
  • the specific control vector includes L information bits corresponding to each facial feature
  • the selection device 1403 is further configured to: when one information bit in the specific control vector is the first When the value is one, the feature block of the facial feature corresponding to the information bit is selected from the M first feature blocks, and when the information bit in the specific control vector is the second value, from the N second feature blocks Select the feature block of the facial feature corresponding to the information bit from the block.
  • L is a natural number, and M ⁇ L and N ⁇ L.
  • the first synthesis device 1404 is configured to generate a first synthesis feature map based on at least a part of the first feature block and a part of the second feature block selected.
  • the attributes (eg, age and gender) of the output third facial image can be controlled. For example, you can specify the gender of the third facial image you want to output.
  • the input attribute information of the first facial image and the second facial image may be quite different. Therefore, as another possible implementation manner, the first synthesizing device 140 is further configured to: expand the specified attribute information into attribute feature blocks in the feature space; and based on a selected part of the first feature block , A part of the second feature block and the attribute feature block to generate the first composite feature map.
  • the first inverse transformation device 1405 is configured to inversely transform the first synthetic feature map back into the image space to generate a third facial image.
  • the first inverse transform device 1405 may perform the inverse transform through a first inverse transform network (for example, a decoding network). And, the first transformation network and the first inverse transformation network constitute a succession network.
  • the output third facial image can be made close to the real image while ensuring the similarity between the output third facial image and the facial image as the input source. In other words, when the third facial image is viewed by the user, it is difficult to distinguish whether the image is a real image or a composite image.
  • the control vector it is possible to accurately control which facial features of the two input facial images are inherited by the third facial image.
  • the attributes of the third facial image can be specified and the harmony and naturalness of the third facial image can be further improved.
  • the main purpose of the inheritance network described above is to output a third facial image that is similar in facial features to the first facial image and the second facial image, so the superposition of the attribute feature blocks included in it is to ensure similarity. Fine-tuning.
  • the third facial image output by the inheritance network is similar to the first facial image and the second facial image as the input source in attributes such as age.
  • the data processing device 1400 may further include: a second transformation device 1406, an expansion device 1407, a second synthesis module 1408, and a second inverse transformation ⁇ 1409.
  • the second transformation device 1406 is used to transform the third facial image into a feature space to generate a third feature map.
  • the second transformation device may perform the transformation through a second transformation network (for example, a coding network), and the second transformation network here is different from the first transformation network above.
  • the expansion device 1407 is used to expand specific attribute information into an attribute feature map in the feature space.
  • the second synthesis module 1408 is configured to generate a second synthesized feature map based on the attribute feature map and the third feature map.
  • the second inverse transform device 1409 is used to inversely transform the second synthetic feature map back into the image space to generate a fourth facial image.
  • the second inverse transformation device may perform the transformation through a second inverse transformation network (for example, a decoding network), and the second inverse transformation network here is different from the first inverse transformation network above.
  • the second transformation network and the second inverse transformation network constitute an attribute enhancement network.
  • the second transformation device 1406 Due to the optionality of the second transformation device 1406, the expansion device 1407, the second synthesis module 1408, and the second inverse transformation device 1409, they are shown in a dashed frame in FIG. 14.
  • the fourth facial image can greatly change in attributes. For example, based on the inputted third facial image of 20 years old, the fourth facial image of 5 years old may be output.
  • the inheritance network and the attribute enhancement network may include an encoding network and a decoding network, and both the encoding network and the decoding network include multiple parameters to be determined. These parameters are determined through the training process to complete the construction of inheritance network and attribute enhancement network. In this way, the inheritance network and the attribute enhancement network can realize the function of facial image generation. In other words, before applying inheritance network and attribute enhancement network, we must first train inheritance network and attribute enhancement network.
  • the data processing device 1400 further includes a training device 1410.
  • the training device 1410 is used to train the inherited network in the training mode. Specifically, the training device 1410 includes: a pre-exchange module, a first discrimination module, and a first training module.
  • the pre-exchange module is used to obtain L fifth image blocks corresponding to each facial feature in the fifth facial image ( IM ), and obtain L sixth image blocks corresponding to each facial feature in the sixth facial image ( IF ) , According to the first control vector to select a part of the fifth image block and a part of the sixth image block to generate the first composite image And select another part of the fifth image block and another part of the sixth image block according to the second control vector to generate a second composite image
  • the segmentation device is further configured to obtain the first composite image L seventh image blocks corresponding to each facial feature in, and obtain the second composite image L eighth image blocks corresponding to each facial feature in, and input L seventh image blocks and L eighth image blocks into the inheritance network.
  • L is a natural number, and M ⁇ L and N ⁇ L.
  • the first discriminating module is configured to receive at least one set of inherited training data, and through the first discriminating network, output the probability value of the input inherited training data as a real image, wherein the at least one set of inherited training data includes fifth to An eighth facial image, the seventh facial image (I' M ) is generated by selecting a part of the seventh image block and a part of the eighth image block based on the first control vector through the inheritance network, the eighth facial image (I' F ) Generated by the inherited network by selecting another part of the seventh image block and another part of the eighth image block based on the second control vector, where the fifth facial image is a supervision image used to provide supervision information for the seventh facial image, and the first The six-face image is a supervision image used to provide supervision information for the eighth face image.
  • the first training module is configured to alternately train the inherited network and the first discriminant network based on the first loss function until the first loss function converges.
  • the first loss function is determined based on the probability value output by the first discriminant network for at least one set of inherited training data and the pixel difference between the face image and the corresponding supervised image in the at least one set of inherited training data.
  • the first loss function is further determined based on at least one of the following: the difference between at least one set of attributes of the face image in the inherited training data and the attributes of the corresponding supervised image and at least A set of differences between the features of the facial image in the inherited training data and the features of the corresponding supervised image.
  • the training device 1410 is also used to train the attribute enhancement network in the training mode.
  • the training device 1410 further includes: a second discrimination module and a second training module.
  • the second discriminant module is used to receive at least one set of attribute training data, and through the second discriminant network, output the probability value used to discriminate the input attribute training data as a real image, wherein the at least one set of attribute training data includes seventh to The tenth facial image, the ninth facial image Based on the seventh facial image output through the attribute enhancement network, the tenth facial image
  • the attribute enhancement network is based on the eighth facial image output, where the seventh facial image is a supervision image used to provide supervision information for the ninth facial image, and the eighth facial image is a supervision image used to provide supervision information for the tenth facial image image.
  • the second training module is configured to alternately train the attribute enhancement network and the second discriminant network based on the second loss function until the second loss function converges.
  • the second loss function is determined based on the probability value output by the second discriminant network for at least one set of attribute training data and the pixel difference between the face image and the corresponding supervised image in the at least one set of attribute training data.
  • the second loss function is further determined based on at least one of the following: the difference between the attributes of the facial image and the attributes of the corresponding supervised image in at least one set of attribute training data and at least The difference between the feature of the facial image and the feature of the corresponding supervised image in a set of attribute training data.
  • the training device may further include: a joint training module for determining a total loss function based on the first loss function and second loss function, and alternately training the inherited network based on the total loss function And the attribute enhancement network and the first discriminant network and the second discriminant network until the total loss function converges.
  • FIG. 15 An example of a data processing device for facial image generation according to an embodiment of the present application as a hardware entity is shown in FIG. 15.
  • the terminal device includes a processor 1501, a memory 1502, and at least one external communication interface 1503.
  • the processor 1501, the memory 1502, and the external communication interface 1503 are all connected via a bus 1504.
  • a microprocessor when performing processing, a microprocessor, a central processing unit (CPU, Central Processing Unit), a digital signal processor (DSP, Digital Singnal Processor), or a programmable logic array can be used.
  • CPU Central Processing Unit
  • DSP Digital Singnal Processor
  • programmable logic array For the processor 1501 used for data processing, when performing processing, a microprocessor, a central processing unit (CPU, Central Processing Unit), a digital signal processor (DSP, Digital Singnal Processor), or a programmable logic array can be used.
  • DSP Digital Singnal Processor
  • FPGA Field-Programmable Gate Array
  • Fig. 16 shows a schematic diagram of a computer-readable recording medium according to an embodiment of the present application.
  • a computer-readable recording medium 1600 according to an embodiment of the present application stores computer program instructions 1601.
  • the computer program instructions 1601 are executed by the processor, the data processing method for facial image generation according to the embodiment of the present application described with reference to the above drawings is executed.
  • An embodiment of the present application also provides a computer device, including a memory and a processor.
  • the memory stores a computer program that can be run on the processor.
  • the processor executes the computer program, the computer program described in the foregoing embodiment can be implemented.
  • the computer device can be the server described above or any device capable of data processing.
  • the data processing method, device and medium for facial image generation according to embodiments of the present application have been described in detail with reference to FIGS. 1 to 16.
  • the data processing method, device, and medium for facial image generation according to the embodiments of the present application through the segmentation of facial feature images and the reorganization in the feature space, it is possible to generate facial features that inherit a part of the first facial image And a third facial image of facial features in a part of the second facial image.
  • the third facial image is viewed by the user, it is difficult to distinguish whether the image is a real image or a composite image.
  • the inheritance network by setting a control vector, it is possible to precisely control which facial features of the two input facial images are inherited by the third facial image.
  • the attributes of the third facial image can be specified and the harmony and naturalness of the third facial image can be further improved.
  • the attributes of the generated facial images can be changed in a wider range.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Image Processing (AREA)
  • Processing Or Creating Images (AREA)
  • Image Analysis (AREA)

Abstract

公开了用于面部图像生成的数据处理方法、设备、介质和计算机设备。所述数据处理方法包括:获取第一面部图像(IMA)及第二面部图像(IFA);获取第一面部图像(IMA)中与面部特征对应的M个第一图像块,并获取第二面部图像(IFA)中与面部特征对应的N个第二图像块;将M个第一图像块和N个第二图像块变换到特征空间以生成M个第一特征块和N个第二特征块;根据特定的控制向量选择一部分第一特征块和一部分第二特征块;至少基于所选择的一部分第一特征块和一部分第二特征块,生成第一合成特征图;以及将所述第一合成特征图反变换回图像空间以生成第三面部图像,其中M和N为自然数。

Description

用于面部图像生成的数据处理方法、设备和介质
本申请要求于2019年4月26日提交中国专利局、申请号为201910345276.6、申请名称为“用于面部图像生成的数据处理方法、设备和介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像处理领域,更具体地说,涉及用于面部图像生成的数据处理方法、设备、介质和计算机设备。
背景技术
面部图像生成技术是一个新兴的研究领域,这在子代人脸预测、刑事侦查中的罪犯图像恢复、构建虚拟人物等方面都有广阔的应用前景。例如,通过输入一张面部图像,可以生成与该张面部图像相像但不同的另一张全新的面部图像,作为目标图像。
现有的面部图像生成方案采用通用的处理网络来生成目标图像。例如,将一张面部图像输入至完成训练的编码网络和解码网络,然后输出目标图像。然而,这样的图像生成方案的问题在于,该通用的处理网络输出的合成后的面部图像的和谐度和自然度很差,难以让用户相信这是真实的面部图像。
发明内容
鉴于以上情形,本申请实施例提供了一种用于面部图像生成的数据处理方法、设备、介质和计算机设备,能够生成更接近真实面部图像的合成面部图像。
根据本申请的一个方面,提供了一种用于面部图像生成的数据处理方法,由服务器执行,包括:获取第一面部图像及第二面部图像;获取第一面部图像(I MA)中与面部特征对应的M个第一图像块,并获取第二面部图像(I FA)中与面部特征对应的N个第二图像块;将M个第一图像块和N个第二图像块变换到特征空间以生成M个第一特征块和N个第二特征块;根据特定的控制向量选择一部分第一特征块和一部分第二特征块;至少基于所选择的一部分第一特征块和一部分第二特征块,生成第一合成特征图;以及将所述第一合成特征图反变换回图像空间以生成第三面部图像,其中M和N为自然数。
根据本申请的另一方面,提供了一种用于面部图像生成的数据处理设备,包括: 分割装置,用于获取输入的第一面部图像中与面部特征对应的M个第一图像块,并获取输入的第二面部图像中与面部特征对应的N个第二图像块;第一变换装置,用于将M个第一图像块和N个第二图像块变换到特征空间以生成M个第一特征块和N个第二特征块;选择装置,用于根据特定的控制向量选择一部分第一特征块和一部分第二特征块;第一合成装置,用于至少基于所选择的一部分第一特征块和一部分第二特征块,生成第一合成特征图;以及第一反变换装置,用于将所述第一合成特征图反变换回图像空间以生成第三面部图像。
根据本申请的再一方面,提供了一种计算机可读记录介质,其上存储有计算机程序,用于当由处理器执行所述计算机程序时,执行上述实施例所述的用于面部图像生成的数据处理方法。
根据本申请的又一方面,提供了一种计算机设备,包括存储器和处理器,所述存储器用于存储计算机程序,所述处理器用于执行所述计算机程序以实现上述实施例所述的用于面部图像生成的数据处理方法。
附图简要说明
图1是图示根据本申请实施例的用于面部图像生成的数据处理方法的过程的流程图;
图2示出了根据本申请实施例的关于继承网络的数据流示意图;
图3示出了根据本申请实施例的不同控制向量下的面部图像生成结果;
图4示出了根据本申请实施例的当输入的面部图像中增加随机因素时的面部图像生成结果;
图5示出了根据本申请实施例的关于属性增强网络的数据流示意图;
图6示出了在指定的控制向量下生成的不同年龄阶段的面部图像;
图7示出了在指定的控制向量下生成的不同年龄和不同性别的面部图像;
图8是图示根据本申请实施例的继承网络的训练过程的流程图;
图9是图示继承网络的训练过程中两次面部特征交换的过程的示意图;
图10示出了根据本申请实施例的继承网络的训练过程中的数据流示意图;
图11是图示根据本申请实施例的属性增强网络的训练过程的流程图;
图12示出了根据本申请实施例的属性增强网络的训练过程中的数据流图;
图13示出了本申请实施例的应用环境的示意图;
图14示出了根据本申请实施例的用于面部图像生成的数据处理设备的配置的 功能性框图;
图15示出了根据本申请实施例的用于面部图像生成的数据处理设备作为硬件实体的一个示例;以及
图16示出了根据本申请实施例的计算机可读记录介质的示意图。
具体实施方式
下面将参照附图对本申请的各个实施方式进行描述。提供以下参照附图的描述,以帮助对由权利要求及其等价物所限定的本申请的示例实施方式的理解。其包括帮助理解的各种具体细节,但它们只能被看作是示例性的。因此,本领域技术人员将认识到,可对这里描述的实施方式进行各种改变和修改,而不脱离本申请的范围和精神。而且,为了使说明书更加清楚简洁,将省略对本领域熟知功能和构造的详细描述。
如上文在背景技术中所述,由于根据现有技术的面部生成方案采用的是通用处理网络,因此输出的面部图像与真实面部图像的差距较大。除此之外,在根据现有技术的面部生成方案中,训练通用处理网络中的编码网络和解码网络需要搜集和建立真实面部数据库,以便为输出的合成面部图像提供监督信息。例如,在子代人脸预测的应用场景中,需要搜集和建立存在父、母和孩子关系的人脸数据库。以真实的孩子面部图像作为处理网络基于父或母面部图像输出的合成孩子面部图像的监督信息,以便调节处理网络的参数,使得训练完成的处理网络能够输出与输入的面部图像相像且类似于真实图像的合成面部图像。然而,在实践中,搜集和建立这样的数据库需要较大的成本。
因此,在本申请实施例中,提出了一种专用于面部图像合成的继承网络,与通用处理网络相比,能够输出更接近于真实图像的合成面部图像,且能够精确地控制合成面部图像继承两个输入面部图像中的哪些面部特征。并且,本申请实施例还进一步提出了属性增强网络,能够在继承网络输出的合成面部图像的基础上,在较大范围中调节合成面部图像的属性(如,年龄、性别等)。此外,在本申请实施例中,提出了一种在没有父、母和孩子关系的人脸数据库的情况下的继承网络和属性增强网络的训练方式。在根据本申请实施例的继承网络和属性增强网络的训练过程中,不需要建立存在父、母和孩子关系的人脸数据库,而是直接利用任意的现有人脸数据库就可以完成处理网络的训练。
为了更好地理解本申请,在下文中将要提及的名称的具体含义定义如下:
第一面部图像:在应用模式下,向继承网络输入的一个图像,以I MA表示;
第二面部图像:在应用模式下,向继承网络输入的另一个图像,以I FA表示;
第三面部图像:在应用模式下,由继承网络输出的图像,以I o1表示;
第四面部图像:在应用模式下,进一步由继承网络输出的图像,以I o2表示;
第五面部图像:在训练模式下,向继承网络输入的一个图像,以I M表示;
第六面部图像:在训练模式下,向继承网络输入的另一个图像,以I F表示;
第七面部图像:在训练模式下,由继承网络输出的一个图像,以I' M表示,以第五面部图像I M作为监督图像;
第八面部图像:在训练模式下,由继承网络输出的一个图像,以I' F表示,以第六面部图像I F作为监督图像;
第九面部图像:在训练模式下,由属性增强网络输出的一个图像,以
Figure PCTCN2020082918-appb-000001
表示,以第七面部图像I' M作为监督图像;
第十面部图像:在训练模式下,由属性增强网络输出的一个图像,以
Figure PCTCN2020082918-appb-000002
表示,以第八面部图像I' F作为监督图像。
接下来,将参照附图详细描述根据本申请的各个实施例。首先,将参照图1描述根据本申请实施例的用于面部图像生成的数据处理方法,该方法由服务器执行。如图1所示,所述数据处理方法包括以下步骤。
在步骤S101,获取第一面部图像(I MA)及第二面部图像(I FA)。
然后,在步骤S102,获取第一面部图像(I MA)中与面部特征对应的M个第一图像块,并获取第二面部图像(I FA)中与面部特征对应的N个第二图像块。这里,面部特征可以是器官(如,眉毛、眼睛、鼻子、嘴巴、脸廓)、组织或局部特征(如额头、脸部、皮肤上的特征)等。M个第一图像块分别与不同的面部特征对应,且类似地,N个第二图像块也分别与不同的面部特征对应。其中,M和N为自然数。
例如,第一面部图像和第二面部图像可以是不同性别的人的面部图像,如一张男性面部图像和一张女性面部图像。或者,第一面部图像和第二面部图像可以是相同性别的人的面部图像。
此外,例如,第一面部图像和第二面部图像可以是由照相机拍摄的真实面部图像。或者,第一面部图像和第二面部图像也可以是基于从现有面部特征库中选择的面部特征图像而生成的合成图像。具体地,第一面部图像可以是在一个人原有面部特征的基础上从面部特征库中随机挑选并更换一个面部特征而生成的合成图像,且第二面部图像也可以是通过类似方式生成的合成图像。或者,第一面部图像也可以是全部面部特征从面部特征库中随机挑选并组合而生成的合成图像,且第二面部图像也可以是通过类似方式生成的合成图像。
再如,第一面部图像和第二面部图像也可以是卡通面部图像。可见,在本申请实施例中,并不特别地限定第一面部图像和第二面部图像的类型,任何能够作为输入的两张面部图像都可以类似地应用于本申请实施例,且包括在本申请的范围中。
对于输入的面部图像,可以先通过面部校准来定位各面部特征的位置,然后将面部图像分解为与各面部特征对应的图像块。预先设置生成新的面部图像所需的不同面部特征的总数,将其表示为L,L为自然数。例如,作为一种可能的实施方式,可以将面部特征分为左眼睛和左眉毛、右眼睛和右眉毛、鼻子、嘴巴以及脸廓。在这种情况下,生成新的面部图像所需的不同面部特征的总数为五。如果输入的一个面部图像为完整的正面图像,那么分解得到的图像块的数量将与上述不同面部特征的总数一致,换言之,从该面部图像中能够检测到所有需要的不同面部特征。在该实施方式中,可以将输入的面部图像分解为五个图像块,分别为:对应于左眼睛和左眉毛的图像块、对应于右眼睛和右眉毛的图像块、对应于鼻子的图像块、对应于嘴巴的图像块以及对应于脸廓的图像块。当然,这种分解方式仅为示例,任何其他的分解方式也是可行的。例如,也可以将输入的面部图像分解为对应于眼睛的图像块、对应于眉毛的图像块、对应于鼻子的图像块、对应于嘴巴的图像块以及对应于脸廓的图像块。然而,如果输入的一个面部图像为一定角度下的侧面图像,或者输入的一个面部图像为不完整的正面图像,那么从这个面部图像分解的图像块的数量将小于所需的不同面部特征的总数,换言之,从该面部图像可能检测不到有些面部特征。由于在后续步骤中可以通过选择第一面部图像中的一些面部特征和第二面部图像中的一些面部特征来合成新的面部图像,因此不需要在一个输入的面部图像中获得生成新的面部图像所需的所有面部特征,只需要从两个输入的面部图像中能够拼凑出生成新的面部图像所需的所有面部特征即可。
总结来说,第一图像块的数量M和第二图像块的数量N可以都等于生成新的面部图像所需的不同面部特征的总数L。或者,第一图像块的数量M和第二图像块的数量N中的一个可以等于生成新的面部图像所需的不同面部特征的总数L,而另一个可以小于L。或者,第一图像块的数量M和第二图像块的数量N可以都小于L,并且,M和N可以相等,也可以不等。
接下来,在步骤S103,将M个第一图像块和N个第二图像块变换到特征空间以生成M个第一特征块和N个第二特征块。
可以通过变换网络,如编码网络,来实现图像空间到特征空间的变换。可以对于不同面部特征的图像块设置相同的编码网络。或者,作为另一种可能的实施方式,由于每个面部特征表观上的差异,因此针对每个面部特征获取专属特征。具体来说, 可以为对应于每一面部特征的图像块设置一个编码网络。例如,设置一组编码网络,其中编码网络E1用于对应于左眼睛和左眉毛的图像块,编码网络E2用于对应于右眼睛和右眉毛的图像块,编码网络E3用于对应于鼻子的图像块,编码网络E4用于对应于嘴巴的图像块,且编码网络E5用于对应于脸廓的图像块。编码网络E1~E5的参数各不相同。将M个第一图像块分别通过对应的编码网络E1~E5变换到特征空间,并且同样地,将N个第二图像块分别通过对应的编码网络E1~E5变换到特征空间。例如,通过编码网络,可以将二维图像块变换为具有长宽高的三维特征块。
然后,在步骤S104,根据特定的控制向量选择一部分第一特征块和一部分第二特征块。
特定的控制向量包括与各面部特征对应的L个信息位。这里需要注意的是,信息位的数量与上文中所述的生成新的面部图像所需的不同面部特征的总数L相同,并且如上文中所述,L为自然数,且M≤L,N≤L。例如,在上文中描述的将面部特征分为左眼睛和左眉毛、右眼睛和右眉毛、鼻子、嘴巴和脸廓的情况下,所述控制向量包括五个信息位,且这五个信息位分别对应于左眼睛和左眉毛、右眼睛和右眉毛、鼻子、嘴巴和脸廓。并且,特定的控制向量可以由用户手动设置,也可以随机地自动设置。
具体来说,根据特定的控制向量选择一部分第一特征块和一部分第二特征块的步骤进一步包括:当所述特定的控制向量中的一个信息位为第一值时,从M个第一特征块中选择与该信息位对应的面部特征的特征块,而当所述特定的控制向量中的该信息位为第二值时,从N个第二特征块中选择与该信息位对应的面部特征的特征块。依次根据控制向量中的每一信息位进行选择,进而获得L个特征块,这些特征块是由一部分第一特征块和一部分第二特征块组成的混合特征块。
举例而言,假如控制向量v为10010,那么这意味着选择第一特征块中对应于左眼睛和左眉毛的特征块以及对应于嘴巴的特征块,并选择第二特征块中对应于右眼睛和右眉毛的特征块、对应于鼻子的特征块以及对应于脸廓的特征块。
接下来,在步骤S105,至少基于所选择的一部分第一特征块和一部分第二特征块,生成第一合成特征图。
例如,沿用上文中的例子,在控制向量v为10010的情况下,可以基于第一特征块中对应于左眼睛和左眉毛的特征块、对应于嘴巴的特征块以及第二特征块中对应于右眼睛和右眉毛的特征块、对应于鼻子的特征块、对应于脸廓的特征块来生成第一合成特征图,即:在特征空间中,将来自不同源的面部特征的特征块重新组合成一个新的具有各面部特征的合成特征图。
另外,可以对输出的第三面部图像的属性(如,年龄和性别)进行控制。例如,可以指定希望输出的第三面部图像的性别。并且,输入的第一面部图像和第二面部图像的属性信息可能存在较大差异。具体来说,第一面部图像的年龄与第二面部图像的年龄可能相差很大。例如,第一面部图像的年龄为20岁,而第二面部图像的年龄为60岁。为了对输出的第三面部图像的属性进行控制并防止最终生成的第三面部图像的不和谐,作为另一种可能的实施方式,在选择的特征块的基础上,进一步叠加属性特征。例如,如果希望输出的第三面部图像为女性面部图像,则可以进一步叠加性别为女的属性特征,以便去除诸如胡子之类的男性特征。或者,如果希望平衡输入面部图像的年龄差异,则可以进一步叠加平均年龄(在以上例子中,可以是40岁)的属性特征。
具体地,至少基于所选择的一部分第一特征块和一部分第二特征块,生成第一合成特征图的步骤可以进一步包括以下步骤。首先,将指定属性信息扩展为处于所述特征空间中的属性特征块。在上文中所述的将二维图像块变换为具有长宽高的三维特征块的情况下,可以将属性信息扩展为与特征块相同长宽但不同高度的特征块。然后,基于所选择的一部分第一特征块、一部分第二特征块以及属性特征块,生成第一合成特征图。
最后,在步骤S106,将所述第一合成特征图反变换回图像空间以生成第三面部图像(I o1)。可以通过反变换网络,如解码网络D,来实现特征空间到图像空间的反变换。
可以认为,上文中所述的基于M个第一图像块和N个第二图像块而生成第三面部图像是由一继承网络来实现的。所述继承网络可以包括上文中所述的编码网络E1~E5和解码网络D,且可以通过各种神经网络来实现。如果将该继承网络表示为一函数f inh,那么该函数的输入包括第一面部图像I MA和第二面部图像I FA以及控制向量v,且输出为第三面部图像I o1,具体公式如下:
I o1=f inh(I MA,I FA,v)   (1)
或者,在增加属性特征的情况下,该函数的输入还进一步包括希望输出的第三面部图像的年龄(y a)和性别(y g),具体公式如下:
I o1=f inh(I MA,I FA,v,y a,y g)   (2)
图2示出了根据本申请实施例的关于继承网络的数据流图。如图2所示,作为输入源的第一面部图像I MA和第二面部图像I FA在分解为与面部特征对应的图像块后经由一组编码网络E1~E5变换到特征空间的特征块,根据控制向量v选择并交换特征块,然后与属性特征块拼接,最后经由解码网络D变换回图像空间,以生成第三 面部图像I o1
第三面部图像是继承了第一面部图像中的一部分面部特征和第二面部图像中的一部分面部特征的合成面部图像。在第一面部图像和第二面部图像为不同性别的两人的面部图像时,生成的第三面部图像可以是假定该两人为父母时的子代面部图像。在第一面部图像和第二面部图像为相同性别的两人的面部图像时,生成的第三面部图像可以是集合该两人的面部特征优点而合成的假想面部图像。在第一面部图像和第二面部图像为拼凑了多个人的面部特征而生成的合成图像时,通过生成的第三面部图像可以推断特定人的面部图像。这在刑事侦查中的目击证人指认中尤其重要。例如,在目击证人从面部特征库中挑选出与嫌疑人类似的各面部特征后,将各面部特征组合以生成一低质量的、不像真实照片的合成面部图像。通过将该合成面部图像作为第一面部图像,同时任意选取一个第二面部图像,并将特定的控制向量设置为11111(即,全部选择第一面部图像中的面部特征图像),可以输出非常类似于真实图像的第三面部图像,以便于嫌疑人的确定。
在根据本申请实施例的用于面部图像生成的数据处理方法中,通过参照图1所述的各处理步骤可以看出,通过面部特征图像的分割,以及特征空间内的重组,能够生成继承了一部分第一面部图像中的面部特征和一部分第二面部图像中的面部特征的第三面部图像。与现有技术中使用通用处理网络的方案相比,能够在保证输出的第三面部图像与作为输入源的面部图像的相似性的同时,使得输出的第三面部图像接近于真实图像。换言之,当由用户观看该第三面部图像时,难以分辨该图像是真实图像还是合成图像。
并且,通过设置控制向量,能够精确地控制第三面部图像继承两个输入面部图像中的哪些面部特征。图3示出了不同控制向量下的面部图像生成结果。从图3可以看出,通过设置不同的控制向量,可以精确地控制生成的第三面部图像中的五官与作为输入源的两个面部图像的继承关系。
图4示出了在输入的面部图像中增加随机因素的面部图像生成结果,即如在上文中所述的那样,在输入的面部图像是在一个人原有面部特征的基础上从面部特征库中随机挑选并更换一个面部特征而生成的合成图像的情况下的面部图像生成结果。在图4中,从上到下的各行分别示出了在眼睛和眉毛、鼻子、嘴巴以及脸廓上增加随机因素的生成结果。
此外,通过特征空间内属性特征块的叠加,能够指定第三面部图像的属性并进一步提升第三面部图像的和谐度和自然度。
上文中所述的继承网络的主要目的在于,输出与第一面部图像和第二面部图像 在面部特征上相似的第三面部图像,因此其中包括的属性特征块的叠加是在保证相似度前提下的微调。换言之,继承网络输出的第三面部图像在诸如年龄之类的属性方面与作为输入源的第一面部图像和第二面部图像近似。
为了在更大范围中调节输出面部图像的属性,作为另一种可能的实施方式,返回参照图1,在步骤S105之后,还可以进一步包括以下步骤。
在步骤S107,将所述第三面部图像变换至特征空间以生成第三特征图。例如,可以通过编码网络E0来实现图像空间到特征空间的变换。当然,这里的编码网络E0的参数与上文中所述的编码网络E1~E5的参数并不相同。
然后,在步骤S108,将特定的属性信息扩展为所述特征空间中的属性特征图。例如,在步骤S107将二维图像变换为具有长宽高的三维特征图的情况下,可以将属性信息扩展为与三维特征图相同长宽但不同高度的特征图。
接下来,在步骤S109,基于所述属性特征图与所述第三特征图,生成第二合成特征图。
最后,在步骤S110,将第二合成特征图反变换回图像空间,以生成第四面部图像。例如,可以通过解码网络D0来实现特征空间到图像空间的反变换。这里,解码网络D0的参数与上文中的步骤S105中提及的解码网络D的参数也不相同。
由于步骤S107~S110的可选性,因此在图1中以虚线框示出。
可以认为,上文中所述的基于第三面部图像而生成所述第四面部图像是由属性增强网络来实现的。所述属性增强网络可以包括上文中所述的编码网络E0和解码网络D0,且可以通过各种神经网络来实现。如果将该属性增强网络表示为一函数f att,那么该函数的输入包括第三面部图像I o1、以及期望输出的第四面部图像的年龄(y a)和性别(y g),输出为第四面部图像I o2,具体公式如下:
I o2=f att(I o1,y a,y g)   (3)
图5示出了根据本申请实施例的关于属性增强网络的数据流图。如图5所示,第三面部图像I o1通过编码网络E0变换为特征空间中的第三特征图Z 1,然后将Z 1在特征空间中与属性信息y a和y g进行拼接,并通过解码网络D0反变换回图像空间,以得到第四面部图像I o2
与第三面部图像相比,第四面部图像能够在属性上发生很大改变。例如,基于输入的年龄为20岁的第三面部图像,可以输出年龄为5岁的第四面部图像。
图6示出了在指定的控制向量下生成的不同年龄阶段的面部图像。从图6中可 以看出,通过继承网络和属性增强网络,可以生成不同年龄段的面部图像,且每个年龄段的面部差异明显。
图7示出了在指定的控制向量下生成的不同年龄和不同性别的面部图像。从图7中可以看出,通过继承网络和属性增强网络,即使在相同的控制向量下,也可以体现出生成的面部图像由于性别和年龄不同在五官上的差异,如苹果肌、眉毛、法令纹、嘴唇颜色等。
在上文中参照图1到图7详细描述了根据本申请实施例的用于生成面部图像的数据处理方法的具体过程。所述数据处理方法是通过继承网络或者继承网络和属性增强网络来实现的。上文中描述的数据处理方法是在继承网络和属性增强网络的应用过程中执行的处理。然而,如上文中所述,继承网络和属性增强网络可以包括编码网络和解码网络,且编码网络和解码网络中都包括多个待确定的参数。通过训练过程来确定这些参数,从而完成继承网络和属性增强网络的构建。这样,继承网络和属性增强网络才能实现面部图像生成的功能。换言之,在应用继承网络和属性增强网络之前,首先要训练继承网络和属性增强网络。接下来,首先将参照图8描述继承网络的训练过程。所述继承网络可以通过图8中所示的以下训练步骤得到。
为了与上文中的应用过程中的第一到第四面部图像相区分,在下文中,将在训练过程中涉及的面部图像限定为第五至第十面部图像。
首先,在步骤S801,获取第五面部图像(I M)中与各面部特征对应的L个第五图像块,并获取第六面部图像(I F)中与各面部特征对应的L个第六图像块。
这里,需要特别指出的是,在上文中的应用过程中,由于基于作为输入源的两个面部图像仅需要生成一个新的面部图像,因此从输入的两个面部图像中获取的、与面部特征对应的图像块的数量可以小于或等于L,其中L为生成新的面部图像所需的不同面部特征的总数,只需要从两个输入的面部图像中能够拼凑出生成新的面部图像所需的所有面部特征即可。然而,与上文中的应用过程不同,在训练过程中,由于需要基于作为输入源的两个面部图像生成两个新的面部图像,因此从作为输入源的第五面部图像和第六面部图像中获取的图像块的数量均为L个,其中如上文中所述,L为生成新的面部图像所需的不同面部特征的总数。
然后,在步骤S802,根据第一控制向量v 1选择一部分第五图像块和一部分第六图像块以生成第一合成图像
Figure PCTCN2020082918-appb-000003
并根据第二控制向量v 2选择另一部分第五图像块和另一部分第六图像块以生成第二合成图像
Figure PCTCN2020082918-appb-000004
假设交换面部特征所对应的函数为f syn,那么交换面部特征的合成过程可以由如下公式表示:
Figure PCTCN2020082918-appb-000005
为了确保后续的训练效果,对于交换面部特征后的合成图像进一步通过颜色校正的方法进行融合,避免合成图像中出现不连贯的色块。这里,需要指出的是,在上文中描述的应用过程中,也提到了第一面部图像和第二面部图像可以为基于从现有面部特征库中选择的面部特征图像而生成的合成图像。然而,在应用过程中,由于继承网络已经训练完毕,因此合成图像可以不需要执行颜色校正处理。
接下来,在步骤S803,获取第一合成图像
Figure PCTCN2020082918-appb-000006
中与各面部特征对应的L个第七图像块,并获取第二合成图像
Figure PCTCN2020082918-appb-000007
中与各面部特征对应的L个第八图像块。
在步骤S804,将L个第七图像块和L个第八图像块输入到继承网络。
然后,在步骤S805,通过所述继承网络,输出基于第一控制向量选择的一部分第七图像块和一部分第八图像块而生成的第七面部图像(I' M),并输出基于第二控制向量选择的另一部分第七图像块和另一部分第八图像块而生成的第八面部图像(I' F)。其中第五面部图像是用于对第七面部图像提供监督信息的监督图像,第六面部图像是用于对第八面部图像提供监督信息的监督图像,并且将第五至第八面部图像作为一组继承训练数据。
假设继承网络所对应的函数为f inh,那么继承网络的生成过程可以由如下公式表示:
Figure PCTCN2020082918-appb-000008
其中
Figure PCTCN2020082918-appb-000009
Figure PCTCN2020082918-appb-000010
分别表示第五面部图像的属性和性别,
Figure PCTCN2020082918-appb-000011
Figure PCTCN2020082918-appb-000012
分别表示第六面部图像的属性和性别。在训练过程中,将希望输出的面部图像的属性设置为与作为输入源的面部图像的属性相同,以便于后续损失函数的计算。
从以上步骤可以看出,与继承网络的应用过程相比,继承网络的训练过程的不同之处在于,在将作为输入源的面部图像输入到继承网络之前,预先进行一次面部特征交换处理。这样做的目的在于为继承网络输出的面部图像提供监督信息。
具体来说,如果在将作为输入源的第五面部图像和第六面部图像提供至继承网络之前,先通过一个控制向量交换一次面部特征,并将面部特征交换后的合成图像提供至继承网络,那么如果继承网络的参数设置准确,通过使用同样的控制向量再交换一次面部特征,应该能够得到原始的第五面部图像或第六面部图像。
为了便于理解,图9示出了继承网络的训练过程中两次面部特征交换的示意性过程。在图9中,以字母A表示作为输入源的第五面部图像(I M)中各面部特征的 图像块,以字母B表示作为输入源的第六面部图像(I F)中各面部特征的图像块。对于第五面部图像(I M),如果以第一控制向量v 1=01010进行面部特征交换,然后以同样的第一控制向量v 1=01010再次执行面部特征交换,那么将得到与原始的第五面部图像(I M)相同的图像。类似地,对于第六面部图像(I F),如果以第二控制向量v 2=10101进行面部特征交换,然后以同样的第二控制向量v 2=10101再次执行面部特征交换,那么将得到与原始的第六面部图像(I F)相同的图像。注意,这里需要指出的是,第一控制向量v 1和第二控制向量v 2需要彼此相反。
因此,通过将第五面部图像(I M)作为继承网路输出的第七面部图像(I' M)的监督图像,并且将第六面部图像(I F)作为继承网路输出的第八面部图像(I' F)的监督图像,可以不需要建立存在父、母和孩子关系的人脸数据库,而是直接利用任意的已经存在的人脸数据库就能够完成继承网络的训练过程。
在根据本申请实施例的继承网络的训练过程中,采用生成式对抗网络(GAN)的方式来学习。生成式对抗网路包括生成网络和判别网络,通过生成网络与判别网络之间对弈的新方式来学习数据分布。生成网络的目的是尽量去学习真实的数据分布,而判别网络的目的是尽量正确判别输入数据是来自真实数据还是来自生成网络;在训练过程中,生成网络和判别网络需要不断优化,各自提高自己的生成能力和判别能力。
继承网络可以看作是这里的生成网络。此外,还需要针对继承网络输出的图像,设置一个判别网络,如第一判别网络,用于判断向其输入的图像的真伪。所谓真,是指输出的面部图像为真实图像;所谓伪,是指输出的面部图像为继承网络输出的图像。
因此,接下来,在步骤S806,将至少一组继承训练数据输入至第一判别网络,其中所述第一判别网络被设置为当向所述第一判别网络输入一图像时,输出该图像为真实图像的概率值。
最后,在步骤S807,基于第一损失函数,交替地训练所述继承网络和所述第一判别网络,直至所述第一损失函数收敛为止。
图10示出了根据本申请实施例的继承网络的训练过程中的数据流图。由于在训练过程中,如上文中所述,将输入源的两个面部图像分别作为继承网络的两个输出面部图像的监督图像,因此为了便于对照,在图10中同时示出了继承网络的两路输出。事实上,如上文中参照图2所述,每当向继承网络提供两个面部图像作为输入时,仅输出一个面部图像。
如图10所示,第五面部图像I M经过相同的控制向量v 1交换两次后得到第七面 部图像I' M,并以I M作为I' M的监督图像。类似地,第六面部图像I F经过相同的控制向量v 2交换两次后得到第八面部图像I' F,并以I F作为I' F的监督图像。
作为一种可能的实施方式,所述第一损失函数基于所述第一判别网络对于至少一组继承训练数据输出的概率值以及至少一组继承训练数据中面部图像与对应的监督图像之间的像素差异而确定。
具体来说,所述第一损失函数包括对抗损失和像素损失两部分之和。对抗损失
Figure PCTCN2020082918-appb-000013
使继承网络生成的面部图像的分布更接近于真实图像,且可以通过以下公式来计算:
Figure PCTCN2020082918-appb-000014
其中,D I表示第一判别网络,D I(I′ s)为向第一判别网络输入继承网络输出的图像时第一判别网络的输出(概率值),D I(I s)为向第一判别网络输入真实图像时第一判别网络的输出(概率值)。
Figure PCTCN2020082918-appb-000015
表示在输入面部图像I′ s时第一判别网络输出的均值,其中I′ s是继承网络输出的面部图像。
Figure PCTCN2020082918-appb-000016
表示在输入面部图像I s时第一判别网络输出的均值,其中I s是来自真实人脸数据库的面部图像。
此外,作为另一种可能的实施方式,为了使得第一损失函数更加稳定,也可以基于WGAN(Wasserstein GAN)的框架,在其中增加噪声分量,具体公式如下:
Figure PCTCN2020082918-appb-000017
其中λ gp为WGAN的超参数,
Figure PCTCN2020082918-appb-000018
为向第一判别网络输入噪声
Figure PCTCN2020082918-appb-000019
时第一判别网络的输出,
Figure PCTCN2020082918-appb-000020
表示对
Figure PCTCN2020082918-appb-000021
求梯度后的二范数。
像素损失
Figure PCTCN2020082918-appb-000022
用于确保继承网络生成的面部图像与作为输入源的面部图像的相似性,由继承网络生成的面部图像与真实面部图像之间像素级别的损失,即两张图像的像素值间的差异的绝对值之和来表示,具体公式如下:
Figure PCTCN2020082918-appb-000023
因此,第一损失函数可以表示如下:
Figure PCTCN2020082918-appb-000024
其中,λ 11和λ 12为权重系数。
基于第一损失函数交替地训练所述继承网络和所述第一判别网络。具体来说,可以先固定继承网络,并训练第一判别网络。此时,希望第一损失函数的值尽可能 地小。然后,可以再固定第一判别网络,并训练继承网络。此时,希望第一损失函数的值尽可能地大。在经过多轮训练后,当第一损失函数对于不同的继承训练数据的波动不大,即第一损失函数收敛时,完成继承网络的训练。
作为另一种可能的实施方式,除了上文中所述的对抗损失和像素损失之外,第一损失函数还可以进一步基于以下至少之一而确定:至少一组继承训练数据中面部图像的属性与对应的监督图像的属性之间的差异和至少一组继承训练数据中面部图像的特征与对应的监督图像的特征之间的差异。
具体来说,第一损失函数还可以进一步包括属性损失。属性损失由继承网络输出的面部图像的属性与作为输入源的真实面部图像的属性之间的差异来确定。年龄和性别的损失函数分别可以由以下公式来计算:
Figure PCTCN2020082918-appb-000025
Figure PCTCN2020082918-appb-000026
其中,D a和D g分别是判别一个图像的年龄和性别的网络。例如,可以使用ResNet预训练年龄和性别的回归模型,从而当向该模型输入一个图像I′ s时,可以输出该图像的年龄和性别信息。D a(I′ s)表示通过D a判断的面部图像(I′ s)的年龄,D g(I′ s)表示通过D g判断的面部图像(I′ s)的性别。
Figure PCTCN2020082918-appb-000027
表示作为输入源的真实面部图像的年龄,
Figure PCTCN2020082918-appb-000028
表示作为输入源的真实面部图像的性别。
此外,第一损失函数还可以进一步包括感知损失。例如,可以使用19层VGG特征来计算感知损失
Figure PCTCN2020082918-appb-000029
即继承网络输出的面部图像的VGG特征与作为输入源的真实面部图像的VGG特征的距离,具体公式如下:
Figure PCTCN2020082918-appb-000030
其中,
Figure PCTCN2020082918-appb-000031
Figure PCTCN2020082918-appb-000032
分别是指面部图像I s和I′ s在VGG19中第i个池化层前、第j个卷积层的特征。
例如,作为另一种可能的实施方式,第一损失函数也可以表示如下:
Figure PCTCN2020082918-appb-000033
其中,λ 11、λ 12、λ 13、λ 14和λ 15均为不同的权重系数,可以根据各损失函数的重要性来分配。
接下来,将参照图11描述属性增强网络的训练过程。所述属性增强网络可以通过图11中所示的以下训练步骤得到。
在根据本申请实施例的属性增强网络的训练过程中,也采用生成式对抗网络(GAN)的方式来学习。
属性增强网络可以看作是这里的生成网络。此外,还需要针对属性增强网络输出的图像,设置一个判别网络,如第一判别网络,用于判断向其输入的图像的真伪。所谓真,是指输出的面部图像为真实图像;所谓伪,是指输出的面部图像为属性增强网络输出的图像。
如上文中所述,通过将第三面部图像输入到属性增强网络来生成所述第四面部图像,并且所述属性增强网络通过图11中所示的以下训练步骤得到。
首先,在步骤S1101,将第七面部图像(I' M)和第八面部图像(I' F)输入至属性增强网络。
然后,在步骤S1102,通过属性增强网络,输出与第七面部图像对应的第九面部图像
Figure PCTCN2020082918-appb-000034
以及与第八面部图像对应的第十面部图像
Figure PCTCN2020082918-appb-000035
其中第七面部图像是用于对第九面部图像提供监督信息的监督图像,第八面部图像是用于对第十面部图像提供监督信息的监督图像,并且将第七至第十面部图像作为一组属性训练数据。
假设属性增强网络所对应的函数为f att,那么属性增强网络的生成过程可以由如下公式表示:
Figure PCTCN2020082918-appb-000036
其中
Figure PCTCN2020082918-appb-000037
Figure PCTCN2020082918-appb-000038
分别表示第五面部图像的属性和性别,
Figure PCTCN2020082918-appb-000039
Figure PCTCN2020082918-appb-000040
分别表示第六面部图像的属性和性别。在训练过程中,将希望输出的面部图像的属性设置为与作为输入源的面部图像的属性相同,以便于后续损失函数的计算。
接下来,在步骤S1103,将至少一组属性训练数据输入至第二判别网络,其中所述第二判别网络被设置为当向所述第二判别网络输入一图像时,输出该图像为真实图像的概率值。
最后,在步骤S1104,基于第二损失函数,交替地训练所述属性增强网络和所述第二判别网络,直至所述第二损失函数收敛为止。
图12示出了根据本申请实施例的属性增强网络的训练过程中的数据流图。与图10类似地,在图12中也同时示出了属性增强网络的两路输出。
如图12所示,将第七面部图像I' M和第八面部图像I' F输入到属性增强网络,变换到特征空间分别得到特征图Z M和Z F,在特征空间中与属性特征拼接并反变换回图像空间得到第九面部图像
Figure PCTCN2020082918-appb-000041
和第十面部图像
Figure PCTCN2020082918-appb-000042
并分别以第七面部图像I' M和第八面部图像I' F作为第九面部图像
Figure PCTCN2020082918-appb-000043
和第十面部图像
Figure PCTCN2020082918-appb-000044
的监督图像。
作为一种可能的实施方式,所述第二损失函数基于所述第二判别网络对于至少一组属性训练数据输出的概率值以及至少一组属性训练数据中面部图像与对应的监督图像之间的像素差异而确定。
具体来说,所述第二损失函数包括对抗损失和像素损失两部分之和。对抗损失
Figure PCTCN2020082918-appb-000045
使属性增强网络生成的面部图像的分布更接近于真实图像,且可以通过以下公式来计算:
Figure PCTCN2020082918-appb-000046
其中,
Figure PCTCN2020082918-appb-000047
表示第二判别网络,
Figure PCTCN2020082918-appb-000048
为向第二判别网络输入属性增强网络输出的图像时第二判别网络的输出(概率值),
Figure PCTCN2020082918-appb-000049
为向第二判别网络输入真实图像时第二判别网络的输出(概率值)。
Figure PCTCN2020082918-appb-000050
表示在输入面部图像
Figure PCTCN2020082918-appb-000051
时第二判别网络输出的对数的均值,其中
Figure PCTCN2020082918-appb-000052
是属性增强网络输出的面部图像。
Figure PCTCN2020082918-appb-000053
表示在输入面部图像I s时第二判别网络输出的对数的均值,其中I s是来自真实人脸数据库的面部图像。
此外,作为另一种可能的实施方式,为了使得第二损失函数更加稳定,也可以基于WGAN(Wasserstein GAN)的框架,在其中增加噪声分量,具体公式如下:
Figure PCTCN2020082918-appb-000054
其中λ gp为WGAN的超参数,
Figure PCTCN2020082918-appb-000055
为向第二判别网络输入噪声
Figure PCTCN2020082918-appb-000056
时第二判别网络的输出,
Figure PCTCN2020082918-appb-000057
表示对
Figure PCTCN2020082918-appb-000058
求梯度后的二范数。
像素损失
Figure PCTCN2020082918-appb-000059
用于确保属性增强网络生成的面部图像与继承网络输出的面部图像的相似性,由属性增强网络生成的面部图像与继承网络输出的图像之间像素级别的损失,即两张图像的像素值间的差异的绝对值之和来表示,具体公式如下:
Figure PCTCN2020082918-appb-000060
因此,第二损失函数可以表示如下:
Figure PCTCN2020082918-appb-000061
其中,λ 21和λ 22为权重系数。
作为另一种可能的实施方式,除了上文中所述的对抗损失和像素损失之外,第二损失函数还可以进一步基于以下至少之一而确定:至少一组属性训练数据中面部图像的属性与对应的监督图像的属性之间的差异和至少一组属性训练数据中面部 图像的特征与对应的监督图像的特征之间的差异。
具体来说,第二损失函数还可以进一步包括属性损失。属性损失由属性增强网络输出的面部图像的属性与继承网络输出的面部图像的属性之间的差异来确定。年龄和性别的损失函数分别可以由以下公式来计算:
Figure PCTCN2020082918-appb-000062
Figure PCTCN2020082918-appb-000063
其中,D a和D g分别是判别一个图像的年龄和性别的网络。例如,可以使用ResNet预训练年龄和性别的回归模型,从而当向该模型输入一个图像I′ s时,可以输出该图像的年龄和性别信息。
Figure PCTCN2020082918-appb-000064
表示通过D a判断的面部图像
Figure PCTCN2020082918-appb-000065
的年龄,
Figure PCTCN2020082918-appb-000066
表示通过D g判断的面部图像
Figure PCTCN2020082918-appb-000067
的性别。
Figure PCTCN2020082918-appb-000068
表示作为继承网络输出的面部图像的年龄,
Figure PCTCN2020082918-appb-000069
表示作为继承网络输出的面部图像的性别。由于继承网络输出的面部图像的年龄和性别与作为输入源的真实面部图像的年龄和性别相同,因此可以直接使用真实面部图像的年龄和性别作为这里的
Figure PCTCN2020082918-appb-000070
Figure PCTCN2020082918-appb-000071
此外,第一损失函数还可以进一步包括感知损失。例如,可以使用19层VGG特征来计算感知损失
Figure PCTCN2020082918-appb-000072
即属性增强网络输出的面部图像的VGG特征与继承网络输出的面部图像的VGG特征的距离,具体公式如下:
Figure PCTCN2020082918-appb-000073
其中,
Figure PCTCN2020082918-appb-000074
Figure PCTCN2020082918-appb-000075
是指面部图像
Figure PCTCN2020082918-appb-000076
和I′ s在VGG19中第i个池化层前、第j个卷积层的特征。
例如,作为另一种可能的实施方式,第二损失函数也可以表示如下:
Figure PCTCN2020082918-appb-000077
其中,λ 21、λ 22、λ 23、λ 24和λ 25均为不同的权重系数,可以根据各损失函数的重要性来分配。
基于第二损失函数交替地训练所述属性增强网络和所述第二判别网络。具体来说,可以先固定属性增强网络,并训练第二判别网络。此时,希望第二损失函数的值尽可能地小。然后,可以再固定第二判别网络,并训练属性增强网络。此时,希望第二损失函数的值尽可能地大。在经过多轮训练后,当第二损失函数对于不同的属性训练数据的波动不大,即第二损失函数收敛时,完成属性增强网络的训练。
这里,需要指出的是,尽管在属性增强网络的应用过程中,可以大幅地改变原 有输入面部图像的属性(如,年龄),但是在属性增强网络的训练过程中,为了能够提供监督信息,选择与最初输入的面部图像相同的属性。
在上文中,描述了针对继承网络和属性增强网络单独进行的训练过程。作为另一种可能的实施方式,除了继承网络和属性增强网络的单独训练之外,还可以对这两个网络进行联合训练,以寻求全局最优解。
具体来说,所述继承网络和所述属性增强网络通过以下联合训练步骤进一步优化:基于所述第一损失函数和所述第二损失函数,确定总损失函数;基于所述总损失函数,交替地训练所述继承网络和所述属性增强网络与所述第一判别网络和所述第二判别网络,直至所述总损失函数收敛为止。
具体来说,可以将第一损失函数和第二损失函数的加权和作为总损失函数L,具体公式如下:
L=λ 01L inh02L att   (23)
其中,λ 01和λ 02为不同的权重系数,可以根据各损失函数的重要性来分配。
在联合训练过程中,例如,可以先固定继承网络和属性增强网络,并训练第一判别网络和第二判别网络。此时,希望总损失函数的值尽可能地小,统一地调整第一判别网络和第二判别网络的参数。然后,可以再固定第一判别网络和第二判别网络,并训练继承网络和属性增强网络。此时,希望总损失函数的值尽可能地大,统一地调整继承网络和属性增强网络的参数。在经过多轮训练后,当总损失函数收敛时,完成两个网络的联合训练。
在上文中,已经参照图1到图12详细描述了根据本申请实施例的用于面部图像生成的数据处理方法。接下来,将描述根据本申请实施例的用于面部图像生成的数据处理设备。
首先,将简要描述本申请的实施例的应用环境。如图13所示,服务器10通过网络30连接到多个终端设备20。所述多个终端设备20是提供作为输入源的第一面部图像和第二面部图像的设备。所述终端可以是智能终端,例如智能电话、PDA(个人数字助理)、台式计算机、笔记本计算机、平板计算机等,也可以是其他类型的终端。所述服务器10为用于基于现有人脸数据库训练上文中所述的继承网络和属性增强网络的设备。并且,所述服务器也是将完成训练的继承网络和属性增强网络应用于面部图像生成的设备。具体来说,所述服务器10与终端设备20连接,从终端设备20接收第一面部图像和第二面部图像,基于服务器10上的训练好的继承网络和属性增强网络生成第三面部图像或第四面部图像,并将生成的面部图像传送到终端设备20。所述服务器10可以是下文中描述的数据处理设备。所述网络30可以 是任何类型的有线或无线网络,例如因特网。应当认识到,图13所示的终端设备20的数量是示意性的,而不是限制性的。当然,根据本申请实施例的用于面部图像生成的数据处理设备也可以是不联网的单机设备。
图14是图示根据本申请实施例的用于面部图像生成的数据处理设备。如图14所示,数据处理设备1400包括:分割装置1401、第一变换装置1402、选择装置1403、第一合成装置1404和第一反变换装置1405。
分割装置1401用于获取输入的第一面部图像中与各面部特征对应的M个第一图像块,并获取输入的第二面部图像中与各面部特征对应的N个第二图像块。
第一变换装置1402用于将M个第一图像块和N个第二图像块变换到特征空间以生成M个第一特征块和N个第二特征块。第一变换装置1402可以通过第一变换网络(如,编码网络)来执行该变换。
选择装置1403用于根据特定的控制向量选择一部分第一特征块和一部分第二特征块。
在本申请实施例中,所述特定的控制向量包括与各面部特征对应的L个信息位,并且所述选择装置1403进一步被配置为:当所述特定的控制向量中的一个信息位为第一值时,从M个第一特征块中选择与该信息位对应的面部特征的特征块,而当所述特定的控制向量中的该信息位为第二值时,从N个第二特征块中选择与该信息位对应的面部特征的特征块。其中,L为自然数,且M≤L且N≤L。
第一合成装置1404用于至少基于所选择的一部分第一特征块和一部分第二特征块,生成第一合成特征图。
另外,可以对输出的第三面部图像的属性(如,年龄和性别)进行控制。例如,可以指定希望输出的第三面部图像的性别。并且,输入的第一面部图像和第二面部图像的属性信息可能存在较大差异。因此,作为另一种可能的实施方式,所述第一合成装置140进一步被配置为:将指定属性信息扩展为处于所述特征空间中的属性特征块;以及基于所选择的一部分第一特征块、一部分第二特征块以及属性特征块,生成第一合成特征图。
第一反变换装置1405用于将所述第一合成特征图反变换回图像空间以生成第三面部图像。第一反变换装置1405可以通过第一反变换网络(如,解码网络)来执行该反变换。并且,第一变换网络和第一反变换网络构成继承网络。
在根据本申请实施例的用于面部图像生成的数据处理设备中,通过面部特征图像的分割,以及特征空间内的重组,能够生成继承了一部分第一面部图像中的面部特征和一部分第二面部图像中的面部特征的第三面部图像。与现有技术中使用通用 处理网络的方案相比,能够在保证输出的第三面部图像与作为输入源的面部图像的相似性的同时,使得输出的第三面部图像接近于真实图像。换言之,当由用户观看该第三面部图像时,难以分辨该图像是真实图像还是合成图像。并且,通过设置控制向量,能够精确地控制第三面部图像继承两个输入面部图像中的哪些面部特征。此外,通过特征空间内属性特征的叠加,能够指定第三面部图像的属性并进一步提升第三面部图像的和谐度和自然度。
上文中所述的继承网络的主要目的在于,输出与第一面部图像和第二面部图像在面部特征上相似的第三面部图像,因此其中包括的属性特征块的叠加是在保证相似度前提下的微调。换言之,继承网络输出的第三面部图像在诸如年龄之类的属性方面与作为输入源的第一面部图像和第二面部图像近似。
为了在更大范围中调节输出面部图像的属性,作为另一种可能的实施方式,数据处理设备1400可以进一步包括:第二变换装置1406、扩展装置1407、第二合成模块1408和第二反变换装置1409。
第二变换装置1406用于将所述第三面部图像变换至特征空间以生成第三特征图。第二变换装置可以通过第二变换网络(如,编码网络)来执行该变换,且这里的第二变换网络与上文中的第一变换网络不同。
扩展装置1407用于将特定的属性信息扩展为所述特征空间中的属性特征图。
第二合成模块1408用于基于所述属性特征图与所述第三特征图,生成第二合成特征图。
第二反变换装置1409用于将第二合成特征图反变换回图像空间,以生成第四面部图像。第二反变换装置可以通过第二反变换网络(如,解码网络)来执行该变换,且这里的第二反变换网络与上文中的第一反变换网络不同。且第二变换网络和第二反变换网络构成一属性增强网络。
由于第二变换装置1406、扩展装置1407、第二合成模块1408和第二反变换装置1409的可选性,因此在图14中以虚线框示出。
与第三面部图像相比,第四面部图像能够在属性上发生很大改变。例如,基于输入的年龄为20岁的第三面部图像,可以输出年龄为5岁的第四面部图像。
如上文中所述,继承网络和属性增强网络可以包括编码网络和解码网络,且编码网络和解码网络中都包括多个待确定的参数。通过训练过程来确定这些参数,从而完成继承网络和属性增强网络的构建。这样,继承网络和属性增强网络才能实现面部图像生成的功能。换言之,在应用继承网络和属性增强网络之前,首先要训练继承网络和属性增强网络。
因此所述数据处理设备1400进一步包括训练装置1410。
训练装置1410用于在训练模式下,对所述继承网络进行训练。具体来说,训练装置1410包括:预交换模块、第一判别模块和第一训练模块。
预交换模块用于获取第五面部图像(I M)中与各面部特征对应的L个第五图像块,获取第六面部图像(I F)中与各面部特征对应的L个第六图像块,根据第一控制向量选择一部分第五图像块和一部分第六图像块以生成第一合成图像
Figure PCTCN2020082918-appb-000078
并根据第二控制向量选择另一部分第五图像块和另一部分第六图像块以生成第二合成图像
Figure PCTCN2020082918-appb-000079
其中在训练模式下,所述分割装置进一步被配置为获取第一合成图像
Figure PCTCN2020082918-appb-000080
中与各面部特征对应的L个第七图像块,并获取第二合成图像
Figure PCTCN2020082918-appb-000081
中与各面部特征对应的L个第八图像块,并将L个第七图像块和L个第八图像块输入到继承网络。其中,L为自然数,且M≤L且N≤L。
第一判别模块用于接收至少一组继承训练数据,并通过第一判别网络,输出用于判别输入的继承训练数据为真实图像的概率值,其中所述至少一组继承训练数据包括第五至第八面部图像,所述第七面部图像(I' M)通过所述继承网络基于第一控制向量选择一部分第七图像块和一部分第八图像块而生成,所述第八面部图像(I' F)通过所述继承网络基于第二控制向量选择另一部分第七图像块和另一部分第八图像块而生成,其中第五面部图像是用于对第七面部图像提供监督信息的监督图像,第六面部图像是用于对第八面部图像提供监督信息的监督图像。
第一训练模块用于基于第一损失函数,交替地训练所述继承网络和所述第一判别网络,直至所述第一损失函数收敛为止。
其中,所述第一损失函数基于所述第一判别网络对于至少一组继承训练数据输出的概率值以及至少一组继承训练数据中面部图像与对应的监督图像之间的像素差异而确定。
或者,作为另一种可能的实施方式,所述第一损失函数进一步基于以下至少之一而确定:至少一组继承训练数据中面部图像的属性与对应的监督图像的属性之间的差异和至少一组继承训练数据中面部图像的特征与对应的监督图像的特征之间的差异。
此外,训练装置1410还用于在训练模式下,对所述属性增强网络进行训练。
具体来说,所述训练装置1410进一步包括:第二判别模块和第二训练模块。
第二判别模块用于接收至少一组属性训练数据,并通过第二判别网络,输出用于判别输入的属性训练数据为真实图像的概率值,其中所述至少一组属性训练数据包括第七至第十面部图像,所述第九面部图像
Figure PCTCN2020082918-appb-000082
通过所述属性增强网络基于第 七面部图像输出,所述第十面部图像
Figure PCTCN2020082918-appb-000083
通过所述属性增强网络基于第八面部图像输出,其中第七面部图像是用于对第九面部图像提供监督信息的监督图像,第八面部图像是用于对第十面部图像提供监督信息的监督图像。
第二训练模块用于基于第二损失函数,交替地训练所述属性增强网络和所述第二判别网络,直至所述第二损失函数收敛为止。
其中,所述第二损失函数基于所述第二判别网络对于至少一组属性训练数据输出的概率值以及至少一组属性训练数据中面部图像与对应的监督图像之间的像素差异而确定。
或者,作为另一种可能的实施方式,所述第二损失函数进一步基于以下至少之一而确定:至少一组属性训练数据中面部图像的属性与对应的监督图像的属性之间的差异和至少一组属性训练数据中面部图像的特征与对应的监督图像的特征之间的差异。
此外,所述训练装置还可以进一步包括:联合训练模块,用于基于所述第一损失函数和第二损失函数,确定总损失函数,并基于所述总损失函数,交替地训练所述继承网络和所述属性增强网络与第一判别网络和第二判别网络,直至所述总损失函数收敛为止。
由于根据本申请实施例的数据处理设备中各装置的具体操作与根据本申请实施例的数据处理方法中的各步骤完全对应,因此为了避免冗余起见,这里未对其细节展开赘述。本领域的技术人员可以理解,根据本申请实施例的数据处理方法中的各步骤可以类似地应用于根据本申请实施例的数据处理设备中的各装置。
根据本申请实施例的用于面部图像生成的数据处理设备作为硬件实体的一个示例如图15所示。所述终端设备包括处理器1501、存储器1502以及至少一个外部通信接口1503。所述处理器1501、存储器1502以及外部通信接口1503均通过总线1504连接。
对于用于数据处理的处理器1501而言,在执行处理时,可以采用微处理器、中央处理器(CPU,Central Processing Unit)、数字信号处理器(DSP,Digital Singnal Processor)或可编程逻辑阵列(FPGA,Field-Programmable Gate Array)实现;对于存储器1502来说,其包含操作指令,该操作指令可以为计算机可执行代码,通过所述操作指令来实现上述本申请实施例的用于面部图像生成的数据处理方法中的各个步骤。
图16示出了根据本申请实施例的计算机可读记录介质的示意图。如图16所示,根据本申请实施例的计算机可读记录介质1600上存储有计算机程序指令1601。当 所述计算机程序指令1601由处理器运行时,执行参照以上附图描述的根据本申请实施例的用于面部图像生成的数据处理方法。
本申请实施例还提供了一种计算机设备,包括存储器和处理器,该存储器上存储可在处理器上运行的计算机程序,该处理器执行该计算机程序时,可实现上述实施例所述的用于面部图像生成的数据处理方法。该计算机设备可以是上文所述的服务器或任何能进行数据处理的设备。
至此,已经参照图1到图16详细描述了根据本申请实施例的用于面部图像生成的数据处理方法、设备和介质。在根据本申请实施例的用于面部图像生成的数据处理方法、设备和介质中,通过面部特征图像的分割,以及特征空间内的重组,能够生成继承了一部分第一面部图像中的面部特征和一部分第二面部图像中的面部特征的第三面部图像。与现有技术中使用通用处理网络的方案相比,能够在保证输出的第三面部图像与作为输入源的面部图像的相似性的同时,使得输出的第三面部图像接近于真实图像。换言之,当由用户观看该第三面部图像时,难以分辨该图像是真实图像还是合成图像。
并且,在继承网络中,通过设置控制向量,能够精确地控制第三面部图像继承两个输入面部图像中的哪些面部特征。通过特征空间内属性特征的叠加,能够指定第三面部图像的属性并进一步提升第三面部图像的和谐度和自然度。此外,通过额外的属性增强网络,可以在更大范围中改变生成的面部图像的属性。并且,通过训练过程中的两次面部特征交换,可以不需要建立存在父、母和孩子关系的人脸数据库,而是直接利用任意的已经存在的人脸数据库就能够完成继承网络的训练过程,大幅地降低了成本和实现难度。
需要说明的是,在本说明书中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
最后,还需要说明的是,上述一系列处理不仅包括以这里所述的顺序按时间序列执行的处理,而且包括并行或分别地、而不是按时间顺序执行的处理。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到本申请实施例可借助软件加必需的硬件平台的方式来实现,当然也可以全部通过软件来实施。基于这样的理解,本申请实施例的技术方案对背景技术做出贡献的全部或者部分可 以以软件产品的形式体现出来,该计算机软件产品可以存储在存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例或者实施例的某些部分所述的方法。
以上对本申请实施例进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (14)

  1. 一种用于面部图像生成的数据处理方法,由服务器执行,包括:
    获取第一面部图像(I MA)及第二面部图像(I FA);
    获取第一面部图像(I MA)中与面部特征对应的M个第一图像块,并获取第二面部图像(I FA)中与面部特征对应的N个第二图像块;
    将M个第一图像块和N个第二图像块变换到特征空间以生成M个第一特征块和N个第二特征块;
    根据特定的控制向量选择一部分第一特征块和一部分第二特征块;
    至少基于所选择的一部分第一特征块和一部分第二特征块,生成第一合成特征图;以及
    将所述第一合成特征图反变换回图像空间以生成第三面部图像(I O1),
    其中M和N为自然数。
  2. 根据权利要求1所述的方法,其中所述特定的控制向量包括与各面部特征对应的L个信息位,其中L为自然数,且M≤L,N≤L,并且
    根据特定的控制向量选择一部分第一特征块和一部分第二特征块的步骤包括:
    当所述特定的控制向量中的一个信息位为第一值时,从M个第一特征块中选择与该信息位对应的面部特征的特征块,而当所述特定的控制向量中的该信息位为第二值时,从N个第二特征块中选择与该信息位对应的面部特征的特征块。
  3. 根据权利要求1所述的方法,其中至少基于所选择的一部分第一特征块和一部分第二特征块,生成第一合成特征图的步骤包括:
    将指定属性信息扩展为处于所述特征空间中的属性特征块;以及
    基于所选择的一部分第一特征块、一部分第二特征块以及属性特征块,生成第一合成特征图。
  4. 根据权利要求1所述的方法,其中通过将M个第一图像块和N个第二图像块输入到继承网络来生成所述第三面部图像,并且
    所述继承网络通过以下训练步骤得到:
    获取第五面部图像(I M)中与各面部特征对应的L个第五图像块,并获取第六面部图像(I F)中与各面部特征对应的L个第六图像块,其中L为自然数,且M≤L且N≤L;
    根据第一控制向量选择一部分第五图像块和一部分第六图像块以生成第一合成图像
    Figure PCTCN2020082918-appb-100001
    并根据第二控制向量选择另一部分第五图像块和另一部分第六图像块以生成第二合成图像
    Figure PCTCN2020082918-appb-100002
    获取第一合成图像
    Figure PCTCN2020082918-appb-100003
    中与各面部特征对应的L个第七图像块,并获取第二合成图像
    Figure PCTCN2020082918-appb-100004
    中与各面部特征对应的L个第八图像块;
    将L个第七图像块和L个第八图像块输入到继承网络;
    通过所述继承网络,输出基于第一控制向量选择的一部分第七图像块和一部分第八图像块而生成的第七面部图像(I' M),并输出基于第二控制向量选择的另一部分第七图像块和另一部分第八图像块而生成的第八面部图像(I' F),其中第五面部图像是用于对第七面部图像提供监督信息的监督图像,第六面部图像是用于对第八面部图像提供监督信息的监督图像,并且将第五至第八面部图像作为一组继承训练数据;
    将至少一组继承训练数据输入至第一判别网络,其中所述第一判别网络被设置为当向所述第一判别网络输入一图像时,输出该图像为真实图像的概率值;以及
    基于第一损失函数,交替地训练所述继承网络和所述第一判别网络,直至所述第一损失函数收敛为止。
  5. 根据权利要求4所述的方法,其中所述第一损失函数基于所述第一判别网络对于至少一组继承训练数据输出的概率值以及至少一组继承训练数据中面部图像与对应的监督图像之间的像素差异而确定。
  6. 根据权利要求5所述的方法,其中所述第一损失函数进一步基于以下至少之一而确定:
    至少一组继承训练数据中面部图像的属性与对应的监督图像的属性之间的差异和至少一组继承训练数据中面部图像的特征与对应的监督图像的特征之间的差异。
  7. 根据权利要求4所述的方法,进一步包括:
    将所述第三面部图像(I O1)变换至特征空间以生成第三特征图;
    将特定的属性信息扩展为所述特征空间中的属性特征图;
    基于所述属性特征图与所述第三特征图,生成第二合成特征图;以及
    将第二合成特征图反变换回图像空间,以生成第四面部图像(I O2)。
  8. 根据权利要求7所述的方法,其中通过将第三面部图像输入到属性增强网络来生成所述第四面部图像,并且
    所述属性增强网络通过以下训练步骤得到:
    将第七面部图像(I' M)和第八面部图像(I' F)输入至属性增强网络;
    通过属性增强网络,输出与第七面部图像对应的第九面部图像
    Figure PCTCN2020082918-appb-100005
    以及与第八面部图像对应的第十面部图像
    Figure PCTCN2020082918-appb-100006
    其中第七面部图像是用于对第九面部图像提供监督信息的监督图像,第八面部图像是用于对第十面部图像提供监督信息的监督图像,并且将第七至第十面部图像作为一组属性训练数据;
    将至少一组属性训练数据输入至第二判别网络,其中所述第二判别网络被设置为当向所述第二判别网络输入一图像时,输出该图像为真实图像的概率值;
    基于第二损失函数,交替地训练所述属性增强网络和所述第二判别网络,直至所述第二损失函数收敛为止。
  9. 根据权利要求8所述的方法,其中所述第二损失函数基于所述第二判别网络对于至少一组属性训练数据输出的概率值以及至少一组属性训练数据中面部图像与对应的监督图像之间的像素差异而确定。
  10. 根据权利要求9所述的方法,其中所述第二损失函数进一步基于以下至少之一而确定:
    至少一组属性训练数据中面部图像的属性与对应的监督图像的属性之间的差异和至少一组属性训练数据中面部图像的特征与对应的监督图像的特征之间的差异。
  11. 根据权利要求8所述的方法,其中所述继承网络和所述属性增强网络通过以下联合训练步骤进一步优化:
    基于所述第一损失函数和所述第二损失函数,确定总损失函数;
    基于所述总损失函数,交替地训练所述继承网络和所述属性增强网络与第一判别网络和第二判别网络,直至所述总损失函数收敛为止。
  12. 一种用于面部图像生成的数据处理设备,包括:
    分割装置,用于获取输入的第一面部图像中与面部特征对应的M个第一图像块,并获取输入的第二面部图像中与面部特征对应的N个第二图像块;
    第一变换装置,用于将M个第一图像块和N个第二图像块变换到特征空间以生成M个第一特征块和N个第二特征块;
    选择装置,用于根据特定的控制向量选择一部分第一特征块和一部分第二特征块;
    第一合成装置,用于至少基于所选择的一部分第一特征块和一部分第二特征块,生成第一合成特征图;以及
    第一反变换装置,用于将所述第一合成特征图反变换回图像空间以生成第三面部图像。
  13. 一种计算机可读记录介质,在其上存储计算机程序,当由处理器执行所述计算机程序时,执行根据权利要求1至11中任意一项所述的方法。
  14. 一种计算机设备,包括存储器和处理器,所述存储器用于存储计算机程序,所述处理器用于执行所述计算机程序以实现权利要求1-11任一项所述的用于面部图像生成的数据处理方法。
PCT/CN2020/082918 2019-04-26 2020-04-02 用于面部图像生成的数据处理方法、设备和介质 WO2020216033A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP20796212.7A EP3961486A4 (en) 2019-04-26 2020-04-02 DATA PROCESSING METHOD AND DEVICE FOR FACIAL IMAGING AND MEDIUM
KR1020217020518A KR102602112B1 (ko) 2019-04-26 2020-04-02 얼굴 이미지 생성을 위한 데이터 프로세싱 방법 및 디바이스, 및 매체
JP2021534133A JP7246811B2 (ja) 2019-04-26 2020-04-02 顔画像生成用のデータ処理方法、データ処理機器、コンピュータプログラム、及びコンピュータ機器
US17/328,932 US11854247B2 (en) 2019-04-26 2021-05-24 Data processing method and device for generating face image and medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910345276.6 2019-04-26
CN201910345276.6A CN110084193B (zh) 2019-04-26 2019-04-26 用于面部图像生成的数据处理方法、设备和介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/328,932 Continuation US11854247B2 (en) 2019-04-26 2021-05-24 Data processing method and device for generating face image and medium

Publications (1)

Publication Number Publication Date
WO2020216033A1 true WO2020216033A1 (zh) 2020-10-29

Family

ID=67417067

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/082918 WO2020216033A1 (zh) 2019-04-26 2020-04-02 用于面部图像生成的数据处理方法、设备和介质

Country Status (6)

Country Link
US (1) US11854247B2 (zh)
EP (1) EP3961486A4 (zh)
JP (1) JP7246811B2 (zh)
KR (1) KR102602112B1 (zh)
CN (1) CN110084193B (zh)
WO (1) WO2020216033A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112613460A (zh) * 2020-12-30 2021-04-06 深圳威富优房客科技有限公司 人脸生成模型的建立方法和人脸生成方法

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110084193B (zh) * 2019-04-26 2023-04-18 深圳市腾讯计算机系统有限公司 用于面部图像生成的数据处理方法、设备和介质
US11373352B1 (en) * 2021-03-04 2022-06-28 Meta Platforms, Inc. Motion transfer using machine-learning models
US11341701B1 (en) * 2021-05-06 2022-05-24 Motorola Solutions, Inc Method and apparatus for producing a composite image of a suspect
CN114708644B (zh) * 2022-06-02 2022-09-13 杭州魔点科技有限公司 一种基于家庭基因模板的人脸识别方法和系统
CN116012258B (zh) * 2023-02-14 2023-10-13 山东大学 一种基于循环生成对抗网络的图像和谐化方法
CN117078974B (zh) * 2023-09-22 2024-01-05 腾讯科技(深圳)有限公司 图像处理方法及装置、电子设备、存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1490764A (zh) * 2002-05-31 2004-04-21 欧姆龙株式会社 用于合成图像的方法、设备、系统、程序以及计算机可读介质
US20080199055A1 (en) * 2007-02-15 2008-08-21 Samsung Electronics Co., Ltd. Method and apparatus for extracting facial features from image containing face
CN103295210A (zh) * 2012-03-01 2013-09-11 汉王科技股份有限公司 婴儿图像合成方法及装置
US20170301121A1 (en) * 2013-05-02 2017-10-19 Emotient, Inc. Anonymization of facial images
CN108171124A (zh) * 2017-12-12 2018-06-15 南京邮电大学 一种相似样本特征拟合的人脸图像清晰化方法
CN110084193A (zh) * 2019-04-26 2019-08-02 深圳市腾讯计算机系统有限公司 用于面部图像生成的数据处理方法、设备和介质

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005107848A (ja) * 2003-09-30 2005-04-21 Lic Corporation:Kk 子供画像生成装置
KR20080106596A (ko) * 2007-03-22 2008-12-09 연세대학교 산학협력단 가상 얼굴 생성 방법
CN106682632B (zh) * 2016-12-30 2020-07-28 百度在线网络技术(北京)有限公司 用于处理人脸图像的方法和装置
US10430978B2 (en) * 2017-03-02 2019-10-01 Adobe Inc. Editing digital images utilizing a neural network with an in-network rendering layer
US10474881B2 (en) * 2017-03-15 2019-11-12 Nec Corporation Video retrieval system based on larger pose face frontalization
CN107273818B (zh) * 2017-05-25 2020-10-16 北京工业大学 遗传算法融合差分进化的选择性集成人脸识别方法
CN107578017B (zh) * 2017-09-08 2020-11-17 百度在线网络技术(北京)有限公司 用于生成图像的方法和装置
CN107609506B (zh) * 2017-09-08 2020-04-21 百度在线网络技术(北京)有限公司 用于生成图像的方法和装置
CN108288072A (zh) * 2018-01-26 2018-07-17 深圳市唯特视科技有限公司 一种基于生成对抗网络的面部表情合成方法
CN108510473A (zh) 2018-03-09 2018-09-07 天津工业大学 结合深度可分离卷积与通道加权的fcn视网膜图像血管分割
CN108510437B (zh) * 2018-04-04 2022-05-17 科大讯飞股份有限公司 一种虚拟形象生成方法、装置、设备以及可读存储介质
CN109508669B (zh) * 2018-11-09 2021-07-23 厦门大学 一种基于生成式对抗网络的人脸表情识别方法
CN109615582B (zh) * 2018-11-30 2023-09-01 北京工业大学 一种基于属性描述生成对抗网络的人脸图像超分辨率重建方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1490764A (zh) * 2002-05-31 2004-04-21 欧姆龙株式会社 用于合成图像的方法、设备、系统、程序以及计算机可读介质
US20080199055A1 (en) * 2007-02-15 2008-08-21 Samsung Electronics Co., Ltd. Method and apparatus for extracting facial features from image containing face
CN103295210A (zh) * 2012-03-01 2013-09-11 汉王科技股份有限公司 婴儿图像合成方法及装置
US20170301121A1 (en) * 2013-05-02 2017-10-19 Emotient, Inc. Anonymization of facial images
CN108171124A (zh) * 2017-12-12 2018-06-15 南京邮电大学 一种相似样本特征拟合的人脸图像清晰化方法
CN110084193A (zh) * 2019-04-26 2019-08-02 深圳市腾讯计算机系统有限公司 用于面部图像生成的数据处理方法、设备和介质

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
See also references of EP3961486A4 *
XIAO YANG ET AL.: "Recognizing Minimal Facial Sketch by Generating Photorealistic Faces With the Guidance of Descriptive Attributes", vol. Speech and Signal Processing (ICASSP), no. 2018 IEEE International Conference on Acoustics, XP033403904, ISSN: 2379-190X *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112613460A (zh) * 2020-12-30 2021-04-06 深圳威富优房客科技有限公司 人脸生成模型的建立方法和人脸生成方法

Also Published As

Publication number Publication date
US11854247B2 (en) 2023-12-26
EP3961486A4 (en) 2022-07-13
CN110084193A (zh) 2019-08-02
KR20210095696A (ko) 2021-08-02
US20210279515A1 (en) 2021-09-09
JP2022513858A (ja) 2022-02-09
KR102602112B1 (ko) 2023-11-13
EP3961486A1 (en) 2022-03-02
CN110084193B (zh) 2023-04-18
JP7246811B2 (ja) 2023-03-28

Similar Documents

Publication Publication Date Title
WO2020216033A1 (zh) 用于面部图像生成的数据处理方法、设备和介质
CN111754596B (zh) 编辑模型生成、人脸图像编辑方法、装置、设备及介质
US20220028139A1 (en) Attribute conditioned image generation
US20200402284A1 (en) Animating avatars from headset cameras
JP7144699B2 (ja) 信号変更装置、方法、及びプログラム
WO2021027759A1 (en) Facial image processing
WO2023050992A1 (zh) 用于人脸重建的网络训练方法、装置、设备及存储介质
JP2023548921A (ja) 画像の視線補正方法、装置、電子機器、コンピュータ可読記憶媒体及びコンピュータプログラム
CN111353546B (zh) 图像处理模型的训练方法、装置、计算机设备和存储介质
CN115565238B (zh) 换脸模型的训练方法、装置、设备、存储介质和程序产品
CN110288513A (zh) 用于改变人脸属性的方法、装置、设备和存储介质
US20220101121A1 (en) Latent-variable generative model with a noise contrastive prior
CN112101087A (zh) 一种面部图像身份去识别方法、装置及电子设备
WO2022166840A1 (zh) 人脸属性编辑模型的训练方法、人脸属性编辑方法及设备
US20220101122A1 (en) Energy-based variational autoencoders
Liu et al. Learning shape and texture progression for young child face aging
CN116825127A (zh) 基于神经场的语音驱动数字人生成方法
KR20210019182A (ko) 나이 변환된 얼굴을 갖는 직업영상 생성 장치 및 방법
CN117237521A (zh) 语音驱动人脸生成模型构建方法、目标人说话视频生成方法
CN116152631A (zh) 模型训练及图像处理方法、装置、设备及存储介质
CN113822790B (zh) 一种图像处理方法、装置、设备及计算机可读存储介质
CN115914505A (zh) 基于语音驱动数字人模型的视频生成方法及系统
US20220101145A1 (en) Training energy-based variational autoencoders
CN114943912A (zh) 视频换脸方法、装置及存储介质
KR102147061B1 (ko) 사용자의 주관적 선호도를 반영한 가상 얼굴 성형 장치 및 방법

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20796212

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021534133

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20217020518

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020796212

Country of ref document: EP

Effective date: 20211126