WO2021254499A1 - Procédé et appareil de génération de modèle d'édition, procédé et appareil d'édition d'image faciale, dispositif et support - Google Patents

Procédé et appareil de génération de modèle d'édition, procédé et appareil d'édition d'image faciale, dispositif et support Download PDF

Info

Publication number
WO2021254499A1
WO2021254499A1 PCT/CN2021/101007 CN2021101007W WO2021254499A1 WO 2021254499 A1 WO2021254499 A1 WO 2021254499A1 CN 2021101007 W CN2021101007 W CN 2021101007W WO 2021254499 A1 WO2021254499 A1 WO 2021254499A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
generator
training
discriminator
editing
Prior art date
Application number
PCT/CN2021/101007
Other languages
English (en)
Chinese (zh)
Inventor
吴臻志
祝夭龙
Original Assignee
北京灵汐科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京灵汐科技有限公司 filed Critical 北京灵汐科技有限公司
Publication of WO2021254499A1 publication Critical patent/WO2021254499A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/001Texturing; Colouring; Generation of texture or colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the embodiment of the present invention relates to the field of artificial intelligence, in particular to the editing of face images and the generation of corresponding editing models.
  • a Generative Adversarial Network including a generator and a discriminator can be used to generate real face images.
  • the generator is used to generate the face image
  • the discriminator is used to distinguish the true and false of the generated face image.
  • the training of the generative confrontation network is actually training the generator and discriminator in the generative confrontation network.
  • the discriminator training is usually completed faster. After the training begins, the discriminator can be quickly trained to determine the true and false of the generated image relatively accurately, but at this time the generator has not learned how to generate real images, and the images generated by the generator cannot pass the judgment of the discriminator. , Resulting in the failure of the training of the entire generative confrontation network.
  • the authenticity of the image generated based on the generative confrontation network obtained by training cannot be guaranteed, and the editing effect of the image editing model obtained based on the generative confrontation network cannot be guaranteed.
  • the embodiment of the present invention provides an editing model generation and face image editing method, device, equipment and medium, which can improve the training consistency of the generator and the discriminator, and improve the authenticity of the generated image.
  • an embodiment of the present invention provides an editing model generation method, including: performing iterative training on a generative confrontation network, the generative confrontation network including a generator and a discriminator; in the iterative training, according to all The gradient update configuration information of the discriminator updates the generative confrontation network, wherein the gradient update configuration information is determined by Lipschitz constraints; when it is determined that the generative confrontation network satisfies the training end condition, it is obtained according to the training
  • the generator in the generative confrontation network generates an image editing model.
  • an embodiment of the present invention provides a face image editing method, including: acquiring a face image to be edited; inputting the face image to be edited into an image editing model to obtain the image editing model The output edited face image.
  • the image editing model is generated by the above-mentioned editing model generation method.
  • an embodiment of the present invention also provides an editing model generation device, including: a network training module for iterative training of a generative confrontation network, the generative confrontation network including a generator and a discriminator; network update A module for updating the generative confrontation network according to the gradient update configuration information of the discriminator in the iterative training, the gradient update configuration information is determined by Lipschitz constraints; a model generation module is used for When it is determined that the generative confrontation network satisfies the training end condition, an image editing model is generated according to the generator in the generative confrontation network obtained by training.
  • an embodiment of the present invention also provides a face image editing device, including: a face image acquisition module for acquiring a face image to be edited; a face image editing module for editing the face image The face image of is input into the image editing model to obtain the edited face image output by the image editing model; wherein, the image editing model is generated by the above-mentioned editing model generation method.
  • an embodiment of the present invention also provides a computer device, including a memory, a processor, and a computer program stored on the memory and running on the processor. Wherein, when the processor executes the program, the editing model generation method or the face image editing method according to any one of the embodiments of the present invention is implemented.
  • an embodiment of the present invention also provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the editing model generation method as described in any one of the embodiments of the present invention is implemented Or face image editing method.
  • the embodiment of the present invention limits the learning rate of the parameter items of the discriminator according to Lipschitz constraints when training a generative confrontation network including a generator and a discriminator, so as to slow down the training speed of the discriminator, and can improve the discriminator and
  • the training consistency of the generator can ensure the accuracy of the discriminator’s identification of true and false images, and enable the generator to quickly learn how to generate real images, thereby improving the editing effect of the image editing model built by the generator Authenticity.
  • FIG. 1A is a flowchart of an editing model generation method in Embodiment 1 of the present invention.
  • FIG. 1B is a schematic diagram of an application scenario of training a generative confrontation network in Embodiment 1 of the present invention.
  • FIG. 2 is a flowchart of an editing model generation method in the second embodiment of the present invention.
  • FIG. 3A is a flowchart of an editing model generation method in Embodiment 3 of the present invention.
  • FIG. 3B is a schematic diagram of an application scenario of a self-supervised training convolutional neural network in Embodiment 3 of the present invention.
  • 4A is a flowchart of a method for editing a face image in the fourth embodiment of the present invention.
  • 4B is a schematic diagram of a face image editing image in the fourth embodiment of the present invention.
  • Figure 5 is a schematic structural diagram of an editing model generating device in the fifth embodiment of the present invention.
  • Fig. 6 is a schematic structural diagram of a face image editing device in the sixth embodiment of the present invention.
  • Fig. 7 is a schematic structural diagram of a computer device in the seventh embodiment of the present invention.
  • FIG. 1A is a flowchart of a method for generating an editing model in Embodiment 1 of the present invention.
  • This embodiment may be suitable for training a generative confrontation network, and generate an image editing model according to a generator in the trained generative confrontation network.
  • the method can be executed by the editing model generation device provided in the embodiment of the present invention, and the device can be implemented in software and/or hardware, and generally can be integrated in computer equipment. As shown in FIG. 1A, the method of this embodiment specifically includes:
  • S110 Perform iterative training on a generative confrontation network (GAN), which includes a generator and a discriminator.
  • GAN generative confrontation network
  • the generator to be trained and the discriminator to be trained constitute a GAN.
  • the generator and the discriminator are actually trained at the same time.
  • samples are used to train the generative confrontation network.
  • training the generative countermeasure network includes: inputting samples of real images or noisy images to the generative countermeasure network, and iteratively trains the generative countermeasure network.
  • the samples include noisy images and real images.
  • the noise image may be a random noise image
  • the real image may include images with real attributes such as real people, real animals, or real scenes.
  • the real image may include a real face image, for example, a face photo.
  • multiple samples can be formed into a sample group, and multiple rounds of iterative training can be performed on the generative confrontation network. Each round of training can use a set number of samples for training, and the set number can be selected according to the actual situation, for example, There are eight, which are not limited in the embodiment of the present invention.
  • a set number of samples can be determined as a sample group, and in one round of training, a sample group is used to train the generative adversarial network.
  • the generative confrontation network includes a generator 101 and a discriminator 102.
  • random noise images or real images can be input into the generator 101 as samples, and the generated images output by the generator 101 can be obtained.
  • the generated image output by the generator 101 and the corresponding random noise image or real image as a sample can be input to the discriminator 102 to obtain the discrimination result output by the discriminator 102, and update the generator 101 and the real image based on the discrimination result.
  • the parameter item of the discriminator 102 is the training process.
  • the generator 101 can be used to edit any image and output the generated image
  • the discriminator 102 is used to determine whether the generated image output by the generator 101 meets the real conditions (or rules). It should be noted that the discriminator 102 is not used to determine whether the generated image is accurate, that is, whether to edit the original image into the required image effect, but to determine the degree of authenticity of the generated image, that is, true or Fake.
  • the generated image is a face image
  • the authenticity of the generated image can be determined according to the positional relationship between the nose and the mouth, and the corresponding real conditions may include that the nose is located directly above the mouth.
  • the generated image is determined to be false; if the nose is located above the mouth in the generated image output by the generator 101, the generated image is determined to be true.
  • the true condition can be used by the determiner 102 to determine whether the generated image output by the generator is true.
  • the discriminator 102 can learn real features to determine whether the image is true or false.
  • the gradient update configuration information is used to determine the learning rate of the parameter item learned from each sample, where the learning rate is used to measure the rate of change of the parameter item. Updating the generative confrontation network is actually updating the parameter items of the generator and/or the parameter items of the discriminator. In this step, the learning rate of each parameter item can be determined according to the gradient update configuration information of the discriminator, so as to update the generative confrontation network based on the learning rate.
  • the target learning rate of each parameter item may be determined according to the gradient update configuration information. Among them, the target learning rate is used to indicate the fastest learning rate achievable for each parameter item, or the target learning rate is used to define the suitability of the learning rate. After determining the target learning rate, the generative confrontation network can be updated according to the current learning rate and the target learning rate.
  • the value of the parameter item when entering the current round of training can be obtained as the pre-update value, and the value of the parameter item determined based on the current round of training can be used as the proposed update value; according to the parameter The value to be updated and the value before the update of the item, calculate the current learning rate of the parameter item, and judge whether the current learning rate matches the target learning rate.
  • the current learning rate matches the target learning rate, use the to-be-updated value to update the parameter item, that is, to use the to-be-updated value as the final value of the parameter item in the current round of training, and as the parameter item in the next round of training If the current learning rate does not match the target learning rate, determine the target value of the parameter item according to the target learning rate, and use the target value to update the parameter item, that is, the target value as the current round of training The final value of the parameter item will be used as the pre-update value of the parameter item in the next round of training.
  • the target learning rate can be determined according to Lipschitz constraints.
  • the function f(x) satisfies the Lipschitz condition, then the function f(x) is uniformly continuous.
  • the Lipschitz constraint limits the rate of change of the function f(x), that is, limits the range of change of the function f(x) not to exceed a certain constant, and its slope must be less than Lipschitz’s constant L, which can be based on The Pushtz constant L determines the learning rate.
  • the learning rate of each parameter item in the discriminator can be reduced, so as to ensure the accuracy of the discriminator’s identification of true and false images, while allowing the generator to quickly learn how to generate real images, and then The generator is effectively applied to the image editing model of real images.
  • updating the generative countermeasure network according to the gradient update configuration information of the discriminator includes: updating the configuration information according to the gradient update configuration information of the discriminator, and determining the parameter corresponding to each feature extraction layer in one or more feature extraction layers included in the discriminator Maximum learning rate threshold; for each of the feature extraction layers in the discriminator, according to the parameter learning rate maximum threshold of the feature extraction layer, update the parameter items of the feature extraction layer in the discriminator to make the parameters associated with the feature extraction layer The update rate of the item matches the maximum threshold of the parameter learning rate corresponding to the feature extraction layer.
  • the maximum parameter learning rate threshold is used to determine the maximum learning rate of the parameter item.
  • the parameter item refers to the parameter item of the generative confrontation network. Specifically, it can refer to one or more parameter items corresponding to one or more feature extraction layers in the discriminator .
  • the feature extraction layer is used to extract feature information from the input and output it.
  • the discriminator can be a learning model of any depth, and is usually a structure including multiple feature extraction layers.
  • the learning rate of the parameter item to be updated relative to the value before the update needs to be less than or equal to the maximum learning rate determined according to the maximum threshold of the parameter learning rate.
  • the maximum threshold of the parameter learning rate can be configured for some or all of the parameter items; or, the parameter item that requires the configuration of the maximum threshold of the parameter learning rate can be customized according to the actual situation.
  • the embodiment of the present invention does not make a specific limitation.
  • update the parameter items of the feature extraction layer in the discriminator which can be specifically: According to the gradient update configuration information, each parameter item of one or more parameter items associated with the feature extraction layer can be determined The target learning rate of the parameter item; for each parameter item, obtain the value to be updated for the parameter item; calculate the learning rate of the parameter item according to the value to be updated and the value before the update of the parameter item; determine the learning rate and the parameter item The size relationship of the target learning rate.
  • the learning rate is less than or equal to the target learning rate, it is determined that the learning rate matches the target learning rate, and the parameter item is updated according to the expected update value of the parameter item; when the learning rate is greater than the target learning rate, it is determined that the learning rate does not match the target learning rate. Calculate the target value of the parameter item according to the target learning rate, and update the parameter item with the target value.
  • the target value of the parameter item can be calculated based on the following formula:
  • is the learning rate
  • J( ⁇ 0 , ⁇ 1 ) is the fitting function
  • ⁇ 0 is the value before the update of the parameter term
  • ⁇ 1 is the target value of the parameter term.
  • the value of ⁇ can be determined according to the value of the constant L in the aforementioned Lipschitz constraint condition, for example, it can be equal to the aforementioned target learning rate.
  • the maximum learning rate of each parameter item can be limited, the learning rate of each parameter item of the discriminator can be slowed down, and the discriminator and the discriminator in the generative confrontation network can be effectively improved.
  • the learning consistency of the generator enables the generator to quickly learn how to generate real images while ensuring the accuracy of the discriminator’s identification of true and false images, so that the generator can be effectively applied to the image editing model structure of real images .
  • the training end condition is used to judge whether the training of the generative confrontation network is completed.
  • the loss function will converge to a set value
  • the training end condition can be configured to be that the calculated value of the loss function is less than the set value, or that the update rate of the loss function is less than the set threshold, etc.
  • the generator in it can generate real images relatively accurately.
  • the image editing model can be obtained by adjusting the generator.
  • the real image can be edited using the image editing model, and the edited image output correspondingly is the real image.
  • the editing mode of the image editing model may include changes in attributes such as the position, size, brightness, and color of pixels in the image.
  • the editing method of the image editing model does not change the true nature of the image, and usually the image obtained after editing the real image is still the real image.
  • the editing method includes editing at least one of the skin color, age, gender, and organ region of the human face. For example, edit the skin color of the face from yellow to white; edit the age feature of the face from 50 to 10; edit the gender feature of the face from male to female; edit the single eyelid of the face to double eyelid, etc.
  • the generator includes an encoder and a decoder.
  • the intermediate results corresponding to these intermediate layers can affect the final output result of the generator, that is, the final image editing effect.
  • the output results of a specific one or more intermediate layers can be obtained from the generator as a hidden space (Latent Space), and the hidden space can be adjusted and sent to the cascade structure behind the generator to achieve the effect of image editing. That is, the image editing model can be generated by adjusting the parameters of the hidden space of the generator.
  • the gender characteristics of the face image can be adjusted by editing the hidden space.
  • a female face is input, and a male face is output.
  • the hidden space can be selected according to the specific structure of the generator.
  • the generator includes an encoder and a decoder, and the hidden space is a neural network layer in the decoder. Editing the hidden space may be: obtaining the parameter items of the pre-trained image editing model, and updating the parameter items of the hidden space of the generator.
  • image editing samples can be used to continue training the generator to generate an image editing model.
  • the image editing sample includes the real image before editing and the real image after editing.
  • the image editing sample may include a face image before editing and a face image after editing.
  • the correlation between the face image after editing and the face image before editing can be selected according to the actual situation.
  • the correlation may include gender, age, skin color, etc., which is not limited in the embodiment of the present invention.
  • a pre-trained standard encoder can be used to replace the encoder in the generator to extract effective features from the input image.
  • the standard encoder is used to learn how to extract features that can characterize the input image from the input image.
  • the input size of the decoder in the generator matches the output size of the standard encoder, where the size can be the dimension of the vector.
  • the generative confrontation network by inputting a sample group including noisy images and/or real images to a generative confrontation network including a discriminator and a generator, the generative confrontation network is trained, and the parameters of the discriminator are limited according to Lipschitz constraints
  • the learning rate of the item can slow down the learning rate of each parameter item of the discriminator, which can effectively improve the consistency of the training rate of the discriminator and generator in the generative confrontation network.
  • FIG. 2 is a flowchart of a method for generating an editing model in Embodiment 2 of the present invention. This embodiment is embodied on the basis of Embodiment 1 described above.
  • the method of this embodiment specifically includes:
  • S210 Perform iterative training on the generative confrontation network, which includes a generator and a discriminator.
  • the loss function configuration information can be used to add the Euclidean distance norm to the initial loss function, and the elements included in the Euclidean distance norm are the parameter items of the encoder in the generator.
  • the training process of the generative confrontation network is actually the process of solving the algorithm used to realize the input to the output, and the solving of the algorithm is actually the numerical value of each parameter item in the algorithm.
  • the algorithm has an objective function, and the solution process of the algorithm is an optimization process of the objective function.
  • the loss function can be used as the objective function.
  • the loss function is used to express the degree to which the predicted value of the generative confrontation network is different from the true value. For example, the smaller the value of the loss function, the better the performance of the corresponding generative adversarial network.
  • different models use different loss functions.
  • the loss function is used as the training target of the generative confrontation network.
  • the loss function can be of the following form:
  • the training discriminator D takes maximizing logD(m) as the training goal, so as to continuously improve the accuracy of the discriminator to determine whether the generated image output by the generator is true or not, while training the generator G to minimize 1-logD(G( n)) is the training target, so as to continuously reduce the difference between the generated image output by the generator and the real image.
  • the training effect of the discriminator and generator confrontation training can be achieved.
  • the Euclidean distance norm can be added as a constraint condition on the basis of the initial loss function. Since the Euclidean distance norm can be decomposed into a combination of two low-dimensional parameter matrices, adding the Euclidean distance norm as a constraint condition can effectively reduce the dimension of the parameter matrix and the sample requirement.
  • the training of generative confrontation network may also have the problem of over-fitting.
  • the trained generative confrontation network has good generation effect and discrimination accuracy only for certain types of real images, while the generation effect for unknown types of real images And the accuracy of discrimination is poor.
  • it can also be considered to add the Euclidean distance norm as a constraint condition on the basis of the initial loss function, so that the distribution of the mapping to the hidden space is more even, thereby reducing the coupling of each feature vector, and correspondingly improving the generative confrontation network
  • the loss function configuration information is used to add the Euclidean distance norm on the basis of the initial loss function.
  • the Euclidean distance norm can also be called the regularization term, or the L2 norm, which refers to the result of the square sum of each element.
  • Adding the Euclidean distance norm is equivalent to adding constraints to the initial loss function. In fact, it severely penalizes large-value weight vectors to tend to more dispersed weight vectors, so as to achieve a more uniform weight distribution and avoid weights concentrated in With a small number of vectors, the generative adversarial network is closer to a low-dimensional model. The lower the dimensionality, the smaller the amount of data used for training. Therefore, adding the Euclidean distance range to the initial loss function as a constraint condition can reduce the amount of data used in the training of the generative confrontation network, thereby reducing the complexity of the training of the generative confrontation network.
  • the updated loss function can be in the following form:
  • ⁇ g represents the parameter item matrix of the hidden space of the generator G, which can be specifically the parameter item matrix of the hidden space of the encoder in the generator G.
  • is the penalty coefficient, which is used to adjust the training complexity of the generative confrontation network, which can be set according to the actual situation.
  • ⁇ F stands for norm operation, Represents the Euclidean distance norm of the parameter matrix ⁇ g of the hidden space.
  • the elements included in the Euclidean distance norm may be the parameter item matrix ⁇ g of the hidden space of the generator G, and specifically may be the parameter items of the encoder in the generator.
  • the stability condition is used to judge whether the loss function tends to be stable or convergent.
  • the stable condition is used to determine whether the change rate of the loss function in adjacent training rounds is less than a set change rate threshold, and the size of the change rate threshold can be limited according to actual conditions. It can be understood that the value of the loss function changes very little with the number of training rounds, which indicates that the loss function is stable.
  • the rate of change of the loss function may be: the difference between the current value of the loss function calculated in the current round of training and the historical value of the loss function calculated in the previous round of training, relative to the ratio of the current value of the loss function.
  • the stable condition may be to determine whether the number of training rounds exceeds the set round number threshold. If the number of training rounds of the generative confrontation network is sufficient, it can be determined that the training of the generative confrontation network is completed.
  • the weight distribution of the vectors can be made more uniform, and the weights can be prevented from being concentrated on a few vectors, thereby not only reducing the amount of data used in the training of the generative confrontation network and
  • the computational complexity can also improve the generalization ability of the trained generative countermeasure network, that is, expand the range of real images applicable to the trained generative countermeasure network, thereby ensuring the application of the trained generative countermeasure network The accuracy and editing effect of images in unknown categories.
  • FIG. 3A is a flowchart of a method for generating an editing model in Embodiment 3 of the present invention. This embodiment is embodied on the basis of the above-mentioned embodiment.
  • the method of this embodiment specifically includes:
  • the generative confrontation network includes a generator and a discriminator.
  • the image feature detection model is obtained by training based on image feature samples.
  • the image feature sample may include two regional image blocks in the same image and relationship data between the two regional image blocks.
  • the image feature detection model may include two convolutional neural networks sharing weights, a feature vector splicer, and a fully connected network classifier.
  • the convolutional neural network is used to extract the feature information of the regional image block and form a feature vector
  • the feature vector splicer is used to synthesize the feature vector generated by each convolutional neural network into the target feature vector
  • the fully connected network classifier is used to combine the target feature The vector is classified, and the relationship data between the image blocks in each area is output.
  • the image feature detection model is used to extract features from the image, and can be a pre-trained deep learning model. Specifically, the image feature detection model can learn to extract the features of image blocks in different regions and the relationship between image blocks in different regions in a self-supervised manner.
  • the image blocks of different regions are partial image regions in the same image, and there is no overlap or overlap between the two regional image blocks.
  • the number and specific settings of regional image blocks in the same image can be selected according to actual conditions.
  • the target object is detected in the image, and the target object is divided into nine equal parts (for example, in the form of nine square grids).
  • the embodiment of the present invention does not make a specific limitation.
  • the relationship data is used to describe the relationship between the two regional image blocks.
  • the relationship data may be at least one of the position relationship, size relationship, shape relationship, and color relationship of the regional image blocks.
  • the relationship data includes a position relationship.
  • the positional relationship in the relationship data may include upper left, upper middle, upper right, right left, middle, right right, bottom left, bottom middle, and bottom right.
  • the feature information of the regional image block is used to represent the regional image block in the form of data, for example, a feature vector.
  • the feature information represents regional image blocks from different dimensions, and the feature vector can be used to represent the corresponding dimensional information.
  • the convolutional neural network and the feature vector stitcher are used to map the original image data to the hidden space, and the fully connected network classifier is used to map the learned distributed feature representation to the sample label space, so that the classification of the sample can be determined based on the sample label.
  • the convolutional neural network can use the PixelShuffle method to realize the upsampling of the feature map to reduce the artifact effect caused by the transposed convolution or the ordinary linear interpolation upsampling method, which can improve the generation based on the convolutional neural network structure The authenticity of the generated image output by the generator.
  • the image feature detection model may include a first convolutional neural network 301 and a second convolutional neural network 302 that share weights, a feature vector splicer 303, and a fully connected network classifier 304.
  • the convolutional neural network used to construct the generator may be any one of the first convolutional neural network 301 and the second convolutional neural network 302.
  • the specific operation of the image feature detection model may include: dividing the face image into at least two regional image blocks (for example, the mouth region image block and the right eye region image block); in this embodiment, the mouth region image block can be input to the first Perform feature extraction in a convolutional neural network 301 to obtain the first feature vector output by the first convolutional neural network 301; input the image block of the right eye area into the second convolutional neural network 302 for feature extraction to obtain the second volume
  • the second feature vector output by the product neural network 302; the first feature vector and the second feature vector are input to the feature vector splicer 303 for splicing, and the spliced feature vector output by the feature vector splicer 303 is obtained; the spliced feature vector Input to the fully connected network classifier 304 for classification, and obtain the relationship data between the image block of the mouth area and the image block of the right eye area.
  • the fully connected network classifier 304 can determine that the image block of the right eye area is at the upper right of the image block
  • the image feature sample may include two face organ region image blocks in the same face image and the relationship data between the two face organ region image blocks.
  • the facial organ region image blocks may be image blocks divided according to facial organs, for example, a nose region image block and a mouth region image block.
  • the relationship data between the image blocks of the face organ region may indicate the relative positional relationship of the two face organ region image blocks in the face image. For example, if the nose area image block is in the middle and the mouth area image block is in the middle and bottom, the relational data may be that the nose area image block is located above the mouth area image block.
  • the facial organ region image blocks in the face image as the image feature sample, the feature information used to distinguish the facial organs can be accurately extracted from the face image, and learning can be performed. In this way, it can help to accurately identify the various organs of the face image to be edited, so that the authenticity of face editing can be improved.
  • the decoder of the generator may include a convolutional neural network.
  • the convolutional neural network in the pre-trained image feature detection model can be used as the convolutional neural network in the decoder of the generator; or, the parameter items of the convolutional neural network in the pre-trained image feature detection model , Migrate to the convolutional neural network in the decoder of the generator.
  • the convolutional neural network in the pre-trained image feature detection model may be additionally added to the existing feature extraction network of the decoder of the generator.
  • the convolutional neural network and other feature extraction networks can share weights
  • the output feature vector of the convolutional neural network and the output feature vector of other feature extraction layers can be spliced
  • the spliced feature vector can be input to the original feature extraction Layer output feature vector module, for example, fully connected network classifier, etc.
  • S340 Generate an image editing model according to the updated generator.
  • the updated generator uses a self-supervised learning method to train the generated convolutional neural network.
  • a small number of samples can be used to complete the training of the convolutional neural network, which effectively reduces the demand for the generator's training samples and improves the training speed .
  • the generator in the generative confrontation network is updated by the convolutional neural network trained and generated in advance based on the self-supervised learning method, and the image editing model is constructed based on the updated generator, which can effectively extract the input image of the image editing model
  • the features in, reduce the demand for labeling samples, reduce the training sample size of the image editing model, thereby increasing the generation speed of the image editing model, and reducing the labeling labor cost of the image editing model.
  • Fig. 4A is a flowchart of a method for editing a face image in the fourth embodiment of the present invention.
  • This embodiment is applicable to a situation where an image editing model is used to edit a face image.
  • the method may be executed by the face image editing apparatus provided by the embodiment of the present invention, and the apparatus may be implemented in software and/or hardware, and may generally be integrated into computer equipment.
  • the method of this embodiment specifically includes:
  • the human face image is a real image including the human face. For example, photos taken by users themselves. It should be noted that the face images of cartoon characters are not real images.
  • S420 Input the face image to be edited into the image editing model to obtain the edited face image output by the image editing model.
  • the image editing model is generated by the editing model generation method in any of the foregoing embodiments of the present invention.
  • the image editing model can be generated by the editing model generation method of any of the foregoing embodiments of the present invention.
  • the generator in the image editing model or the decoder in the generator is derived from the generative confrontation network obtained by the method for generating the editing model in any of the foregoing embodiments of the present invention.
  • the generative confrontation network includes a generator and a discriminator, and uses Lipschitz constraints to determine the gradient update configuration information of the discriminator, so as to slow down the learning rate of each parameter item of the discriminator, so that the discriminator learns how to distinguish between real images and
  • the rate at which the generator learns how to generate real images is as consistent as possible, so as to ensure both the accuracy of the generative adversarial network to distinguish real images and the authenticity of the generated images.
  • the first image on the left is a standard processed image commonly used in teaching books, and this image can be used as a real face image.
  • the second image in the middle is a video frame in a dynamic video.
  • the third image on the right is the edited image formed by editing the first image to simulate the mouth opening action of the intermediate video frame.
  • the embodiment of the present invention uses Lipschitz constraints to constrain the gradient update configuration information of the discriminator in the training of the generative confrontation network, which can effectively slow down the learning rate of each parameter item of the discriminator, and improve the performance of the discriminator and generator. Training consistency guarantees the authenticity of the generated image output by the generator while ensuring the discrimination accuracy of the final trained discriminator. In this way, the editing model built based on the generator in the generative confrontation network that is finally trained can effectively ensure the authenticity of the image editing effect when used to obtain the edited image of the real face image, and further improve the user experience .
  • Fig. 5 is a schematic diagram of an editing model generating device in the fifth embodiment of the present invention.
  • the fifth embodiment is a corresponding device that implements the editing model generation method provided in the foregoing embodiment of the present invention.
  • the device can be implemented in software and/or hardware, and generally can be integrated with computer equipment.
  • the apparatus of this embodiment may include:
  • the network training module 510 is used for iterative training of a generative confrontation network, the generative confrontation network including a generator and a discriminator;
  • the network update module 520 is configured to update the generative confrontation network according to the gradient update configuration information of the discriminator in the iterative training, and the gradient update configuration information is determined by Lipschitz constraint conditions;
  • the model generation module 530 is configured to generate an image editing model according to the generator in the trained generative confrontation network when it is determined that the generative confrontation network meets the training end condition.
  • the embodiment of the present invention uses real images and/or noise images as samples to be input to the generative countermeasure network to iteratively train a generative countermeasure network including a generator and a discriminator, and limit the parameter items of the discriminator according to Lipschitz constraints
  • a generative countermeasure network including a generator and a discriminator
  • limit the parameter items of the discriminator according to Lipschitz constraints In order to improve the learning consistency of the discriminator and the generator, it can ensure the accuracy of the discriminator’s identification of true and false images while ensuring the authenticity of the generated image output by the generator, and then the generator can be effectively applied
  • the image editing model structure of the real image the authenticity of the effect of editing the face image based on the image editing model structure is improved.
  • the model generation module 530 includes a loss function calculation unit for: adding Euclidean distance norm as a constraint condition on the basis of the initial loss function according to the configuration information of the loss function to obtain the loss function of the generative countermeasure network.
  • the elements included in the distance norm are the parameter items of the encoder in the generator; when the loss function is determined to meet the convergence condition, it is determined that the generative confrontation network meets the training end condition, and the generator in the generated confrontation network is confronted according to the training Generate image editing model.
  • the network update module 520 includes a discriminator parameter item update unit, which is used to: according to the gradient update configuration information of the discriminator, determine that the parameter learning rate corresponding to each feature extraction layer in the one or more feature extraction layers included in the discriminator is the largest Threshold; For each feature extraction layer in the discriminator, according to the maximum threshold of the parameter learning rate of the feature extraction layer, update the parameter items of the feature extraction layer in the discriminator so that the update rate of the parameter items associated with the feature extraction layer is the same as The parameter learning rate corresponding to the feature extraction layer is matched with the maximum threshold value.
  • the model generation module 530 includes a self-supervised generation unit, which is used to: based on the pre-trained image feature detection model of the convolutional neural network, update the generator in the generative confrontation network obtained by training, and specifically update the generator in the generator Decoder; Generate image editing model based on the updated generator.
  • the image feature detection model is obtained by training based on image feature samples, and the image feature samples include two regional image blocks in the same image and the relationship data between the two regional image blocks.
  • the image feature detection model can include two convolutional neural networks that share weights, a feature vector splicer, and a fully connected network classifier.
  • the convolutional neural network is used to extract the feature information of the regional image block and form a feature vector; the feature vector splicer is used to synthesize the feature vector generated by each convolutional neural network into the target feature vector; the fully connected network classifier is used to perform the target feature vector Categorize and output the relationship data between image blocks in each area.
  • the image feature sample includes two face organ region image blocks in the same face image and the relationship data between the two face organ region image blocks.
  • the network training module 510 includes a training unit for inputting samples including real images and/or noisy images into the generative confrontation network, and performs a round of training on the generative confrontation network.
  • the above-mentioned editing model generating device can execute the editing model generating method provided by any one of the embodiments of the present invention, and achieve the same beneficial effects.
  • Fig. 6 is a schematic diagram of a face image editing device in the sixth embodiment of the present invention.
  • the sixth embodiment is a corresponding device that implements the face image editing method provided in the foregoing embodiment of the present invention.
  • the device can be implemented in software and/or hardware, and generally can be integrated with computer equipment.
  • the apparatus of this embodiment may include:
  • the face image acquisition module 610 is used to acquire the face image to be edited
  • the face image editing module 620 is configured to input the face image to be edited into the image editing model to obtain the edited face image output by the image editing model; wherein, the image editing model is adopted as in any of the foregoing embodiments of the present invention Generated by editing model generation method.
  • the gradient update configuration information of the discriminator in the generative confrontation network is determined according to Lipschitz constraints, and the learning rate of the parameter items of the discriminator is restricted based on the gradient update configuration information, which can improve the generative confrontation network
  • the training consistency of the middle discriminator and the generator can ensure the authenticity of the image editing effect of the editing model constructed based on the generator in the generative confrontation network of the final training, which effectively improves the user experience.
  • the aforementioned facial image editing device can execute the facial image editing method provided by any one of the embodiments of the present invention, and achieve the same beneficial effects.
  • FIG. 7 is a schematic structural diagram of a computer device according to Embodiment 7 of the present invention.
  • Figure 7 shows a block diagram of an exemplary computer device 12 suitable for implementing embodiments of the present invention.
  • the computer device 12 shown in FIG. 7 is only an example, and should not bring any limitation to the function and application scope of the embodiment of the present invention.
  • the computer device 12 is represented in the form of a general-purpose computing device.
  • the components of the computer device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 connecting different system components (including the system memory 28 and the processing unit 16).
  • the bus 18 represents one or more of several types of bus structures, including a memory bus, a peripheral bus, or any bus structure using multiple bus structures.
  • these bus structures include but are not limited to Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, enhanced ISA bus, Video Electronics Standards Association (Video Electronics Standards) Association, VESA) local bus and Peripheral Component Interconnect (PCI) bus.
  • the computer device 12 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by the computer device 12, including volatile and nonvolatile media, removable and non-removable media.
  • the system memory 28 may include computer system readable media in the form of volatile memory, such as random access memory (RAM) 30 and/or cache memory 32.
  • the computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media.
  • the storage system 34 may be used to read and write non-removable, non-volatile magnetic media (not shown in FIG. 7 and generally referred to as a "hard drive").
  • a disk drive for reading and writing to a removable non-volatile disk (such as a "floppy disk") and a removable non-volatile optical disk (such as a compact disk read-only memory (Compact Disk)) can be provided.
  • each drive can be connected to the bus 18 through one or more data media interfaces.
  • the system memory 28 may store at least one program product, the program product having a set (for example, at least one) program modules, and these program modules are configured to perform the functions of the various embodiments of the present invention.
  • the program/utility tool 40 having a set of (at least one) program module 42 may be stored in the system memory 28, for example.
  • the program module 42 includes, but is not limited to, an operating system, one or more application programs, other program modules, and program data. Each of these examples or some combination may include the implementation of a network environment.
  • the program module 42 generally executes the functions and/or methods in the described embodiments of the present invention.
  • the computer device 12 can also communicate with one or more external devices 14 (such as keyboards, pointing devices, displays 24, etc.), and can also communicate with one or more devices that enable users to interact with the computer device 12, and/or communicate with Any device (such as a network card, modem, etc.) that enables the computer device 12 to communicate with one or more other computing devices. This communication can be performed through an input/output (Input/Output, I/O) interface 22.
  • the computer device 12 may also communicate with one or more networks (for example, a local area network (LAN), a wide area network (WAN)) through the network adapter 20. As shown in the figure, the network adapter 20 communicates with other modules of the computer device 12 through the bus 18. It should be understood that although not shown in FIG.
  • the processing unit 16 executes various functional applications and data processing by running the program module 42 stored in the system memory 28, for example, to implement an editing model generation method and/or a face image editing method provided by any embodiment of the present invention .
  • the eighth embodiment of the present invention provides a computer-readable storage medium on which a computer program is stored.
  • the program When the program is executed by a processor, it realizes the editing model generation method as provided in all the invention embodiments of this application, or realizes all
  • An embodiment of the invention provides a method for editing a face image.
  • the computer storage medium of the embodiment of the present invention may adopt any combination of one or more computer-readable media.
  • the computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above. More specific examples of computer-readable storage media (non-exhaustive list) include: electrical connections with one or more wires, portable computer disks, hard disks, RAM, Read Only Memory (ROM), erasable Erasable Programmable Read Only Memory (EPROM), flash memory, optical fiber, portable CD-ROM, optical storage device, magnetic storage device, or any suitable combination of the above.
  • the computer-readable storage medium can be any tangible medium that contains or stores a program, and the program can be used by or in combination with an instruction execution system, apparatus, or device.
  • the computer-readable signal medium may include a data signal propagated in baseband or as a part of a carrier wave, and computer-readable program code is carried therein. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer-readable signal medium may also be any computer-readable medium other than the computer-readable storage medium.
  • the computer-readable medium may send, propagate or transmit the program for use by or in combination with the instruction execution system, apparatus, or device .
  • the program code contained on the computer-readable medium can be transmitted by any suitable medium, including but not limited to wireless, wire, optical cable, radio frequency, etc., or any suitable combination of the foregoing.
  • the computer program code used to perform the operations of the present invention can be written in one or more programming languages or a combination thereof.
  • the programming languages include object-oriented programming languages-such as Java, Smalltalk, C++, and also conventional procedural programming languages. Programming language-such as "C" language or similar programming language.
  • the program code can be executed entirely on the user's computer, partly on the user's computer, as an independent software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server.
  • the remote computer may be connected to a user computer through any kind of network including a LAN or WAN, or may be connected to an external computer (for example, using an Internet service provider to connect through the Internet).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

L'invention concerne un procédé et un appareil de génération de modèle d'édition, un procédé et un appareil d'édition d'image faciale, ainsi qu'un dispositif et un support. Le procédé de génération de modèle d'édition consiste à : effectuer un apprentissage itératif d'un réseau antagoniste génératif comprenant un générateur et un discriminateur (S110) ; dans un apprentissage itératif, mettre à jour le réseau antagoniste génératif en fonction d'informations de configuration de mise à jour de gradient du discriminateur, les informations de configuration de mise à jour du gradient étant déterminées au moyen d'une condition de contrainte Lipschitz (S120) ; et lorsqu'il est déterminé que le réseau antagoniste génératif remplit une condition de fin d'apprentissage, générer un modèle d'édition d'image en fonction du générateur dans le réseau antagoniste génératif appris (S130).
PCT/CN2021/101007 2020-06-19 2021-06-18 Procédé et appareil de génération de modèle d'édition, procédé et appareil d'édition d'image faciale, dispositif et support WO2021254499A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010568177.7A CN111754596B (zh) 2020-06-19 2020-06-19 编辑模型生成、人脸图像编辑方法、装置、设备及介质
CN202010568177.7 2020-06-19

Publications (1)

Publication Number Publication Date
WO2021254499A1 true WO2021254499A1 (fr) 2021-12-23

Family

ID=72675543

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/101007 WO2021254499A1 (fr) 2020-06-19 2021-06-18 Procédé et appareil de génération de modèle d'édition, procédé et appareil d'édition d'image faciale, dispositif et support

Country Status (2)

Country Link
CN (1) CN111754596B (fr)
WO (1) WO2021254499A1 (fr)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114359667A (zh) * 2021-12-30 2022-04-15 西安交通大学 一种基于生成式对抗网络的强度相干识别方法及设备
CN114359034A (zh) * 2021-12-24 2022-04-15 北京航空航天大学 一种基于手绘的人脸图片生成方法及系统
CN114549287A (zh) * 2022-01-27 2022-05-27 西北大学 一种人脸任意属性编辑模型构建方法及系统
CN114663539A (zh) * 2022-03-09 2022-06-24 东南大学 一种基于音频驱动的口罩下2d人脸还原技术
CN114724214A (zh) * 2022-03-31 2022-07-08 华南理工大学 一种基于面部动作单元的微表情编辑方法及系统
CN116187294A (zh) * 2023-04-24 2023-05-30 开元华创科技(集团)有限公司 信息化检测实验室电子文件快速生成方法及系统
CN116415687A (zh) * 2022-12-29 2023-07-11 江苏东蓝信息技术有限公司 一种基于深度学习的人工智能网络优化训练系统及方法
WO2023143126A1 (fr) * 2022-01-30 2023-08-03 北京字跳网络技术有限公司 Procédé et appareil de traitement d'image, dispositif électronique et support de stockage
CN117853638A (zh) * 2024-03-07 2024-04-09 厦门大学 基于文本驱动的端到端的3d人脸快速生成与编辑方法
WO2024108472A1 (fr) * 2022-11-24 2024-05-30 北京京东方技术开发有限公司 Procédé et appareil d'apprentissage de modèle, procédé de traitement d'image de texte, dispositif et support

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111754596B (zh) * 2020-06-19 2023-09-19 北京灵汐科技有限公司 编辑模型生成、人脸图像编辑方法、装置、设备及介质
CN112232281B (zh) * 2020-11-04 2024-06-11 深圳大学 一种人脸属性编辑方法、装置、智能终端及存储介质
CN112651915B (zh) * 2020-12-25 2023-08-29 百果园技术(新加坡)有限公司 一种人脸图像合成方法、系统、电子设备及存储介质
CN112668529A (zh) * 2020-12-31 2021-04-16 神思电子技术股份有限公司 一种菜品样本图像增强识别方法
CN112819689A (zh) * 2021-02-02 2021-05-18 百果园技术(新加坡)有限公司 人脸属性编辑模型的训练方法、人脸属性编辑方法及设备
CN113158977B (zh) * 2021-05-12 2022-07-29 河南师范大学 改进FANnet生成网络的图像字符编辑方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170316281A1 (en) * 2016-04-28 2017-11-02 Microsoft Technology Licensing, Llc Neural network image classifier
CN107943784A (zh) * 2017-11-02 2018-04-20 南华大学 基于生成对抗网络的关系抽取方法
CN108564119A (zh) * 2018-04-04 2018-09-21 华中科技大学 一种任意姿态行人图片生成方法
CN110197514A (zh) * 2019-06-13 2019-09-03 南京农业大学 一种基于生成式对抗网络的蘑菇表型图像生成方法
CN110689480A (zh) * 2019-09-27 2020-01-14 腾讯科技(深圳)有限公司 一种图像变换方法及装置
CN111754596A (zh) * 2020-06-19 2020-10-09 北京灵汐科技有限公司 编辑模型生成、人脸图像编辑方法、装置、设备及介质

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108537152B (zh) * 2018-03-27 2022-01-25 百度在线网络技术(北京)有限公司 用于检测活体的方法和装置
CN109308450A (zh) * 2018-08-08 2019-02-05 杰创智能科技股份有限公司 一种基于生成对抗网络的脸部变化预测方法
CN110457994B (zh) * 2019-06-26 2024-05-10 平安科技(深圳)有限公司 人脸图像生成方法及装置、存储介质、计算机设备
CN110659582A (zh) * 2019-08-29 2020-01-07 深圳云天励飞技术有限公司 图像转换模型训练方法、异质人脸识别方法、装置及设备
CN110889370B (zh) * 2019-11-26 2023-10-24 上海大学 基于条件生成对抗网络的端对端的侧面人脸合成正脸的系统及方法
CN111275784B (zh) * 2020-01-20 2023-06-13 北京百度网讯科技有限公司 生成图像的方法和装置
CN111275613A (zh) * 2020-02-27 2020-06-12 辽宁工程技术大学 一种引入注意力机制生成对抗网络人脸属性编辑方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170316281A1 (en) * 2016-04-28 2017-11-02 Microsoft Technology Licensing, Llc Neural network image classifier
CN107943784A (zh) * 2017-11-02 2018-04-20 南华大学 基于生成对抗网络的关系抽取方法
CN108564119A (zh) * 2018-04-04 2018-09-21 华中科技大学 一种任意姿态行人图片生成方法
CN110197514A (zh) * 2019-06-13 2019-09-03 南京农业大学 一种基于生成式对抗网络的蘑菇表型图像生成方法
CN110689480A (zh) * 2019-09-27 2020-01-14 腾讯科技(深圳)有限公司 一种图像变换方法及装置
CN111754596A (zh) * 2020-06-19 2020-10-09 北京灵汐科技有限公司 编辑模型生成、人脸图像编辑方法、装置、设备及介质

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114359034A (zh) * 2021-12-24 2022-04-15 北京航空航天大学 一种基于手绘的人脸图片生成方法及系统
CN114359034B (zh) * 2021-12-24 2023-08-08 北京航空航天大学 一种基于手绘的人脸图片生成方法及系统
CN114359667A (zh) * 2021-12-30 2022-04-15 西安交通大学 一种基于生成式对抗网络的强度相干识别方法及设备
CN114359667B (zh) * 2021-12-30 2024-01-30 西安交通大学 一种基于生成式对抗网络的强度相干识别方法及设备
CN114549287A (zh) * 2022-01-27 2022-05-27 西北大学 一种人脸任意属性编辑模型构建方法及系统
CN114549287B (zh) * 2022-01-27 2024-03-01 西北大学 一种人脸任意属性编辑模型构建方法及系统
WO2023143126A1 (fr) * 2022-01-30 2023-08-03 北京字跳网络技术有限公司 Procédé et appareil de traitement d'image, dispositif électronique et support de stockage
CN114663539A (zh) * 2022-03-09 2022-06-24 东南大学 一种基于音频驱动的口罩下2d人脸还原技术
CN114663539B (zh) * 2022-03-09 2023-03-14 东南大学 一种基于音频驱动的口罩下2d人脸还原技术
CN114724214A (zh) * 2022-03-31 2022-07-08 华南理工大学 一种基于面部动作单元的微表情编辑方法及系统
CN114724214B (zh) * 2022-03-31 2024-05-14 华南理工大学 一种基于面部动作单元的微表情编辑方法及系统
WO2024108472A1 (fr) * 2022-11-24 2024-05-30 北京京东方技术开发有限公司 Procédé et appareil d'apprentissage de modèle, procédé de traitement d'image de texte, dispositif et support
CN116415687A (zh) * 2022-12-29 2023-07-11 江苏东蓝信息技术有限公司 一种基于深度学习的人工智能网络优化训练系统及方法
CN116415687B (zh) * 2022-12-29 2023-11-21 江苏东蓝信息技术有限公司 一种基于深度学习的人工智能网络优化训练系统及方法
CN116187294B (zh) * 2023-04-24 2023-07-07 开元华创科技(集团)有限公司 信息化检测实验室电子文件快速生成方法及系统
CN116187294A (zh) * 2023-04-24 2023-05-30 开元华创科技(集团)有限公司 信息化检测实验室电子文件快速生成方法及系统
CN117853638A (zh) * 2024-03-07 2024-04-09 厦门大学 基于文本驱动的端到端的3d人脸快速生成与编辑方法

Also Published As

Publication number Publication date
CN111754596A (zh) 2020-10-09
CN111754596B (zh) 2023-09-19

Similar Documents

Publication Publication Date Title
WO2021254499A1 (fr) Procédé et appareil de génération de modèle d'édition, procédé et appareil d'édition d'image faciale, dispositif et support
US11481869B2 (en) Cross-domain image translation
US20240144566A1 (en) Image classification through label progression
US11508169B2 (en) System and method for synthetic image generation with localized editing
CN110785767B (zh) 紧凑的无语言面部表情嵌入和新颖三元组的训练方案
WO2023082882A1 (fr) Procédé et dispositif de reconnaissance d'action de chute de piéton basés sur une estimation de pose
WO2020216033A1 (fr) Procédé et dispositif de traitement de données pour génération d'images faciales, et support
KR102427484B1 (ko) 이미지 생성 시스템 및 이를 이용한 이미지 생성 방법
CN114240735B (zh) 任意风格迁移方法、系统、存储介质、计算机设备及终端
CN111091010A (zh) 相似度确定、网络训练、查找方法及装置和存储介质
CN115393486B (zh) 虚拟形象的生成方法、装置、设备及存储介质
CN113177572A (zh) 用于从传感器自动学习的方法和计算机可读介质
Liu et al. Learning explicit shape and motion evolution maps for skeleton-based human action recognition
CN112801107A (zh) 一种图像分割方法和电子设备
CN113850714A (zh) 图像风格转换模型的训练、图像风格转换方法及相关装置
CN113902989A (zh) 直播场景检测方法、存储介质及电子设备
CN115019053A (zh) 一种用于点云分类分割的动态图语义特征提取方法
Chen et al. A unified framework for generative data augmentation: A comprehensive survey
WO2023240583A1 (fr) Procédé et appareil de génération de connaissances correspondantes inter-médias
KR20210063171A (ko) 이미지 변환 장치 및 이미지 변환 방법
KR102536808B1 (ko) 멀티뷰 이미지 기반 관심 물체 인식 방법, 서버 및 컴퓨터 프로그램
US20240087265A1 (en) Multidimentional image editing from an input image
WO2023178801A1 (fr) Procédé et appareil de description d'image, dispositif informatique et support de stockage
WO2023226783A1 (fr) Procédé et appareil de traitement de données
Chen et al. Unsupervised Learning: Deep Generative Model

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21825129

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 31/03/2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21825129

Country of ref document: EP

Kind code of ref document: A1