WO2021088556A1 - 图像处理方法、装置、设备及存储介质 - Google Patents

图像处理方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2021088556A1
WO2021088556A1 PCT/CN2020/117455 CN2020117455W WO2021088556A1 WO 2021088556 A1 WO2021088556 A1 WO 2021088556A1 CN 2020117455 W CN2020117455 W CN 2020117455W WO 2021088556 A1 WO2021088556 A1 WO 2021088556A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
encoding
expression
input image
encoding result
Prior art date
Application number
PCT/CN2020/117455
Other languages
English (en)
French (fr)
Inventor
孙天宇
黄浩智
刘威
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2021088556A1 publication Critical patent/WO2021088556A1/zh
Priority to US17/497,883 priority Critical patent/US20220028031A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/94Hardware or software architectures specially adapted for image or video understanding
    • G06V10/95Hardware or software architectures specially adapted for image or video understanding structured as a network, e.g. client-server architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition

Definitions

  • the embodiments of the application relate to the field of computer vision technology in artificial intelligence technology, and in particular to an image processing method, device, device, and storage medium.
  • Facial expression editing also known as facial expression transformation
  • facial expression transformation refers to adjusting the expression in a face image to obtain another image. For example, the expression in the original image is a smile, and the expression in the target image obtained after facial expression editing is crying.
  • the solutions provided by related technologies for realizing facial expression transformation have limited expression transformation capabilities.
  • the embodiments of the present application provide an image processing method, device, equipment, and storage medium, which can generate an output image with a large expression difference from an input image, and improve the expression transformation ability.
  • the technical solution is as follows:
  • an embodiment of the present application provides an image processing method, which is applied to a computer device, and the method includes:
  • the input image is encoded based on the attention mechanism to obtain an encoding tensor set and an attention map set of the input image; wherein the encoding tensor set includes n encoding tensors, and the attention map set includes n attention maps. Try to figure out that the n is an integer greater than 1;
  • an output image is generated; wherein the output image has the identity characteristics of the input image and the expression characteristics of the expression image.
  • an image processing device which includes:
  • the first encoding module is used to encode an input image based on the attention mechanism to obtain an encoding tensor set and an attention map set of the input image; wherein, the encoding tensor set includes n encoding tensors, and The attention map set includes n attention maps, where n is an integer greater than 1;
  • the first encoding module is further configured to obtain the encoding result of the input image according to the encoding tensor set and the attention map set, and the encoding result of the input image records the human face in the input image Identity characteristics;
  • the second encoding module is configured to encode an expression image to obtain an encoding result of the expression image, and the expression feature of the face in the expression image is recorded in the encoding result of the expression image;
  • the image generation module is configured to generate an output image according to the encoding result of the input image and the encoding result of the expression image; wherein the output image has the identity characteristics of the input image and the expression characteristics of the expression image.
  • an embodiment of the present application provides a computer device, the computer device includes a processor and a memory, and at least one instruction, at least a program, code set, or instruction set is stored in the memory, and the at least one instruction , The at least one program, the code set or the instruction set is loaded and executed by the processor to implement the above-mentioned image processing method.
  • an embodiment of the present application provides a computer-readable storage medium that stores at least one instruction, at least one program, code set, or instruction set, the at least one instruction, the at least one program ,
  • the code set or instruction set is loaded and executed by the processor to implement the above-mentioned image processing method.
  • the embodiments of the present application provide a computer program product, which when the computer program product runs on a computer device, causes the computer device to execute the above-mentioned image processing method.
  • the encoding tensor set and the attention map set of the input image are obtained, and then the encoding result of the input image is obtained according to the encoding tensor set and the attention map set, and the expression image is encoded
  • the encoding result of the emoticon image is obtained. Since the encoding result of the input image records the identity characteristics of the face in the input image, the encoding result of the emoticon image records the expression features of the face in the emoticon image.
  • the encoding result of the input image And the encoding result of the expression image to generate an output image, which can make the output image have the identity characteristics of the input image and the expression characteristics of the expression image, so that the expression characteristics of the output image are determined by the expression image instead of the input image, so that it can be generated
  • the output image that has a large expression difference with the input image improves the expression transformation ability.
  • Fig. 1 is a flowchart of an image processing method provided by an embodiment of the present application
  • Fig. 2 is a schematic diagram of a facial expression editing model provided by an embodiment of the present application.
  • Fig. 3 is a schematic diagram of a facial expression editing model provided by another embodiment of the present application.
  • Fig. 4 is a block diagram of an image processing device provided by an embodiment of the present application.
  • Fig. 5 is a block diagram of an image processing device provided by another embodiment of the present application.
  • Fig. 6 is a structural block diagram of a computer device provided by an embodiment of the present application.
  • AI Artificial Intelligence
  • AI is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results.
  • artificial intelligence is a comprehensive technology of computer science, which attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Artificial intelligence technology is a comprehensive discipline, covering a wide range of fields, including both hardware-level technology and software-level technology.
  • Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, and mechatronics.
  • Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
  • Computer Vision is a science that studies how to make machines "see”. Furthermore, it refers to the use of cameras and computers instead of human eyes to identify, track, and measure targets. And further graphics processing, so that computer processing becomes more suitable for human eyes to observe or send to the instrument to detect the image.
  • Computer vision studies related theories and technologies trying to establish an artificial intelligence system that can obtain information from images or multi-dimensional data.
  • Computer vision technology usually includes image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technology, virtual reality, augmented reality, synchronous positioning and mapping Construction and other technologies also include common face recognition, fingerprint recognition and other biometric recognition technologies.
  • ML Machine Learning
  • Machine Learning is the core of artificial intelligence, the fundamental way to make computers intelligent, and its applications cover all fields of artificial intelligence.
  • Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, teaching learning and other technologies.
  • artificial intelligence technology has been researched and applied in many fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, autonomous driving, drones , Robots, intelligent medical care, intelligent customer service, etc., I believe that with the development of technology, artificial intelligence technology will be applied in more fields and play more and more important values.
  • expressions are mainly encoded by means of spatial transformation, and the target image is obtained by spatial transformation of the original image. Because expression features rely on spatial transformation to be encoded into the target image, pixel units that do not appear in the original image cannot be generated. For example, if there are no teeth in the original image, there are no teeth in the target image. Target images with large expression differences have limited expression transformation capabilities.
  • the solution provided by the embodiments of this application involves computer vision technology of artificial intelligence, and provides an image processing method that encodes an input image based on an attention mechanism to obtain an encoding tensor set and an attention map set of the input image, and then according to The encoding tensor set and the attention map set are used to obtain the encoding result of the input image.
  • the expression image is encoded to obtain the encoding result of the expression image. Since the encoding result of the input image records the identity characteristics of the face in the input image, the expression The encoding result of the image records the facial expression characteristics of the facial expression image.
  • the output image By generating the output image based on the encoding result of the input image and the encoding result of the facial expression image, the output image can have the identity characteristics of the input image and the facial expression characteristics of the facial expression image. In this way, the expression characteristics of the output image are determined by the expression image instead of the input image, so that an output image with a large expression difference from the input image can be generated, and the expression transformation ability is improved.
  • the subject of executing each step can be a computer device, which can be any electronic device with processing and storage capabilities, such as mobile phones, tablet computers, game devices, multimedia playback devices, electronic photo frames, Electronic devices such as wearable devices and PC (Personal Computer) can also be servers.
  • a computer device can be any electronic device with processing and storage capabilities, such as mobile phones, tablet computers, game devices, multimedia playback devices, electronic photo frames, Electronic devices such as wearable devices and PC (Personal Computer) can also be servers.
  • PC Personal Computer
  • FIG. 1 shows a flowchart of an image processing method provided by an embodiment of the present application.
  • the method may include the following steps (101-104):
  • Step 101 Perform encoding processing on an input image based on the attention mechanism to obtain an encoding tensor set and an attention map set of the input image.
  • the input image is a human face image, that is, an image containing a human face
  • the input image is subjected to multi-channel encoding processing based on the attention mechanism to obtain the encoding tensor set and the attention map set of the input image.
  • the encoding tensor set includes n encoding tensors
  • the attention map set includes n attention maps
  • n is an integer greater than 1.
  • the value of n mentioned above can be set in advance in combination with actual requirements, for example, the value of n is set to 8.
  • the first encoder can perform encoding processing on the input image to obtain the encoding tensor set and the attention map set of the input image.
  • the first encoder is used to extract the image features of the input image and encode the image features to obtain the above two sets.
  • the first encoder is an imaginative encoder.
  • the first encoder can generate multiple encoding tensors of the input image.
  • the multiple encoding tensors encode multiple possible expressions of the input image, which are obviously better than related technologies in terms of the diversity of pixel units.
  • the imagination module by embedding the attention mechanism, the expression encoded by each subunit can be intuitively understood through visual observation.
  • the attention mechanism is a pixel-based information enhancement mechanism that uses an attention mechanism similar to the human eye in deep learning to achieve specific targets in the feature map.
  • the attention mechanism is a mechanism that can enhance the target information in the feature map. After processing the feature map based on the attention mechanism, the target information in the feature map will be enhanced.
  • the feature map with enhanced attention can realize the enhancement of target-based voxel-level information.
  • the input image is a 256 ⁇ 256 ⁇ 3 image, where 256 ⁇ 256 represents the resolution of the input image, and 3 represents 3 channels of RGB.
  • the input image is encoded by the first encoder
  • the encoding tensor set obtained afterwards includes 8 encoding tensors of 256 ⁇ 256 ⁇ 3
  • the attention map set includes 8 attention maps of 256 ⁇ 256 ⁇ 1
  • the above 8 encoding tensors and 8 attention maps are one One correspondence.
  • the first encoder can adopt a U-Net network structure.
  • the U-Net network is an image segmentation model based on CNN (Convolutional Neural Networks).
  • the U-Net network mainly includes a convolutional layer and a maximum Pooling layer (downsampling), deconvolution layer (upsampling), ReLU (Rectified Linear Unit, linear rectification function) layer, etc.
  • the first encoder may also adopt other network architectures, which is not limited in the embodiment of the present application.
  • Step 102 Obtain an encoding result of the input image according to the encoding tensor set and the attention map set, and the encoding result of the input image records the identity feature of the face in the input image.
  • the identity feature refers to the feature information used to distinguish the faces of different people.
  • the encoding result of the input image in addition to recording the identity features of the face in the input image, the encoding result of the input image also records the appearance features of the face in the input image, so that the final generated output image and the input image Have the same identity characteristics and appearance characteristics.
  • the appearance feature refers to the feature information used to reflect the appearance attributes of the face.
  • the encoding tensor and the attention map are multiplied to obtain n processed encoded tensors; where The encoding result includes the n processed encoded tensors.
  • the encoding tensor set of the input image is E
  • the attention map set is A
  • the number of elements in E and A is equal to n
  • n is an integer greater than 1.
  • e i represents the i-th coded tensor in the coded tensor set E
  • a i represents the i-th attention map in the attention map set.
  • the encoding result of the input image obtained by the above operation includes 8 256 ⁇ 256 ⁇ 3 encoded tensor after processing.
  • Step 103 Perform encoding processing on the emoticon image to obtain an encoding result of the emoticon image.
  • the encoding result of the emoticon image records the facial expression characteristics of the human face in the emoticon image.
  • An expression image is a face image used to provide expression features.
  • the expression image of the expression image is encoded to extract the expression characteristics of the face in the expression image, so that the final generated output image is the same as the input image
  • the identity characteristics and appearance characteristics of the image have the same expression characteristics as the expression image. That is, the expression of the face in the expression image is transformed into the input image, and the identity and appearance characteristics of the face in the input image are maintained to obtain the final output image.
  • the encoding result of the expression image includes a displacement map set, and the displacement map set includes n displacement maps, where the i-th displacement map is used to perform spatial transformation processing on the i-th processed encoded tensor.
  • the expression image is y
  • the expression of the encoding result E T (y) of the expression image y is:
  • O i represents the i-th displacement map in the displacement map set O.
  • the encoding result of the expression image may include 8 displacement maps of 256 ⁇ 256 ⁇ 2. Take the i-th 256 ⁇ 256 ⁇ 2 displacement map as an example, which includes two 256 ⁇ 256 displacement maps. The element value of the pixel at position (x, y) in the above two 256 ⁇ 256 displacement maps is recorded as x'and y'indicate that the pixel at position (x, y) in the i-th processed encoded tensor is moved to (x', y').
  • the expression image can be encoded by the second encoder to obtain the encoding result of the expression image.
  • the second encoder is used to extract the image features of the emoticon image, and encode the image features to obtain a displacement map set as the encoding result of the emoticon image.
  • the network structure of the second encoder and the first encoder may be the same or different, which is not limited in the embodiment of the present application.
  • Step 104 Generate an output image according to the encoding result of the input image and the encoding result of the expression image; wherein the output image has the identity characteristics of the input image and the expression characteristics of the expression image.
  • the identity feature carried in the encoding result of the input image is mixed with the expression feature carried in the encoding result of the expression image, and then the output image is reconstructed, so that the output image has the identity characteristics of the input image and the expression characteristics of the expression image.
  • the displacement map is used to perform spatial transformation processing on the processed encoded tensors to obtain n transformed encoded tensors; for the n transformed encoded tensors The amount is decoded to generate an output image.
  • the final training goal of the second encoder is set to learn one A suitable displacement map, through which the processed encoded tensor is spatially transformed to obtain the transformed encoded tensor, and then the transformed encoded tensor is decoded to generate an output image, so that only the input is recorded in the output image
  • the identity characteristics of the image, and the identity characteristics of the emoticon image will not be recorded.
  • the expression of the encoded tensor set F after transformation is:
  • the transformed encoded tensor set F includes n transformed encoded tensors, and finally the n transformed encoded tensors are decoded by a decoder to generate an output image R:
  • R D R ( ⁇ F i , 1 ⁇ i ⁇ n ⁇ );
  • F i represents the i-th transformed encoded tensor
  • D R represents the decoding process
  • the above-mentioned steps 101 to 104 can be completed by a pre-trained facial expression editing model.
  • the facial expression editing model is based on the input image and the expression image. , To generate the output image.
  • FIG. 2 it exemplarily shows a schematic diagram of a facial expression editing model.
  • the facial expression editing model includes a first encoder 21, a second encoder 22 and a decoder 23.
  • the first encoder 21 is used to encode the input image x based on the attention mechanism to obtain the encoding tensor set and the attention map set of the input image x; and, according to the encoding tensor set and the attention map set, to obtain the input image The encoding result of x.
  • the second encoder 22 is used to encode the expression image y to obtain an encoding result of the expression image y.
  • the decoder 23 is configured to generate an output image R according to the encoding result of the input image x and the encoding result of the expression image y.
  • the input image x is a girl with a smiling expression
  • the expression image y is a boy with a sad expression.
  • the final generated output image R has the identity of the input image x
  • the expression characteristics of the expression image y that is, the output image R is a girl with a sad expression.
  • a video or dynamic picture including the output image may be generated.
  • the expression transformation process described above can be performed on multiple input images, correspondingly generate multiple output images, and then combine the multiple output images into a video or a dynamic picture.
  • the input image is encoded based on the attention mechanism to obtain the encoding tensor set and the attention map set of the input image, and then according to the encoding tensor set and the attention map set , Get the encoding result of the input image, and encode the expression image to obtain the encoding result of the expression image.
  • the encoding result of the input image records the identity characteristics of the face in the input image
  • the encoding result of the expression image records the expression
  • the expression characteristics of the human face in the image by generating the output image according to the encoding result of the input image and the encoding result of the expression image, the output image can have the identity characteristics of the input image and the expression characteristics of the expression image, so that the expression characteristics of the output image It is determined by the expression image instead of the input image, so that an output image with a large expression difference from the input image can be generated, and the expression transformation ability is improved.
  • the facial expression editing model introduced above is a model constructed based on a generative confrontation network.
  • the facial expression editing model 30 includes a generator 31 and a discriminator 32, and the generator 31 includes The first encoder 21, the second encoder 22, and the decoder 23.
  • the training process of the facial expression editing model 30 is as follows:
  • Each training sample is an image pair including an original image and a target image.
  • the original image and the target image are two images of the same face, and the original image and the target image have different expressions;
  • the generator is used to generate an output image corresponding to the original image according to the original image and the target image.
  • the discriminator is used to discriminate whether the output image corresponding to the target image and the original image is the image generated by the generator.
  • the facial expression editing model is constructed based on the generation of confrontation network, and the image pairs included in the training sample are two images of the same facial expression with different expressions, and the generator is used to perform the facial expression transformation processing described above , Generate an output image corresponding to the original image, and the discriminator uses adversarial learning to adjust and optimize the parameters of the generator, so that the output image corresponding to the original image generated by the generator is as close as possible to the target image.
  • LSGAN Least Squares Generative Adversarial Networks, Least Squares Generative Adversarial Networks
  • LSGAN Least Squares Generative Adversarial Networks
  • the loss function L total of the facial expression editing model is:
  • L LSGAN, L P and L O represent three
  • the weights corresponding to the three types of losses, the weights corresponding to the three types of losses can be preset according to actual conditions, which are not limited in the embodiment of the present application.
  • Overlap penalty loss A represents the attention map set, and ⁇ (a) represents the sigmoid function on a.
  • an overlap penalty loss L O is introduced to encourage different encoding tensors to encode different parts of the image.
  • the generator is trained by using a generative confrontation network in a self-supervised mode, without additional annotations, and learning the rules of facial expression transformation from unlabeled data. It helps to reduce the training cost of the model and improve the training efficiency of the model.
  • FIG. 4 shows a block diagram of an image processing apparatus provided by an embodiment of the present application.
  • the device has the function of realizing the above method example, and the function can be realized by hardware, or by hardware executing corresponding software.
  • the device can be a computer device, or it can be set in a computer device.
  • the apparatus 400 may include: a first encoding module 410, a second encoding module 420, and an image generating module 430.
  • the first encoding module 410 is configured to encode an input image based on the attention mechanism to obtain an encoding tensor set and an attention map set of the input image; wherein the encoding tensor set includes n encoding tensors, so The attention map set includes n attention maps, and n is an integer greater than one.
  • the first encoding module 410 is further configured to obtain an encoding result of the input image according to the encoding tensor set and the attention map set, and the encoding result of the input image records the person in the input image The identity characteristics of the face.
  • the second encoding module 420 is configured to perform encoding processing on an expression image to obtain an encoding result of the expression image, and the expression characteristics of the human face in the expression image are recorded in the encoding result of the expression image.
  • the image generation module 430 is configured to generate an output image according to the encoding result of the input image and the encoding result of the expression image; wherein the output image has the identity characteristics of the input image and the expression characteristics of the expression image .
  • the first encoding module 410 is configured to, for each set of corresponding encoding tensors and attention maps in the encoding tensor set and the attention map set, sum the encoding tensor and The attention map is multiplied to obtain n processed encoded tensors.
  • the encoding result of the input image includes the n processed encoded tensors.
  • the encoding result of the expression image includes n displacement maps
  • the image generation module 430 is used to:
  • the apparatus 400 further includes:
  • the model calling module 440 is used to call the facial expression editing model, the facial expression editing model including: a first encoder, a second encoder, and a decoder; wherein,
  • the first encoder is configured to perform encoding processing on the input image based on an attention mechanism to obtain the set of encoding tensors and the set of attention maps of the input image; according to the set of encoding tensors and the attention Trying to gather to obtain the encoding result of the input image;
  • the second encoder is used for encoding the expression image to obtain the encoding result of the expression image
  • the decoder is configured to generate the output image according to the encoding result of the input image and the encoding result of the expression image.
  • the facial expression editing model is a model constructed based on a generative confrontation network
  • the facial expression editing model includes a generator and a discriminator
  • the generator includes the first encoder, the The second encoder and the decoder
  • the training process of the facial expression editing model is as follows:
  • each of the training samples is an image pair including an original image and a target image
  • the original image and the target image are two images of the same face
  • the original image and the target image The target image has different expressions; wherein, the generator is used to generate an output image corresponding to the original image according to the original image and the target image, and the discriminator is used to determine that the target image corresponds to the original image Whether the output image of is the image generated by the generator;
  • the training sample is used to train the facial expression editing model.
  • the loss function L total of the facial expression editing model is:
  • L LSGAN, L P and L O represents a minimum square part table generation network against loss, damage and overlap sensing penalty loss, ⁇ LSGAN, ⁇ P and ⁇ O represents the weight of three kinds respectively corresponding to the loss of weight of the Overlap penalty loss
  • A represents the attention map set, and ⁇ (a) represents the sigmoid function on a.
  • the apparatus 400 further includes:
  • the image processing module 450 is configured to generate a video or dynamic picture including the output image.
  • the input image is encoded based on the attention mechanism to obtain the encoding tensor set and the attention map set of the input image, and then according to the encoding tensor set and the attention map set , Get the encoding result of the input image, and encode the expression image to obtain the encoding result of the expression image.
  • the encoding result of the input image records the identity characteristics of the face in the input image
  • the encoding result of the expression image records the expression
  • the expression characteristics of the human face in the image by generating the output image according to the encoding result of the input image and the encoding result of the expression image, the output image can have the identity characteristics of the input image and the expression characteristics of the expression image, so that the expression characteristics of the output image It is determined by the expression image instead of the input image, so that an output image with a large expression difference from the input image can be generated, and the expression transformation ability is improved.
  • FIG. 6 shows a schematic structural diagram of a computer device provided by an embodiment of the present application. Specifically:
  • the computer device 600 includes a CPU (Central Processing Unit) 601, a system memory 604 including RAM (Random Access Memory) 602 and ROM (Read Only Memory) 603, and a connection The system memory 604 and the system bus 605 of the central processing unit 601.
  • the computer device 600 also includes a basic I/O (Input/Output input/output) system 606 that helps to transfer information between various devices in the computer, and a basic I/O (Input/Output input/output) system 606 for storing the operating system 613, application programs 614, and other program modules 615.
  • the basic input/output system 606 includes a display 608 for displaying information and an input device 609 such as a mouse and a keyboard for the user to input information.
  • the display 608 and the input device 609 are both connected to the central processing unit 601 through the input and output controller 610 connected to the system bus 605.
  • the basic input/output system 606 may also include an input and output controller 610 for receiving and processing input from multiple other devices such as a keyboard, a mouse, or an electronic stylus. Similarly, the input and output controller 610 also provides output to a display screen, a printer, or other types of output devices.
  • the mass storage device 607 is connected to the central processing unit 601 through a mass storage controller (not shown) connected to the system bus 605.
  • the mass storage device 607 and its associated computer readable medium provide non-volatile storage for the computer device 600. That is, the mass storage device 607 may include a computer-readable medium (not shown) such as a hard disk or a CD-ROM (Compact Disc Read-Only Memory) drive.
  • the computer-readable media may include computer storage media and communication media.
  • Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storing information such as computer readable instructions, data structures, program modules or other data.
  • Computer storage media include RAM (Random Access Memory), ROM (Read-Only Memory), EPROM (Erasable Programmable Read Only Memory), flash memory or other solid state Storage its technology, CD-ROM (Compact Disc Read-Only Memory) or other optical storage, tape cartridges, magnetic tape, disk storage or other magnetic storage devices.
  • RAM Random Access Memory
  • ROM Read-Only Memory
  • EPROM Erasable Programmable Read Only Memory
  • flash memory or other solid state Storage its technology
  • CD-ROM Compact Disc Read-Only Memory
  • tape cartridges tape cartridges
  • magnetic tape disk storage or other magnetic storage devices.
  • the aforementioned system memory 604 and mass storage device 607 may be collectively referred to as a memory.
  • the computer device 600 may also be connected to a remote computer on the network through a network such as the Internet to run. That is, the computer device 600 can be connected to the network 612 through the network interface unit 611 connected to the system bus 605, or in other words, the network interface unit 611 can also be used to connect to other types of networks or remote computer systems (not shown) ).
  • the memory also includes at least one instruction, at least one program, code set, or instruction set.
  • the at least one instruction, at least one program, code set, or instruction set is stored in the memory and configured to be used by one or more processors. Execute to realize the above-mentioned image processing method.
  • a computer-readable storage medium stores at least one instruction, at least one program, a code set, or an instruction set, the at least one instruction, the at least one program ,
  • the code set or the instruction set is executed by the processor of the terminal to realize the above-mentioned image processing method.
  • the computer-readable storage medium may include: ROM, RAM, SSD (Solid State Drives, solid state drive), or optical disc, etc.
  • random access memory may include ReRAM (Resistance Random Access Memory) and DRAM (Dynamic Random Access Memory).
  • a computer program product is also provided, which is used to implement the above-mentioned image processing method when the computer program product is executed by a processor of a terminal.
  • the "plurality” mentioned herein refers to two or more.
  • “And/or” describes the association relationship of the associated objects, indicating that there can be three types of relationships, for example, A and/or B, which can mean: A alone exists, A and B exist at the same time, and B exists alone.
  • the character "/” generally indicates that the associated objects before and after are in an "or” relationship.
  • the numbering of the steps described in this article only exemplarily shows a possible order of execution among the steps. In some other embodiments, the above steps may also be executed out of the order of numbers, such as two differently numbered ones. The steps are executed at the same time, or the two steps with different numbers are executed in the reverse order to that shown in the figure, which is not limited in the embodiment of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Image Processing (AREA)

Abstract

本申请公开了一种图像处理方法、装置、设备及存储介质,涉及人工智能的计算机视觉技术领域。所述方法包括:基于注意力机制对输入图像进行编码处理,得到输入图像的编码张量集合和注意力图集合;根据编码张量集合和注意力图集合,得到输入图像的编码结果,输入图像的编码结果中记录有输入图像中人脸的身份特征;对表情图像进行编码处理,得到表情图像的编码结果,表情图像的编码结果中记录有表情图像中人脸的表情特征;根据输入图像的编码结果和表情图像的编码结果,生成输出图像;其中,输出图像具有输入图像的身份特征以及表情图像的表情特征。本申请能够生成与输入图像具有较大表情差异的输出图像,提升了表情变换能力。

Description

图像处理方法、装置、设备及存储介质
本申请要求于2019年11月5日提交的申请号为201911072470.8、发明名称为“图像处理方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及人工智能技术中的计算机视觉技术领域,特别涉及一种图像处理方法、装置、设备及存储介质。
背景技术
人脸表情编辑(也称为人脸表情变换)是指将一张人脸图像中的表情进行调整,得到另一张图像。例如,原始图像中的表情为微笑,经过人脸表情编辑后得到的目标图像中的表情为哭泣。然而,相关技术提供的用于实现人脸表情变换的方案,表情变换能力有限。
发明内容
本申请实施例提供了一种图像处理方法、装置、设备及存储介质,能够生成与输入图像具有较大表情差异的输出图像,提升了表情变换能力。所述技术方案如下:
一方面,本申请实施例提供了一种图像处理方法,应用于计算机设备中,所述方法包括:
基于注意力机制对输入图像进行编码处理,得到所述输入图像的编码张量集合和注意力图集合;其中,所述编码张量集合包括n个编码张量,所述注意力图集合包括n个注意力图,所述n为大于1的整数;
根据所述编码张量集合和所述注意力图集合,得到所述输入图像的编码结果,所述输入图像的编码结果中记录有所述输入图像中人脸的身份特征;
对表情图像进行编码处理,得到所述表情图像的编码结果,所述表情图像的编码结果中记录有所述表情图像中人脸的表情特征;
根据所述输入图像的编码结果和所述表情图像的编码结果,生成输出图像;其中,所述输出图像具有所述输入图像的身份特征以及所述表情图像的表情特征。
另一方面,本申请实施例提供了一种图像处理装置,所述装置包括:
第一编码模块,用于基于注意力机制对输入图像进行编码处理,得到所述输入图像的编码张量集合和注意力图集合;其中,所述编码张量集合包括n个编码张量,所述注意力图集合包括n个注意力图,所述n为大于1的整数;
所述第一编码模块,还用于根据所述编码张量集合和所述注意力图集合,得到所述输入图像的编码结果,所述输入图像的编码结果中记录有所述输入图像中人脸的身份特征;
第二编码模块,用于对表情图像进行编码处理,得到所述表情图像的编码结果,所述表情图像的编码结果中记录有所述表情图像中人脸的表情特征;
图像生成模块,用于根据所述输入图像的编码结果和所述表情图像的编码结果,生成输出图像;其中,所述输出图像具有所述输入图像的身份特征以及所述表情图像的表情特征。
再一方面,本申请实施例提供了一种计算机设备,所述计算机设备包括处理器和存储器,所述存储器中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现上述图像处理方法。
又一方面,本申请实施例提供了一种计算机可读存储介质,所述存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由处理器加载并执行以实现上述图像处理方法。
还一方面,本申请实施例提供了一种计算机程序产品,当所述计算机程序产品在计算机设备上运行时,使得计算机设备执行上述图像处理方法。
本申请实施例提供的技术方案可以带来如下有益效果:
通过基于注意力机制对输入图像进行编码处理,得到输入图像的编码张量集合和注意力图集合,然后根据编码张量集合和注意力图集合,得到输入图像的编码结果,另外对表情图像进行编码处理得到该表情图像的编码结果,由于输入图像的编码结果中记录有输入图像中人脸的身份特征,表情图像的编码结 果中记录有表情图像中人脸的表情特征,通过根据输入图像的编码结果和表情图像的编码结果,生成输出图像,能够使得输出图像具有输入图像的身份特征以及表情图像的表情特征,这样使得输出图像的表情特征由表情图像决定,而非由输入图像决定,从而能够生成与输入图像具有较大表情差异的输出图像,提升了表情变换能力。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请一个实施例提供的图像处理方法的流程图;
图2是本申请一个实施例提供的人脸表情编辑模型的示意图;
图3是本申请另一个实施例提供的人脸表情编辑模型的示意图;
图4是本申请一个实施例提供的图像处理装置的框图;
图5是本申请另一个实施例提供的图像处理装置的框图;
图6是本申请一个实施例提供的计算机设备的结构框图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
AI(Artificial Intelligence,人工智能)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个综合技术,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。
人工智能技术是一门综合学科,涉及领域广泛,既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。 人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。
计算机视觉技术(Computer Vision,CV)计算机视觉是一门研究如何使机器“看”的科学,更进一步的说,就是指用摄影机和电脑代替人眼对目标进行识别、跟踪和测量等机器视觉,并进一步做图形处理,使电脑处理成为更适合人眼观察或传送给仪器检测的图像。作为一个科学学科,计算机视觉研究相关的理论和技术,试图建立能够从图像或者多维数据中获取信息的人工智能系统。计算机视觉技术通常包括图像处理、图像识别、图像语义理解、图像检索、OCR、视频处理、视频语义理解、视频内容/行为识别、三维物体重建、3D技术、虚拟现实、增强现实、同步定位与地图构建等技术,还包括常见的人脸识别、指纹识别等生物特征识别技术。
ML(Machine Learning,机器学习)是一门多领域交叉学科,涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。专门研究计算机怎样模拟或实现人类的学习行为,以获取新的知识或技能,重新组织已有的知识结构使之不断改善自身的性能。机器学习是人工智能的核心,是使计算机具有智能的根本途径,其应用遍及人工智能的各个领域。机器学习和深度学习通常包括人工神经网络、置信网络、强化学习、迁移学习、归纳学习、示教学习等技术。
随着人工智能技术研究和进步,人工智能技术在多个领域展开研究和应用,例如常见的智能家居、智能穿戴设备、虚拟助理、智能音箱、智能营销、无人驾驶、自动驾驶、无人机、机器人、智能医疗、智能客服等,相信随着技术的发展,人工智能技术将在更多的领域得到应用,并发挥越来越重要的价值。
在相关技术中,主要通过空间变换的方式将表情编码,通过对原始图像进行空间变换来得到目标图像。由于表情特征依赖于空间变换来编码进目标图像中,因此无法生成原始图像中没有出现的像素单元,例如,原始图像中没有牙齿则目标图像中也没有牙齿,这就导致无法生成与原始图像具有较大表情差异的目标图像,表情变换能力有限。
本申请实施例提供的方案涉及人工智能的计算机视觉技术,提供了一种图像处理方法,通过基于注意力机制对输入图像进行编码处理,得到输入图像的编码张量集合和注意力图集合,然后根据编码张量集合和注意力图集合,得到输入图像的编码结果,另外对表情图像进行编码处理得到该表情图像的编码结 果,由于输入图像的编码结果中记录有输入图像中人脸的身份特征,表情图像的编码结果中记录有表情图像中人脸的表情特征,通过根据输入图像的编码结果和表情图像的编码结果,生成输出图像,能够使得输出图像具有输入图像的身份特征以及表情图像的表情特征,这样使得输出图像的表情特征由表情图像决定,而非由输入图像决定,从而能够生成与输入图像具有较大表情差异的输出图像,提升了表情变换能力。
本申请实施例提供的方法,各步骤的执行主体可以是计算机设备,该计算机设备可以是任何具备处理和存储能力的电子设备,如手机、平板电脑、游戏设备、多媒体播放设备、电子相框、可穿戴设备、PC(Personal Computer)等电子设备,也可以是服务器等。为了便于说明,在下述方法实施例中,仅以各步骤的执行主体为计算机设备进行介绍说明,但对此不构成限定。
请参考图1,其示出了本申请一个实施例提供的图像处理方法的流程图,该方法可以包括如下几个步骤(101~104):
步骤101,基于注意力机制对输入图像进行编码处理,得到输入图像的编码张量集合和注意力图集合。
在本申请实施例中,输入图像是一个人脸图像,也即包含有人脸的图像,基于注意力机制对输入图像进行多通道编码处理,得到输入图像的编码张量集合和注意力图集合。其中,编码张量集合包括n个编码张量,注意力图集合包括n个注意力图,n为大于1的整数。上述n的值可以结合实际需求预先进行设定,例如设定n的值为8。
通过第一编码器可以对输入图像进行编码处理,得到该输入图像的编码张量集合和注意力图集合。该第一编码器用于提取输入图像的图像特征,并对该图像特征进行编码,得到上述两个集合。在本申请实施例中,第一编码器是一个具有想象力的编码器,通过在该第一编码器中嵌入一个想象模块,使得第一编码器能够生成出输入图像的多个编码张量,该多个编码张量中编码了输入图像的多种可能表情,在像素单元的多样性上明显比相关技术更胜一筹。同时,在想象模块中,通过嵌入注意力机制使得每个子单元所编码的表情能够通过肉眼观察得到直观的理解。
注意力机制是深度学习中利用与人眼类似的注意力机制,实现对特征图谱中的特定目标的基于像素的信息增强机制。注意力机制是一种可以对特征图谱 中的目标信息进行增强的机制。基于注意力机制对特征图谱进行处理后,特征图谱中的目标信息将被增强。注意力增强后的特征图谱能够实现基于目标的体素级信息的增强。
假设输入图像是一个256×256×3的图像,其中256×256表示输入图像的分辨率,3表示RGB共3个通道,则当n=8时,经第一编码器对该输入图像进行编码后得到的编码张量集合中包括8个256×256×3的编码张量,注意力图集合中包括8个256×256×1的注意力图,且上述8个编码张量和8个注意力图一一对应。
可选地,第一编码器可以采用U-Net网络结构,U-Net网络是一个基于CNN(Convolutional Neural Networks,卷积神经网络)的图像分割模型,U-Net网络主要包括卷积层、最大池化层(下采样)、反卷积层(上采样)以及ReLU(Rectified Linear Unit,线性整流函数)层等。当然,在一些其它实施例中,第一编码器还可以采用其它网络架构,本申请实施例对此不作限定。
步骤102,根据编码张量集合和注意力图集合,得到输入图像的编码结果,输入图像的编码结果中记录有输入图像中人脸的身份特征。
身份特征是指用于区别不同人的人脸的特征信息。在本申请实施例中,输入图像的编码结果中除了记录有输入图像中人脸的身份特征之外,还记录有该输入图像中人脸的外貌特征,以使得最终生成的输出图像与输入图像具有相同的身份特征和外貌特征。其中,外貌特征是指用于反映人脸的外在样貌属性的特征信息。
可选地,对于编码张量集合和注意力图集合中每一组对应的编码张量和注意力图,将编码张量和注意力图相乘,得到n个处理后编码张量;其中,输入图像的编码结果包括该n个处理后编码张量。
假设输入图像为x,该输入图像的编码张量集合为E,注意力图集合为A,E和A中元素数量相等均为n,且n为大于1的整数。该输入图像x的编码结果E S(x)的表达式为:
Figure PCTCN2020117455-appb-000001
其中,e i表示编码张量集合E中的第i个编码张量,a i表示注意力图集合中的第i个注意力图。
假设编码张量集合中包括8个256×256×3的编码张量,注意力图集合中包括8个256×256×1的注意力图,则经过上述运算得到的输入图像的编码结果包 括8个256×256×3的处理后编码张量。
步骤103,对表情图像进行编码处理,得到表情图像的编码结果,表情图像的编码结果中记录有表情图像中人脸的表情特征。
表情图像是一个用于提供表情特征的人脸图像,在本申请实施例中,通过对表情图像进行编码,提取表情图像中人脸的表情特征,从而使得最终生成的输出图像与输入图像具有相同的身份特征和外貌特征,并与表情图像具有相同的表情特征。也即,将表情图像中人脸的表情变换到输入图像中,并保持输入图像中人脸的身份特征和外貌特征,得到最终的输出图像。
表情图像的编码结果包括一个位移图集合,该位移图集合中包括n个位移图,其中,第i个位移图用于对第i个处理后编码张量进行空间变换处理。假设表情图像为y,该表情图像y的编码结果E T(y)的表达式为:
E T(y)={O i,1≤i≤n};
其中,O i表示位移图集合O中的第i个位移图。
示例性地,表情图像的编码结果可以包括8个256×256×2的位移图。以第i个256×256×2的位移图为例,其包括2张256×256的位移图,位置(x,y)的像素在上述2张256×256的位移图中的元素值记为x′和y′,则表示第i个处理后编码张量中位置(x,y)的像素移动至(x′,y′)处。
通过第二编码器可以对表情图像进行编码处理,得到该表情图像的编码结果。该第二编码器用于提取表情图像的图像特征,并对该图像特征进行编码,得到一个位移图集合作为表情图像的编码结果。
另外,第二编码器和第一编码器的网络结构可以相同,也可以不同,本申请实施例对此不作限定。
步骤104,根据输入图像的编码结果和表情图像的编码结果,生成输出图像;其中,输出图像具有输入图像的身份特征以及表情图像的表情特征。
通过将输入图像的编码结果中携带的身份特征与表情图像的编码结果中携带的表情特征进行混合,然后重构出输出图像,使得输出图像具有输入图像的身份特征以及表情图像的表情特征。
可选地,对于每一组对应的处理后编码张量和位移图,采用位移图对处理后编码张量做空间变换处理,得到n个变换后编码张量;对该n个变换后编码张量进行解码处理,生成输出图像。考虑到直接将表情图像和处理后编码向量拼接可能导致表情图像的身份特征逸散到最终的输出图像中,在本申请实施例 中,通过将第二编码器的最终训练目标设定为学习一个合适的位移图,通过该位移图来对处理后编码张量做空间变换处理得到变换后编码张量,然后再对变换后编码张量进行解码处理生成输出图像,使得输出图像中仅记录有输入图像的身份特征,而不会记录有表情图像的身份特征。
可选地,变换后编码张量集合F的表达式为:
F=ST(E S(x),O);
其中,ST表示空间变换处理。经过空间变换处理后,变换后编码张量集合F中包括n个变换后编码张量,最后通过解码器对该n个变换后编码张量进行解码处理,生成输出图像R:
R=D R({F i,1≤i≤n});
其中,F i表示第i个变换后编码张量,D R表示解码处理。
在示例性实施例中,上述步骤101~104的步骤流程可以由预先训练好的人脸表情编辑模型来完成,通过调用人脸表情编辑模型,由该人脸表情编辑模型根据输入图像和表情图像,生成输出图像。如图2所示,其示例性示出了一个人脸表情编辑模型的示意图,该人脸表情编辑模型包括:第一编码器21、第二编码器22和解码器23。其中,第一编码器21用于基于注意力机制对输入图像x进行编码处理,得到输入图像x的编码张量集合和注意力图集合;以及,根据编码张量集合和注意力图集合,得到输入图像x的编码结果。第二编码器22用于对表情图像y进行编码处理,得到表情图像y的编码结果。解码器23用于根据输入图像x的编码结果和表情图像y的编码结果,生成输出图像R。例如,如图2所示,输入图像x是一个带有微笑表情的女孩,表情图像y是一个带有悲伤表情的男孩,经过上述表情变换处理,最终生成的输出图像R具有输入图像x的身份特征以及表情图像y的表情特征,也即输出图像R是一个带有悲伤表情的女孩。
可选地,在对输入图像进行表情变换生成输出图像之后,可以生成包括该输出图像的视频或动态图片。例如,可以对多个输入图像进行上文介绍的表情变换处理,相应生成多个输出图像,然后将该多个输出图像组合成视频或动态图片。
综上所述,本申请实施例提供的技术方案中,通过基于注意力机制对输入图像进行编码处理,得到输入图像的编码张量集合和注意力图集合,然后根据编码张量集合和注意力图集合,得到输入图像的编码结果,另外对表情图像进 行编码处理得到该表情图像的编码结果,由于输入图像的编码结果中记录有输入图像中人脸的身份特征,表情图像的编码结果中记录有表情图像中人脸的表情特征,通过根据输入图像的编码结果和表情图像的编码结果,生成输出图像,能够使得输出图像具有输入图像的身份特征以及表情图像的表情特征,这样使得输出图像的表情特征由表情图像决定,而非由输入图像决定,从而能够生成与输入图像具有较大表情差异的输出图像,提升了表情变换能力。
另外,通过对输入图像进行多通道编码处理,从而生成出输入图像的多个编码张量,该多个编码张量中编码了输入图像的多种可能表情,在像素单元的多样性上明显比相关技术更胜一筹。同时,通过嵌入注意力机制使得每个子单元所编码的表情能够通过肉眼观察得到直观的理解。
在示例性实施例中,上文介绍的人脸表情编辑模型是基于生成对抗网络构建的模型,如图3所示,人脸表情编辑模型30包括生成器31和判别器32,生成器31包括第一编码器21、第二编码器22和解码器23。
该人脸表情编辑模型30的训练过程如下:
1、获取至少一个训练样本,每个训练样本是一个包括原始图像和目标图像的图像对,原始图像和目标图像是同一人脸的两张图像,且原始图像和目标图像具有不同的表情;
其中,生成器用于根据原始图像和目标图像,生成该原始图像对应的输出图像。判别器用于判别目标图像和原始图像对应的输出图像是否为生成器生成的图像。
2、采用训练样本对人脸表情编辑模型进行训练。
在本申请实施例中,基于生成对抗网络构建人脸表情编辑模型,且训练样本中包含的图像对是同一人脸不同表情的两张图像,生成器用于执行上文介绍的人脸表情变换处理,生成原始图像对应的输出图像,判别器利用对抗学习来调整优化生成器的参数,使得生成器生成的原始图像对应的输出图像,尽可能地接近目标图像。可选地,上述生成对抗网络可以选用LSGAN(Least Squares Generative Adversarial Networks,最小二乘生成对抗网络)。
可选地,人脸表情编辑模型的损失函数L total为:
Figure PCTCN2020117455-appb-000002
其中,
Figure PCTCN2020117455-appb-000003
表示一阶距离损失,也即像素维度的曼哈顿距离,L LSGAN、L P和 L O分表表示最小平方生成对抗网络损失、感知损失和重叠惩罚损失,λ LSGAN、λ P和λ O分别表示三种损失对应的权重,该三种损失对应的权重可以根据实际情况预先设定,本申请实施例对此不作限定。重叠惩罚损失
Figure PCTCN2020117455-appb-000004
A表示注意力图集合,σ(a)表示关于a的sigmoid函数。在本申请实施例中,为了使信道宽度得到充分利用,引入了一个重叠惩罚损失L O来鼓励不同的编码张量来编码图像中不同的部分。
综上所述,本申请实施例提供的技术方案中,采用自监督模式下的生成对抗网络来训练生成器,不需要额外的标注,实现从无标注数据中学习到人脸表情变换的规则,有助于降低模型的训练成本,提升模型的训练效率。
下述为本申请装置实施例,可以用于执行本申请方法实施例。对于本申请装置实施例中未披露的细节,请参照本申请方法实施例。
请参考图4,其示出了本申请一个实施例提供的图像处理装置的框图。该装置具有实现上述方法示例的功能,所述功能可以由硬件实现,也可以由硬件执行相应的软件实现。该装置可以是计算机设备,也可以设置在计算机设备中。该装置400可以包括:第一编码模块410、第二编码模块420和图像生成模块430。
第一编码模块410,用于基于注意力机制对输入图像进行编码处理,得到所述输入图像的编码张量集合和注意力图集合;其中,所述编码张量集合包括n个编码张量,所述注意力图集合包括n个注意力图,所述n为大于1的整数。
所述第一编码模块410,还用于根据所述编码张量集合和所述注意力图集合,得到所述输入图像的编码结果,所述输入图像的编码结果中记录有所述输入图像中人脸的身份特征。
第二编码模块420,用于对表情图像进行编码处理,得到所述表情图像的编码结果,所述表情图像的编码结果中记录有所述表情图像中人脸的表情特征。
图像生成模块430,用于根据所述输入图像的编码结果和所述表情图像的编码结果,生成输出图像;其中,所述输出图像具有所述输入图像的身份特征以及所述表情图像的表情特征。
在示例性实施例中,所述第一编码模块410,用于对于所述编码张量集合和所述注意力图集合中每一组对应的编码张量和注意力图,将所述编码张量和所述注意力图相乘,得到n个处理后编码张量。其中,所述输入图像的编码结果包括所述n个处理后编码张量。
在示例性实施例中,所述表情图像的编码结果包括n个位移图;
所述图像生成模块430,用于:
对于每一组对应的处理后编码张量和位移图,采用所述位移图对所述处理后编码张量做空间变换处理,得到n个变换后编码张量;
对所述n个变换后编码张量进行解码处理,生成所述输出图像。
在示例性实施例中,如图5所示,所述装置400还包括:
模型调用模块440,用于调用人脸表情编辑模型,所述人脸表情编辑模型包括:第一编码器、第二编码器和解码器;其中,
所述第一编码器用于基于注意力机制对所述输入图像进行编码处理,得到所述输入图像的所述编码张量集合和所述注意力图集合;根据所述编码张量集合和所述注意力图集合,得到所述输入图像的编码结果;
所述第二编码器用于对表情图像进行编码处理,得到所述表情图像的编码结果;
所述解码器用于根据所述输入图像的编码结果和所述表情图像的编码结果,生成所述输出图像。
在示例性实施例中,所述人脸表情编辑模型是基于生成对抗网络构建的模型,所述人脸表情编辑模型包括生成器和判别器,所述生成器包括所述第一编码器、所述第二编码器和所述解码器;
所述人脸表情编辑模型的训练过程如下:
获取至少一个训练样本,每个所述训练样本是一个包括原始图像和目标图像的图像对,所述原始图像和所述目标图像是同一人脸的两张图像,且所述原始图像和所述目标图像具有不同的表情;其中,所述生成器用于根据所述原始图像和所述目标图像生成所述原始图像对应的输出图像,所述判别器用于判别所述目标图像和所述原始图像对应的输出图像是否为所述生成器生成的图像;
采用所述训练样本对所述人脸表情编辑模型进行训练。
在示例性实施例中,所述人脸表情编辑模型的损失函数L total为:
Figure PCTCN2020117455-appb-000005
其中,
Figure PCTCN2020117455-appb-000006
表示一阶距离损失,L LSGAN、L P和L O分表表示最小平方生成对抗网络损失、感知损失和重叠惩罚损失,λ LSGAN、λ P和λ O分别表示三种损失对应的权重,所述重叠惩罚损失
Figure PCTCN2020117455-appb-000007
A表示注意力图集合,σ(a)表示关于a的sigmoid函数。
在示例性实施例中,如图5所示,所述装置400还包括:
图像处理模块450,用于生成包括所述输出图像的视频或动态图片。
综上所述,本申请实施例提供的技术方案中,通过基于注意力机制对输入图像进行编码处理,得到输入图像的编码张量集合和注意力图集合,然后根据编码张量集合和注意力图集合,得到输入图像的编码结果,另外对表情图像进行编码处理得到该表情图像的编码结果,由于输入图像的编码结果中记录有输入图像中人脸的身份特征,表情图像的编码结果中记录有表情图像中人脸的表情特征,通过根据输入图像的编码结果和表情图像的编码结果,生成输出图像,能够使得输出图像具有输入图像的身份特征以及表情图像的表情特征,这样使得输出图像的表情特征由表情图像决定,而非由输入图像决定,从而能够生成与输入图像具有较大表情差异的输出图像,提升了表情变换能力。
需要说明的是,上述实施例提供的装置,在实现其功能时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的装置与方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
请参考图6,其示出了本申请一个实施例提供的计算机设备的结构示意图。具体来讲:
所述计算机设备600包括CPU(Central Processing Unit,中央处理单元)601、包括RAM(Random Access Memory,随机存取存储器)602和ROM(Read Only Memory,只读存储器)603的系统存储器604,以及连接系统存储器604和中央处理单元601的系统总线605。所述计算机设备600还包括帮助计算机内的各个器件之间传输信息的基本I/O(Input/Output输入/输出)系统606,和用于存储操作系统613、应用程序614和其他程序模块615的大容量存储设备607。
所述基本输入/输出系统606包括有用于显示信息的显示器608和用于用户输入信息的诸如鼠标、键盘之类的输入设备609。其中所述显示器608和输入设备609都通过连接到系统总线605的输入输出控制器610连接到中央处理单元601。所述基本输入/输出系统606还可以包括输入输出控制器610以用于接收和处理来自键盘、鼠标、或电子触控笔等多个其他设备的输入。类似地,输入输出控制器610还提供输出到显示屏、打印机或其他类型的输出设备。
所述大容量存储设备607通过连接到系统总线605的大容量存储控制器(未示出)连接到中央处理单元601。所述大容量存储设备607及其相关联的计算机可读介质为计算机设备600提供非易失性存储。也就是说,所述大容量存储设备607可以包括诸如硬盘或者CD-ROM(Compact Disc Read-Only Memory,只读光盘)驱动器之类的计算机可读介质(未示出)。
不失一般性,所述计算机可读介质可以包括计算机存储介质和通信介质。计算机存储介质包括以用于存储诸如计算机可读指令、数据结构、程序模块或其他数据等信息的任何方法或技术实现的易失性和非易失性、可移动和不可移动介质。计算机存储介质包括RAM(Random Access Memory,随机存取存储器)、ROM(Read-Only Memory,只读存储器)、EPROM(Erasable Programmable Read Only Memory,可擦除可编程只读存储器)、闪存或其他固态存储其技术,CD-ROM(Compact Disc Read-Only Memory,只读光盘)或其他光学存储、磁带盒、磁带、磁盘存储或其他磁性存储设备。当然,本领域技术人员可知所述计算机存储介质不局限于上述几种。上述的系统存储器604和大容量存储设备607可以统称为存储器。
根据本申请的各种实施例,所述计算机设备600还可以通过诸如因特网等网络连接到网络上的远程计算机运行。也即计算机设备600可以通过连接在所述系统总线605上的网络接口单元611连接到网络612,或者说,也可以使用网络接口单元611来连接到其他类型的网络或远程计算机系统(未示出)。
所述存储器还包括至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、至少一段程序、代码集或指令集存储于存储器中,且经配置以由一个或者一个以上处理器执行,以实现上述图像处理方法。
在示例性实施例中,还提供了一种计算机可读存储介质,所述存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或所述指令集在被终端的处理器执行时以实现上述图像处理方法。
可选地,该计算机可读存储介质可以包括:ROM、RAM、SSD(Solid State Drives,固态硬盘)或光盘等。其中,随机存取记忆体可以包括ReRAM(Resistance Random Access Memory,电阻式随机存取记忆体)和DRAM(Dynamic Random Access Memory,动态随机存取存储器)。
在示例性实施例中,还提供一种计算机程序产品,所述计算机程序产品被终端的处理器执行时,用于实现上述图像处理方法。
应当理解的是,在本文中提及的“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。另外,本文中描述的步骤编号,仅示例性示出了步骤间的一种可能的执行先后顺序,在一些其它实施例中,上述步骤也可以不按照编号顺序来执行,如两个不同编号的步骤同时执行,或者两个不同编号的步骤按照与图示相反的顺序执行,本申请实施例对此不作限定。
以上所述仅为本申请的示例性实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (15)

  1. 一种图像处理方法,应用于计算机设备中,所述方法包括:
    基于注意力机制对输入图像进行编码处理,得到所述输入图像的编码张量集合和注意力图集合;其中,所述编码张量集合包括n个编码张量,所述注意力图集合包括n个注意力图,所述n为大于1的整数;
    根据所述编码张量集合和所述注意力图集合,得到所述输入图像的编码结果,所述输入图像的编码结果中记录有所述输入图像中人脸的身份特征;
    对表情图像进行编码处理,得到所述表情图像的编码结果,所述表情图像的编码结果中记录有所述表情图像中人脸的表情特征;
    根据所述输入图像的编码结果和所述表情图像的编码结果,生成输出图像;其中,所述输出图像具有所述输入图像的身份特征以及所述表情图像的表情特征。
  2. 根据权利要求1所述的方法,其中,所述根据所述编码张量集合和所述注意力图集合,得到所述输入图像的编码结果,包括:
    对于所述编码张量集合和所述注意力图集合中每一组对应的编码张量和注意力图,将所述编码张量和所述注意力图相乘,得到n个处理后编码张量;
    其中,所述输入图像的编码结果包括所述n个处理后编码张量。
  3. 根据权利要求2所述的方法,其中,所述表情图像的编码结果包括n个位移图;
    所述根据所述输入图像的编码结果和所述表情图像的编码结果,生成输出图像,包括:
    对于每一组对应的处理后编码张量和位移图,采用所述位移图对所述处理后编码张量做空间变换处理,得到n个变换后编码张量;
    对所述n个变换后编码张量进行解码处理,生成所述输出图像。
  4. 根据权利要求1所述的方法,其中,所述基于注意力机制对输入图像进行编码处理,得到所述输入图像的编码张量集合和注意力图集合之前,还包括:
    调用人脸表情编辑模型,所述人脸表情编辑模型包括:第一编码器、第二编码器和解码器;其中,
    所述第一编码器用于基于注意力机制对所述输入图像进行编码处理,得到所述输入图像的所述编码张量集合和所述注意力图集合;根据所述编码张量集合和所述注意力图集合,得到所述输入图像的编码结果;
    所述第二编码器用于对表情图像进行编码处理,得到所述表情图像的编码结果;
    所述解码器用于根据所述输入图像的编码结果和所述表情图像的编码结果,生成所述输出图像。
  5. 根据权利要求4所述的方法,其中,所述人脸表情编辑模型是基于生成对抗网络构建的模型,所述人脸表情编辑模型包括生成器和判别器,所述生成器包括所述第一编码器、所述第二编码器和所述解码器;
    所述人脸表情编辑模型的训练过程如下:
    获取至少一个训练样本,每个所述训练样本是一个包括原始图像和目标图像的图像对,所述原始图像和所述目标图像是同一人脸的两张图像,且所述原始图像和所述目标图像具有不同的表情;其中,所述生成器用于根据所述原始图像和所述目标图像生成所述原始图像对应的输出图像,所述判别器用于判别所述目标图像和所述原始图像对应的输出图像是否为所述生成器生成的图像;
    采用所述训练样本对所述人脸表情编辑模型进行训练。
  6. 根据权利要求4所述的方法,其中,所述人脸表情编辑模型的损失函数L total为:
    Figure PCTCN2020117455-appb-100001
    其中,
    Figure PCTCN2020117455-appb-100002
    表示一阶距离损失,L LSGAN、L P和L O分表表示最小平方生成对抗网络损失、感知损失和重叠惩罚损失,λ LSGAN、λ P和λ O分别表示三种损失对应的权重,所述重叠惩罚损失
    Figure PCTCN2020117455-appb-100003
    A表示注意力图集合,σ(a)表示关于a的sigmoid函数。
  7. 根据权利要求1至6任一项所述的方法,其中,所述根据所述输入图像 的编码结果和所述表情图像的编码结果,生成输出图像之后,还包括:
    生成包括所述输出图像的视频或动态图片。
  8. 一种图像处理装置,其中,所述装置包括:
    第一编码模块,用于基于注意力机制对输入图像进行编码处理,得到所述输入图像的编码张量集合和注意力图集合;其中,所述编码张量集合包括n个编码张量,所述注意力图集合包括n个注意力图,所述n为大于1的整数;
    所述第一编码模块,还用于根据所述编码张量集合和所述注意力图集合,得到所述输入图像的编码结果,所述输入图像的编码结果中记录有所述输入图像中人脸的身份特征;
    第二编码模块,用于对表情图像进行编码处理,得到所述表情图像的编码结果,所述表情图像的编码结果中记录有所述表情图像中人脸的表情特征;
    图像生成模块,用于根据所述输入图像的编码结果和所述表情图像的编码结果,生成输出图像;其中,所述输出图像具有所述输入图像的身份特征以及所述表情图像的表情特征。
  9. 根据权利要求8所述的装置,其中,所述第一编码模块,用于对于所述编码张量集合和所述注意力图集合中每一组对应的编码张量和注意力图,将所述编码张量和所述注意力图相乘,得到n个处理后编码张量;
    其中,所述输入图像的编码结果包括所述n个处理后编码张量。
  10. 根据权利要求9所述的装置,其中,所述表情图像的编码结果包括n个位移图;
    所述图像生成模块,用于:
    对于每一组对应的处理后编码张量和位移图,采用所述位移图对所述处理后编码张量做空间变换处理,得到n个变换后编码张量;
    对所述n个变换后编码张量进行解码处理,生成所述输出图像。
  11. 根据权利要求8所述的装置,其中,所述装置还包括:
    模型调用模块,用于调用人脸表情编辑模型,所述人脸表情编辑模型包括: 第一编码器、第二编码器和解码器;其中,
    所述第一编码器用于基于注意力机制对所述输入图像进行编码处理,得到所述输入图像的所述编码张量集合和所述注意力图集合;根据所述编码张量集合和所述注意力图集合,得到所述输入图像的编码结果;
    所述第二编码器用于对表情图像进行编码处理,得到所述表情图像的编码结果;
    所述解码器用于根据所述输入图像的编码结果和所述表情图像的编码结果,生成所述输出图像。
  12. 根据权利要求11所述的装置,其中,所述人脸表情编辑模型是基于生成对抗网络构建的模型,所述人脸表情编辑模型包括生成器和判别器,所述生成器包括所述第一编码器、所述第二编码器和所述解码器;
    所述人脸表情编辑模型的训练过程如下:
    获取至少一个训练样本,每个所述训练样本是一个包括原始图像和目标图像的图像对,所述原始图像和所述目标图像是同一人脸的两张图像,且所述原始图像和所述目标图像具有不同的表情;其中,所述生成器用于根据所述原始图像和所述目标图像生成所述原始图像对应的输出图像,所述判别器用于判别所述目标图像和所述原始图像对应的输出图像是否为所述生成器生成的图像;
    采用所述训练样本对所述人脸表情编辑模型进行训练。
  13. 根据权利要求8至12任一项所述的装置,其中,所述装置还包括:
    图像处理模块,用于生成包括所述输出图像的视频或动态图片。
  14. 一种计算机设备,所述计算机设备包括处理器和存储器,所述存储器中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现如权利要求1至7任一项所述的方法。
  15. 一种计算机可读存储介质,所述存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述 代码集或指令集由处理器加载并执行以实现如权利要求1至7任一项所述的方法。
PCT/CN2020/117455 2019-11-05 2020-09-24 图像处理方法、装置、设备及存储介质 WO2021088556A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/497,883 US20220028031A1 (en) 2019-11-05 2021-10-08 Image processing method and apparatus, device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911072470.8A CN110796111B (zh) 2019-11-05 2019-11-05 图像处理方法、装置、设备及存储介质
CN201911072470.8 2019-11-05

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/497,883 Continuation US20220028031A1 (en) 2019-11-05 2021-10-08 Image processing method and apparatus, device, and storage medium

Publications (1)

Publication Number Publication Date
WO2021088556A1 true WO2021088556A1 (zh) 2021-05-14

Family

ID=69442779

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/117455 WO2021088556A1 (zh) 2019-11-05 2020-09-24 图像处理方法、装置、设备及存储介质

Country Status (3)

Country Link
US (1) US20220028031A1 (zh)
CN (1) CN110796111B (zh)
WO (1) WO2021088556A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113723480A (zh) * 2021-08-18 2021-11-30 北京达佳互联信息技术有限公司 一种图像处理方法、装置、电子设备和存储介质
CN114565941A (zh) * 2021-08-24 2022-05-31 商汤国际私人有限公司 纹理生成方法、装置、设备及计算机可读存储介质

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110796111B (zh) * 2019-11-05 2020-11-10 腾讯科技(深圳)有限公司 图像处理方法、装置、设备及存储介质
CN111401216B (zh) * 2020-03-12 2023-04-18 腾讯科技(深圳)有限公司 图像处理、模型训练方法、装置、计算机设备和存储介质
CN113538604B (zh) * 2020-04-21 2024-03-19 中移(成都)信息通信科技有限公司 图像生成方法、装置、设备及介质
CN111553267B (zh) * 2020-04-27 2023-12-01 腾讯科技(深圳)有限公司 图像处理方法、图像处理模型训练方法及设备
CN111833238B (zh) * 2020-06-01 2023-07-25 北京百度网讯科技有限公司 图像的翻译方法和装置、图像翻译模型的训练方法和装置
CN111783603A (zh) * 2020-06-24 2020-10-16 有半岛(北京)信息科技有限公司 生成对抗网络训练方法、图像换脸、视频换脸方法及装置
CN113507608A (zh) * 2021-06-09 2021-10-15 北京三快在线科技有限公司 图像编码方法、装置、电子设备
CN113706388B (zh) * 2021-09-24 2023-06-27 上海壁仞智能科技有限公司 图像超分辨率重建方法及装置
CN114866345B (zh) * 2022-07-05 2022-12-09 支付宝(杭州)信息技术有限公司 一种生物识别的处理方法、装置及设备
CN117974853B (zh) * 2024-03-29 2024-06-11 成都工业学院 同源微表情图像自适应切换生成方法、系统、终端及介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170157521A1 (en) * 2015-07-21 2017-06-08 Disney Enterprises, Inc. Ride with automated trackless vehicles controlled based on sensed occupant state
CN108921061A (zh) * 2018-06-20 2018-11-30 腾讯科技(深圳)有限公司 一种表情识别方法、装置和设备
CN109325422A (zh) * 2018-08-28 2019-02-12 深圳壹账通智能科技有限公司 表情识别方法、装置、终端及计算机可读存储介质
CN109508689A (zh) * 2018-11-28 2019-03-22 中山大学 一种对抗强化的表情识别方法
CN109934116A (zh) * 2019-02-19 2019-06-25 华南理工大学 一种基于生成对抗机制与注意力机制的标准人脸生成方法
CN109934767A (zh) * 2019-03-06 2019-06-25 中南大学 一种基于身份和表情特征转换的人脸表情转换方法
CN110796111A (zh) * 2019-11-05 2020-02-14 腾讯科技(深圳)有限公司 图像处理方法、装置、设备及存储介质

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10825219B2 (en) * 2018-03-22 2020-11-03 Northeastern University Segmentation guided image generation with adversarial networks
US11074711B1 (en) * 2018-06-15 2021-07-27 Bertec Corporation System for estimating a pose of one or more persons in a scene
CN109190472B (zh) * 2018-07-28 2021-09-14 天津大学 基于图像与属性联合引导的行人属性识别方法
CN110008846B (zh) * 2019-03-13 2022-08-30 南京邮电大学 一种图像处理方法
US11074733B2 (en) * 2019-03-15 2021-07-27 Neocortext, Inc. Face-swapping apparatus and method
CN110222588B (zh) * 2019-05-15 2020-03-27 合肥进毅智能技术有限公司 一种人脸素描图像衰老合成方法、装置及存储介质
CN112233212A (zh) * 2019-06-28 2021-01-15 微软技术许可有限责任公司 人像编辑与合成
US10949715B1 (en) * 2019-08-19 2021-03-16 Neon Evolution Inc. Methods and systems for image and voice processing

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170157521A1 (en) * 2015-07-21 2017-06-08 Disney Enterprises, Inc. Ride with automated trackless vehicles controlled based on sensed occupant state
CN108921061A (zh) * 2018-06-20 2018-11-30 腾讯科技(深圳)有限公司 一种表情识别方法、装置和设备
CN109325422A (zh) * 2018-08-28 2019-02-12 深圳壹账通智能科技有限公司 表情识别方法、装置、终端及计算机可读存储介质
CN109508689A (zh) * 2018-11-28 2019-03-22 中山大学 一种对抗强化的表情识别方法
CN109934116A (zh) * 2019-02-19 2019-06-25 华南理工大学 一种基于生成对抗机制与注意力机制的标准人脸生成方法
CN109934767A (zh) * 2019-03-06 2019-06-25 中南大学 一种基于身份和表情特征转换的人脸表情转换方法
CN110796111A (zh) * 2019-11-05 2020-02-14 腾讯科技(深圳)有限公司 图像处理方法、装置、设备及存储介质

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113723480A (zh) * 2021-08-18 2021-11-30 北京达佳互联信息技术有限公司 一种图像处理方法、装置、电子设备和存储介质
CN113723480B (zh) * 2021-08-18 2024-03-05 北京达佳互联信息技术有限公司 一种图像处理方法、装置、电子设备和存储介质
CN114565941A (zh) * 2021-08-24 2022-05-31 商汤国际私人有限公司 纹理生成方法、装置、设备及计算机可读存储介质

Also Published As

Publication number Publication date
US20220028031A1 (en) 2022-01-27
CN110796111A (zh) 2020-02-14
CN110796111B (zh) 2020-11-10

Similar Documents

Publication Publication Date Title
WO2021088556A1 (zh) 图像处理方法、装置、设备及存储介质
CN112215927B (zh) 人脸视频的合成方法、装置、设备及介质
US11967151B2 (en) Video classification method and apparatus, model training method and apparatus, device, and storage medium
US11748934B2 (en) Three-dimensional expression base generation method and apparatus, speech interaction method and apparatus, and medium
US20210224601A1 (en) Video sequence selection method, computer device, and storage medium
EP4198814A1 (en) Gaze correction method and apparatus for image, electronic device, computer-readable storage medium, and computer program product
US20230072627A1 (en) Gaze correction method and apparatus for face image, device, computer-readable storage medium, and computer program product face image
US20230082605A1 (en) Visual dialog method and apparatus, method and apparatus for training visual dialog model, electronic device, and computer-readable storage medium
WO2022188697A1 (zh) 提取生物特征的方法、装置、设备、介质及程序产品
CN111292262B (zh) 图像处理方法、装置、电子设备以及存储介质
WO2024051480A1 (zh) 图像处理方法、装置及计算机设备、存储介质
CN111833360B (zh) 一种图像处理方法、装置、设备以及计算机可读存储介质
CN115565238A (zh) 换脸模型的训练方法、装置、设备、存储介质和程序产品
CN113095206A (zh) 虚拟主播生成方法、装置和终端设备
CN116958324A (zh) 图像生成模型的训练方法、装置、设备及存储介质
CN113393544A (zh) 一种图像处理方法、装置、设备及介质
CN113657272B (zh) 一种基于缺失数据补全的微视频分类方法及系统
WO2021120578A1 (zh) 神经网络的前向计算方法、装置及计算机可读存储介质
WO2023160157A1 (zh) 三维医学图像的识别方法、装置、设备、存储介质及产品
CN113538254A (zh) 图像恢复方法、装置、电子设备及计算机可读存储介质
CN117011449A (zh) 三维面部模型的重构方法和装置、存储介质及电子设备
CN110047118B (zh) 视频生成方法、装置、计算机设备及存储介质
CN114283460A (zh) 一种特征提取方法、装置、计算机设备及存储介质
CN115731101A (zh) 超分辨率的图像处理方法、装置、设备及存储介质
CN116152876A (zh) 表情迁移方法、装置、设备、存储介质及程序产品

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20883752

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20883752

Country of ref document: EP

Kind code of ref document: A1