CN112116684A - Image processing method, device, equipment and computer readable storage medium - Google Patents

Image processing method, device, equipment and computer readable storage medium Download PDF

Info

Publication number
CN112116684A
CN112116684A CN202010777489.9A CN202010777489A CN112116684A CN 112116684 A CN112116684 A CN 112116684A CN 202010777489 A CN202010777489 A CN 202010777489A CN 112116684 A CN112116684 A CN 112116684A
Authority
CN
China
Prior art keywords
image
sample
identity
feature vector
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010777489.9A
Other languages
Chinese (zh)
Inventor
严彦阳
黄浩智
沈力
王璇
操晓春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Institute of Information Engineering of CAS
Original Assignee
Tencent Technology Shenzhen Co Ltd
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd, Institute of Information Engineering of CAS filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202010777489.9A priority Critical patent/CN112116684A/en
Publication of CN112116684A publication Critical patent/CN112116684A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/60Editing figures and text; Combining figures or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application provides an image processing method, an image processing device, image processing equipment and a computer readable storage medium, wherein the method comprises the following steps: acquiring a first image and a second image; extracting identity features of a first target object in the first image to obtain an identity feature vector of the first image; extracting attitude features of a second target object in the second image to obtain an attitude feature vector of the second image; obtaining an image synthesis vector corresponding to the first image and the second image based on the identity feature vector and the pose feature vector; generating a composite image of the first image and the second image according to the image composite vector; a first target object in the composite image has a pose of the second target object. By the embodiment of the application, the image reappearing adaptability during image synthesis can be improved, and the first image and the second image can be accurately subjected to image synthesis processing.

Description

Image processing method, device, equipment and computer readable storage medium
Technical Field
The embodiment of the application relates to the technical field of internet, and relates to but is not limited to an image processing method, an image processing device, image processing equipment and a computer-readable storage medium.
Background
The face reproduction technology in the related technology mainly utilizes big data to drive model training to obtain an image processing network, sets identity information of an original image and a target image to be consistent during model training, trains a generator network in the image processing network by using a large amount of picture data, and further realizes face reproduction through the image processing network obtained by training.
However, in the process of implementing the embodiment of the present application, the applicant finds that the method in the related art is only limited to the reproduction of a specific target face, the trained image processing network is only suitable for the reproduction of a single target face, and is not suitable for other target faces, and the training of the single target face requires a large amount of data support, so that the retraining of a new image processing network is impractical, and meanwhile, since the identity information of the original image and the target image is set to be consistent during most network training, the reproduction adaptability of the network to faces with different identities is poor, and the synthesis accuracy of the network for face reproduction of different images is low.
Disclosure of Invention
Embodiments of the present application provide an image processing method, an image processing apparatus, an image processing device, and a computer-readable storage medium, where it is not necessary to set identity information of a first image and identity information of a second image to be identical, and to process an identity feature vector and an attitude feature vector to obtain an image synthesis vector, so that image reproduction adaptability during image synthesis can be improved, and accurate image synthesis processing can be performed on the first image and the second image.
The technical scheme of the embodiment of the application is realized as follows:
an embodiment of the present application provides an image processing method, including:
acquiring a first image and a second image;
extracting identity features of a first target object in the first image to obtain an identity feature vector of the first image;
extracting attitude features of a second target object in the second image to obtain an attitude feature vector of the second image;
obtaining an image synthesis vector corresponding to the first image and the second image based on the identity feature vector and the pose feature vector;
generating a composite image of the first image and the second image according to the image composite vector; a first target object in the composite image has a pose of the second target object.
An embodiment of the present application provides an image processing apparatus, including:
the acquisition module is used for acquiring a first image and a second image;
the identity feature extraction module is used for extracting identity features of a first target object in the first image to obtain an identity feature vector of the first image;
the attitude feature extraction module is used for extracting the attitude feature of a second target object in the second image to obtain an attitude feature vector of the second image;
a processing module configured to obtain an image synthesis vector corresponding to the first image and the second image based on the identity feature vector and the pose feature vector;
a generating module, configured to generate a composite image of the first image and the second image according to the image composite vector; a first target object in the composite image has a pose of the second target object.
An embodiment of the present application provides an image processing apparatus, including:
a memory for storing executable instructions; and the processor is used for realizing the method when executing the executable instructions stored in the memory.
Embodiments of the present application provide a computer-readable storage medium, which stores executable instructions for causing a processor to execute the executable instructions to implement the method described above.
The embodiment of the application has the following beneficial effects:
the identity characteristic vector and the attitude characteristic vector are correspondingly obtained by respectively extracting the identity characteristic of a first target object in the first image and the attitude characteristic of a second target object in the second image, so that the identity characteristic vector and the attitude characteristic vector are processed to obtain a composite image of the first image and the second image. In this way, since it is not necessary to set the identity information of the first image and the second image to be identical and the identity feature vector and the posture feature vector are processed, the image reproduction adaptability in image synthesis can be improved and the first image and the second image can be accurately subjected to image synthesis processing.
Drawings
FIG. 1 is a schematic diagram of an alternative architecture of an image processing system provided by an embodiment of the present application;
FIG. 2 is a schematic structural diagram of a server provided in an embodiment of the present application;
FIG. 3 is a schematic flow chart of an alternative image processing method provided by the embodiment of the present application;
FIG. 4 is a schematic flow chart of an alternative image processing method provided by the embodiment of the present application;
FIG. 5 is a schematic flow chart of an alternative image processing method provided by the embodiment of the present application;
FIG. 6 is a schematic flow chart of an alternative image processing method provided by the embodiment of the present application;
FIG. 7 is a schematic flow chart of an alternative image processing method provided by the embodiment of the present application;
FIG. 8 is a schematic flow chart diagram illustrating an alternative method for training an image processing network according to an embodiment of the present disclosure;
FIG. 9 is a flowchart of an overall framework of an image processing method provided in an embodiment of the present application;
FIG. 10 is a schematic diagram of an implementation process of face token preprocessing according to an embodiment of the present disclosure;
fig. 11 is a schematic diagram of a gradient update process of a meta-learning strategy provided in an embodiment of the present application.
Detailed Description
In order to make the objectives, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the attached drawings, the described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the embodiments of the present application belong. The terminology used in the embodiments of the present application is for the purpose of describing the embodiments of the present application only and is not intended to be limiting of the present application.
Before explaining the embodiments of the present application, terms referred to in the present application are first explained:
1) face reproduction: the method is characterized in that an original face and a target face are given, a virtual simulated face is synthesized by a computer, the identity information of the virtual simulated face is consistent with the identity information of the target face, and the expression information is consistent with the expression information of the original face.
2) Meta learning: learning is also called learning, which makes the computer possess learning ability and discover learning rules. And when facing a new task, the rules can be mastered through a small amount of training.
3) Few samples: it means that only a few samples of the target face are provided when face reconstruction is performed, for example, only a few frames of images are generally provided for face reconstruction.
In order to better understand the image processing method provided in the embodiment of the present application, first, an image processing method in the related art is explained:
the face reproduction technology in the related technology mainly utilizes a big data driving method, sets identity information of an original image and identity information of a target image to be consistent during training, and trains a generator network by using a large amount of picture data. For target images that do not appear during training, some techniques can adjust the model in the related art by means of fine tuning (finetune).
However, the method in the related art is limited to the reproduction of a specific target face, and the trained network is only suitable for the reproduction of a single target face and is not suitable for other target faces. Moreover, training of a single target face requires a large amount of data support, making it impractical to retrain a new face model. In addition, the reconstruction of new faces is realized by finetune on a network trained on a large amount of data, and the effect is general under the condition of a given finetune calculation amount because a new face sample is limited although the method is feasible. In addition, most of network training sets the identity information of the original image to be consistent with that of the target image, so that the network has poor adaptability to face reproduction with different identities, and the network has low synthesis accuracy when performing face reproduction on different images.
Based on at least one of the above problems in the related art, an embodiment of the present application provides an image processing method, which includes, first, performing identity feature extraction on a first target object in a first image to be processed to obtain an identity feature vector of the first image; extracting the attitude feature of a second target object in a second image to be processed to obtain an attitude feature vector of the second image; then, processing the identity characteristic vector and the posture characteristic vector to obtain an image synthesis vector corresponding to the first image and the second image; finally, a composite image of the first image and the second image is determined based on the image composite vector. In this way, since it is not necessary to set identity information of the first image and the second image to be identical and the identity feature vector and the pose feature vector are processed to obtain the image synthesis vector, image reproduction adaptability during image synthesis can be improved and accurate image synthesis processing can be performed on the first image and the second image.
An exemplary application of the image processing apparatus provided in the embodiment of the present application is described below, and the image processing apparatus provided in the embodiment of the present application may be implemented as any terminal having an on-screen display function, such as a notebook computer, a tablet computer, a desktop computer, a mobile device (e.g., a mobile phone, a portable music player, a personal digital assistant, a dedicated messaging device, a portable game device), an intelligent robot, or may be implemented as a server. Next, an exemplary application when the image processing apparatus is implemented as a server will be described.
Referring to fig. 1, fig. 1 is a schematic diagram of an alternative architecture of an image processing system 10 according to an embodiment of the present application. In order to synthesize the first image and the second image to be processed to obtain a synthesized image having the identity information of the first image and having the pose information of the second image, the image processing system 10 according to the embodiment of the present application includes the terminal 100, the network 200, and the server 300. When the image processing method of the embodiment of the present application is implemented, the terminal 100 displays a first image and a second image to be processed on a current display page 100-1, where the first image has a first target object and the second image has a second target object, the terminal 100 sends the first image and the second image displayed on the display page 100-1 to the server 300 through the network 200, and the server 300 performs identity feature extraction on the first target object in the first image to obtain an identity feature vector of the first image; extracting the attitude feature of a second target object in the second image to obtain an attitude feature vector of the second image; carrying out example normalization processing on the identity characteristic vector and the posture characteristic vector to obtain an image synthesis vector corresponding to the first image and the second image; a composite image of the first image and the second image is determined based on the image composite vector. After forming the composite image, feeding back the composite image to the terminal 100 through the network 200; after the terminal 100 acquires the composite image, the first image, the second image, and the composite image are simultaneously displayed on the presentation page 100-1 for viewing by the user, or only the composite image may be displayed.
The image processing method related to the embodiment of the application can be realized based on an Artificial Intelligence (AI) technology, and is a theory, method, technology and application system for simulating, extending and expanding human intelligence, sensing environment, acquiring knowledge and using the knowledge to obtain an optimal result by using a digital computer or a machine controlled by the digital computer. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Computer Vision technology (CV) Computer Vision is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also include common biometric technologies such as face recognition and fingerprint recognition.
Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how the computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.
With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical care, smart customer service, and the like.
The solution provided in the embodiments of the present application relates to a computer vision technology of artificial intelligence, a machine learning technology, and the like, and will be described in the following embodiments.
Fig. 2 is a schematic structural diagram of a server 300 according to an embodiment of the present application, where the server 300 shown in fig. 2 includes: at least one processor 310, memory 350, at least one network interface 320, and a user interface 330. The various components in server 300 are coupled together by a bus system 340. It will be appreciated that the bus system 340 is used to enable communications among the components connected. The bus system 340 includes a power bus, a control bus, and a posture signal bus in addition to the data bus. For clarity of illustration, however, the various buses are labeled as bus system 340 in fig. 2.
The Processor 310 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.
The user interface 330 includes one or more output devices 331, including one or more speakers and/or one or more visual display screens, that enable presentation of media content. The user interface 330 also includes one or more input devices 332, including user interface components to facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.
The memory 350 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard disk drives, optical disk drives, and the like. Memory 350 optionally includes one or more storage devices physically located remote from processor 310. The memory 350 may include either volatile memory or nonvolatile memory, and may also include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a Random Access Memory (RAM). The memory 350 described in embodiments herein is intended to comprise any suitable type of memory. In some embodiments, memory 350 is capable of storing data, examples of which include programs, modules, and data structures, or subsets or supersets thereof, as exemplified below, to support various operations.
An operating system 351 including system programs for processing various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and processing hardware-based tasks;
a network communication module 352 for communicating to other computing devices via one or more (wired or wireless) network interfaces 320, exemplary network interfaces 320 including: bluetooth, wireless compatibility authentication (WiFi), and Universal Serial Bus (USB), etc.;
an input processing module 353 for detecting one or more user inputs or interactions from one of the one or more input devices 332 and translating the detected inputs or interactions.
In some embodiments, the apparatus provided by the embodiments of the present application may be implemented in software, and fig. 2 illustrates an image processing apparatus 354 stored in the memory 350, where the image processing apparatus 354 may be an image processing apparatus in the server 300, and may be software in the form of programs and plug-ins, and the like, and includes the following software modules: the obtaining module 3541, the identity feature extraction module 3542, the pose feature extraction module 3543, the processing module 3544, and the generating module 3545 are logical and thus may be arbitrarily combined or further separated according to the functions implemented. The functions of the respective modules will be explained below.
In other embodiments, the apparatus provided in the embodiments of the present Application may be implemented in hardware, and for example, the apparatus provided in the embodiments of the present Application may be a processor in the form of a hardware decoding processor, which is programmed to execute the image processing method provided in the embodiments of the present Application, for example, the processor in the form of the hardware decoding processor may be one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), or other electronic components.
The image processing method provided by the embodiment of the present application will be described below in conjunction with an exemplary application and implementation of the server 300 provided by the embodiment of the present application. Referring to fig. 3, fig. 3 is an alternative flowchart of an image processing method provided in an embodiment of the present application, and will be described with reference to the steps shown in fig. 3.
In step S301, a first image and a second image are acquired.
Here, an image processing request may be received, the image processing request including a first image and a second image, the image processing request requesting image synthesis of the first image and the second image, resulting in a synthesized image having identity information of the first image and having pose information of the second image.
Step S302, identity feature extraction is carried out on the first target object in the first image, and an identity feature vector of the first image is obtained.
Here, the first image includes at least one first object, and before image synthesis is performed, a first target object in the first image for providing identity information needs to be determined, and then identity feature extraction is performed on the first target object. In some embodiments, the first target object may be determined according to a focus position of the first image, or the first object closest to the shooting position may be determined as the first target object according to a position of each first object in the first image, or the first object with the highest definition may be determined as the first target object according to a definition of each first object in the first image, or the user may previously calibrate the first target object that needs to provide the identity information, and the server receives calibration information of the user while receiving the first image, and determines the first target object according to the calibration information.
In other embodiments, if only one first object in the first image is detected, the one first object may be directly determined as the first target object.
In some embodiments, the method for determining the first target object may be implemented by using an artificial intelligence technique, through which the first image is identified to determine the first object in the first image, and then the first target object is determined from the plurality of first objects according to the attribute information (e.g., information of focus position, definition, etc.) of the first object.
The identity feature extraction is to extract features of a region capable of representing identity information of the first target object to obtain a feature vector of the region, and synthesize the features of the plurality of regions capable of representing identity information of the first target object to obtain an identity feature vector of the first image.
For example, if the first target object is a person, the region capable of characterizing identity information may be facial information, such as features of the eyes, nose, mouth, eyebrows, and face shape, or features of height, fat, weight, and body shape.
Step S303, performing pose feature extraction on the second target object in the second image to obtain a pose feature vector of the second image.
Here, the second image includes at least one second object, and before image synthesis is performed, a second target object for providing identity information in the second image also needs to be determined. In the embodiment of the present application, the second target object may be determined by the same method as the above-described determination of the first target object.
The posture feature extraction is to extract features of a region capable of representing the posture information of the second target object, obtain a feature vector of the region, and determine the obtained feature vector as the posture information of the second target object.
In the embodiment of the present application, the posture information includes, but is not limited to, facial posture information, gesture information, body posture information, and expression information, and the like, wherein the facial posture information includes, but is not limited to, facial expression information. For example, when the identity information of the first image needs to be combined with the expression information of the second image, the extracted pose feature vector of the second image may be a feature vector corresponding to the expression information of the second target object. In the embodiment of the application, the type of the posture information of the second target object may be preset, and in the image synthesis process, the target area of the second target object is determined according to the type of the posture information of the second target object, and then the feature vector of the target area is obtained to obtain the posture feature vector.
Step S304, based on the identity characteristic vector and the posture characteristic vector, an image synthesis vector corresponding to the first image and the second image is obtained.
Here, the identity feature vector and the pose feature vector may be subjected to example normalization processing to obtain an image synthesis vector of the first image and the second image, and of course, in other embodiments, other processing manners may also be adopted to perform image synthesis to obtain an image synthesis vector of the first image and the second image.
In this embodiment of the application, the instance normalization processing may be adaptive instance normalization processing, and may be full-join processing combined with the normalization processing, and on the basis of calculating a mean and a variance of the identity feature vector and the pose feature vector during the normalization processing, the mean and the variance of the identity feature vector and the pose feature vector are obtained through the full-join processing, that is, the mean and the variance obtained through the normalization processing are associated with the mean and the method obtained through the full-join processing, so that after the instance normalization processing, the image synthesis vector has the identity information of the first image and also has the pose information of the second image.
In step S305, a composite image of the first image and the second image is generated based on the image composite vector.
Here, the image synthesis vector is converted into a synthetic image that is an image in which the identity information of the first image and the pose information of the second image are aggregated, in accordance with information corresponding to the image synthesis vector, that is, the first target object in the synthetic image has the pose of the second target object. For example, if the first target object in the first image has identity a and the second target object in the second image has expression B, the generated composite image is an image having both identity a and expression B.
According to the image processing method provided by the embodiment of the application, the identity characteristic vector and the posture characteristic vector are correspondingly obtained by respectively extracting the identity characteristic of the first target object in the first image and the posture characteristic of the second target object in the second image, so that the identity characteristic vector and the posture characteristic vector are processed to obtain the composite image of the first image and the second image. In this way, since it is not necessary to set identity information of the first image and the second image to be consistent and to perform instance normalization processing on the identity feature vector and the pose feature vector, image reproduction adaptability during image synthesis can be improved and accurate image synthesis processing can be performed on the first image and the second image.
In some embodiments, the image processing system includes at least a terminal and a server, where the terminal is configured to display a first image and a second image to be processed, and the server is configured to synthesize the first image and the second image to obtain a synthesized image. Fig. 4 is an alternative flowchart of an image processing method provided in an embodiment of the present application, and as shown in fig. 4, the method includes the following steps:
in step S401, the terminal acquires a first image and a second image.
Here, the first image and the second image may be images captured by a terminal, images downloaded from a network, or images transmitted from another device. In the embodiment of the application, after the first image and the second image are acquired, the first image and the second image are packaged in an image processing request, and the image processing request is used for requesting reproduction of the first image and the second image so as to synthesize the first image and the second image.
In some embodiments, the user may also mark the first image and the second image through the terminal to mark a first target object to be combined from the first image and mark a second target object to be combined from the second image.
In step S402, the terminal transmits an image processing request to the server.
In step S403, the server parses the image processing request to obtain the first image and the second image.
Step S404, the server extracts the identity feature of the first target object in the first image to obtain the identity feature vector of the first image.
Step S405, the server extracts the attitude feature of the second target object in the second image to obtain the attitude feature vector of the second image.
It should be noted that the identity feature extraction process in step S404 and the pose feature extraction process in step S405 are respectively similar to the processes in step S302 and step S303, and are not described again in this embodiment of the present application.
Step S406, the server normalizes the attitude feature vector to obtain a normalized feature vector.
Here, the normalization processing means that the mean and variance of all the attitude feature vectors are calculated, and the obtained mean and method are determined as the normalized feature vectors.
Step S407, the server performs full connection processing on the normalized feature vector and the identity feature vector to obtain an image synthesis vector corresponding to the first image and the second image.
After the normalized feature vector is obtained through calculation, full connection processing is carried out on the normalized feature vector and the identity feature vector through a full connection network, the mean value and the variance of the normalized feature vector and the identity feature vector are calculated, and the vector corresponding to the mean value and the variance is determined to be an image synthesis vector.
In step S408, the server generates a composite image of the first image and the second image based on the image composite vector.
In step S409, the server transmits the composite image to the terminal.
In step S410, the terminal displays the composite image on the current interface.
The image processing method provided by the embodiment of the application can provide an image processing application, the image processing application runs on a terminal, and a server can be a client server of the image processing application. The user may operate on a client of an image processing application running on the terminal, triggering an image processing procedure. In the embodiment of the application, when a user wants to synthesize a first image and a second image, the user operates the client on the terminal, so that the terminal interacts with the server, the image synthesis process is further realized, in the image synthesis process, identity information of the first image and identity information of the second image do not need to be set to be consistent, and instance normalization processing is performed on the identity characteristic vector and the attitude characteristic vector, therefore, the image reproduction adaptability during image synthesis can be improved, accurate image synthesis processing can be performed on the first image and the second image, and the user experience is improved.
Based on fig. 3, fig. 5 is an optional flowchart of the image processing method provided in the embodiment of the present application, and as shown in fig. 5, in some embodiments, step S302 may be implemented by:
step S501, image data corresponding to at least two first markers in the first target object is acquired.
Here, the first marker refers to any one point in a first target object in a first image, the first marker is used for identifying the position of the point, and the first target object is formed by continuously splicing a large number of point markers in the first image.
The image data corresponding to the first marker may be a pixel or a coordinate of a point corresponding to the first marker.
Step S502, normalization processing is carried out on the image data corresponding to the at least two first marks, and first normalization data are obtained.
Here, the normalization processing of the image data corresponding to the at least two first markers may be performed by taking a mean and a variance of the image data corresponding to the at least two first markers and determining the mean and the variance as the first normalized data.
In step S503, an image data mean of at least two image data is obtained.
Here, the average value of all the acquired image data is obtained to obtain the image data average value.
Step S504, identity characteristic vectors of the first image are determined according to the first normalization data and the image data mean value.
Based on fig. 5, fig. 6 is an optional flowchart of the image processing method provided in the embodiment of the present application, and as shown in fig. 6, in some embodiments, step S501 may be implemented by:
step S601, at least two first regions in the first target object are determined.
Here, the first region may be any region in the first target object, or may be a region in the first target object capable of representing identity information, for example, when the first target object is a face image, the first region may be an eye region, a nose region, an eyebrow region, or the like.
Step S602, image flag preprocessing is performed on at least two first regions in the first target object, so as to obtain a first flag of each first region.
Here, the image marker preprocessing refers to extracting points in the first region to obtain point markers having continuous image data, and determining the point markers as the first markers in the first region. For example, the continuous image data means that if the difference between the image data (e.g., pixel value) corresponding to an arbitrary point and the image data of the adjacent point is small, it indicates that the image of the area is a continuous image, and thus the image data of the area is valid image data, the point mark of the area may be determined as the first mark.
In step S603, image data corresponding to each first mark is obtained.
Referring still to fig. 6, in some embodiments, step S504 can be implemented by:
in step S604, the difference between the first normalized data and the image data mean is determined as the pose information of the first image.
Step S605, determining an identity feature vector of the first image according to the image data mean and the pose information of the first image.
It should be noted that, in some embodiments, when determining the pose feature vector for the second image, a method similar to the process of determining the identity feature vector in the embodiment of the present application may also be used. The difference is that after the pose information of the second image is determined, the pose feature vector of the second image is directly obtained according to the pose information, and the identity feature vector of the second image does not need to be determined according to the image data mean value and the pose information of the second image.
Based on fig. 3, fig. 7 is an optional flowchart of the image processing method provided in the embodiment of the present application, and as shown in fig. 7, in some embodiments, step S303 may be implemented by:
step S701 determines image data corresponding to at least two second markers in the second target object.
Here, the determining process of the second mark may be similar to the method in step S602, and image mark preprocessing is performed on at least two second areas in the second target object, so as to obtain a second mark of each second area. After the second marker is determined, corresponding image data is determined according to the position of the second marker in the second target object.
In some embodiments, the pose information includes expression information, and correspondingly, the step S701 may be implemented by:
in step S7011, a face image of the second target object is determined.
In step S7012, in the face image, at least two second regions related to the expression information are determined.
Here, the at least two second areas related to the expression information include, but are not limited to: the eye area and the eyebrow area.
Step S7013, image flag preprocessing is performed on at least two second areas to obtain a second flag of each second area.
Step S7014, image data corresponding to each second marker is acquired.
Step S702, performing normalization processing on the image data corresponding to the at least two second markers to obtain second normalized data.
Step S703 is to obtain an image data mean value of the image data corresponding to the at least two second marks.
Here, the obtaining of the image data mean value is calculating the mean value of the image data corresponding to all of the at least two second markers.
Step S704, determining a difference between the second normalized data and the image data mean as an attitude feature vector of the second image.
Here, in determining the orientation feature vector of the second image, a difference between the second normalized data and the image data mean may be calculated, and the difference may be directly determined as the orientation feature vector of the second image.
According to the image processing method provided by the embodiment of the application, the extracted identity characteristic vector and the extracted attitude characteristic vector are subjected to instance normalization processing, the normalized data and the image data mean value are respectively calculated before the instance normalization processing process, and the identity information of the first image or the attitude information of the second image are calculated according to the image data mean value and the normalized data, so that accurate identity information and attitude information can be obtained through accurate calculation, and therefore follow-up more accurate instance normalization processing is guaranteed, more accurate image synthesis vectors are obtained, more accurate synthesis images are obtained, and accurate synthesis of the first image and the second image is achieved.
In some embodiments, the image processing method provided in the embodiments of the present application may also be implemented by using an image processing network trained based on an artificial intelligence technique, that is, the image processing network is used to determine a composite image of the first image and the second image. Alternatively, the first image and the second image may be identified and processed by using an artificial intelligence technique to obtain a final composite image.
Fig. 8 is an alternative flowchart of a training method for an image processing network according to an embodiment of the present application, where as shown in fig. 8, the training method includes the following steps:
step S801, inputting the first sample image and the second sample image into an image processing network, and determining a first sample mark of the first sample image and a second sample mark of the second sample image through a mark preprocessing model in the image processing network.
Here, a first sample image for providing sample identity information and a second sample image for providing sample pose information are input into the image processing network as sample data.
In this embodiment, the first sample marker is a point marker obtained by performing image marker preprocessing on the first sample image, and the second sample marker is a point marker obtained by performing image marker preprocessing on the second sample image. In the implementation process, image marker preprocessing can be performed on the first sample image and the second sample image respectively through a marker preprocessing model in an image processing network, so as to obtain a first sample marker and a second sample marker.
Step S802, extracting the identity characteristic of the first sample image through a target identity encoder in the image processing network to obtain a sample identity characteristic vector of the first sample image.
In some embodiments, step S802 may be implemented by:
step S8021, input the first sample mark into the target identity encoder. Step S8022, sequentially performing at least one downsampling process and at least one convolution process on the feature vector corresponding to the first sample identifier through at least one downsampling layer and at least one convolution layer in the target identity encoder, so as to obtain a sample identity feature vector of the first sample image.
And step S803, performing feature extraction on the second sample mark through a feature extraction layer in the image processing network to obtain a sample posture feature vector of the second sample image.
Step S804, the generator in the image processing network carries out example normalization processing on the sample identity characteristic vector and the sample posture characteristic vector to obtain a sample composite vector corresponding to the first sample image and the second sample image.
In some embodiments, step S804 may be implemented by:
step S8041, the sample identity feature vector and the sample posture feature vector are input into the generator. Step S8042, the sample identity feature vector and the sample posture feature vector are respectively encoded by an encoder in the generator, and an identity encoding vector and a posture encoding vector are correspondingly obtained. Step S8043, performing adaptive instance normalization processing on the identity code vector and the pose code vector through an adaptive instance normalization layer in the generator, to obtain a sample synthesized vector corresponding to the first sample image and the second sample image.
Step S805, inputting the sample synthesized vector into a preset loss model to obtain a loss result.
Here, the preset loss model is configured to compare the sample synthetic vector with a preset synthetic vector, so as to obtain a loss result, where the preset synthetic vector may be a synthetic vector corresponding to the first sample image and the second sample image, which is preset by a user.
In the embodiment of the application, a sample synthetic image can be determined according to the sample synthetic vector, and a preset synthetic image can be determined according to the preset synthetic vector. The preset loss model comprises a loss function, the similarity between the sample synthetic image and the preset synthetic image can be calculated through the loss function, in the calculation process, the similarity between the sample synthetic image and the preset synthetic image can be obtained through calculating the distance between the sample synthetic vector and the preset synthetic vector, and the loss result is determined according to the similarity. When the distance between the sample synthetic vector and the preset synthetic vector is larger, the similarity between the sample synthetic image and the preset synthetic image is smaller, which indicates that the difference between the training result of the model and the true value is larger, and further training is needed; when the distance between the sample synthetic vector and the preset synthetic vector is smaller, the similarity between the sample synthetic image and the preset synthetic image is larger, and the fact that the training result of the model is closer to the true value is shown.
Step S806, according to the loss result, the network parameters in the mark preprocessing model, the target identity encoder, and the generator are modified to obtain a modified image processing network.
Here, when the similarity is greater than the preset similarity threshold, the loss result indicates that the first sample flag of the first sample image and the second sample flag of the second sample image cannot be accurately determined by the flag preprocessing model in the current image processing network, and/or the target identity encoder cannot accurately extract the identity feature of the first sample image, and/or the generator cannot accurately extract the feature of the second sample flag, so as to obtain the accurate sample posture feature vector of the second sample image. Therefore, the current image processing network needs to be modified. Then, at least one of the mark preprocessing model, the target identity encoder and the generator may be modified according to the similarity, and when the similarity between the sample synthetic image output by the image processing network and the preset synthetic image satisfies a preset condition, the corresponding image processing network is determined as the trained image processing network.
According to the training method for the image processing network, the first sample image and the second sample image are input into the image processing network, are sequentially processed through the mark preprocessing model, the target identity encoder and the generator in the image processing network to obtain the sample synthetic vector, and are input into the preset loss model to obtain the loss result. Therefore, at least one of the mark preprocessing model, the target identity encoder and the generator can be corrected according to the loss result, and the obtained image processing network can accurately perform synthesis processing on the first image and the second image to obtain a synthesized image meeting the requirements of the user, so that the use experience of the user is provided.
In some embodiments, step S806 may be implemented by:
step S8061, when the network is updated for the nth time, a first update gradient of the network parameter is obtained.
Step S8062, when the network is updated for the (N + 1) th time, a second update gradient of the network parameter is obtained. Wherein the second update gradient is an update gradient obtained on the basis of the first update gradient.
Step S8063, determining a sum of the first update gradient and the second update gradient as a target update gradient of the network parameter during the N +2 th network update.
And step S8064, correcting the network parameters in the mark preprocessing model, the target identity encoder and the generator by adopting the target updating gradient to obtain a corrected image processing network.
In the embodiment of the application, the updating gradient of the network parameter at the previous time is used as the updating basis of the current network parameter, and the sum of the updating gradients at two times is determined as the target updating gradient of the next parameter updating, so that the network parameter can be updated orderly and directionally by iteration updating, the updated network parameter is gradually close to the real parameter value, and the training accuracy and the training efficiency of the network are improved.
In some embodiments, before the identity feature extraction of the first sample image by the target identity encoder, the method further comprises:
step S81, performing dimension transformation processing on the feature vector corresponding to the first sample mark; and the dimensionality of the feature vector after the dimensionality transformation processing is the same as the vector dimensionality of the adaptive instance normalization layer.
Next, an exemplary application of the embodiment of the present application in a practical application scenario will be described.
The embodiment of the application provides an image processing method, which is a generation method aiming at a human face picture. The embodiment of the application realizes a rapid and effective face reproduction algorithm, and provides an original face picture and a target face picture with a small number of samples.
For example, given an identity picture a and a pose picture B (or an emoticon), embodiments of the present application may control generation of a face of a having an emoticon of B. In addition, the embodiment of the application also supports generation of a face picture directly specifying a specific expression gesture, for example, a user can specify the gesture of the face picture by drawing a face mark by himself, and the embodiment of the application generates a face with an expression gesture a specified by the user.
The embodiment of the application mainly comprises a face mark preprocessing network (Landmark Processor), a Target face Encoder (Target Encoder) and a Generator network (Generator). By inputting a small number of target face pictures a to be edited and pose pictures B of the original face, these pictures are first sent to a face sign preprocessing network, each face picture corresponds to a 1 x 68 x 2 feature vector containing information of eyes, nose, mouth, eyebrows and face shape. The connection result (match) of the target face image and the feature vector thereof is sent to a target face coding network, and the output of the target face coding network is 1 × 512 identity feature vectors subjected to coding. The mark feature vector (corresponding to the posture feature vector) of the original face is used as the input of a generator network, and the identity information of the finally generated picture is strengthened through the adjustment of the identity feature vector in the self-adaptive instance normalization layer. The virtual face picture finally generated in the embodiment of the application modifies the expression information of the target face, but the original identity information of the target face is saved to a great extent.
The embodiment of the application utilizes a meta-learning method to purposefully simulate the face reproduction task of limited sample learning, and improves the reproduction effect under the limitation of a small amount of given samples and calculated amount by enabling the network society to adapt to the limited sample learning. Meanwhile, the embodiment of the application utilizes the strategy of face mark conversion, and considers the inconsistency of the face marks during reproduction, so that the influence brought by identity difference is relieved, and the generated result is more real.
The method and the device for editing the face image can edit the face image with the designated facial expression and the identity protected. The user uploads the identity picture A and the expression picture B to a background server, the background server adopts a specific algorithm to quickly generate a composite image, the composite image has the identity of the A and the expression of the B, and a face image with the identity of the A and the expression of the B is returned to the user. In addition, the user can also upload the identity picture A and the facial expression mark picture edited by the user to the background server, and the background server generates a facial image with the identity of the A and the expression of the B by adopting a specific algorithm and returns the facial image to the user.
The method of the embodiment of the application comprises the steps of preprocessing a face mark of an input picture, generating a face of a countermeasure Network (Conditional access Network) based on a condition, and performing meta-learning strategy. FIG. 9 is a flowchart of an overall framework of an image processing method provided in an embodiment of the present application, and as shown in FIG. 9, a target picture Xt(corresponding to the first image described above) and original picture Xs(corresponding to the second image, as inputOriginal picture XsCan be a plurality of pictures, therefore, the picture in FIG. 9 is taken
Figure BDA00026189848200002015
Is shown in which
Figure BDA0002618984820000201
i represents the ith of K pictures) respectively pass through a face mark preprocessing network (Landmark Processor)901 to obtain respective face marks (landmarks)
Figure BDA0002618984820000202
(wherein,
Figure BDA0002618984820000203
and Ls. Targeting picture X with Target identity Encoder (Target Encoder)902tFace sign
Figure BDA0002618984820000204
Extracting the characteristic of the representative identity information to obtain the identity information characteristic
Figure BDA0002618984820000205
Generator (Generator)903 with original picture XsFace sign L ofsFor input, identity information features obtained with target identity encoder 902 are added through adaptive instance normalization layer (AdalN)9031
Figure BDA0002618984820000206
Adjusting the identity information of the generated face and finally generating a face picture
Figure BDA0002618984820000207
(i.e., a composite image). In generating human face picture
Figure BDA0002618984820000208
Then, the generated face picture is subjected to a Discriminator (Discriminator)904
Figure BDA0002618984820000209
And distinguishing with a calibrated real picture (Gro und judge), so as to correct model parameters in the face mark preprocessing network 901, the target identity encoder 902 and the generator 903 according to a distinguishing result.
When the face mark is preprocessed, the embodiment of the application uses a face mark preprocessing scheme based on face mark decomposition. Fig. 10 is a schematic diagram of an implementation process of Face token preprocessing according to an embodiment of the present application, and as shown in fig. 10, given an input image 1001, a three-dimensional Face token 1002 is obtained by three-dimensional dense plane Alignment (3D dfa, 3D Den Face Alignment), and then normalized to obtain a normalized Face token
Figure BDA00026189848200002016
The normalized face marker can be decomposed by the following formula (1-1):
Figure BDA00026189848200002010
wherein the content of the first and second substances,
Figure BDA00026189848200002011
a mean portion of all face data representing face landmarks;
Figure BDA00026189848200002012
an identity information part representing a face mark;
Figure BDA00026189848200002013
an expression information part representing a face marker. Thus, the goal of face reconstruction is to have a target identity t and an original expression esFace sign
Figure BDA00026189848200002017
Wherein, the human face mark
Figure BDA00026189848200002014
By passing throughThe following formula (1-2) represents:
Figure BDA0002618984820000211
wherein the content of the first and second substances,
Figure BDA0002618984820000212
an identity information part representing a face marker having a target identity t;
Figure BDA0002618984820000213
the representation has an original expression esThe facial marker of (1).
Mean part of all face data
Figure BDA0002618984820000214
Can be obtained by averaging all the data, and the identity information part
Figure BDA0002618984820000215
And an expression information part
Figure BDA0002618984820000216
The prediction is performed by the implementation of the face mask preprocessing shown in fig. 10.
Referring to fig. 10, a Multi-Layer Perceptron (MLP) 1003 inputs feature vectors and normalized face tokens encoded by RGB images through a VGGFace network 1004
Figure BDA0002618984820000217
Subtracting the mean part of the face data
Figure BDA0002618984820000218
Is obtained by ligation of the difference (concatenate). The output of MLP 1003 is the expression information part
Figure BDA0002618984820000219
And identity information part
Figure BDA00026189848200002110
It is based on the mean portion of the face data
Figure BDA00026189848200002111
And the expression information part
Figure BDA00026189848200002112
The difference of (a) is obtained.
In the face generation process based on the condition GAN, the face generation process based on the condition GAN is composed of an object identity encoder E, a generator G and a discriminator D, such as the object identity encoder 902, the generator 903 and the discriminator 904 shown in fig. 9.
Wherein the target identity encoder: the system is used for receiving a target image and a face mark thereof as input, encoding identity information through a series of downsampling convolutional layers, and adopting a feature vector of the last layer of encoding as an identity feature vector.
A generator G: the system is used for receiving the face mark of the original image and the identity characteristic vector obtained by the target identity encoder E as input, and the network structure of the generator G is a codec. The face mark of the received original image can be coded, and the identity information of the image is enhanced and generated through a self-adaptive example normalization layer in the decoding stage. In some embodiments, prior to using the identity feature vector obtained by the target identity encoder E, the MLP is first used to perform a transformation of the feature vector dimensions to accommodate the dimensional requirements of the adaptive instance normalization layer.
A discriminator D: and judging whether the generated image maintains the identity information of the target image and whether the generated image is close to the face mark of the original image or not by adopting the thought of the conditional GAN and inputting the image, the face mark and the identity id.
In the embodiment of the present application, the optimization objective function of the whole network is as the following formula (1-3):
L=LGANCNTLCNTFMLFMMCHLMCH (1-3);
wherein L isRepresenting an objective function; l isGANRepresenting a loss of confrontation; l isCNTRepresents a content reconstruction loss; lambda [ alpha ]CNTA weight representing a content reconstruction loss; l isFMRepresenting a feature matching penalty; lambda [ alpha ]FMA weight representing a feature matching penalty; l isMCHRepresenting a code matching penalty; lambda [ alpha ]MCHRepresenting the weight of the code matching penalty.
Discrimination output result of the discriminator D
Figure BDA0002618984820000227
Expressed by the following formulas (1-4):
Figure BDA0002618984820000221
wherein, D (X)s,Ls,t;θdW) represents the real picture (Ground route) and X with good calibrationsFace sign L ofsPerforming discrimination to obtain a first discrimination result;
Figure BDA0002618984820000222
representing generated face picture
Figure BDA0002618984820000223
And XsFace sign L ofsPerforming a second judgment result obtained by the judgment; xsRepresenting an input original picture; xtRepresenting an input target picture; l issRepresents XsThe face marker of (1); t represents the target identity; thetadParameters representing the entire network; w represents the total set of identity feature vectors.
The meta learning strategy of the embodiment of the present application is explained below.
The embodiment of the application provides a meta-learning strategy, so that the model can adapt to new characters more quickly. Fig. 11 is a schematic diagram of a gradient update process of a meta-learning strategy provided in an embodiment of the present application, where a pre-trained parameter of a given face reconstruction model (where the pre-trained parameter may be θ in the above formula (1-4))d) In the embodiment of the application, the learning problem of few samples is simulated, a few samples of different new characters are taken, and the pre-training model is subjected to parameter updating of specific updating calculated quantity. In the embodiment of the application, the parameters of the model can be updated through a duplicate scheme, so that the updated model can be more quickly adapted to a new human reproduction task given a small number of samples and a specific updating calculation amount. Thus, in practical applications, the user only needs to give a small number of target persons, and the network can realize better face reproduction within a specific update calculation amount (time).
As shown in fig. 11, the pre-trained parameter θ ═ θmgd) The pre-trained parameter θ represents a weight; the solid line in fig. 11 represents the meta learning process; the dashed line in fig. 11 identifies the trimming process.
As shown in the left diagram of FIG. 11, for three different target images 1101, 1102 and 1103, the update gradients of the corresponding pre-trained parameters θ are respectively
Figure BDA0002618984820000224
And
Figure BDA0002618984820000225
and
Figure BDA0002618984820000226
the corresponding target update directions are respectively theta1 *、θ2 *And theta3 *. Here, taking the target image 1101 as an example, the pre-training parameter θ allows the network to be trained better if a small number of target images 1101 are used as the training target. Given several target images 1101 each time, the gradient of θ is updated at the time of training, i.e., the updated gradient is obtained
Figure BDA0002618984820000231
Then using the update gradient
Figure BDA0002618984820000232
Updating the pre-trained parameter theta, and in the updating process, updating the pre-trained parameter thetaWill follow the update direction theta1 *So that the pre-trained parameter theta is adapted to the update direction theta more quickly1 *In the direction of (a).
In the embodiment of the present application, the parameters of the model may be updated by a duplicate scheme, as shown in the right diagram of fig. 11, if the update gradient is calculated by using the MAML algorithm in the related art, the obtained update gradient is g2(ii) a If the update gradient is calculated using a pretraining (Pretrain) algorithm in the related art, the resulting update gradient is g1(ii) a If the update gradient is calculated by using the repeat scheme provided by the embodiment of the application, the gradient is updated for the target image 1101 in the left image of fig. 11
Figure BDA0002618984820000233
The embodiment of the application realizes a flexible face reappearing algorithm, and the method can quickly generate a synthesized face according to the identity picture and the expression picture provided by the user, and also can provide expression information by the user. The method has potential commercial value.
It should be noted that, for each model and device described in the embodiments of the present application, the specific network structure described herein is not limited, and it is within the scope of the embodiments of the present application to implement components in the framework with deep neural networks of other structures. In addition, the embodiments of the present application also do not limit the meta-learning scheme mentioned herein, and other meta-learning schemes are also within the scope of the embodiments of the present application.
Continuing with the exemplary structure of the image processing apparatus 354 implemented as a software module provided in the embodiments of the present application, in some embodiments, as shown in fig. 2, the software module stored in the image processing apparatus 354 of the memory 350 may be an image processing apparatus in the server 300, including:
an obtaining module 3541 configured to obtain a first image and a second image;
an identity feature extraction module 3542, configured to perform identity feature extraction on a first target object in the first image, to obtain an identity feature vector of the first image;
an attitude feature extraction module 3543, configured to perform attitude feature extraction on a second target object in the second image, to obtain an attitude feature vector of the second image;
a processing module 3544 configured to derive an image synthesis vector corresponding to the first image and the second image based on the identity feature vector and the pose feature vector;
a generating module 3545 configured to generate a composite image of the first image and the second image according to the image composite vector; a first target object in the composite image has a pose of the second target object.
In some embodiments, the identity feature extraction module is further configured to: acquiring image data corresponding to at least two first marks in the first target object; normalizing the image data corresponding to the at least two first marks to obtain first normalized data; acquiring an image data mean value of at least two image data; and determining the identity characteristic vector of the first image according to the first normalized data and the image data mean value.
In some embodiments, the identity feature extraction module is further configured to: determining at least two first regions in the first target object; performing image marker preprocessing on the at least two first areas in the first target object to obtain a first marker of each first area; and acquiring the image data corresponding to each first mark.
In some embodiments, the identity feature extraction module is further configured to: determining a difference between the first normalized data and the image data mean as pose information of the first image; and determining the identity characteristic vector of the first image according to the image data mean value and the posture information of the first image.
In some embodiments, the pose feature extraction module is further to: determining image data corresponding to at least two second markers in the second target object; normalizing the image data corresponding to the at least two second marks to obtain second normalized data; acquiring an image data mean value of image data corresponding to at least two second marks; and determining the difference value between the second normalized data and the image data mean value as the attitude feature vector of the second image.
In some embodiments, the pose information comprises expression information;
the attitude feature extraction module is further configured to: determining a facial image of the second target object; determining at least two second regions related to the expression information in the facial image; performing image sign preprocessing on the at least two second areas to obtain a second sign of each second area; and acquiring the image data corresponding to each second mark.
In some embodiments, the normalization processing module is further configured to: normalizing the attitude feature vector to obtain a normalized feature vector; and carrying out full connection processing on the normalized feature vector and the identity feature vector to obtain the image synthesis vector corresponding to the first image and the second image.
In some embodiments, the apparatus further comprises: a processing module to determine a composite image of the first image and the second image using an image processing network;
wherein the image processing network is trained by: inputting a first sample image and a second sample image into the image processing network, and determining a first sample mark of the first sample image and a second sample mark of the second sample image through a mark preprocessing model in the image processing network; extracting the identity feature of the first sample image through a target identity encoder in the image processing network to obtain a sample identity feature vector of the first sample image; performing feature extraction on the second sample mark through a feature extraction layer in the image processing network to obtain a sample attitude feature vector of the second sample image; performing instance normalization processing on the sample identity feature vector and the sample pose feature vector through a generator in the image processing network to obtain sample composite vectors corresponding to the first sample image and the second sample image; inputting the sample synthetic vector into a preset loss model to obtain a loss result; and according to the loss result, correcting the network parameters in the mark preprocessing model, the target identity encoder and the generator to obtain a corrected image processing network.
In some embodiments, the image processing network is trained by: inputting the first sample token into the target identity encoder; and sequentially performing at least one downsampling process and at least one convolution process on the feature vector corresponding to the first sample identifier through at least one downsampling layer and at least one convolution layer in the target identity encoder to obtain a sample identity feature vector of the first sample image.
In some embodiments, the image processing network is trained by: inputting the sample identity feature vector and the sample pose feature vector into the generator; respectively encoding the sample identity characteristic vector and the sample attitude characteristic vector through an encoder in the generator to correspondingly obtain an identity encoding vector and an attitude encoding vector; and carrying out adaptive instance normalization processing on the identity coding vector and the posture coding vector through an adaptive instance normalization layer in the generator to obtain sample composite vectors corresponding to the first sample image and the second sample image.
In some embodiments, the image processing network is trained by: before the identity feature extraction is carried out on the first sample image through the target identity encoder, carrying out dimension transformation processing on a feature vector corresponding to the first sample mark; and the dimension of the feature vector after the dimension transformation processing is the same as the vector dimension of the adaptive example normalization layer.
In some embodiments, the image processing network is trained by: acquiring a first updating gradient of the network parameter when the network is updated for the Nth time; acquiring a second updating gradient of the network parameter when the (N + 1) th network is updated; wherein the second update gradient is an update gradient obtained on the basis of the first update gradient; determining the sum of the first updating gradient and the second updating gradient as a target updating gradient of the network parameter when the network is updated for the (N + 2) th time; and correcting the network parameters in the mark preprocessing model, the target identity encoder and the generator by adopting the target updating gradient to obtain the corrected image processing network.
It should be noted that the description of the apparatus in the embodiment of the present application is similar to the description of the method embodiment, and has similar beneficial effects to the method embodiment, and therefore, the description is not repeated. For technical details not disclosed in the embodiments of the apparatus, reference is made to the description of the embodiments of the method of the present application for understanding.
Embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the method of the embodiment of the present application.
Embodiments of the present application provide a storage medium having stored therein executable instructions, which when executed by a processor, will cause the processor to perform a method provided by embodiments of the present application, for example, the method as illustrated in fig. 3.
In some embodiments, the storage medium may be a computer-readable storage medium, such as a Ferroelectric Random Access Memory (FRAM), a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read Only Memory (EPROM), a charged Erasable Programmable Read Only Memory (EEPROM), a flash Memory, a magnetic surface Memory, an optical disc, or a Compact disc Read Only Memory (CD-ROM), and the like; or may be various devices including one or any combination of the above memories.
In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, and may be stored in a portion of a file that holds other programs or data, such as in one or more scripts in a hypertext Markup Language (H TML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code). By way of example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.
The above description is only an example of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims (15)

1. An image processing method, comprising:
acquiring a first image and a second image;
extracting identity features of a first target object in the first image to obtain an identity feature vector of the first image;
extracting attitude features of a second target object in the second image to obtain an attitude feature vector of the second image;
obtaining an image synthesis vector corresponding to the first image and the second image based on the identity feature vector and the pose feature vector;
generating a composite image of the first image and the second image according to the image composite vector; a first target object in the composite image has a pose of the second target object.
2. The method of claim 1, wherein the extracting the identity feature of the first target object in the first image to obtain the identity feature vector of the first image comprises:
acquiring image data corresponding to at least two first marks in the first target object;
normalizing the image data corresponding to the at least two first marks to obtain first normalized data;
acquiring an image data mean value of at least two image data;
and determining the identity characteristic vector of the first image according to the first normalized data and the image data mean value.
3. The method of claim 2, wherein the obtaining image data corresponding to at least two first markers in the first target object comprises:
determining at least two first regions in the first target object;
performing image marker preprocessing on the at least two first areas in the first target object to obtain a first marker of each first area;
and acquiring the image data corresponding to each first mark.
4. The method of claim 2, wherein determining the identity feature vector of the first image according to the first normalized data and the image data mean comprises:
determining a difference between the first normalized data and the image data mean as pose information of the first image;
and determining the identity characteristic vector of the first image according to the image data mean value and the posture information of the first image.
5. The method of claim 1, wherein the extracting the pose feature of the second target object in the second image to obtain the pose feature vector of the second image comprises:
determining image data corresponding to at least two second markers in the second target object;
normalizing the image data corresponding to the at least two second marks to obtain second normalized data;
acquiring an image data mean value of image data corresponding to at least two second marks;
and determining the difference value between the second normalized data and the image data mean value as the attitude feature vector of the second image.
6. The method of claim 5, wherein determining image data corresponding to at least two second markers in the second target object comprises:
determining a facial image of the second target object;
determining at least two second regions related to expression information in the face image;
performing image sign preprocessing on the at least two second areas to obtain a second sign of each second area;
and acquiring the image data corresponding to each second mark.
7. The method of any one of claims 1 to 6, wherein the deriving an image composite vector corresponding to the first image and the second image based on the identity feature vector and the pose feature vector comprises:
normalizing the attitude feature vector to obtain a normalized feature vector;
and carrying out full connection processing on the normalized feature vector and the identity feature vector to obtain the image synthesis vector corresponding to the first image and the second image.
8. The method according to any one of claims 1 to 6, further comprising: determining a composite image of the first image and the second image using an image processing network;
wherein the image processing network is trained by:
inputting a first sample image and a second sample image into the image processing network, and determining a first sample mark of the first sample image and a second sample mark of the second sample image through a mark preprocessing model in the image processing network;
extracting the identity feature of the first sample image through a target identity encoder in the image processing network to obtain a sample identity feature vector of the first sample image;
performing feature extraction on the second sample mark through a feature extraction layer in the image processing network to obtain a sample attitude feature vector of the second sample image;
performing instance normalization processing on the sample identity feature vector and the sample pose feature vector through a generator in the image processing network to obtain sample composite vectors corresponding to the first sample image and the second sample image;
inputting the sample synthetic vector into a preset loss model to obtain a loss result;
and according to the loss result, correcting the network parameters in the mark preprocessing model, the target identity encoder and the generator to obtain a corrected image processing network.
9. The method according to claim 8, wherein the performing, by a target identity encoder in the image processing network, identity feature extraction on the first sample image to obtain a sample identity feature vector of the first sample image comprises:
inputting the first sample token into the target identity encoder;
and sequentially performing at least one downsampling process and at least one convolution process on the feature vector corresponding to the first sample identifier through at least one downsampling layer and at least one convolution layer in the target identity encoder to obtain the sample identity feature vector of the first sample image.
10. The method of claim 8, wherein performing, by a generator in the image processing network, an instance normalization on the sample identity feature vector and the sample pose feature vector to obtain a sample composite vector corresponding to the first sample image and the second sample image comprises:
inputting the sample identity feature vector and the sample pose feature vector into the generator;
respectively encoding the sample identity characteristic vector and the sample attitude characteristic vector through an encoder in the generator to correspondingly obtain an identity encoding vector and an attitude encoding vector;
and carrying out adaptive instance normalization processing on the identity coding vector and the posture coding vector through an adaptive instance normalization layer in the generator to obtain sample composite vectors corresponding to the first sample image and the second sample image.
11. The method of claim 10, further comprising:
prior to identity feature extraction of the first sample image by the target identity encoder,
performing dimension transformation processing on the feature vector corresponding to the first sample mark; and the dimension of the feature vector after the dimension transformation processing is the same as the vector dimension of the adaptive example normalization layer.
12. The method of claim 8, wherein modifying network parameters in the token pre-processing model, the object identity encoder, and the generator according to the loss result to obtain a modified image processing network comprises:
acquiring a first updating gradient of the network parameter when the network is updated for the Nth time;
acquiring a second updating gradient of the network parameter when the (N + 1) th network is updated; wherein the second update gradient is an update gradient obtained on the basis of the first update gradient;
determining the sum of the first updating gradient and the second updating gradient as a target updating gradient of the network parameter when the network is updated for the (N + 2) th time;
and correcting the network parameters in the mark preprocessing model, the target identity encoder and the generator by adopting the target updating gradient to obtain the corrected image processing network.
13. An image processing apparatus characterized by comprising:
the acquisition module is used for acquiring a first image and a second image;
the identity feature extraction module is used for extracting identity features of a first target object in the first image to obtain an identity feature vector of the first image;
the attitude feature extraction module is used for extracting the attitude feature of a second target object in the second image to obtain an attitude feature vector of the second image;
a processing module configured to obtain an image synthesis vector corresponding to the first image and the second image based on the identity feature vector and the pose feature vector;
a generating module, configured to generate a composite image of the first image and the second image according to the image composite vector; a first target object in the composite image has a pose of the second target object.
14. An image processing apparatus characterized by comprising:
a memory for storing executable instructions; a processor for implementing the method of any one of claims 1 to 12 when executing executable instructions stored in the memory.
15. A computer-readable storage medium having stored thereon executable instructions for causing a processor to perform the method of any one of claims 1 to 12 when the executable instructions are executed.
CN202010777489.9A 2020-08-05 2020-08-05 Image processing method, device, equipment and computer readable storage medium Pending CN112116684A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010777489.9A CN112116684A (en) 2020-08-05 2020-08-05 Image processing method, device, equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010777489.9A CN112116684A (en) 2020-08-05 2020-08-05 Image processing method, device, equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN112116684A true CN112116684A (en) 2020-12-22

Family

ID=73799190

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010777489.9A Pending CN112116684A (en) 2020-08-05 2020-08-05 Image processing method, device, equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN112116684A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112764649A (en) * 2021-01-29 2021-05-07 北京字节跳动网络技术有限公司 Method, device and equipment for generating virtual image and storage medium
CN112819689A (en) * 2021-02-02 2021-05-18 百果园技术(新加坡)有限公司 Training method of face attribute editing model, face attribute editing method and equipment
CN112991152A (en) * 2021-03-04 2021-06-18 网易(杭州)网络有限公司 Image processing method and device, electronic equipment and storage medium
CN114120412A (en) * 2021-11-29 2022-03-01 北京百度网讯科技有限公司 Image processing method and device
WO2023241427A1 (en) * 2022-06-17 2023-12-21 北京字跳网络技术有限公司 Image processing method and apparatus, device, and storage medium
WO2024023902A1 (en) * 2022-07-25 2024-02-01 日本電信電話株式会社 Information processing device, motion transfer method, and program

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112764649A (en) * 2021-01-29 2021-05-07 北京字节跳动网络技术有限公司 Method, device and equipment for generating virtual image and storage medium
CN112819689A (en) * 2021-02-02 2021-05-18 百果园技术(新加坡)有限公司 Training method of face attribute editing model, face attribute editing method and equipment
CN112991152A (en) * 2021-03-04 2021-06-18 网易(杭州)网络有限公司 Image processing method and device, electronic equipment and storage medium
CN114120412A (en) * 2021-11-29 2022-03-01 北京百度网讯科技有限公司 Image processing method and device
CN114120412B (en) * 2021-11-29 2022-12-09 北京百度网讯科技有限公司 Image processing method and device
WO2023241427A1 (en) * 2022-06-17 2023-12-21 北京字跳网络技术有限公司 Image processing method and apparatus, device, and storage medium
WO2024023902A1 (en) * 2022-07-25 2024-02-01 日本電信電話株式会社 Information processing device, motion transfer method, and program

Similar Documents

Publication Publication Date Title
CN111709409B (en) Face living body detection method, device, equipment and medium
CN112766244B (en) Target object detection method and device, computer equipment and storage medium
CN112116684A (en) Image processing method, device, equipment and computer readable storage medium
CN110909651B (en) Method, device and equipment for identifying video main body characters and readable storage medium
CN110785767B (en) Compact linguistics-free facial expression embedding and novel triple training scheme
CN111401216B (en) Image processing method, model training method, image processing device, model training device, computer equipment and storage medium
CN109657533A (en) Pedestrian recognition methods and Related product again
CN108780519A (en) Structure learning in convolutional neural networks
CN113449700B (en) Training of video classification model, video classification method, device, equipment and medium
CN115588224B (en) Virtual digital person generation method and device based on face key point prediction
CN111553267A (en) Image processing method, image processing model training method and device
CN110796593A (en) Image processing method, device, medium and electronic equipment based on artificial intelligence
CN113705290A (en) Image processing method, image processing device, computer equipment and storage medium
US20230095182A1 (en) Method and apparatus for extracting biological features, device, medium, and program product
CN114283351A (en) Video scene segmentation method, device, equipment and computer readable storage medium
CN115661246A (en) Attitude estimation method based on self-supervision learning
Yu et al. A video-based facial motion tracking and expression recognition system
Skubic et al. Qualitative analysis of sketched route maps: translating a sketch into linguistic descriptions
CN115565238A (en) Face-changing model training method, face-changing model training device, face-changing model training apparatus, storage medium, and program product
CN114707589A (en) Method, device, storage medium, equipment and program product for generating countermeasure sample
CN114037046A (en) Distillation method and device of neural network model and electronic system
CN111008622B (en) Image object detection method and device and computer readable storage medium
CN113516142A (en) Text image matching method, device, equipment and storage medium
CN110956599A (en) Picture processing method and device, storage medium and electronic device
CN117011449A (en) Reconstruction method and device of three-dimensional face model, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination