CN112668517A - Picture processing method and device and electronic equipment - Google Patents

Picture processing method and device and electronic equipment Download PDF

Info

Publication number
CN112668517A
CN112668517A CN202011639117.6A CN202011639117A CN112668517A CN 112668517 A CN112668517 A CN 112668517A CN 202011639117 A CN202011639117 A CN 202011639117A CN 112668517 A CN112668517 A CN 112668517A
Authority
CN
China
Prior art keywords
feature
action
appearance
characteristic
picture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011639117.6A
Other languages
Chinese (zh)
Inventor
葛璞
李玉乐
项伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bigo Technology Pte Ltd
Original Assignee
Bigo Technology Pte Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bigo Technology Pte Ltd filed Critical Bigo Technology Pte Ltd
Priority to CN202011639117.6A priority Critical patent/CN112668517A/en
Publication of CN112668517A publication Critical patent/CN112668517A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The embodiment of the invention provides a picture processing method and device and electronic equipment, and relates to the technical field of motion migration. The method comprises the following steps: acquiring appearance characteristics and first action characteristics of a first picture, and acquiring second action characteristics of a second picture; according to the appearance feature, the first action feature and the second action feature, obtaining displacement information from the first action feature to the second action feature and guiding information for guiding feature fusion of the appearance feature and the second action feature; and performing feature fusion on the appearance feature and the second action feature according to the displacement information and the guide information to generate a target picture. By the aid of the scheme, the lost motion characteristics in the characteristic fusion process can be supplemented, and the problem of characteristic loss is avoided.

Description

Picture processing method and device and electronic equipment
Technical Field
The present invention relates to the field of motion migration technologies, and in particular, to a method and an apparatus for processing an image, and an electronic device.
Background
The motion migration is an important technology in the field of computer vision, and is widely applied to a plurality of fields such as movie making, virtual fitting, picture editing and the like. A source picture and a corresponding human body action thereof and a target human body action can be given, and the action migration task is to move the source picture to the target action and maintain the identity characteristics of the source picture. In the existing action migration task, in some complex action scenes (such as leg raising, body self-shielding, hand raising and the like), natural and real pictures are difficult to generate, and even the situation of limb loss occurs.
Disclosure of Invention
The invention provides a picture processing method, a picture processing device and electronic equipment, which are used for solving the problem that partial features are easy to lose in the existing action migration technology to a certain extent.
In a first aspect of the present invention, there is provided a picture processing method, including:
acquiring appearance characteristics and first action characteristics of a first picture, and acquiring second action characteristics of a second picture;
according to the appearance feature, the first action feature and the second action feature, obtaining displacement information from the first action feature to the second action feature and guiding information for guiding feature fusion of the appearance feature and the second action feature;
and performing feature fusion on the appearance feature and the second action feature according to the displacement information and the guide information to generate a target picture.
In a second aspect of the present invention, there is provided a picture processing apparatus, comprising:
the first acquisition module is used for acquiring the appearance characteristic and the first action characteristic of the first picture and acquiring the second action characteristic of the second picture;
a second obtaining module, configured to obtain, according to the appearance feature, the first action feature, and the second action feature, displacement information from the first action feature to the second action feature, and guidance information for guiding feature fusion between the appearance feature and the second action feature;
and the fusion module is used for performing feature fusion on the appearance features and the second action features according to the displacement information and the guidance information to generate a target picture.
In a third aspect of the present invention, there is also provided an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;
a memory for storing a computer program;
and the processor is used for realizing the steps in the image processing method when the program stored in the memory is executed.
In a fourth aspect implemented by the present invention, there is also provided a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the picture processing method as described above.
In a fifth aspect of the embodiments of the present invention, there is also provided a computer program product containing instructions, which when run on a computer, causes the computer to execute the picture processing method as described above.
Aiming at the prior art, the invention has the following advantages:
in the embodiment of the invention, through the acquired appearance characteristic and the first action characteristic of the first picture and the second action characteristic of the second picture, the displacement information from the first action characteristic to the second action characteristic and the guide information for guiding the appearance characteristic and the second action characteristic to carry out characteristic fusion can be obtained, the first action characteristic can be converted into the corresponding position of the second action characteristic through the displacement information, and the lost action characteristic in the characteristic fusion process can be supplemented through the guide information to generate the target picture with complete limbs, so that the problem of limb loss is avoided.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments will be briefly described below.
Fig. 1 is a flowchart of a picture processing method according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of an action migration model according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a limb attention network according to an embodiment of the present invention;
FIG. 4 is a block diagram of a picture processing apparatus according to an embodiment of the present invention;
fig. 5 is a block diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that embodiments of the application may be practiced in sequences other than those illustrated or described herein, and that the terms "first," "second," and the like are generally used herein in a generic sense and do not limit the number of terms, e.g., the first term can be one or more than one. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.
Currently, the existing action migration technology may have the following methods:
the method based on appearance and action information decomposition comprises the following steps: the method respectively encodes appearance information and action information through an encoder, then connects the two information, and generates a target picture through a decoder, and the method does not align source action and target action, and appearance information is difficult to encode into a feature vector, so that the generated effect is not real enough.
The method based on the deformation network structure comprises the following steps: the method establishes the corresponding relation between the origin action and the target action by utilizing the action structure information, and deforms the source picture through the deformation network structure so as to generate the target picture, but the method is difficult to process some complex action scenes.
Optical flow-based methods: the method learns the optical flow information of a source action and a target action through an optical flow network, deforms a source picture to a corresponding position of the target action along the optical flow, and combines target action characteristics to generate a picture of the target action.
Therefore, the embodiment of the invention provides an image processing method, an image processing device and electronic equipment.
Specifically, as shown in fig. 1, an embodiment of the present invention provides a picture processing method, where the method specifically includes:
step 101, obtaining an appearance feature and a first action feature of a first picture, and obtaining a second action feature of a second picture.
In particular, the method can be applied to a motion migration model, as shown in fig. 2, which includes an appearance feature coding network 21, which is a downsampled convolutional coding network; will be firstA picture IsThe source picture is input to the appearance feature coding network 21 as a source picture, and the first picture I is subjected to a downsampling convolution process (i.e., a process of reducing the height and width of the picture and increasing the number of channels)sEncoding and outputting an appearance characteristic diagram FrAccording to the appearance characteristic diagram FrThe first picture I can be obtainedsThe appearance characteristic of (1), the appearance characteristic map FrThe appearance feature map F is a feature map containing three dimensions of height, width and channel numberrIncludes a first picture IsAppearance (e.g., clothes, skin tone, hair, etc.) information.
Also, as shown in FIG. 2, the motion migration model may further include a motion feature encoding network 22, which is a downsampled convolutional encoding network; the first picture I may be detected using an action bone point detection networksAnd the second picture PtThe first picture I is taken as a key point of the human bodysThe key points (such as head, shoulder, neck, etc.) of the human body are input into the motion characteristic coding network 22, and the first picture I is subjected to a downsampling convolution processsThe first action characteristic graph can be obtained according to the first action characteristic graph, and the first action characteristic graph contains basic skeleton information of the first action. In the same way, the second picture PtThe key points (such as head, shoulder, neck, etc.) of the human body are input into the motion feature coding network 22, and the second picture P is subjected to a downsampling convolution processtThe motion in (1) is encoded and a second motion characteristic diagram F is outputpAccording to a second motion profile FpThen a second motion characteristic can be obtained, and the second motion characteristic graph FpContains basic skeleton information of the second action.
It should be noted that the second picture may be a single picture or a frame picture in a video, and is not limited herein.
102, obtaining displacement information from the first action characteristic to the second action characteristic and guiding information for guiding feature fusion of the appearance characteristic and the second action characteristic according to the appearance characteristic, the first action characteristic and the second action characteristic.
Specifically, as shown in fig. 2, the motion migration model may further include an optical flow network 23, where the optical flow network 23 is a coding-decoding neural network, and the displacement information f from the first motion feature to the second motion feature may be obtained through a coding-decoding process by using the appearance feature, the human body key point of the first motion feature, and the human body key point of the second motion feature as inputs.
Further, the displacement information may be an optical flow vector including three dimensions of height, width, and channel number. Such as: the number of channels is 2, i.e. two coordinates x and y.
Moreover, the motion migration model may further include a limb attention network 24, which is a coding-decoding neural network, and through the coding-decoding process, a piece of guidance information M may be obtained, where the guidance information M is used to guide feature fusion between the appearance feature of the first picture and the second motion feature of the second picture, so as to avoid the problem of motion feature loss in the fusion process.
Further, the instructional information may be a limb attention weight mask that includes both height and width dimensions.
And 103, performing feature fusion on the appearance features and the second action features according to the displacement information and the guidance information to generate a target picture.
Specifically, as shown in fig. 2, the motion migration model may further include a decoding network 25, which is an up-sampling convolution decoding network structure, and the network output is a fused target picture Igen. The first motion characteristic can be converted into the corresponding position of the second motion characteristic through the displacement information f, and the motion characteristic lost in the characteristic fusion process can be supplemented through the guide information M to generate the target picture I with the complete limbsgenThereby avoiding the problems of limb loss and the like.
It should be noted that the application scenarios of the above method are as follows: the method can be used in various applications or products needing action migration, such as virtual fitting, movie making, dance video generation and the like; specifically, the following functions can be provided:
function one: a source picture (namely a first picture) and a target action picture (namely a second picture) are given, the source picture is migrated according to the action (namely a second action characteristic) of the target action picture to generate the target picture, the generated target picture has identity information (characteristics such as human faces and clothes) of the first picture, and action information of the target action (second action characteristic) is maintained.
And a second function: given an action sequence of a segment of a first video (namely, a second picture is a frame picture, and a plurality of frame pictures are combined into the first video) and a source picture (namely, the first picture), the source picture is replaced to each frame picture (namely, the second picture) of a target video in a function one mode, so that the target video keeping the identity information of the source picture (namely, the target picture is a target frame picture, and a plurality of target frame pictures are combined into the target video) is generated.
In the above embodiment of the present invention, through the obtained appearance feature and the first motion feature of the first picture and the second motion feature of the second picture, displacement information from the first motion feature to the second motion feature and guidance information for guiding the appearance feature and the second motion feature to perform feature fusion can be obtained, the first motion feature can be converted into a corresponding position of the second motion feature through the displacement information, and the motion feature lost in the feature fusion process can be supplemented through the guidance information to generate the target picture with complete limbs, so as to avoid the problem of limb loss.
Optionally, the step 102, obtaining, according to the appearance feature, the first action feature, and the second action feature, displacement information from the first action feature to the second action feature, may specifically include:
obtaining optical flow characteristics by coding the appearance characteristics, the first action characteristics and the second action characteristics;
and obtaining displacement information from the first motion characteristic to the second motion characteristic by decoding the optical flow characteristic.
Specifically, as shown in fig. 2, the appearance feature, the human key points of the first motion feature, and the human key points of the second motion feature are input into the optical flow network 23, and the optical flow feature F corresponding to the optical flow network 23 can be obtained through an encoding processf(ii) a Characterizing the optical flow FfBy means of decoding, displacement information f from the first action to the second action, namely optical flow vectors of the first action characteristic to the second action characteristic can be obtained.
Optionally, the step 102, obtaining guidance information for guiding the feature fusion of the appearance feature and the second action feature according to the appearance feature, the first action feature and the second action feature, may specifically include:
step A1, obtaining limb characteristics by coding the first action characteristics and the second action characteristics;
step A2, obtaining guiding information for guiding feature fusion of the appearance feature and the second action feature according to the optical flow feature and the limb feature.
Specifically, as shown in fig. 2, the human body key points of the first motion characteristics and the human body key points of the second motion characteristics are input into the limb attention network 24, and the limb characteristics F are obtained through a coding processJThe characteristics of the limbs FJContains the limb structure information; in the decoding process, the optical flow characteristics F are passedfAnd limb characteristics FJObtaining guidance information M for guiding the appearance characteristics and the second action characteristics to perform characteristic fusion, namely obtaining a limb attention weight mask for guiding the appearance characteristics and the second action characteristics to perform characteristic fusion, wherein the limb attention weight mask is used for indicating whether optical flow vectors are missing or not, and positioning the missing relation to each specific position of the second action characteristics, so as to perform characteristic information supplementation and generate a target picture I of limb completiongen
Optionally, the step a2, according to the optical flow features and the limb features, acquires guidance information for guiding feature fusion between the appearance features and the second motion features, which may specifically include:
and step B1, connecting the first channel number of the optical flow characteristics and the second channel number of the limb characteristics to obtain the limb weight.
Specifically, as shown in FIG. 3, during the decoding process, the optical flow feature F corresponding to the optical flow networkfFeatures of the limbs FJA limb attention network is added in between, and the optical flow characteristic F can be obtained through the limb attention networkfAnd the limb characteristics FJIs connected to obtain the weight of the limb aJThe weight of the limb aJIs a weight of a limb that includes two dimensions, height and width.
For example, the process of obtaining the weight of the limb through the limb attention network is as follows:
characterizing the optical flow FfTo the limbs FJAnd carrying out vector convolution operation conv, then carrying out Linear rectification function (RecU) operation, then carrying out conv operation, and finally carrying out softmax function operation, thereby obtaining the limb weight.
And step B2, acquiring guidance information for guiding the feature fusion of the appearance feature and the second action feature according to the limb weight and the optical flow feature.
Further, in the step B2, the acquiring, according to the limb weight and the optical flow feature, guidance information for guiding feature fusion between the appearance feature and the second motion feature may specifically include:
performing point multiplication on the limb weight and the optical flow characteristic to obtain a limb attention characteristic;
and connecting the third channel number of the limb attention feature with the second channel number of the limb feature, and obtaining guide information for guiding feature fusion of the appearance feature and the second action feature through an up-sampling convolution process.
Specifically, as shown in fig. 2 and 3, by weighting the limb with the weight aJAnd the optical flow characteristic FfThe feature of the attention of the limbs can be obtained by dot multiplication
Figure BDA0002877775300000081
The method comprises the following specific steps:
Figure BDA0002877775300000082
wherein the content of the first and second substances,
Figure BDA0002877775300000083
indicating a limb attention feature;
Ffrepresenting optical flow features;
aJrepresenting a limb weight;
Figure BDA0002877775300000084
indicating a dot product.
And, to characterize the limb attention
Figure BDA0002877775300000085
And the third number of channels and the limb characteristics FJThe second channel number of the first channel number is connected, an up-sampling convolution process is carried out, and guide information M for guiding the appearance characteristic and the second action characteristic to carry out characteristic fusion can be obtained, namely a limb attention weight mask for guiding the appearance characteristic and the second action characteristic to carry out characteristic fusion is obtained, wherein the coordinate value of each position can be a value between 0 and 1 and is used for guiding the characteristic fusion, the lost action characteristic in the characteristic fusion process can be supplemented, and a target picture I with complete limbs is generatedgenThereby avoiding the problems of limb loss and the like.
Optionally, the step 103 performs feature fusion on the appearance feature and the second action feature according to the displacement information and the guidance information to generate a target picture, which may specifically include:
and step C1, performing coordinate deformation on the appearance characteristics through the displacement information to obtain deformation characteristics.
Specifically, as shown in fig. 2, in the decoding network 25, the appearance feature is first deformed coordinate by the displacement information F (i.e., optical flow vector) to obtain a deformed feature Fwarp
And step C2, performing feature fusion on the deformation feature and the second action feature through the guide information to obtain a fusion feature.
Specifically, as shown in FIG. 2, the deformation characteristic F is setwarpFusing the second action characteristic with the guide information M (namely, the limb attention weight mask) to obtain a fused characteristic F after fusionfuse
Further, in the step C2, the performing feature fusion on the deformation feature and the second action feature through the guidance information to obtain a fusion feature specifically may include:
performing point multiplication processing on the guide information and the deformation characteristic to obtain a first target characteristic;
performing dot product processing on a second value obtained by subtracting the guidance information from the first value and the second action characteristic to obtain a second target characteristic;
and calculating the sum of the first target characteristic and the second target characteristic to obtain a fusion characteristic.
The fusion characteristics can be obtained through the fusion process, the limb attention weight mask is added in the fusion process, the deformation characteristics and the second action characteristics can be fused under the guidance of the limb attention weight mask, and the loss of the characteristics in the fusion process can be avoided. And the step of acquiring the first target feature and the step of acquiring the second target feature are not limited in sequence. For example: in the case that the first value is 1, the fusion characteristic can be obtained by the following formula:
Figure BDA0002877775300000091
wherein, FfuseRepresenting a fusion feature;
m denotes the coaching information, namely the limb attention weight mask;
Fwarprepresenting deformation characteristics;
Fprepresenting a second motion characteristic;
Figure BDA0002877775300000092
indicating a dot product.
And step C3, generating a target picture by the fused features through an up-sampling convolution process.
Specifically, the fusion features are subjected to an upsampling convolution process through an upsampling convolution neural network, and are restored to a target picture, and the target picture has the appearance features of the first picture and the second action features of the second picture.
In the training process of the motion migration model, two pictures (I) of different motions of the same person can be takens,It) And corresponding action characteristics (P)s,Pt) Generating a picture I after the movement migration through the modelgen(ii) a The training process is as follows:
the first step is as follows: training discriminator
Firstly, I istAnd IgenCalculating the countermeasure loss through a discriminator; then solving the gradient and updating the weight of the discriminator.
The second step is that: training generator
First through ItAnd IgenCalculating the reconstruction loss and the countermeasure loss; then adding ItAnd IgenVGG (I) was obtained using a convolutional neural network (VGG)t) And vgg (I)gen) And calculating the perception loss. Then, the target face position is obtained through the face key points, and face loss (face perception loss and face reconstruction loss) is calculated. Then, adding IsAnd ItVGG (I) is obtained by using VGG networks) And vgg (I)t) Use (vgg (I)s),vgg(It) And f), calculating the optical flow loss. Finally, the gradient is solved, and the weight of the generator is updated. Where f is the optical flow vector.
And thirdly, alternately repeating the first step and the second step until the action migration model converges.
Specifically, the loss functions in the three steps are used for providing guidance for a training model process, and parameters of the model are continuously optimized in the training process so that the loss functions are reduced, and therefore the ability of action migration is learned. The data required to complete each training consists of two pictures of different actions of the same person (I)s,It) And corresponding action characteristics (P)s,Pt) The input of the model is (I)s,Ps,Pt) The output is Igen,ItIs IgenThe corresponding true value. The loss function is a joint loss function, and the main components are as follows:
loss of reconstruction Lrec: let the picture I generatedgenAnd the real value picture ItClose at the pixel level, represented as:
Lrec=||Igen-It||1
wherein the above formula represents the reconstruction loss function LrecIs IgenAnd ItThe absolute value of the difference of (a).
Loss of perception Lperc: let generate picture IgenAnd real value picture ItClose on a feature level. By using a trained VGG neural network to respectively aim at IgenAnd ItFeatures are extracted and then the distance between two features is calculated, expressed as:
Lper=||vgg(Igen)-vgg(It)||1
wherein VGG (X) represents the X characteristic extracted by using the VGG neural network, and X is IgenOr It
Against loss LGAN: the generated picture can be more real and natural by resisting the loss.
Optical flow loss Lflow: source picture I by utilizing trained VGG neural networksAnd real value picture ItExtracting features, respectively obtaining the featuresFIG. vgg (I)s) And vgg (I)t) Make vgg (I)s) Vgg (I) and a feature map obtained by performing a pixel-by-pixel deformation along the optical flow vector ft) Similarly, the loss function measures similarity by cosine distance, which can be expressed as
Figure BDA0002877775300000101
Wherein φ () is a process of deformation of the feature map along the optical flow vector;
cos (×) is the cosine distance;
vgg(It)lis a feature map vgg (I)t) A value at the l coordinate position;
and N is the total number of the coordinate positions of the characteristic diagram.
Face loss Lface: the face region is determined through the face key points in the target action information, reconstruction loss and perception loss are independently increased for the face region, and meanwhile, an independent discriminator of the face region is added.
Lface=||face(Igen)-face(It)||1+||vgg(face(Igen))-vgg(face(It))||1
Wherein, face (#) represents a face region.
In summary, the joint loss function can be expressed as
Ltotal=Lrec+Lprec+LGAN+Lflow+Lface
The testing process comprises the following steps: if the third step of training converges, that is, the training is completed, the motion migration model may be used to perform motion migration operation on the input first picture and the input second picture, so as to generate the target picture.
In summary, in the embodiment of the present invention, by extracting the appearance feature of the first picture, the first motion feature of the first picture, and the second motion feature of the second picture, an optical flow vector from the first motion feature to the second motion feature is obtained according to the optical flow network, and a limb attention weight mask is obtained through the limb attention network; the appearance features are deformed along the light stream vector to obtain deformation features, the deformation features and the second action features are fused under the guidance of a limb attention weight mask, the fusion features are decoded through a decoding network to generate a target picture after action migration, partial features lost in the fusion process can be supplemented, the generation of the target picture with complete limbs is ensured, and the target picture is more real and natural; and moreover, the addition of the face loss in the model training process can generate a real face picture for the input low-definition picture through the action migration of the model.
As shown in fig. 4, an image processing apparatus 400 according to an embodiment of the present invention includes:
a first obtaining module 401, configured to obtain an appearance feature and a first action feature of a first picture, and obtain a second action feature of a second picture;
a second obtaining module 402, configured to obtain, according to the appearance feature, the first action feature, and the second action feature, displacement information from the first action feature to the second action feature, and guidance information for guiding feature fusion between the appearance feature and the second action feature;
and a fusion module 403, configured to perform feature fusion on the appearance feature and the second action feature according to the displacement information and the guidance information, so as to generate a target picture.
Optionally, the second obtaining module 402 includes:
a first encoding unit, configured to obtain an optical flow feature by encoding the appearance feature, the first motion feature, and the second motion feature;
and the first decoding unit is used for obtaining the displacement information from the first motion characteristic to the second motion characteristic by decoding the optical flow characteristic.
Optionally, the second obtaining module 402 further includes:
the second coding unit is used for obtaining the limb characteristics by coding the first action characteristics and the second action characteristics;
and the acquisition unit is used for acquiring guide information for guiding the appearance feature and the second action feature to perform feature fusion according to the optical flow feature and the limb feature.
Optionally, the obtaining unit includes:
the connection subunit is configured to connect the first channel number of the optical flow feature and the second channel number of the limb feature to obtain a limb weight;
and the acquisition subunit is used for acquiring guidance information for guiding the feature fusion of the appearance feature and the second action feature according to the limb weight and the optical flow feature.
Optionally, the obtaining subunit includes:
performing point multiplication on the limb weight and the optical flow characteristic to obtain a limb attention characteristic;
and connecting the third channel number of the limb attention feature with the second channel number of the limb feature, and obtaining guide information for guiding feature fusion of the appearance feature and the second action feature through an up-sampling convolution process.
Optionally, the fusion module 403 includes:
the deformation unit is used for carrying out coordinate deformation on the appearance characteristics through the displacement information to obtain deformation characteristics;
the fusion unit is used for performing feature fusion on the deformation feature and the second action feature through the guide information to obtain a fusion feature;
and the generating unit is used for generating the target picture by the fusion characteristics through an up-sampling convolution process.
Optionally, the fusion unit includes:
the first processing subunit is used for performing dot product processing on the guidance information and the deformation characteristic to obtain a first target characteristic;
the second processing subunit is configured to perform dot product processing on a second numerical value obtained by subtracting the guidance information from the first numerical value and the second action characteristic to obtain a second target characteristic;
and the calculating subunit is used for calculating the sum of the first target feature and the second target feature to obtain a fusion feature.
Optionally, the displacement information is an optical flow vector including three dimensions of height, width, and channel number.
Optionally, the guidance information is a limb attention weight mask comprising two dimensions of height and width.
It should be noted that the embodiment of the image processing apparatus is an apparatus corresponding to the above-mentioned image processing method, and all implementation manners of the embodiment of the method are applicable to the embodiment of the apparatus, and can achieve the same technical effect, which is not described herein again.
In summary, in the embodiment of the present invention, by extracting the appearance feature of the first picture, the first motion feature of the first picture, and the second motion feature of the second picture, an optical flow vector from the first motion feature to the second motion feature is obtained according to the optical flow network, and a limb attention weight mask is obtained through the limb attention network; the appearance features are deformed along the light stream vector to obtain deformation features, the deformation features and the second action features are fused under the guidance of a limb attention weight mask, the fusion features are decoded through a decoding network to generate a target picture after action migration, partial features lost in the fusion process can be supplemented, the generation of the target picture with complete limbs is ensured, and the target picture is more real and natural; and moreover, the addition of the face loss in the model training process can generate a real face picture for the input low-definition picture through the action migration of the model.
The embodiment of the invention also provides the electronic equipment. As shown in fig. 5, the system comprises a processor 501, a communication interface 502, a memory 503 and a communication bus 504, wherein the processor 501, the communication interface 502 and the memory 503 are communicated with each other through the communication bus 504.
The memory 503 stores a computer program.
The processor 501 is configured to implement part or all of the steps of the image processing method provided by the embodiment of the present invention when executing the program stored in the memory 503.
The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the terminal and other equipment.
The Memory may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.
In another embodiment provided by the present invention, a computer-readable storage medium is further provided, in which instructions are stored, and when the instructions are executed on a computer, the computer is caused to execute the picture processing method described in the above embodiment.
In yet another embodiment of the present invention, a computer program product containing instructions is also provided, which when run on a computer, causes the computer to execute the picture processing method described in the above embodiments.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims (12)

1. A picture processing method, characterized in that the method comprises:
acquiring appearance characteristics and first action characteristics of a first picture, and acquiring second action characteristics of a second picture;
according to the appearance feature, the first action feature and the second action feature, obtaining displacement information from the first action feature to the second action feature and guiding information for guiding feature fusion of the appearance feature and the second action feature;
and performing feature fusion on the appearance feature and the second action feature according to the displacement information and the guide information to generate a target picture.
2. The method according to claim 1, wherein obtaining displacement information of the first action feature to the second action feature according to the appearance feature, the first action feature and the second action feature comprises:
obtaining optical flow characteristics by coding the appearance characteristics, the first action characteristics and the second action characteristics;
and obtaining displacement information from the first motion characteristic to the second motion characteristic by decoding the optical flow characteristic.
3. The method according to claim 2, wherein obtaining guidance information for guiding feature fusion of the appearance feature and the second action feature according to the appearance feature, the first action feature and the second action feature comprises:
obtaining the limb characteristics by coding the first action characteristics and the second action characteristics;
and acquiring guidance information for guiding the appearance feature and the second action feature to perform feature fusion according to the optical flow feature and the limb feature.
4. The method according to claim 3, wherein the obtaining guidance information for guiding feature fusion of the appearance feature and the second motion feature according to the optical flow feature and the limb feature comprises:
connecting the first channel number of the optical flow features and the second channel number of the limb features to obtain limb weight;
and acquiring guidance information for guiding the feature fusion of the appearance feature and the second action feature according to the limb weight and the optical flow feature.
5. The method according to claim 4, wherein the obtaining guidance information for guiding feature fusion of the appearance feature and the second motion feature according to the limb weight and the optical flow feature comprises:
performing point multiplication on the limb weight and the optical flow characteristic to obtain a limb attention characteristic;
and connecting the third channel number of the limb attention feature with the second channel number of the limb feature, and obtaining guide information for guiding feature fusion of the appearance feature and the second action feature through an up-sampling convolution process.
6. The method according to claim 1, wherein the performing feature fusion on the appearance feature and the second motion feature according to the displacement information and the guidance information to generate a target picture comprises:
performing coordinate deformation on the appearance characteristic through the displacement information to obtain a deformation characteristic;
performing feature fusion on the deformation feature and the second action feature through the guide information to obtain a fusion feature;
and generating a target picture by the fusion characteristics through an up-sampling convolution process.
7. The method according to claim 6, wherein the feature fusing the deformation feature and the second action feature through the guidance information to obtain a fused feature comprises:
performing point multiplication processing on the guide information and the deformation characteristic to obtain a first target characteristic;
performing dot product processing on a second value obtained by subtracting the guidance information from the first value and the second action characteristic to obtain a second target characteristic;
and calculating the sum of the first target characteristic and the second target characteristic to obtain a fusion characteristic.
8. The method of claim 1, wherein the displacement information is an optical flow vector comprising three dimensions of height, width, and number of channels.
9. The method of claim 1, wherein the instructional information is a limb attention weight mask comprising two dimensions, height and width.
10. A picture processing apparatus, characterized in that the apparatus comprises:
the first acquisition module is used for acquiring the appearance characteristic and the first action characteristic of the first picture and acquiring the second action characteristic of the second picture;
a second obtaining module, configured to obtain, according to the appearance feature, the first action feature, and the second action feature, displacement information from the first action feature to the second action feature, and guidance information for guiding feature fusion between the appearance feature and the second action feature;
and the fusion module is used for performing feature fusion on the appearance features and the second action features according to the displacement information and the guidance information to generate a target picture.
11. An electronic device, comprising: a processor, a communication interface, a memory, and a communication bus; the processor, the communication interface and the memory complete mutual communication through a communication bus;
a memory for storing a computer program;
a processor for implementing the steps of the picture processing method according to any one of claims 1 to 9 when executing the program stored in the memory.
12. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a picture processing method according to any one of claims 1 to 9.
CN202011639117.6A 2020-12-31 2020-12-31 Picture processing method and device and electronic equipment Pending CN112668517A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011639117.6A CN112668517A (en) 2020-12-31 2020-12-31 Picture processing method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011639117.6A CN112668517A (en) 2020-12-31 2020-12-31 Picture processing method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN112668517A true CN112668517A (en) 2021-04-16

Family

ID=75413833

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011639117.6A Pending CN112668517A (en) 2020-12-31 2020-12-31 Picture processing method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN112668517A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114862666A (en) * 2022-06-22 2022-08-05 阿里巴巴达摩院(杭州)科技有限公司 Image conversion system, method, storage medium and electronic device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114862666A (en) * 2022-06-22 2022-08-05 阿里巴巴达摩院(杭州)科技有限公司 Image conversion system, method, storage medium and electronic device
CN114862666B (en) * 2022-06-22 2022-10-04 阿里巴巴达摩院(杭州)科技有限公司 Image conversion system, method, storage medium and electronic device

Similar Documents

Publication Publication Date Title
Luo et al. 3d human motion estimation via motion compression and refinement
Chen et al. Progressive semantic-aware style transformation for blind face restoration
Wang et al. A state-of-the-art review on image synthesis with generative adversarial networks
KR102081854B1 (en) Method and apparatus for sign language or gesture recognition using 3D EDM
US11276231B2 (en) Semantic deep face models
Si et al. Multistage adversarial losses for pose-based human image synthesis
CN110599395B (en) Target image generation method, device, server and storage medium
CN110096156B (en) Virtual reloading method based on 2D image
CN112733795B (en) Method, device and equipment for correcting sight of face image and storage medium
CN112767554B (en) Point cloud completion method, device, equipment and storage medium
CN112733797B (en) Method, device and equipment for correcting sight of face image and storage medium
CN112233012B (en) Face generation system and method
Madadi et al. SMPLR: Deep learning based SMPL reverse for 3D human pose and shape recovery
CN111553267A (en) Image processing method, image processing model training method and device
CN112131985A (en) Real-time light human body posture estimation method based on OpenPose improvement
CN110188667B (en) Face rectification method based on three-party confrontation generation network
CN111462274A (en) Human body image synthesis method and system based on SMP L model
CN114187165A (en) Image processing method and device
CN115346000A (en) Three-dimensional human body reconstruction method and device, computer readable medium and electronic equipment
Li et al. Image-guided human reconstruction via multi-scale graph transformation networks
CN112668517A (en) Picture processing method and device and electronic equipment
CN113822114A (en) Image processing method, related equipment and computer readable storage medium
AU2022241513B2 (en) Transformer-based shape models
CN115578298A (en) Depth portrait video synthesis method based on content perception
Fang et al. Self-enhanced convolutional network for facial video hallucination

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination