CN112668517A

CN112668517A - Picture processing method and device and electronic equipment

Info

Publication number: CN112668517A
Application number: CN202011639117.6A
Authority: CN
Inventors: 葛璞; 李玉乐; 项伟
Original assignee: Bigo Technology Pte Ltd
Current assignee: Bigo Technology Pte Ltd
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2021-04-16

Abstract

The embodiment of the invention provides a picture processing method and device and electronic equipment, and relates to the technical field of motion migration. The method comprises the following steps: acquiring appearance characteristics and first action characteristics of a first picture, and acquiring second action characteristics of a second picture; according to the appearance feature, the first action feature and the second action feature, obtaining displacement information from the first action feature to the second action feature and guiding information for guiding feature fusion of the appearance feature and the second action feature; and performing feature fusion on the appearance feature and the second action feature according to the displacement information and the guide information to generate a target picture. By the aid of the scheme, the lost motion characteristics in the characteristic fusion process can be supplemented, and the problem of characteristic loss is avoided.

Description

Picture processing method and device and electronic equipment

Technical Field

The present invention relates to the field of motion migration technologies, and in particular, to a method and an apparatus for processing an image, and an electronic device.

Background

The motion migration is an important technology in the field of computer vision, and is widely applied to a plurality of fields such as movie making, virtual fitting, picture editing and the like. A source picture and a corresponding human body action thereof and a target human body action can be given, and the action migration task is to move the source picture to the target action and maintain the identity characteristics of the source picture. In the existing action migration task, in some complex action scenes (such as leg raising, body self-shielding, hand raising and the like), natural and real pictures are difficult to generate, and even the situation of limb loss occurs.

Disclosure of Invention

The invention provides a picture processing method, a picture processing device and electronic equipment, which are used for solving the problem that partial features are easy to lose in the existing action migration technology to a certain extent.

In a first aspect of the present invention, there is provided a picture processing method, including:

acquiring appearance characteristics and first action characteristics of a first picture, and acquiring second action characteristics of a second picture;

according to the appearance feature, the first action feature and the second action feature, obtaining displacement information from the first action feature to the second action feature and guiding information for guiding feature fusion of the appearance feature and the second action feature;

and performing feature fusion on the appearance feature and the second action feature according to the displacement information and the guide information to generate a target picture.

In a second aspect of the present invention, there is provided a picture processing apparatus, comprising:

the first acquisition module is used for acquiring the appearance characteristic and the first action characteristic of the first picture and acquiring the second action characteristic of the second picture;

a second obtaining module, configured to obtain, according to the appearance feature, the first action feature, and the second action feature, displacement information from the first action feature to the second action feature, and guidance information for guiding feature fusion between the appearance feature and the second action feature;

and the fusion module is used for performing feature fusion on the appearance features and the second action features according to the displacement information and the guidance information to generate a target picture.

In a third aspect of the present invention, there is also provided an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;

a memory for storing a computer program;

and the processor is used for realizing the steps in the image processing method when the program stored in the memory is executed.

In a fourth aspect implemented by the present invention, there is also provided a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the picture processing method as described above.

In a fifth aspect of the embodiments of the present invention, there is also provided a computer program product containing instructions, which when run on a computer, causes the computer to execute the picture processing method as described above.

Aiming at the prior art, the invention has the following advantages:

in the embodiment of the invention, through the acquired appearance characteristic and the first action characteristic of the first picture and the second action characteristic of the second picture, the displacement information from the first action characteristic to the second action characteristic and the guide information for guiding the appearance characteristic and the second action characteristic to carry out characteristic fusion can be obtained, the first action characteristic can be converted into the corresponding position of the second action characteristic through the displacement information, and the lost action characteristic in the characteristic fusion process can be supplemented through the guide information to generate the target picture with complete limbs, so that the problem of limb loss is avoided.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments will be briefly described below.

Fig. 1 is a flowchart of a picture processing method according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of an action migration model according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a limb attention network according to an embodiment of the present invention;

FIG. 4 is a block diagram of a picture processing apparatus according to an embodiment of the present invention;

fig. 5 is a block diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that embodiments of the application may be practiced in sequences other than those illustrated or described herein, and that the terms "first," "second," and the like are generally used herein in a generic sense and do not limit the number of terms, e.g., the first term can be one or more than one. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.

Currently, the existing action migration technology may have the following methods:

the method based on appearance and action information decomposition comprises the following steps: the method respectively encodes appearance information and action information through an encoder, then connects the two information, and generates a target picture through a decoder, and the method does not align source action and target action, and appearance information is difficult to encode into a feature vector, so that the generated effect is not real enough.

The method based on the deformation network structure comprises the following steps: the method establishes the corresponding relation between the origin action and the target action by utilizing the action structure information, and deforms the source picture through the deformation network structure so as to generate the target picture, but the method is difficult to process some complex action scenes.

Optical flow-based methods: the method learns the optical flow information of a source action and a target action through an optical flow network, deforms a source picture to a corresponding position of the target action along the optical flow, and combines target action characteristics to generate a picture of the target action.

Therefore, the embodiment of the invention provides an image processing method, an image processing device and electronic equipment.

Specifically, as shown in fig. 1, an embodiment of the present invention provides a picture processing method, where the method specifically includes:

step 101, obtaining an appearance feature and a first action feature of a first picture, and obtaining a second action feature of a second picture.

In particular, the method can be applied to a motion migration model, as shown in fig. 2, which includes an appearance feature coding network 21, which is a downsampled convolutional coding network; will be firstA picture I_sThe source picture is input to the appearance feature coding network 21 as a source picture, and the first picture I is subjected to a downsampling convolution process (i.e., a process of reducing the height and width of the picture and increasing the number of channels)_sEncoding and outputting an appearance characteristic diagram F_rAccording to the appearance characteristic diagram F_rThe first picture I can be obtained_sThe appearance characteristic of (1), the appearance characteristic map F_rThe appearance feature map F is a feature map containing three dimensions of height, width and channel number_rIncludes a first picture I_sAppearance (e.g., clothes, skin tone, hair, etc.) information.

Also, as shown in FIG. 2, the motion migration model may further include a motion feature encoding network 22, which is a downsampled convolutional encoding network; the first picture I may be detected using an action bone point detection network_sAnd the second picture P_tThe first picture I is taken as a key point of the human body_sThe key points (such as head, shoulder, neck, etc.) of the human body are input into the motion characteristic coding network 22, and the first picture I is subjected to a downsampling convolution process_sThe first action characteristic graph can be obtained according to the first action characteristic graph, and the first action characteristic graph contains basic skeleton information of the first action. In the same way, the second picture P_tThe key points (such as head, shoulder, neck, etc.) of the human body are input into the motion feature coding network 22, and the second picture P is subjected to a downsampling convolution process_tThe motion in (1) is encoded and a second motion characteristic diagram F is output_pAccording to a second motion profile F_pThen a second motion characteristic can be obtained, and the second motion characteristic graph F_pContains basic skeleton information of the second action.

It should be noted that the second picture may be a single picture or a frame picture in a video, and is not limited herein.

102, obtaining displacement information from the first action characteristic to the second action characteristic and guiding information for guiding feature fusion of the appearance characteristic and the second action characteristic according to the appearance characteristic, the first action characteristic and the second action characteristic.

Specifically, as shown in fig. 2, the motion migration model may further include an optical flow network 23, where the optical flow network 23 is a coding-decoding neural network, and the displacement information f from the first motion feature to the second motion feature may be obtained through a coding-decoding process by using the appearance feature, the human body key point of the first motion feature, and the human body key point of the second motion feature as inputs.

Further, the displacement information may be an optical flow vector including three dimensions of height, width, and channel number. Such as: the number of channels is 2, i.e. two coordinates x and y.

Moreover, the motion migration model may further include a limb attention network 24, which is a coding-decoding neural network, and through the coding-decoding process, a piece of guidance information M may be obtained, where the guidance information M is used to guide feature fusion between the appearance feature of the first picture and the second motion feature of the second picture, so as to avoid the problem of motion feature loss in the fusion process.

Further, the instructional information may be a limb attention weight mask that includes both height and width dimensions.

And 103, performing feature fusion on the appearance features and the second action features according to the displacement information and the guidance information to generate a target picture.

Specifically, as shown in fig. 2, the motion migration model may further include a decoding network 25, which is an up-sampling convolution decoding network structure, and the network output is a fused target picture I_gen. The first motion characteristic can be converted into the corresponding position of the second motion characteristic through the displacement information f, and the motion characteristic lost in the characteristic fusion process can be supplemented through the guide information M to generate the target picture I with the complete limbs_genThereby avoiding the problems of limb loss and the like.

It should be noted that the application scenarios of the above method are as follows: the method can be used in various applications or products needing action migration, such as virtual fitting, movie making, dance video generation and the like; specifically, the following functions can be provided:

function one: a source picture (namely a first picture) and a target action picture (namely a second picture) are given, the source picture is migrated according to the action (namely a second action characteristic) of the target action picture to generate the target picture, the generated target picture has identity information (characteristics such as human faces and clothes) of the first picture, and action information of the target action (second action characteristic) is maintained.

And a second function: given an action sequence of a segment of a first video (namely, a second picture is a frame picture, and a plurality of frame pictures are combined into the first video) and a source picture (namely, the first picture), the source picture is replaced to each frame picture (namely, the second picture) of a target video in a function one mode, so that the target video keeping the identity information of the source picture (namely, the target picture is a target frame picture, and a plurality of target frame pictures are combined into the target video) is generated.

In the above embodiment of the present invention, through the obtained appearance feature and the first motion feature of the first picture and the second motion feature of the second picture, displacement information from the first motion feature to the second motion feature and guidance information for guiding the appearance feature and the second motion feature to perform feature fusion can be obtained, the first motion feature can be converted into a corresponding position of the second motion feature through the displacement information, and the motion feature lost in the feature fusion process can be supplemented through the guidance information to generate the target picture with complete limbs, so as to avoid the problem of limb loss.

Optionally, the step 102, obtaining, according to the appearance feature, the first action feature, and the second action feature, displacement information from the first action feature to the second action feature, may specifically include:

obtaining optical flow characteristics by coding the appearance characteristics, the first action characteristics and the second action characteristics;

and obtaining displacement information from the first motion characteristic to the second motion characteristic by decoding the optical flow characteristic.

Specifically, as shown in fig. 2, the appearance feature, the human key points of the first motion feature, and the human key points of the second motion feature are input into the optical flow network 23, and the optical flow feature F corresponding to the optical flow network 23 can be obtained through an encoding process_f(ii) a Characterizing the optical flow F_fBy means of decoding, displacement information f from the first action to the second action, namely optical flow vectors of the first action characteristic to the second action characteristic can be obtained.

Optionally, the step 102, obtaining guidance information for guiding the feature fusion of the appearance feature and the second action feature according to the appearance feature, the first action feature and the second action feature, may specifically include:

step A1, obtaining limb characteristics by coding the first action characteristics and the second action characteristics;

step A2, obtaining guiding information for guiding feature fusion of the appearance feature and the second action feature according to the optical flow feature and the limb feature.

Specifically, as shown in fig. 2, the human body key points of the first motion characteristics and the human body key points of the second motion characteristics are input into the limb attention network 24, and the limb characteristics F are obtained through a coding process_JThe characteristics of the limbs F_JContains the limb structure information; in the decoding process, the optical flow characteristics F are passed_fAnd limb characteristics F_JObtaining guidance information M for guiding the appearance characteristics and the second action characteristics to perform characteristic fusion, namely obtaining a limb attention weight mask for guiding the appearance characteristics and the second action characteristics to perform characteristic fusion, wherein the limb attention weight mask is used for indicating whether optical flow vectors are missing or not, and positioning the missing relation to each specific position of the second action characteristics, so as to perform characteristic information supplementation and generate a target picture I of limb completion_gen。

Optionally, the step a2, according to the optical flow features and the limb features, acquires guidance information for guiding feature fusion between the appearance features and the second motion features, which may specifically include:

and step B1, connecting the first channel number of the optical flow characteristics and the second channel number of the limb characteristics to obtain the limb weight.

Specifically, as shown in FIG. 3, during the decoding process, the optical flow feature F corresponding to the optical flow network_fFeatures of the limbs F_JA limb attention network is added in between, and the optical flow characteristic F can be obtained through the limb attention network_fAnd the limb characteristics F_JIs connected to obtain the weight of the limb a_JThe weight of the limb a_JIs a weight of a limb that includes two dimensions, height and width.

For example, the process of obtaining the weight of the limb through the limb attention network is as follows:

characterizing the optical flow F_fTo the limbs F_JAnd carrying out vector convolution operation conv, then carrying out Linear rectification function (RecU) operation, then carrying out conv operation, and finally carrying out softmax function operation, thereby obtaining the limb weight.

And step B2, acquiring guidance information for guiding the feature fusion of the appearance feature and the second action feature according to the limb weight and the optical flow feature.

Further, in the step B2, the acquiring, according to the limb weight and the optical flow feature, guidance information for guiding feature fusion between the appearance feature and the second motion feature may specifically include:

performing point multiplication on the limb weight and the optical flow characteristic to obtain a limb attention characteristic;

and connecting the third channel number of the limb attention feature with the second channel number of the limb feature, and obtaining guide information for guiding feature fusion of the appearance feature and the second action feature through an up-sampling convolution process.

Specifically, as shown in fig. 2 and 3, by weighting the limb with the weight a_JAnd the optical flow characteristic F_fThe feature of the attention of the limbs can be obtained by dot multiplication

The method comprises the following specific steps:

wherein the content of the first and second substances,

indicating a limb attention feature;

F_frepresenting optical flow features;

a_Jrepresenting a limb weight;

indicating a dot product.

And, to characterize the limb attention

And the third number of channels and the limb characteristics F_JThe second channel number of the first channel number is connected, an up-sampling convolution process is carried out, and guide information M for guiding the appearance characteristic and the second action characteristic to carry out characteristic fusion can be obtained, namely a limb attention weight mask for guiding the appearance characteristic and the second action characteristic to carry out characteristic fusion is obtained, wherein the coordinate value of each position can be a value between 0 and 1 and is used for guiding the characteristic fusion, the lost action characteristic in the characteristic fusion process can be supplemented, and a target picture I with complete limbs is generated_genThereby avoiding the problems of limb loss and the like.

Optionally, the step 103 performs feature fusion on the appearance feature and the second action feature according to the displacement information and the guidance information to generate a target picture, which may specifically include:

and step C1, performing coordinate deformation on the appearance characteristics through the displacement information to obtain deformation characteristics.

Specifically, as shown in fig. 2, in the decoding network 25, the appearance feature is first deformed coordinate by the displacement information F (i.e., optical flow vector) to obtain a deformed feature F_warp。

And step C2, performing feature fusion on the deformation feature and the second action feature through the guide information to obtain a fusion feature.

Specifically, as shown in FIG. 2, the deformation characteristic F is set_warpFusing the second action characteristic with the guide information M (namely, the limb attention weight mask) to obtain a fused characteristic F after fusion_fuse。

Further, in the step C2, the performing feature fusion on the deformation feature and the second action feature through the guidance information to obtain a fusion feature specifically may include:

performing point multiplication processing on the guide information and the deformation characteristic to obtain a first target characteristic;

performing dot product processing on a second value obtained by subtracting the guidance information from the first value and the second action characteristic to obtain a second target characteristic;

and calculating the sum of the first target characteristic and the second target characteristic to obtain a fusion characteristic.

The fusion characteristics can be obtained through the fusion process, the limb attention weight mask is added in the fusion process, the deformation characteristics and the second action characteristics can be fused under the guidance of the limb attention weight mask, and the loss of the characteristics in the fusion process can be avoided. And the step of acquiring the first target feature and the step of acquiring the second target feature are not limited in sequence. For example: in the case that the first value is 1, the fusion characteristic can be obtained by the following formula:

wherein, F_fuseRepresenting a fusion feature;

m denotes the coaching information, namely the limb attention weight mask;

F_warprepresenting deformation characteristics;

F_prepresenting a second motion characteristic;

indicating a dot product.

And step C3, generating a target picture by the fused features through an up-sampling convolution process.

Specifically, the fusion features are subjected to an upsampling convolution process through an upsampling convolution neural network, and are restored to a target picture, and the target picture has the appearance features of the first picture and the second action features of the second picture.

In the training process of the motion migration model, two pictures (I) of different motions of the same person can be taken_s，I_t) And corresponding action characteristics (P)_s，P_t) Generating a picture I after the movement migration through the model_gen(ii) a The training process is as follows:

the first step is as follows: training discriminator

Firstly, I is_tAnd I_genCalculating the countermeasure loss through a discriminator; then solving the gradient and updating the weight of the discriminator.

The second step is that: training generator

First through I_tAnd I_genCalculating the reconstruction loss and the countermeasure loss; then adding I_tAnd I_genVGG (I) was obtained using a convolutional neural network (VGG)_t) And vgg (I)_gen) And calculating the perception loss. Then, the target face position is obtained through the face key points, and face loss (face perception loss and face reconstruction loss) is calculated. Then, adding I_sAnd I_tVGG (I) is obtained by using VGG network_s) And vgg (I)_t) Use (vgg (I)_s)，vgg(I_t) And f), calculating the optical flow loss. Finally, the gradient is solved, and the weight of the generator is updated. Where f is the optical flow vector.

And thirdly, alternately repeating the first step and the second step until the action migration model converges.

Specifically, the loss functions in the three steps are used for providing guidance for a training model process, and parameters of the model are continuously optimized in the training process so that the loss functions are reduced, and therefore the ability of action migration is learned. The data required to complete each training consists of two pictures of different actions of the same person (I)_s，I_t) And corresponding action characteristics (P)_s，P_t) The input of the model is (I)_s，P_s，P_t) The output is I_gen，I_tIs I_genThe corresponding true value. The loss function is a joint loss function, and the main components are as follows:

loss of reconstruction L_rec: let the picture I generated_genAnd the real value picture I_tClose at the pixel level, represented as:

L_rec＝||I_gen-I_t||₁

wherein the above formula represents the reconstruction loss function L_recIs I_genAnd I_tThe absolute value of the difference of (a).

Loss of perception L_perc: let generate picture I_genAnd real value picture I_tClose on a feature level. By using a trained VGG neural network to respectively aim at I_genAnd I_tFeatures are extracted and then the distance between two features is calculated, expressed as:

L_per＝||vgg(I_gen)-vgg(I_t)||₁

wherein VGG (X) represents the X characteristic extracted by using the VGG neural network, and X is I_genOr I_t；

Against loss L_GAN: the generated picture can be more real and natural by resisting the loss.

Optical flow loss L_flow: source picture I by utilizing trained VGG neural network_sAnd real value picture I_tExtracting features, respectively obtaining the featuresFIG. vgg (I)_s) And vgg (I)_t) Make vgg (I)_s) Vgg (I) and a feature map obtained by performing a pixel-by-pixel deformation along the optical flow vector f_t) Similarly, the loss function measures similarity by cosine distance, which can be expressed as

Wherein φ () is a process of deformation of the feature map along the optical flow vector;

cos (×) is the cosine distance;

vgg(I_t)^lis a feature map vgg (I)_t) A value at the l coordinate position;

and N is the total number of the coordinate positions of the characteristic diagram.

Face loss L_face: the face region is determined through the face key points in the target action information, reconstruction loss and perception loss are independently increased for the face region, and meanwhile, an independent discriminator of the face region is added.

L_face＝||face(I_gen)-face(I_t)||₁+||vgg(face(I_gen))-vgg(face(I_t))||₁

Wherein, face (#) represents a face region.

In summary, the joint loss function can be expressed as

L_total＝L_rec+L_prec+L_GAN+L_flow+L_face

The testing process comprises the following steps: if the third step of training converges, that is, the training is completed, the motion migration model may be used to perform motion migration operation on the input first picture and the input second picture, so as to generate the target picture.

In summary, in the embodiment of the present invention, by extracting the appearance feature of the first picture, the first motion feature of the first picture, and the second motion feature of the second picture, an optical flow vector from the first motion feature to the second motion feature is obtained according to the optical flow network, and a limb attention weight mask is obtained through the limb attention network; the appearance features are deformed along the light stream vector to obtain deformation features, the deformation features and the second action features are fused under the guidance of a limb attention weight mask, the fusion features are decoded through a decoding network to generate a target picture after action migration, partial features lost in the fusion process can be supplemented, the generation of the target picture with complete limbs is ensured, and the target picture is more real and natural; and moreover, the addition of the face loss in the model training process can generate a real face picture for the input low-definition picture through the action migration of the model.

As shown in fig. 4, an image processing apparatus 400 according to an embodiment of the present invention includes:

a first obtaining module 401, configured to obtain an appearance feature and a first action feature of a first picture, and obtain a second action feature of a second picture;

a second obtaining module 402, configured to obtain, according to the appearance feature, the first action feature, and the second action feature, displacement information from the first action feature to the second action feature, and guidance information for guiding feature fusion between the appearance feature and the second action feature;

and a fusion module 403, configured to perform feature fusion on the appearance feature and the second action feature according to the displacement information and the guidance information, so as to generate a target picture.

Optionally, the second obtaining module 402 includes:

a first encoding unit, configured to obtain an optical flow feature by encoding the appearance feature, the first motion feature, and the second motion feature;

and the first decoding unit is used for obtaining the displacement information from the first motion characteristic to the second motion characteristic by decoding the optical flow characteristic.

Optionally, the second obtaining module 402 further includes:

the second coding unit is used for obtaining the limb characteristics by coding the first action characteristics and the second action characteristics;

and the acquisition unit is used for acquiring guide information for guiding the appearance feature and the second action feature to perform feature fusion according to the optical flow feature and the limb feature.

Optionally, the obtaining unit includes:

the connection subunit is configured to connect the first channel number of the optical flow feature and the second channel number of the limb feature to obtain a limb weight;

and the acquisition subunit is used for acquiring guidance information for guiding the feature fusion of the appearance feature and the second action feature according to the limb weight and the optical flow feature.

Optionally, the obtaining subunit includes:

Optionally, the fusion module 403 includes:

the deformation unit is used for carrying out coordinate deformation on the appearance characteristics through the displacement information to obtain deformation characteristics;

the fusion unit is used for performing feature fusion on the deformation feature and the second action feature through the guide information to obtain a fusion feature;

and the generating unit is used for generating the target picture by the fusion characteristics through an up-sampling convolution process.

Optionally, the fusion unit includes:

the first processing subunit is used for performing dot product processing on the guidance information and the deformation characteristic to obtain a first target characteristic;

the second processing subunit is configured to perform dot product processing on a second numerical value obtained by subtracting the guidance information from the first numerical value and the second action characteristic to obtain a second target characteristic;

and the calculating subunit is used for calculating the sum of the first target feature and the second target feature to obtain a fusion feature.

Optionally, the displacement information is an optical flow vector including three dimensions of height, width, and channel number.

Optionally, the guidance information is a limb attention weight mask comprising two dimensions of height and width.

It should be noted that the embodiment of the image processing apparatus is an apparatus corresponding to the above-mentioned image processing method, and all implementation manners of the embodiment of the method are applicable to the embodiment of the apparatus, and can achieve the same technical effect, which is not described herein again.

The embodiment of the invention also provides the electronic equipment. As shown in fig. 5, the system comprises a processor 501, a communication interface 502, a memory 503 and a communication bus 504, wherein the processor 501, the communication interface 502 and the memory 503 are communicated with each other through the communication bus 504.

The memory 503 stores a computer program.

The processor 501 is configured to implement part or all of the steps of the image processing method provided by the embodiment of the present invention when executing the program stored in the memory 503.

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the terminal and other equipment.

The Memory may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.

In another embodiment provided by the present invention, a computer-readable storage medium is further provided, in which instructions are stored, and when the instructions are executed on a computer, the computer is caused to execute the picture processing method described in the above embodiment.

In yet another embodiment of the present invention, a computer program product containing instructions is also provided, which when run on a computer, causes the computer to execute the picture processing method described in the above embodiments.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims

1. A picture processing method, characterized in that the method comprises:

2. The method according to claim 1, wherein obtaining displacement information of the first action feature to the second action feature according to the appearance feature, the first action feature and the second action feature comprises:

3. The method according to claim 2, wherein obtaining guidance information for guiding feature fusion of the appearance feature and the second action feature according to the appearance feature, the first action feature and the second action feature comprises:

obtaining the limb characteristics by coding the first action characteristics and the second action characteristics;

and acquiring guidance information for guiding the appearance feature and the second action feature to perform feature fusion according to the optical flow feature and the limb feature.

4. The method according to claim 3, wherein the obtaining guidance information for guiding feature fusion of the appearance feature and the second motion feature according to the optical flow feature and the limb feature comprises:

connecting the first channel number of the optical flow features and the second channel number of the limb features to obtain limb weight;

and acquiring guidance information for guiding the feature fusion of the appearance feature and the second action feature according to the limb weight and the optical flow feature.

5. The method according to claim 4, wherein the obtaining guidance information for guiding feature fusion of the appearance feature and the second motion feature according to the limb weight and the optical flow feature comprises:

6. The method according to claim 1, wherein the performing feature fusion on the appearance feature and the second motion feature according to the displacement information and the guidance information to generate a target picture comprises:

performing coordinate deformation on the appearance characteristic through the displacement information to obtain a deformation characteristic;

performing feature fusion on the deformation feature and the second action feature through the guide information to obtain a fusion feature;

and generating a target picture by the fusion characteristics through an up-sampling convolution process.

7. The method according to claim 6, wherein the feature fusing the deformation feature and the second action feature through the guidance information to obtain a fused feature comprises:

8. The method of claim 1, wherein the displacement information is an optical flow vector comprising three dimensions of height, width, and number of channels.

9. The method of claim 1, wherein the instructional information is a limb attention weight mask comprising two dimensions, height and width.

10. A picture processing apparatus, characterized in that the apparatus comprises:

11. An electronic device, comprising: a processor, a communication interface, a memory, and a communication bus; the processor, the communication interface and the memory complete mutual communication through a communication bus;

a memory for storing a computer program;

a processor for implementing the steps of the picture processing method according to any one of claims 1 to 9 when executing the program stored in the memory.

12. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a picture processing method according to any one of claims 1 to 9.