CN115588031A

CN115588031A - Method and device for generating character moving image

Info

Publication number: CN115588031A
Application number: CN202211293093.2A
Authority: CN
Inventors: 刘武; 梅涛; 刘鑫辰; 杨光
Original assignee: Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2022-10-21
Filing date: 2022-10-21
Publication date: 2023-01-10

Abstract

The invention discloses a method and a device for generating a character moving image, and relates to the technical field of computers. One embodiment of the method comprises: acquiring a character image and a plurality of first action images, wherein the plurality of first action images are continuous frames, and acquiring a second action image corresponding to each first action image according to the character image, the plurality of first action images and a basic generation model; determining a frequency domain time sequence difference function of the first action image and the second action image, and constructing a loss function containing frequency domain constraint according to the frequency domain time sequence difference function; and updating the basic generation model according to the loss function containing the frequency domain constraint to obtain the character motion image generation model. The implementation mode reduces the calculation complexity and the storage consumption, avoids additional errors introduced by other complex neural network models, improves the time sequence consistency of the generated character moving images, and improves the quality of the images.

Description

Method and device for generating character moving image

Technical Field

The invention relates to the technical field of computers, in particular to a method and a device for generating a character moving image.

Background

The character moving image generation means that by specifying a character and a given video, a continuous image in which the character performs an action in the action reference video is generated.

In the prior art, a generation model obtained by training is utilized, images of a designated character image under a corresponding reference action are generated frame by frame aiming at each frame action in a reference action video, and then the images are taken as continuous frames to synthesize a moving image of the designated character image.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method and an apparatus for generating a character moving image, which can improve timing consistency of consecutive frame images, improve quality of a single frame image, reduce computational complexity and storage consumption, and effectively avoid additional errors introduced by other complex neural network models.

To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a method of training a character moving image generation model, including:

acquiring a character image and a plurality of first action images, wherein the plurality of first action images are continuous frames, and acquiring a second action image corresponding to each first action image according to the character image, the plurality of first action images and a basic generation model;

determining a frequency domain time sequence difference function of the first action image and the second action image, and constructing a loss function containing frequency domain constraint according to the frequency domain time sequence difference function;

and updating the basic generation model according to the loss function containing the frequency domain constraint to obtain a character motion image generation model.

Optionally, determining a frequency domain timing difference function of the first motion image and the second motion image comprises:

determining a first magnitude spectrum and a first phase spectrum of each first motion image in a frequency domain; determining a first time sequence amplitude difference of two first action images corresponding to adjacent frames according to the first amplitude spectrum; determining a first time sequence phase difference of two first action images corresponding to adjacent frames according to the first phase spectrum;

determining a second magnitude spectrum and a second phase spectrum of each second motion image in a frequency domain; determining a second time sequence amplitude difference of two second action images corresponding to adjacent frames according to the second amplitude spectrum; determining a second time sequence phase difference of two second action images corresponding to adjacent frames according to the second phase spectrum;

determining the timing amplitude difference function according to the first timing amplitude difference and the second timing amplitude difference; determining the timing difference function according to the first timing phase difference and the second timing phase difference;

and determining the frequency domain time sequence difference function according to the time sequence amplitude difference function and the time sequence difference function.

Optionally, determining a first timing amplitude difference between two first motion images corresponding to adjacent frames according to the first amplitude spectrum includes:

determining an amplitude value at the t moment of each coordinate in a first amplitude spectrum of a first action image corresponding to a current frame, wherein the t moment is the moment corresponding to the current frame;

determining the amplitude value of t-1 moment at each coordinate in a first amplitude spectrum of a first action image corresponding to the previous frame;

the absolute value or the square of the difference between the amplitude value at time t and the amplitude value at time t-1 at each coordinate is taken as the first time-series amplitude difference at time t.

Optionally, determining a first time-sequence phase difference between two first motion images corresponding to adjacent frames according to the first phase spectrum includes:

determining a phase value at t moment of each coordinate in a first phase spectrum of a first action image corresponding to a current frame, wherein the t moment is a moment corresponding to the current frame;

determining a phase value at t-1 moment at each coordinate in a first phase spectrum of a first action image corresponding to a previous frame;

the absolute value or the square of the difference between the phase value at time t and the phase value at time t-1 at each coordinate is taken as the first timing phase difference at time t.

Optionally, determining the timing amplitude difference function according to the first timing amplitude difference and the second timing amplitude difference includes:

determining the timing amplitude difference function according to an absolute value or a square of a difference between the first timing amplitude difference and the second timing amplitude difference.

Optionally, determining the timing phase difference function according to the first timing phase difference and the second timing phase difference includes:

determining the timing phase difference function according to an absolute value or a square of a difference of the first timing phase difference and the second timing phase difference.

Optionally, the loss function further includes a pixel domain constraint, and constructing a loss function including a frequency domain constraint according to the frequency domain timing difference function includes:

and taking the frequency domain time sequence difference function as the frequency domain constraint, and performing weighted summation on the pixel domain constraint and the frequency domain constraint to determine the loss function.

According to a second aspect of an embodiment of the present invention, there is provided a method of generating a character moving image, including:

acquiring a target character image and a plurality of action images, wherein the action images are continuous frames;

inputting the target character image and the plurality of motion images into a character moving image generation model, obtaining a character moving image including the target character image corresponding to each of the motion images,

the character moving image generation model is obtained according to the training method of the character moving image generation model of the embodiment of the invention.

According to a third aspect of the embodiments of the present invention, there is provided a training apparatus for a character moving image generation model, including:

the first acquisition module is used for acquiring a character image and a plurality of first action images, wherein the plurality of first action images are continuous frames, and a second action image corresponding to each first action image is acquired according to the character image, the plurality of first action images and a basic generation model;

the determining module is used for determining a frequency domain time sequence difference function of the first action image and the second action image and constructing a loss function containing frequency domain constraint according to the frequency domain time sequence difference function;

and the training module is used for updating the basic generation model according to the loss function containing the frequency domain constraint to obtain a character motion image generation model.

According to a fourth aspect of the embodiments of the present invention, there is provided a character moving image generation apparatus including:

the second acquisition module is used for acquiring a target character image and a plurality of action images, wherein the action images are continuous frames;

a generating module that inputs the target character image and the plurality of motion images into a character moving image generation model to obtain a character motion image including the target character image corresponding to each of the motion images,

According to another aspect of an embodiment of the present invention, there is provided an electronic device including:

one or more processors;

a storage device to store one or more programs,

when the one or more programs are executed by the one or more processors, the one or more processors implement the method for training a character moving image generation model or the method for generating a character moving image according to the present invention.

According to still another aspect of an embodiment of the present invention, there is provided a computer-readable medium on which a computer program is stored, the program, when being executed by a processor, implementing the method for training a character moving image generation model or the method for generating a character moving image according to the present invention.

One embodiment of the above invention has the following advantages or benefits: the character image generating method comprises the steps of obtaining a second action image corresponding to a first action image through a character image, a plurality of first action images and a basic generating model, carrying out frequency domain time sequence change analysis on the first action image and the second action image, determining a frequency domain time sequence difference function of the first action image and the second action image, then constructing a loss function containing frequency domain constraint by taking the frequency domain time sequence difference function as the frequency domain constraint, and updating the basic generating model according to the loss function to obtain a character motion image generating model. The basic model is updated by adding the frequency domain constraint in the loss function, the frequency domain time sequence change difference of the continuous frame images is considered, the time sequence consistency of the continuous frames can be improved, the calculation complexity and the storage consumption are reduced, additional errors introduced by other models are avoided, the character moving image generated by the character moving image generation model can improve the time sequence consistency of the continuous frame images, and the quality of a single frame image can be improved.

Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

fig. 1 is a schematic diagram of a main flow of a training method of a character moving image generation model according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a main flow of another training method of a character moving image generation model according to an embodiment of the present invention;

FIG. 3 is a schematic flow chart of a method for training a character moving image generative model according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a main flow of a generation method of a character moving image according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of the main blocks of a training apparatus for a character moving image generation model according to an embodiment of the present invention;

fig. 6 is a schematic diagram of the main blocks of a human moving image generation apparatus according to an embodiment of the present invention;

FIG. 7 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;

fig. 8 is a schematic structural diagram of a computer system suitable for implementing a terminal device or a server according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The time-series consistency of a character moving image means that the appearance of a character is consistent in consecutive frames, the position and the motion are continuous, and there is no apparent inter-frame jitter, artifact noise, or the like. In the prior art, when a generation model is trained by adopting technologies such as a countermeasure generation network and the like, sequential frames are constrained in a pixel domain, and sequential information of the pixel domain such as optical flow and the like is introduced, so that sequential consistency of the sequential frames is improved to a certain extent. However, the acquisition of the pixel domain time sequence information such as the optical flow depends on other complex neural network models, and the calculation time and the storage space are increased; meanwhile, extra errors are introduced into the dependent model, and the estimation accuracy cannot be guaranteed; moreover, pixel-level noise, edge jitter, and the like are not sufficiently obvious in a pixel domain and are easily ignored by models such as optical flow for the pixel domain, which results in a poor effect of improving the sequential property of consecutive frames. Therefore, in view of the above problems, embodiments of the present invention provide a method for training a character moving image generation model and a method for generating a character moving image, which can improve the time sequence consistency of the generated character moving images, reduce the computational complexity and storage consumption, and avoid introducing additional errors.

Fig. 1 is a schematic diagram of a main flow of a training method of a character moving image generation model according to an embodiment of the present invention, as shown in fig. 1, the method including the steps of:

step 101: acquiring a character image and a plurality of first action images, wherein the plurality of first action images are continuous frames, and acquiring a second action image corresponding to each first action image according to the character image, the plurality of first action images and a basic generation model;

step 102: determining a frequency domain time sequence difference function of the first action image and the second action image, and constructing a loss function containing frequency domain constraint according to the frequency domain time sequence difference function;

step 103: and updating the basic generation model according to the loss function containing the frequency domain constraint to obtain the character motion image generation model.

In the embodiment of the present invention, the human figure image may be an image including a human figure, such as a human head image, a human whole body or a human bust image. The first motion images are reference motion images, the plurality of first motion images are continuous frames, namely the first motion images are single-frame images in the reference motion video, the plurality of first motion images are images of the continuous frames in the reference motion video, and the plurality of first motion images can form the reference motion video. The first motion image includes motion information, for example, the first motion image may be an image including a motion of a character or an image including a key point of the motion information, and the key point may be a key point of a body part corresponding to a human body. Wherein, the image containing the action of the character image can be the same as or different from the character image in the image.

In the embodiment of the invention, a model is generated according to a character image, a plurality of first action images and a plurality of character image action real images training basis, a loss function is constructed based on the generated character action images and the plurality of character image action real images, iterative training is carried out, and the model is generated by utilizing a gradient reverse transfer training optimization basis. The basic Generative model may be any Generative model including a countermeasure Generative Network (GAN), such as LWGAN (Liquid walking GAN), C2F-FWN (Coarse-to-Fine Flow walking Network), and the like. The loss function comprises pixel domain constraint, and the pixel domain constraint selects one or more of image style constraint, feature constraint, face constraint and the like. After the trained basic model is generated, the character image and the plurality of first motion images are input to the trained basic model, and a second motion image corresponding to each first motion image, that is, a plurality of second motion images, which are continuous frames, can be generated. The second motion image is the image of the character performing the motion in the first motion image. The embodiment of the invention performs frequency domain time sequence analysis on the basic generation model so as to improve the time sequence consistency of the continuous frame images.

In an embodiment of the present invention, after obtaining a second motion image corresponding to each first motion image, performing frequency domain time sequence variation analysis on the first motion image and the corresponding second motion image, and determining a frequency domain time sequence difference function between the first motion image and the second motion image, includes:

determining a first amplitude spectrum and a first phase spectrum of each first action image in a frequency domain; determining a first time sequence amplitude difference of two first action images corresponding to adjacent frames according to the first amplitude spectrum; determining a first time sequence phase difference of two first action images corresponding to adjacent frames according to the first phase spectrum;

determining a second magnitude spectrum and a second phase spectrum of each second motion image in the frequency domain; determining a second time sequence amplitude difference of two second action images corresponding to adjacent frames according to the second amplitude spectrum; determining a second time sequence phase difference of two second action images corresponding to adjacent frames according to the second phase spectrum;

determining a time sequence amplitude difference function according to the first time sequence amplitude difference and the second time sequence amplitude difference; determining a timing difference function according to the first timing phase difference and the second timing phase difference;

and determining a frequency domain time sequence difference function according to the time sequence amplitude difference function and the time sequence difference function.

In the embodiment of the present invention, frequency domain time sequence changes of consecutive frames of the plurality of first motion images and frequency domain time sequence changes of consecutive frames of the plurality of second motion images are analyzed respectively. Before determining the first magnitude spectrum and the first phase spectrum of each first motion image in the frequency domain, performing two-dimensional Discrete Fourier Transform (DFT) on each first motion image, that is, a single-frame image, to obtain a frequency domain representation of the first motion image, as shown in equation (1),

in the formula (1), F (x, y) is the pixel value of the first motion image at the (x, y) coordinate, M, N are the length and width of the image respectively, F (u, v) represents the value of the first motion image at the coordinate (u, v) in the frequency spectrum after transformation, F (u, v) can be represented as formula (2),

f (u, v) = R (u, v) + I (u, v), formula (2),

in the formula (2), R (u, v) is a real part of F (u, v), and I (u, v) is an imaginary part of F (u, v);

obtaining a first amplitude value and a first phase value at the coordinates of the frequency spectrum (u, v) after the first motion image is transformed according to the formula (2);

first amplitude value

The first phase value = F (u, v) = arctan (I (u, v)/R (u, v)).

Obtaining a first amplitude spectrum of the first action image in a frequency domain according to the first amplitude values of all the coordinates of the frequency spectrum; and obtaining a first phase spectrum of the first motion image in a frequency domain according to the first phase values of all coordinates of the frequency spectrum. Similarly, a second magnitude spectrum and a second phase spectrum of the second motion image in the frequency domain can be obtained.

After the first amplitude spectrum and the first phase spectrum of each first action image are obtained, frequency domain time sequence changes of the two first action images corresponding to the adjacent frames are analyzed, a first time sequence amplitude difference can be determined according to the first amplitude spectra of the two first action images corresponding to the adjacent frames, and a first time sequence phase difference can be determined according to the first phase spectra of the two first action images corresponding to the adjacent frames.

In the embodiment of the present invention, as shown in fig. 2, determining the first time-series amplitude difference of two first motion images corresponding to adjacent frames according to the first amplitude spectrum includes:

step 201: determining the amplitude value of a first amplitude spectrum of a first action image corresponding to the current frame at the t moment of each coordinate; wherein, the time t is the time corresponding to the current frame;

step 202: determining the amplitude value of a first amplitude spectrum of a first action image corresponding to the previous frame at the t-1 moment at each coordinate;

step 203: the absolute value or the square of the difference between the amplitude value at time t and the amplitude value at time t-1 at each coordinate is taken as the first timing amplitude difference.

For example, the amplitude value of the first amplitude spectrum of the first motion image corresponding to the current frame at the time t at the coordinates (u, v) is | F _t (u, v) |, the amplitude value of the first amplitude spectrum of the first action image corresponding to the previous frame at the time t-1 at the coordinate (u, v) is | F _t-1 (u, v) |, the first timing amplitude difference of the two first motion images corresponding to the current frame and the previous frame respectively at the coordinates (u, v) can be calculated by equation (3), that is, the absolute value of the difference between the amplitude value at the time t and the amplitude value at the time t-1,

TAC _t (u，v)＝||F _t (u，v)|-|F _t-1 (u, v) | |, formula (3)

In formula (3), TAC _t (u, v) represents when the first magnitude spectrum of the adjacent frame is at the coordinates (u, v)The order magnitude difference.

In the embodiment of the present invention, the first timing amplitude difference of adjacent frames may be obtained by: respectively setting a corresponding t moment weight value for the amplitude value at the t moment of each coordinate, setting a corresponding t-1 moment weight value for the amplitude value at the t-1 moment of each coordinate, determining the product of the amplitude value at the t moment and the t moment weight value, determining the product of the amplitude value at the t-1 moment and the t-1 moment weight value, and taking the absolute value or the square of the difference value of the two products at each coordinate as the first amplitude difference of the adjacent frames.

In an embodiment of the present invention, determining a first timing phase difference of two first motion images corresponding to adjacent frames according to the first phase spectrum includes:

determining a phase value of a first phase spectrum of a first action image corresponding to a current frame at t moment of each coordinate, wherein the t moment is the moment corresponding to the current frame;

determining a phase value of a first phase spectrum of a first action image corresponding to a previous frame at the t-1 moment at each coordinate;

the absolute value or the square of the difference between the phase value at time t and the phase value at time t-1 at each coordinate is taken as the first time-series phase difference at time t.

For example, the phase value of the first phase spectrum of the first motion image corresponding to the current frame at time t at coordinates (u, v) is ═ F _t (u, v), the phase value of the first phase spectrum of the first motion image corresponding to the previous frame at the t-1 moment of the coordinates (u, v) is ° F _t-1 (u, v), the first timing phase difference of the two first motion images corresponding to the current frame and the previous frame respectively at the coordinates (u, v) can be calculated by equation (4), i.e. the absolute value of the difference between the phase value at the time t and the phase value at the time t-1,

TPC _t (u，v)＝|∠F _t (u，v)-∠F _t-1 (u, v) |, formula (4)

In formula (4), TAC _t (u, v) represents a timing phase difference of the first phase spectrum of the adjacent frame at the coordinates (u, v).

In the embodiment of the present invention, the first timing phase difference of adjacent frames may be obtained by: respectively setting a corresponding t moment weight value for the phase value at the t moment of each coordinate, setting a corresponding t-1 moment weight value for the phase value at the t-1 moment of each coordinate, determining the product of the phase value at the t moment and the t moment weight value, determining the product of the phase value at the t-1 moment and the t-1 moment weight value, and taking the absolute value or the square of the difference value of the two products at each coordinate as the first phase difference of the adjacent frames.

According to the determination process of the first time sequence amplitude difference, a second time sequence amplitude difference of two second motion images corresponding to adjacent frames can be obtained in the same way. According to the above determination process of the first timing phase difference, a second timing phase difference of two second motion images corresponding to adjacent frames can be obtained in the same manner.

In order to measure the difference of the second motion image from the first motion image in frequency domain time sequence, the frequency domain time sequence change consistency measure is analyzed.

In the embodiment of the present invention, a time sequence amplitude difference function may be determined according to a first time sequence amplitude difference and a second time sequence amplitude difference, where the first time sequence amplitude difference and the second time sequence amplitude difference respectively correspond to a first motion image and a second motion image corresponding to the first motion image, and the time sequence amplitude difference function may be used to measure an amplitude difference of the first motion image and the second motion image in a frequency domain time sequence. Determining a timing amplitude difference function from the first timing amplitude difference and the second timing amplitude difference, comprising: the timing amplitude difference function is determined from the absolute value or the square of the difference between the first timing amplitude difference and the second timing amplitude difference. For example, a timing amplitude difference function determined from the absolute value of the difference, as shown in equation (5),

in the formula (5), the reaction mixture is,

a first motion image representing time t;

a second motion image representing time t;

a time sequence amplitude difference function of the first motion image and the second motion image at the time t can also be called as a time sequence amplitude difference consistency measure; w (u, v) is the angular frequency at coordinate (u, v);

a first timing amplitude difference at time t;

a second timing amplitude difference at time t;

is the absolute value of the difference between the first timing amplitude difference and the second timing amplitude difference.

In the embodiment of the present invention, the timing amplitude difference function may also be obtained by: setting corresponding first time sequence amplitude weights for the first time sequence amplitude differences, setting corresponding second time sequence amplitude weights for the second time sequence amplitude differences, determining the product of the first time sequence amplitude differences and the first time sequence amplitude weights, determining the product of the second time sequence amplitude differences and the second time sequence amplitude weights, and determining a time sequence amplitude difference function according to the absolute value or the square of the difference value of the two products.

In an embodiment of the present invention, a timing phase difference function may be determined according to a first timing phase difference and a second timing phase difference, where the first timing phase difference and the second timing phase difference correspond to a first motion image and a second motion image corresponding to the first motion image, respectively, and the timing phase difference function may be used to measure a phase difference of the first motion image and the second motion image in a frequency domain timing. Determining a timing phase difference function from the first timing phase difference and the second timing phase difference, comprising: the timing phase difference function is determined based on an absolute value or a square of a difference between the first timing phase difference and the second timing phase difference. For example, a timing phase difference function determined from the absolute value of the difference, as shown in equation (5),

in the formula (5), the reaction mixture is,

a first motion image representing time t;

a second motion image representing time t;

a time sequence phase difference function of the first motion image and the second motion image at the time t, which is also called a time sequence phase difference consistency measurement; w (u, v) is the angular frequency at coordinate (u, v);

a first timing phase difference at time t;

a second timing phase difference at time t;

is the absolute value of the difference between the first timing phase difference and the second timing phase difference.

In the embodiment of the present invention, the timing phase difference function may also be obtained by: setting corresponding first time sequence phase weight for the first time sequence phase difference, setting corresponding second time sequence phase weight for the second time sequence phase difference, determining the product of the first time sequence phase difference and the first time sequence phase weight, determining the product of the second time sequence phase difference and the second time sequence phase weight, and determining a time sequence phase difference function according to the absolute value or the square of the difference value of the two products.

After determining the timing amplitude difference function and the timing phase difference function, a frequency domain timing difference function, i.e., a frequency domain constraint, may be obtained. The frequency domain timing difference function may be a summation or weighted summation of a timing amplitude difference function and a timing phase difference function, as shown in equation (6),

in the formula (6), the reaction mixture is,

for the frequency domain timing difference function, i.e. the frequency domain constraint, α, β are for controlling L, respectively _TAC And L _TPC The coefficient of the order of magnitude of (c) can be adjusted according to actual conditions.

In the embodiment of the present invention, after determining the first amplitude spectrum and the first phase spectrum of each first motion image in the frequency domain, determining a third time sequence amplitude difference of two first motion images separated by a preset number of frames according to the first amplitude spectrum, and determining a third time sequence phase difference of two first motion images separated by a preset number of frames according to the first phase spectrum;

after a second amplitude spectrum and a second phase spectrum of each second action image in a frequency domain are determined, determining a fourth time sequence amplitude difference of two second action images separated by a preset frame number according to the second amplitude spectrum, and determining a fourth time sequence phase difference of two second action images separated by the preset frame number according to the second phase spectrum;

determining a time sequence amplitude difference function according to the third time sequence amplitude difference and the fourth time sequence amplitude difference, and determining a time sequence difference function according to the third time sequence phase difference and the ground time sequence phase difference;

and determining a frequency domain difference function according to the time sequence amplitude difference function and the time sequence phase difference function.

The preset frame number N can be a self-defined frame number, N is an integer greater than or equal to 1, and the preset frame number N can be 1 frame, 2 frames and the like, namely, the preset frame number is separated by a preset frame number such as a first frame, a third frame, a second frame and a fourth frame, \ 8230, and the like. For example, the amplitude value at the time t at each coordinate in the first amplitude spectrum of the first motion image corresponding to the current frame may be determined, where the time t is the time corresponding to the current frame; determining an amplitude value at the time of (t-N-1) at each coordinate in a first amplitude spectrum of a first motion image separated by a preset number of frames N, and taking the absolute value or the square of the difference between the amplitude value at the time of t at each coordinate and the amplitude value at the time of (t-N-1) as a third time sequence amplitude difference at the time of t; determining a phase value at t moment at each coordinate in a first phase spectrum of a first motion image corresponding to the current frame, determining a phase value at (t-N-1) moment at each coordinate in the first phase spectrum of the first motion image separated by a preset frame number N, and taking an absolute value or a square of a difference value between the phase value at t moment at each coordinate and the phase value at (t-N-1) moment as a third time sequence phase difference at t moment. Similarly, a fourth timing amplitude difference and a fourth timing phase difference can be obtained. The timing amplitude difference function may then be determined from an absolute value or a square of a difference of the third timing amplitude difference and the fourth timing amplitude difference, and the timing phase difference function may be determined from an absolute value or a square of a difference of the third timing phase difference and the fourth timing phase difference.

In the embodiment of the present invention, after obtaining a frequency domain timing difference function, that is, a frequency domain constraint, a loss function is constructed according to the frequency domain constraint, where the loss function includes a pixel domain constraint and a frequency domain constraint, and the constructing of the loss function including the frequency domain constraint according to the frequency domain timing difference function includes: and taking the frequency domain time sequence difference function as frequency domain constraint, and performing weighted summation on the pixel domain constraint and the frequency domain constraint to determine a loss function. The direct sum of the pixel domain constraint and the frequency domain constraint can also be taken as a loss function. And (4) performing iterative training on the basic generation model through a loss function, and updating the basic generation model by utilizing gradient reverse transfer until the effect is not improved any more to obtain the character motion image generation model. The frequency domain time sequence is added into the loss function to update and optimize the basic generation model, so that the time sequence consistency of continuous frames can be improved, and the noise of the continuous frames is reduced.

As shown in fig. 3, a flow chart of a training method for a human motion image generation model according to an embodiment of the present invention is first to obtain a character image and a plurality of reference motion images, wherein the plurality of reference motion images are continuous motion images of the character, train the generation model according to the character image and the reference motion images, and a loss function including pixel domain constraints, that is, generate a model based on a basis of a countermeasure generation network, and obtain a plurality of corresponding generated motion images based on the character image, the plurality of reference motion images, and the basis generation model. Performing frequency domain time sequence change analysis according to a reference action image and a generated action image through forward calculation, firstly performing two-Dimensional Fourier Transform (DFT) on the reference action image and a single-frame image of the generated action image to obtain a magnitude spectrum and a phase spectrum, obtaining frequency domain time sequence change (TPC) based on the magnitude spectrum and the phase spectrum, obtaining time sequence amplitude difference (TAC) according to the magnitude spectrum of a continuous frame, and obtaining time sequence phase difference (TPC) according to the phase spectrum of the continuous frame; thereby obtaining a timing amplitude difference consistency metric (L) from the timing amplitude difference of the reference motion image and the generated motion image _TAC ) Deriving a timing phase difference consistency metric (L) from the timing phase difference of the reference motion image and the generated motion image _TPC ) According to L _TAC And L _TPC Obtaining a time sequence frequency domain difference function (L) _WTFR ) And transmitting the time sequence frequency domain difference function as frequency domain constraint to the generation model, namely, the loss function is the sum of pixel domain constraint and frequency domain constraint, and updating the generation model by utilizing gradient reverse transmission to obtain the character moving image generation model.

As shown in fig. 4, another aspect of the embodiments of the present invention provides a method for generating a character moving image, including:

step S401: acquiring a target character image and a plurality of action images, wherein the action images are continuous frames;

step S402: inputting the target character image and the plurality of motion images into the character moving image generation model, obtaining a character moving image including the target character image corresponding to each motion image,

By inputting the target character image and a plurality of continuous motion images into the character motion image generation model, a character motion image containing the target character can be obtained, and the obtained character motion image has high time sequence consistency and obtains a character motion image with better effect. The method for generating the character moving image can be applied to the fields of movie animation production, virtual fitting, virtual digital people and the like.

The embodiment of the invention respectively adopts a basic generation model and a character motion image generation model to generate character motion images for experimental simulation, respectively adopts LWGAN and C2F-FWN as basic generation models, respectively adopts updated LWGAN and updated C2F-FWN as corresponding character motion image generation models, compares the effects of the character motion images generated by LWGAN and updated LWGAN, and compares the effects of the character motion images generated by C2F-FWN and updated C2F-FWN. Table 1 shows the results of comparison of the human moving images generated by LWGAN and updated LWGAN, and C2F-FWN and updated C2F-FWN with real images in time-series consistency, and table 2 shows the results of comparison of the human moving images generated by LWGAN and updated LWGAN, and C2F-FWN and updated C2F-FWN with single-frame image quality.

TABLE 1

TABLE 2

As can be seen from table 1, compared to LWGAN and C2F-FWN, the Timing Consistency Metric (TCM) and the inter-frame peak signal-to-noise ratio (inter-PSNR) of updated LWGAN and updated C2F-FWN are both larger, which indicates that the timing consistency of the continuous frame images is significantly improved by the updated basic generation model, i.e., the human motion image generation model. As can be seen from table 2, compared to LWGAN and C2F-FWN, the peak signal-to-noise ratio (PSNR) of updated LWGAN and updated C2F-FWN is larger, the image perception similarity (LPIPS) is smaller, and the Structural Similarity (SSIM) is larger; the higher the PSNR value is, the better the quality of the single-frame image is, the smaller the LPIPS is, the better the quality of the single-frame image is, the higher the SSIM value is, the better the quality of the single-frame image is, and the quality of the updated basic generation model, namely the human moving image generation model, is improved. Meanwhile, the computational complexity is not significantly increased when the character moving image generation model is trained.

According to the character moving image generation model training method provided by the embodiment of the invention, the basic generation model is updated by adding the frequency domain constraint in the loss function, the basic generation model can be any generation model based on the confrontation generation network to obtain the character moving image generation model, the time sequence consistency of continuous frame images is improved, additional errors caused by introducing other complex neural network models are avoided, and the character moving image generation model training method has lower calculation complexity and storage consumption. Moreover, after the basic generation model is iteratively updated, the updated generation model of the character moving image can be obtained by adopting the training method of the character moving image generation model of the embodiment of the invention. The method for generating the character moving image reduces the phenomena of artifacts, shaking and the like, improves the time sequence consistency of the character moving image, reduces the noise in the image and improves the quality of a single-frame image. The character moving image generation method can be applied to a plurality of fields such as movie animation production, virtual fitting, virtual digital people and the like.

As shown in fig. 5, another aspect of the embodiment of the present invention provides a training apparatus 500 for generating a model of a character moving image, including:

a second obtaining module 501, configured to obtain a character image and a plurality of first motion images, where the plurality of first motion images are continuous frames, and obtain a second motion image corresponding to each first motion image according to the character image, the plurality of first motion images, and a basic generation model;

a determining module 502, configured to determine a frequency domain timing difference function of the first motion image and the second motion image, and construct a loss function including frequency domain constraints according to the frequency domain timing difference function;

the training module 503 updates the basic generation model according to a loss function including frequency domain constraints, to obtain a character motion image generation model.

In this embodiment of the present invention, the determining module 502 is further configured to:

In this embodiment of the present invention, the determining module 502 is further configured to: determining an amplitude value at the t moment of each coordinate in a first amplitude spectrum of a first action image corresponding to a current frame, wherein the t moment is the moment corresponding to the current frame; determining the amplitude value of t-1 moment at each coordinate in a first amplitude spectrum of a first action image corresponding to the previous frame; the absolute value or the square of the difference between the amplitude value at time t and the amplitude value at time t-1 at each coordinate is taken as the first time-series amplitude difference at time t.

In this embodiment of the present invention, the determining module 502 is further configured to: determining a phase value at t moment of each coordinate in a first phase spectrum of a first action image corresponding to a current frame, wherein the t moment is a moment corresponding to the current frame; determining a phase value at t-1 moment at each coordinate in a first phase spectrum of a first action image corresponding to a previous frame; the absolute value or the square of the difference between the phase value at time t and the phase value at time t-1 at each coordinate is taken as the first timing phase difference at time t.

In this embodiment of the present invention, the determining module 502 is further configured to: and determining a time sequence amplitude difference function according to the absolute value or the square of the difference value of the first time sequence amplitude difference and the second time sequence amplitude difference.

In this embodiment of the present invention, the determining module 502 is further configured to: the timing phase difference function is determined based on an absolute value or a square of a difference between the first timing phase difference and the second timing phase difference.

In an embodiment of the present invention, the loss function further includes a pixel domain constraint, and the determining module 502 is further configured to: and taking the frequency domain time sequence difference function as frequency domain constraint, and performing weighted summation on the pixel domain constraint and the frequency domain constraint to determine a loss function.

As shown in fig. 6, a further aspect of the embodiments of the present invention provides a device 600 for generating a character moving image, including:

a second obtaining module 601, configured to obtain a target character image and a plurality of motion images, where the plurality of motion images are continuous frames;

a generating module 602 inputting the target character image and the plurality of motion images into the character moving image generating model, obtaining a character motion image including the target character corresponding to each motion image,

wherein the character moving image generation model is obtained according to the training method of any one of claims 1 to 7.

In another aspect, an embodiment of the present invention provides an electronic device, including: one or more processors; a storage device for storing one or more programs which, when executed by one or more processors, cause the one or more processors to implement a character moving image generation model training method or a character moving image generation method according to an embodiment of the present invention.

A further aspect of the embodiments of the present invention provides a computer-readable medium on which a computer program is stored, the program, when executed by a processor, implementing a method of training a character moving image generation model or a method of generating a character moving image of the embodiments of the present invention.

Fig. 7 shows an exemplary system architecture 700 of a training method of a character moving image generation model or a training apparatus of a character moving image generation model, a character moving image generation method, or a character moving image generation apparatus to which an embodiment of the present invention can be applied.

As shown in fig. 7, the system architecture 700 may include

terminal devices

701, 702, 703, a network 704, and a server 705. The network 704 serves to provide a medium for communication links between the

terminal devices

701, 702, 703 and the server 705. Network 704 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

A user may interact with a server 705 via a network 704 using terminal devices 701,702, 703 to receive or send messages or the like. The

terminal devices

701, 702, 703 may have installed thereon various communication client applications, such as a shopping application, a web browser application, a search application, an instant messaging tool, a mailbox client, social platform software, etc. (by way of example only).

The

terminal devices

701, 702, 703 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 705 may be a server providing various services, such as a background management server (for example only) providing support for shopping websites browsed by users using the

terminal devices

701, 702, 703. The backend management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (for example, target push information, product information — just an example) to the terminal device.

It should be noted that the method for training a character moving image generation model or the method for generating a character moving image according to the embodiment of the present invention is generally executed by the server 705, and accordingly, the training device for a character moving image generation model or the generating device for a character moving image is generally provided in the server 705.

It should be understood that the number of terminal devices, networks, and servers in fig. 7 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring now to FIG. 8, shown is a block diagram of a computer system 800 suitable for use in implementing a terminal device of an embodiment of the present invention. The terminal device shown in fig. 8 is only an example, and should not bring any limitation to the functions and the use range of the embodiment of the present invention.

As shown in fig. 8, a computer system 800 includes a Central Processing Unit (CPU) 801 which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM) 802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data necessary for the operation of the system 800 are also stored. The CPU 801, ROM 802, and RAM 803 are connected to each other via a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

The following components are connected to the I/O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output section 807 including a signal such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 808 including a hard disk and the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as necessary. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that the computer program read out therefrom is mounted on the storage section 808 as necessary.

In particular, according to embodiments of the present disclosure, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 809 and/or installed from the removable medium 811. The computer program executes the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 801.

It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes an acquisition module, a determination module, and a training module. The names of these modules do not limit the module itself in some cases, and for example, the training module may be further described as "a module that updates the basic generation model according to a loss function including a frequency domain constraint to obtain a character moving image generation model".

As another aspect, the present invention also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be separate and not assembled into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: acquiring a character image and a plurality of first action images, wherein the plurality of first action images are continuous frames, and acquiring a second action image corresponding to each first action image according to the character image, the plurality of first action images and a basic generation model; determining a frequency domain time sequence difference function of the first action image and the second action image, and constructing a loss function containing frequency domain constraint according to the frequency domain time sequence difference function; and updating the basic generation model according to the loss function containing the frequency domain constraint to obtain the character motion image generation model.

According to the technical scheme of the embodiment of the invention, the character moving image generation model training method provided by the embodiment of the invention obtains the character moving image generation model by adding frequency domain constraint in the loss function and updating the basic generation model, improves the time sequence consistency of continuous frame images, avoids additional errors caused by introducing other complex neural network models, and has lower computational complexity and storage consumption. Moreover, after the basic generation model is iteratively updated, the updated generation model of the character moving image can be obtained by adopting the training method of the character moving image generation model of the embodiment of the invention. The method for generating the character moving image reduces the phenomena of artifacts, shaking and the like, improves the time sequence consistency of the character moving image, reduces the noise in the image and improves the quality of a single frame image. The character moving image generating method can be applied to a plurality of fields such as movie animation production, virtual fitting, virtual digital people and the like.

The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for training a character moving image generation model, comprising:

2. The training method of claim 1, wherein determining a frequency domain timing difference function for the first motion image and the second motion image comprises:

and determining the frequency domain time sequence difference function according to the time sequence amplitude difference function and the time sequence phase difference function.

3. The training method according to claim 2, wherein determining the first timing amplitude difference of two first motion images corresponding to adjacent frames according to the first amplitude spectrum comprises:

determining an amplitude value at t-1 moment of each coordinate in a first amplitude spectrum of a first motion image corresponding to a previous frame;

4. The training method of claim 2, wherein determining the first timing phase difference of two first motion images corresponding to adjacent frames according to the first phase spectrum comprises:

determining a phase value at t moment of each coordinate in a first phase spectrum of a first motion image corresponding to a current frame, wherein the t moment is a moment corresponding to the current frame;

5. Training method according to claim 2, wherein determining the timing amplitude difference function from the first timing amplitude difference and the second timing amplitude difference comprises:

6. The training method of claim 2, wherein determining the timing phase difference function from the first timing phase difference and the second timing phase difference comprises:

7. The training method of claim 2, wherein the loss function further comprises a pixel domain constraint, and wherein constructing a loss function comprising a frequency domain constraint based on the frequency domain timing difference function comprises:

8. A method for generating a character moving image, comprising:

wherein the character moving image generation model is obtained by the training method according to any one of claims 1 to 7.

9. A training apparatus for generating a model of a character moving image, comprising:

10. A device for generating a character moving image, comprising:

a generating module inputting the target character image and the plurality of motion images into a character moving image generating model to obtain a character motion image including the target character image corresponding to each of the motion images,

11. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-8.

12. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-8.