CN115100028A

CN115100028A - Method and device for transferring image key points, electronic equipment and storage medium

Info

Publication number: CN115100028A
Application number: CN202210688531.9A
Authority: CN
Inventors: 何声一; 洪智滨
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-06-17
Filing date: 2022-06-17
Publication date: 2022-09-23

Abstract

The invention provides a method and a device for transferring image key points, electronic equipment and a storage medium, relates to the technical field of artificial intelligence, in particular to the technical field of computer vision and deep learning, and can be applied to scenes such as face image processing, face recognition and the like. The scheme is as follows: acquiring first key points of a first object contained in a source image and second key points of a second object contained in a target image, wherein the key points comprise at least one of the following items: human face expression key points and human body action key points; determining a neutral keypoint between the first keypoint and the second keypoint, wherein the neutral keypoint is characteristic of at least one of: the same human face characteristics when the multiple objects show different human face expressions and the same human body characteristics when the multiple objects perform different limb actions; based on the neutral keypoints, migrating the first keypoints to the second keypoints.

Description

Method and device for transferring image key points, electronic equipment and storage medium

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to the technical field of computer vision and deep learning, can be applied to scenes such as face image processing, face recognition and the like, and particularly relates to a method and a device for migrating image key points, electronic equipment and a storage medium.

Background

The facial expression transfer technique or the human body motion transfer technique is to map an input facial expression to a target face through a certain mapping relationship so that the target face can perform the facial expression, or to map an input human body motion to a target human body so that the target human body can perform the human body motion.

In the prior art, mapping of facial expressions or human body actions is usually realized by a way of key point direct mapping. However, in practical applications, there are differences between key points corresponding to different human faces or human bodies, for example, the key point when a square face performs a smiling expression is different from the key point when a circular face performs a smiling expression. Therefore, the mapping of the facial expression or the human body action is realized by directly mapping the key points, so that the mapped facial expression or the mapped human body action cannot be perfectly fused with the target object, namely, the target object cannot truly and naturally perform the facial expression or the human body action of the source object.

Disclosure of Invention

The disclosure provides a method and a device for transferring image key points, electronic equipment and a storage medium.

According to an aspect of the present disclosure, there is provided a method of migrating image keypoints, comprising: acquiring first key points of a first object contained in a source image and second key points of a second object contained in a target image, wherein the key points comprise at least one of the following items: human face expression key points and human body action key points; determining a neutral keypoint between the first keypoint and the second keypoint, wherein the neutral keypoint is characteristic of at least one of: the same human face characteristics are obtained when the plurality of objects show different human face expressions, and the same human body characteristics are obtained when the plurality of objects perform different body actions; based on the neutral keypoints, migrating the first keypoints to the second keypoints.

In the disclosure, the neutral key points are adopted to realize the migration of the facial expressions or the human body actions of different objects, wherein the neutral key points represent the common facial feature information when the multiple objects perform different facial expressions and/or the common human body feature information when the multiple objects perform different limb actions, that is, in the disclosure, the neutral key points can decouple the influence of the features of the different objects on the facial expressions or the human body actions, so that the uniform expression of the facial expressions or the human body actions on different faces or different human bodies is realized, and the accuracy and the authenticity of the facial expression migration or the human body action migration are further ensured.

Therefore, the scheme provided by the disclosure achieves the purpose of facial expression migration or human body action migration, so that the effect of improving the accuracy and the authenticity of the facial expression migration or the human body action migration is realized, and the problem that the facial expression or the human body action migration is not true in the prior art is solved.

According to another aspect of the present disclosure, there is also provided an apparatus for migrating image keypoints, comprising: the acquisition module is used for acquiring first key points of a first object contained in a source image and second key points of a second object contained in a target image, wherein the key points comprise at least one of the following items: human face expression key points and human body action key points; a determination module for determining a neutral keypoint between the first keypoint and the second keypoint, wherein the neutral keypoint is characteristic of at least one of: the same human face characteristics when the multiple objects show different human face expressions and the same human body characteristics when the multiple objects perform different limb actions; a migration module to migrate the first keypoint to the second keypoint based on the neutral keypoint.

According to another aspect of the present disclosure, there is also provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; the storage stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to execute the method for migrating the image key points.

According to another aspect of the present disclosure, there is also provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of migrating image keypoints according to the above.

According to another aspect of the present disclosure, there is also provided a computer program product comprising a computer program which, when executed by a processor, implements a method of migrating image keypoints according to the above.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a flow chart of a method of migrating image keypoints, according to a first embodiment of the present disclosure;

FIG. 2 is a schematic diagram of a keypoint extraction model according to a second embodiment of the disclosure;

FIG. 3 is a schematic diagram of model training according to a second embodiment of the present disclosure;

FIG. 4 is a schematic diagram of an apparatus for migrating image keypoints, according to a third embodiment of the present disclosure;

FIG. 5 is a block diagram of an electronic device for implementing a method of migrating image keypoints according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of embodiments of the present disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.

Example 1

According to an aspect of the present disclosure, there is provided a method for migrating image key points, as shown in fig. 1, the method at least including the steps of:

step S102, acquiring a first key point of a first object included in a source image and a second key point of a second object included in a target image, where the key points (i.e. the first key point and the second key point) include at least one of: the key points of the face expression and the key points of the human body action.

In step S102, the key point information of the facial expression key points represents the facial expression information of the target object, and the key point information of the human body action key points represents the human body action information of the target object. Optionally, in this disclosure, the key point information at least includes position change information of the key point and number change information of the key point. For example, the number of key points corresponding to the smiling expression is different from the number of key points corresponding to the heart-injured expression, and the position of a key point corresponding to the smiling expression is different from the position of a key point corresponding to the heart-injured expression.

In addition, in the present disclosure, the keypoint information of the first object in the source image needs to be mapped into the second object in the target image, so that the second object can perform the same expression or limb movement as the first object. That is, in the present disclosure, the second object in the target image is the object to which the keypoint is to be mapped.

Alternatively, in step S102, the terminal device may identify the key points of the object in the image by a key point detection technique. Specifically, the terminal device first identifies the human face or the human body based on identifying the object in the image, i.e. from the image containing different types of objects. And then, the terminal equipment detects key points from the identified human face or human body based on the Deepfake technology.

It should be noted that after the face or the human body is detected from the image, the key points of the face or the human body are identified, so that the problem of inaccurate identification of the key points in the image can be avoided, and the accuracy of key point identification is improved.

Step S104, determining a neutral key point between the first key point and the second key point, wherein the neutral key point represents at least one of the following items: the same human face characteristics are displayed by a plurality of objects when different human face expressions are displayed, and the same human body characteristics are displayed by a plurality of objects when different limb actions are performed.

In step S104, the neutral key point represents face feature information common to a plurality of subjects performing different facial expressions and/or body feature information common to a plurality of subjects performing different body movements. That is, the neutral key points can realize the uniform expression of the facial expressions or human body actions on different faces or different human bodies, for example, because the shapes of the faces of different objects are different, the corresponding facial key points of different objects are different certainly when the different objects are doing similar expressions. However, the face key points of different objects may have similar motion laws, and the neutral key points can realize the uniform expression of the face expression or the human body action on different faces or different human bodies, that is, in the disclosure, the neutral key points can decouple the influence of the face shapes and the face motions of different objects on the face key points, so that the motions of the corresponding neutral key points of different objects are consistent when the different objects make similar expressions.

It should be noted that, for human body movement, the neutral key points can decouple the human body shapes of different objects and the influence of human body movement on the human body key points, so that the movements of the corresponding neutral key points are consistent when different objects perform similar actions.

In addition, it should be noted that the neutral key point is not any one or more of the face key point or the human body key point, but is constructed based on the face key point or the human body key point, that is, in the present disclosure, the neutral key point is a virtual key point, which does not actually exist.

In step S104, the neutral key points are constructed based on the key point information of the object, so that the influence of the specific information of the object on the facial expression migration or the human body motion migration is decoupled, the facial expression or the human body motion of the source object can be truly and naturally mapped onto the target object, and the face driving or the human body driving is truly, naturally and accurately realized.

Step S106, based on the neutral key point, the first key point is transferred to the second key point.

In step S106, after the neutral key point is determined, the terminal device may predict position information of the neutral key point on the second object, resulting in a predicted key point. And then, the terminal equipment determines the mapping relation between the predicted key point and the first key point, and maps the first key point to the predicted key point based on the mapping relation, so that the facial expression or the human body action of the first object can be really and naturally displayed on the second object.

In the above process, the predicted key point is a key point obtained by predicting a movement trajectory of a key point of the second object. Therefore, the key points of the first object are mapped to the predicted key points, so that the facial expression or the human body action of the first object can be displayed on the second object truly and accurately.

Based on the schemes defined in steps S102 to S106, it can be known that, in a manner that the neutral key points are used to implement the migration of facial expressions or human body actions of different objects, after a first key point of a first object included in the source image and a second key point of a second object included in the target image are acquired, the neutral key point between the first key point and the second key point is determined, and the first key point is migrated to the second key point based on the neutral key point, so that the second object performs facial expressions and/or human body actions identical to those of the first object. Wherein the key points include at least one of: the human face expression key points and the human body action key points, wherein the neutral key points represent at least one of the following characteristics: the same human face characteristics are displayed by a plurality of objects when different human face expressions are displayed, and the same human body characteristics are displayed by a plurality of objects when different limb actions are performed.

It is easy to notice that the neutral key point represents face feature information common to a plurality of objects performing different face expressions and/or human feature information common to a plurality of objects performing different body motions, that is, in the present disclosure, the neutral key point can decouple the influence of the features of different objects on the face expression or the human motion, so as to realize the uniform expression of the face expression or the human motion on different faces or different human bodies, and further ensure the accuracy and the authenticity of the face expression migration or the human motion migration.

Example 2

According to another aspect of the present disclosure, a method of migrating image keypoints is also provided. It should be noted that, in many application scenarios of the face key points or the human body key points, the key point positions of different objects are obviously different, but in the motion process, the motion modes of the face key points or the human body key points of different objects have similar places. Therefore, in the embodiment, the neutral key points are established to describe the facial expressions or the human body motions, and then the neutral key points are mapped to specific faces or human bodies according to different objects, so that the unreal problem of the face driving or the human body driving can be solved.

In an optional embodiment, the terminal device first obtains a first keypoint of a first object contained in the source image and a second keypoint of a second object contained in the target image, where, in the present disclosure, the keypoints (i.e., the first keypoint and the second keypoint described above) include at least one of: the key points of the face expression and the key points of the human body action.

Optionally, step S102 shown in fig. 1 may include: and respectively extracting a first key point and a second key point from the source image and the target image. Specifically, the terminal device may extract facial expression key points and/or human body action key points of the first object from the source image based on the key point extraction model, and extract facial expression key points and/or human body action key points of the second object from the target image based on the key point extraction model.

It should be noted that the above-mentioned key point extraction model may be, but is not limited to, a model constructed based on the deepake technique.

Optionally, fig. 2 shows a schematic diagram of an optional key point extraction model, as can be seen from fig. 2, for different human expressions (e.g., the human expression in the source image and the human expression in the target image in fig. 2), the key point extraction model firstly encodes the different human expressions, and extracts the expressive features (e.g., Inter AB in fig. 2) between the different people and the expressive features (e.g., Inter B in fig. 2) unique to each person according to the encoding result. And then, performing feature analysis based on the expression features of different characters and the unique expression feature of each character to obtain key points corresponding to the source image and key points corresponding to the target image.

Further, in order to verify the accuracy of the key point extraction model, in the present disclosure, after obtaining the key points corresponding to the source image and the key points corresponding to the target image, the key points are further used to reconstruct the facial expression or the human body action. As shown in fig. 2, after obtaining the key points corresponding to the source image and the key points corresponding to the target image, the terminal device predicts the key points of the face based on the decoder, so as to reconstruct the facial expression based on the key points, and obtain a prediction result. And then, identifying whether the prediction result is accurate or not through an identifier to obtain an identification result.

It should be noted that, the key points of different objects are extracted through the key point extraction model, so that the accuracy of extracting the key points is improved, and the accuracy of facial expression migration and/or human body action migration is further ensured.

Further, after extracting the key points of the different objects, the terminal device may determine a neutral key point between the first key point and the second key point. When the key points include facial expression key points, step S104 shown in fig. 1 may include: the method comprises the steps of respectively obtaining first face key point information and second face key point information when a first object and a second object show different face expressions, carrying out feature aggregation on the first face key point information and the second face key point information to obtain face aggregation features, and finally constructing face aggregation key points between the first key points and the second key points based on the face aggregation features to obtain face neutral key points.

Optionally, the face key point information (i.e. the first face key point information and the second face key point information) includes: the number change information of the facial expression key points and the position change information of the facial expression key points.

It should be noted that, because the facial shapes of different objects are different, the corresponding facial expression key points of different objects are necessarily different when the different objects have similar expressions. But the facial expression key points of different objects may have similar motion laws, and the neutral key points can realize the uniform expression of the facial expression on different faces.

In the embodiment, feature aggregation is performed on face key point information when different objects show different face expressions, and then key point aggregation is performed on the basis of the aggregated features to obtain face neutral key points, and the face neutral key points can decouple the influence of face shapes and face motions of the different objects on the face expression key points, so that the motions of the corresponding neutral key points of the different objects are consistent when the different objects make similar expressions, thereby avoiding the influence of specific information of the objects on the transfer of the face expressions and ensuring the accurate transfer of the face expressions.

In an alternative embodiment, when the key point includes the human body motion key point information, step S104 shown in fig. 1 may include: respectively obtaining first human body key point information and second human body key point information when the first object and the second object are counted to perform different human body actions, performing feature aggregation on the first human body key point information and the second human body key point information to obtain human body aggregation features, and then constructing human body aggregation key points between the first key points and the second key points based on the human body aggregation features to obtain human body neutral key points.

Optionally, the human key point information (i.e. the first human key point information and the second human key point information) includes: the number change information of the human body action key points and the position change information of the human body action key points.

It should be noted that, because the shapes of human bodies of different objects are different, the key points of the corresponding human body actions of different objects are necessarily different when the different objects perform similar human body actions. However, the human body action key points of different objects may have similar motion laws, and the neutral key points can realize the uniform expression of human body actions on different human bodies.

In the embodiment, the human body neutral key points are obtained by performing feature aggregation on human body key point information when different objects perform different human body actions and then performing key point aggregation based on the aggregated features, and the human body neutral key points can decouple the influence of human body shapes and human body motions of the different objects on the human body action key points, so that the motions of the corresponding neutral key points of the different objects are consistent when the different objects perform similar limb actions, thereby avoiding the influence of the specific information of the objects on the human body action migration, and ensuring the accurate migration of the human body action.

Further, after determining the neutral key point, the terminal device may migrate the first key point to the key point based on the neutral key point. Specifically, step S106 shown in fig. 1 may include: and predicting the position information of the neutral key point on the second object based on the neural network model to obtain a predicted key point. Then, a mapping relationship between the predicted keypoint and the first keypoint is determined, and the first keypoint is mapped on the predicted keypoint based on the mapping relationship. The predicted key point is a key point obtained by predicting the movement locus of the key point of the second object.

It should be noted that, in practical applications, not all the neutral key points have corresponding face key points or human body key points, and therefore, when the migration of the face expression and/or the human body action between different objects is realized through the neutral key points, there may be a case that some key points cannot realize mapping through the neutral key points, so that the face expression and/or the human body action cannot be accurately migrated.

To avoid the above problem, in the present disclosure, the neutral key point is predicted based on the neural network model, and the predicted key point is obtained. And the predicted key point is a key point obtained by predicting the movement locus of the key point of the second object. Therefore, the predicted key points corresponding to the key points of the first object inevitably exist, that is, all the key points of the first object can be mapped onto the second object, so that the face expression or the human body action of the first object can be really and accurately shown on the second object.

In addition, the predicted keypoints are obtained by processing the second keypoints and the neutral keypoints based on a neural network model, and the neural network model can be obtained by training through a model training diagram shown in fig. 3.

Specifically, the terminal device first obtains neutral key point samples and target key point samples corresponding to the plurality of objects, and predicts neutral key points in the neutral key point samples based on the initial neural network model to obtain predicted key point samples. And then, calculating a loss value of a loss function corresponding to the initial neural network model based on the predicted key point sample and the target key point sample, and adjusting model parameters of the initial neural network model based on the loss value until the loss value reaches the minimum value to obtain the neural network model. The target key points represent real key points corresponding to the plurality of objects.

Optionally, as shown in fig. 3, the neutral key point sample is input into an initial neural network model, and the initial neural network model performs multi-level and different-scale convolution processing on the neutral key point sample, and finally outputs a predicted key point sample after average pooling processing. The end device then does this by acting on the loss function of the initial neural network model with the target key point samples.

That is, in the present disclosure, the key points in the target key point sample are the true values of the key points in the predicted key point sample, but in the present disclosure, each time the predicted key points are to be obtained, the key points corresponding to the plurality of objects pass through the deepake network for each object, and there is no generalization for the target key points corresponding to the plurality of objects (that is, the target key points may not be obtained by all the neutral key points).

It should be noted that the target key points corresponding to the plurality of objects are fitted by the predicted key points obtained by the neural network model prediction provided by the present disclosure, which not only can improve the acquisition speed of the target key points, but also can determine the corresponding target key points for any neutral key points, thereby ensuring the accuracy of facial expression migration and/or human body motion migration.

Based on the above, in the present disclosure, by collecting face images and/or body images of different objects, a one-to-one correspondence between different face images and/or body images is obtained through a deepake technique, so as to obtain facial expression key points and/or body action key points. And detecting the facial expression key points and/or the human body action key points by a key point detection technology to obtain neutral key points. And then, the mapping relation between the facial expression key points and/or the human body action key points and the neutral key points is learned through a deep learning technology, so that the effect of key point migration among different objects is realized, and the accuracy of key point migration is ensured.

Example 3

According to another aspect of the present disclosure, there is also provided an apparatus for migrating image keypoints, as shown in fig. 4, the apparatus including: an acquisition module 401, a determination module 403, and a migration module 405.

The obtaining module 401 is configured to obtain a first key point of a first object included in a source image and a second key point of a second object included in a target image, where the key points include at least one of: human face expression key points and human body action key points; a determining module 403 for determining a neutral keypoint between the first keypoint and the second keypoint, wherein the neutral keypoint is characteristic of at least one of: the same human face characteristics when the multiple objects show different human face expressions and the same human body characteristics when the multiple objects perform different limb actions; a migration module 405 for migrating the first keypoint to the second keypoint based on the neutral keypoint.

Optionally, the obtaining module includes: and the extraction module is used for respectively extracting the first key point and the second key point from the source image and the target image.

Optionally, the determining module includes: the device comprises a first acquisition module, a first aggregation module and a first construction module. The first obtaining module is configured to obtain first face key point information and second face key point information when the first object and the second object exhibit different facial expressions, respectively, where the face key point information includes: the number change information of the facial expression key points and the position change information of the facial expression key points; the first aggregation module is used for carrying out feature aggregation on the first face key point information and the second face key point information to obtain face aggregation features; and the first construction module is used for constructing the face aggregation key points between the first key points and the second key points based on the face aggregation characteristics to obtain the face neutral key points.

Optionally, the determining module includes: the device comprises a second acquisition module, a second aggregation module and a second construction module. The second obtaining module is used for respectively obtaining and counting first human body key point information and second human body key point information when the first object and the second object carry out different human body actions, wherein the human body key point information comprises: the number change information of the human body action key points and the position change information of the human body action key points; the second aggregation module is used for carrying out feature aggregation on the first human body key point information and the second human body key point information to obtain human body aggregation features; and the second construction module is used for constructing the human body aggregation key points between the first key points and the second key points based on the human body aggregation characteristics to obtain the human body neutral key points.

Optionally, the migration module includes: the device comprises a processing module, a first determining module and a mapping module. The processing module is used for processing object information and a neutral key point of a second object based on a neural network model to obtain a predicted key point, wherein the predicted key point is a key point obtained by predicting a moving track of the key point of the second object; the first determination module is used for determining the mapping relation between the prediction key point and the object information of the first object; and the mapping module is used for mapping the object information of the first object on the prediction key point of the second object based on the mapping relation.

Optionally, the apparatus for migrating image key points further includes: the device comprises a first obtaining module, a predicting module, a calculating module and a generating module. The first acquisition module is used for acquiring neutral key point samples corresponding to a plurality of objects and target key point samples corresponding to the plurality of objects, wherein the target key points represent real key points corresponding to the plurality of objects; the prediction module is used for predicting neutral key points in the neutral key point samples based on the initial neural network model to obtain predicted key point samples; the calculation module is used for calculating a loss value of a loss function corresponding to the initial neural network model based on the prediction key point sample and the target key point sample; and the generating module is used for adjusting the model parameters of the initial neural network model based on the loss value until the loss value reaches the minimum value, so as to obtain the neural network model.

Example 4

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 5 illustrates a schematic block diagram of an example electronic device 500 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 5, the apparatus 500 comprises a computing unit 501 which may perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)502 or a computer program loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the device 500 can also be stored. The calculation unit 501, the ROM 502, and the RAM 503 are connected to each other by a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

A number of components in the device 500 are connected to the I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, or the like; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508, such as a magnetic disk, optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the device 500 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 501 may be a variety of general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of the computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 501 executes the respective methods and processes described above, such as a method of migrating image key points. For example, in some embodiments, the method of migrating image keypoints may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 500 via the ROM 502 and/or the communication unit 509. When the computer program is loaded into the RAM 503 and executed by the computing unit 501, one or more steps of the method of migrating image keypoints described above may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured to perform the method of migrating image keypoints by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method of migrating image keypoints, comprising:

acquiring first key points of a first object contained in a source image and second key points of a second object contained in a target image, wherein the key points comprise at least one of the following items: human face expression key points and human body action key points;

determining a neutral keypoint between the first keypoint and the second keypoint, wherein the neutral keypoint characterizes at least one of: the method comprises the following steps that the same face features are displayed by a plurality of objects when different face expressions are displayed, and the same human body features are displayed by the plurality of objects when different body actions are carried out;

migrating the first keypoint to the second keypoint based on the neutral keypoint.

2. The method of claim 1, wherein said obtaining first keypoints of a first object contained in the source image and second keypoints of a second object contained in the target image comprises:

and respectively extracting the first key point and the second key point from the source image and the target image.

3. The method of claim 1, wherein the determining a neutral keypoint between the first keypoint and the second keypoint comprises:

respectively acquiring first face key point information and second face key point information when the first object and the second object show different face expressions, wherein the face key point information comprises: the number change information of the facial expression key points and the position change information of the facial expression key points;

performing feature aggregation on the first face key point information and the second face key point information to obtain face aggregation features;

and constructing a face aggregation key point between the first key point and the second key point based on the face aggregation characteristics to obtain a face neutral key point.

4. The method of claim 1, wherein the determining a neutral keypoint between the first keypoint and the second keypoint comprises:

respectively acquiring first human body key point information and second human body key point information when the first object and the second object carry out different human body actions, wherein the human body key point information comprises: the number change information of the human body action key points and the position change information of the human body action key points;

performing feature aggregation on the first human body key point information and the second human body key point information to obtain human body aggregation features;

and constructing a human body aggregation key point between the first key point and the second key point based on the human body aggregation characteristics to obtain a human body neutral key point.

5. The method of claim 1, wherein said migrating the first keypoint to the second keypoint based on the neutral keypoint comprises:

predicting the position information of the neutral key point on the second object based on a neural network model to obtain a predicted key point, wherein the predicted key point is obtained by predicting the movement track of the key point of the second object;

determining a mapping relation between the prediction key point and the first key point;

mapping the first keypoint on the predicted keypoint based on the mapping relationship.

6. The method of claim 5, further comprising:

obtaining neutral key point samples and target key point samples corresponding to the plurality of objects, wherein the target key points represent real key points corresponding to the plurality of objects;

predicting neutral key points in the neutral key point samples based on an initial neural network model to obtain predicted key point samples;

calculating a loss value of a loss function corresponding to the initial neural network model based on the predicted keypoint samples and the target keypoint samples;

and adjusting model parameters of the initial neural network model based on the loss value until the loss value reaches the minimum value, so as to obtain the neural network model.

7. An apparatus for migrating image keypoints, comprising:

the acquisition module is used for acquiring first key points of a first object contained in a source image and second key points of a second object contained in a target image, wherein the key points comprise at least one of the following items: human face expression key points and human body action key points;

a determination module to determine a neutral keypoint between the first keypoint and the second keypoint, wherein the neutral keypoint is characteristic of at least one of: the method comprises the following steps that the same human face characteristics are displayed by a plurality of objects when different human face expressions are displayed, and the same human body characteristics are displayed by the plurality of objects when different limb actions are carried out;

a migration module to migrate the first keypoint to the second keypoint based on the neutral keypoint.

8. The apparatus of claim 7, wherein the means for obtaining comprises:

and the extraction module is used for respectively extracting the first key point and the second key point from the source image and the target image.

9. The apparatus of claim 7, wherein the means for determining comprises:

a first obtaining module, configured to obtain first face key point information and second face key point information when the first object and the second object exhibit different facial expressions, respectively, where the face key point information includes: the number change information of the facial expression key points and the position change information of the facial expression key points;

the first aggregation module is used for performing feature aggregation on the first face key point information and the second face key point information to obtain face aggregation features;

and the first construction module is used for constructing the face aggregation key points between the first key points and the second key points based on the face aggregation characteristics to obtain the face neutral key points.

10. The apparatus of claim 7, wherein the means for determining comprises:

a second obtaining module, configured to separately obtain first human body key point information and second human body key point information when the first object and the second object perform different human body actions, where the human body key point information includes: the number change information of the human body action key points and the position change information of the human body action key points;

the second aggregation module is used for performing feature aggregation on the first human body key point information and the second human body key point information to obtain human body aggregation features;

and the second construction module is used for constructing the human body aggregation key points between the first key points and the second key points based on the human body aggregation characteristics to obtain the human body neutral key points.

11. The apparatus of claim 7, wherein the migration module comprises:

the first prediction module is used for predicting the position information of the neutral key point on the second object based on a neural network model to obtain a predicted key point, wherein the predicted key point is obtained by predicting the movement track of the key point of the second object;

a relationship determination module, configured to determine a mapping relationship between the predicted keypoint and the first keypoint;

a mapping module for mapping the first keypoint on the predicted keypoint based on the mapping relationship.

12. The apparatus of claim 11, the apparatus further comprising:

a third obtaining module, configured to obtain neutral key point samples and target key point samples corresponding to the multiple objects, where the target key point represents a real key point corresponding to the multiple objects;

the second prediction module is used for predicting neutral key points in the neutral key point samples based on the initial neural network model to obtain predicted key point samples;

a calculation module, configured to calculate a loss value of a loss function corresponding to the initial neural network model based on the predicted keypoint sample and the target keypoint sample;

and the adjusting module is used for adjusting the model parameters of the initial neural network model based on the loss value until the loss value reaches the minimum value, so as to obtain the neural network model.

13. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of migrating image keypoints according to any one of claims 1 to 6.

14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of migrating image keypoints according to any one of claims 1 to 6.

15. A computer program product comprising a computer program which, when executed by a processor, implements a method of migrating image keypoints according to any one of claims 1 to 6.