CN111447379B

CN111447379B - Method and device for generating information

Info

Publication number: CN111447379B
Application number: CN201910044033.9A
Authority: CN
Inventors: 张赫男; 高原; 孙昊; 文石磊; 丁二锐
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2019-01-17
Filing date: 2019-01-17
Publication date: 2022-08-23
Anticipated expiration: 2039-01-17
Also published as: CN111447379A

Abstract

The embodiment of the disclosure discloses a method and a device for generating information. The method for generating information comprises the following steps: acquiring a user limb action video; establishing a user key point model based on the limb key points in the user limb action video; acquiring a target person video; and mapping the action of the target character in the target character video to the target action of the user by adopting a user key point model. The method improves the pertinence and the accuracy of the generated target action of the user.

Description

Method and device for generating information

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for generating information.

Background

Currently, when a target video of a user imitating a target person is presented, there are two main interaction modes for action imitation: 1) heel-patting: reducing the target video, placing the target video at the corner of a screen, and simulating actions of a user while watching a target character in the video; 2) an action schematic frame: and abstracting the action of the target person in the target video into action schematic boxes, and simulating the corresponding action by the user according to the corresponding schematic boxes.

The above two traditional motion simulation interaction modes require a user to have a high simulation ability to reproduce the motion of the target person in the target video. And when the action of the target video is complex, the learning cost of the user is increased, and the action of the target person in the target video cannot be completely reproduced.

Disclosure of Invention

The embodiment of the disclosure provides a method and a device for generating information.

In a first aspect, an embodiment of the present disclosure provides a method for generating information, including: acquiring a user limb action video; establishing a user key point model based on the limb key points in the user limb action video; acquiring a target person video; and mapping the action of the target character in the target character video to the target action of the user by adopting a user key point model.

In some embodiments, building a user keypoint model based on limb keypoints in the user limb motion video comprises: identifying human face key points, gesture key points and human body key points of a user in each frame from video frames of the user limb action video; and establishing a user key point model based on the face key points, the gesture key points and the human body key points of the user in each frame.

In some embodiments, mapping the action of the target person in the target person video to the target action of the user using the user keypoint model comprises: identifying human face key points, gesture key points and human body key points of the target person in each frame from the video frames of the target person video; extracting the body information of the target person in each video frame based on the face key point, the gesture key point and the human body key point of the target person in each frame; adopting a self-adaptive algorithm to map the extracted limb information of the target person in each video frame to a user key point model to obtain the user key point model mapped with the limb information; and generating a target action of the user by adopting a generative confrontation network technology based on the user key point model with the mapped limb information.

In some embodiments, the adaptive algorithm comprises: a global attitude normalization algorithm, a time sequence smoothing algorithm and a Gaussian anti-shake algorithm.

In some embodiments, identifying, from the video frames of the target person video, face, gesture, and body keypoints of the target person in each frame comprises: separating target character images in each frame from video frames of the target character videos; and identifying the target figure image in each frame to obtain the face key point, the gesture key point and the human body key point of the target figure in each frame.

In some embodiments, the method further comprises: separating user backgrounds of all frames from the user limb action video; and synthesizing the target action of the user in the user background of each frame to obtain the user target video.

In a second aspect, an embodiment of the present disclosure provides an apparatus for generating information, including: a user video acquisition unit configured to acquire a user limb motion video; the user model establishing unit is configured to establish a user key point model based on the limb key points in the user limb action video; a target video acquisition unit configured to acquire a target person video; and the target action generating unit is configured to map the action of the target person in the target person video into the target action of the user by adopting the user key point model.

In some embodiments, the user model building unit is further configured to: identifying face key points, gesture key points and human body key points of a user in each frame from video frames of the user limb action video; and establishing a user key point model based on the face key points, the gesture key points and the human body key points of the user in each frame.

In some embodiments, the target action generation unit is further configured to: identifying human face key points, gesture key points and human body key points of the target person in each frame from the video frames of the target person video; extracting the body information of the target figure in each video frame based on the face key point, the gesture key point and the human body key point of the target figure in each frame; adopting a self-adaptive algorithm to map the extracted limb information of the target person in each video frame to a user key point model to obtain the user key point model mapped with the limb information; and generating a target action of the user by adopting a generative confrontation network technology based on the user key point model with the mapped limb information.

In some embodiments, the adaptive algorithm employed in the target action generation unit comprises: a global attitude normalization algorithm, a time sequence smoothing algorithm and a Gaussian anti-shake algorithm.

In some embodiments, the identifying, in the target motion generation unit, the face key point, the gesture key point, and the human body key point of the target person in each frame from the video frames of the target person video includes: separating target character images in each frame from video frames of the target character videos; and identifying the target figure image in each frame to obtain the face key point, the gesture key point and the human body key point of the target figure in each frame.

In some embodiments, the apparatus further comprises: the user background separation unit is configured to separate the user background of each frame from the user limb action video; and the user video synthesis unit is configured to synthesize the target action of the user in the user background of each frame to obtain the user target video.

In a third aspect, an embodiment of the present disclosure provides an apparatus, including: one or more processors; a storage device having one or more programs stored thereon; when executed by one or more processors, cause the one or more processors to implement a method as described above.

In a fourth aspect, the disclosed embodiments provide a computer readable medium having a computer program stored thereon, where the program when executed by a processor implements a method as described in any of the above.

According to the method and the device for generating the information, firstly, a user limb action video is obtained; then, establishing a user key point model based on the limb key points in the user limb action video; then, acquiring a target person video; and finally, mapping the action of the target person in the target person video into the target action of the user by adopting the user key point model. In the process, the user key point model is generated according to the user limb action video, and then the user key point model is adopted to generate the action of the target person in the target person video, so that the efficiency of generating the target action of the user is improved, and the pertinence and the accuracy of the generated target action of the user are improved.

Drawings

Other features, objects, and advantages of the disclosure will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which the present disclosure may be applied;

FIG. 2 is a schematic flow chart diagram illustrating one embodiment of a method of generating information in accordance with the present disclosure;

FIG. 3 is a schematic diagram of an application scenario of a method of generating information according to an embodiment of the present disclosure;

FIG. 4 is a schematic flow chart diagram illustrating yet another embodiment of a method of generating information in accordance with the present disclosure;

FIG. 5 is a schematic block diagram illustrating one embodiment of an apparatus for generating information according to the present disclosure;

FIG. 6 is a schematic block diagram of a computer system suitable for use with a server embodying embodiments of the present disclosure.

Detailed Description

The present disclosure is described in further detail below with reference to the figures and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and

servers

105, 106. Network 104 is the medium used to provide communication links between

terminal devices

101, 102, 103 and

servers

105, 106. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user 110 may use the

terminal devices

101, 102, 103 to interact with the

servers

105, 106 via the network 104 to receive or send messages or the like. Various communication client applications, such as a video capture application, a video play application, an instant messaging tool, a mailbox client, social platform software, a search engine application, a shopping application, etc., may be installed on the

terminal devices

101, 102, 103.

The

terminal devices

101, 102, 103 may be various electronic devices having a display screen, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, mpeg Audio Layer 3), MP4 players (Moving Picture Experts Group Audio Layer IV, mpeg Audio Layer 4), laptop and desktop computers, and the like.

The

servers

105, 106 may be servers providing various services, such as background servers providing support for the

terminal devices

101, 102, 103. The background server can analyze, store or calculate the data submitted by the terminal and push the analysis, storage or calculation result to the terminal device.

It should be noted that, in practice, the method for generating information provided by the embodiments of the present disclosure is generally performed by the

servers

105 and 106, and accordingly, the apparatus for generating information is generally disposed in the

servers

105 and 106. However, when the performance of the terminal device can satisfy the execution condition of the method or the setting condition of the device, the method for generating information provided by the embodiment of the disclosure may also be executed by the

terminal device

101, 102, 103, and the apparatus for generating information may also be provided in the

terminal device

101, 102, 103.

The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules, for example, to provide distributed services, or as a single piece of software or software module. And is not particularly limited herein.

It should be understood that the number of terminals, networks, and servers in fig. 1 are merely illustrative. There may be any number of terminals, networks, and servers, as desired for an implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method of generating information in accordance with the present disclosure is shown. The method for generating information comprises the following steps:

step 201, acquiring a user limb action video.

In this embodiment, an executing body (for example, a server or a terminal shown in fig. 1) of the method for generating information may obtain the user limb motion video from a local or remote database and other servers or terminals, and may also collect the user limb motion video via an image capture device. In the user limb motion video, at least part of video frames can identify the limb motion of the user.

Step 202, establishing a user key point model based on the limb key points in the user limb action video.

In this embodiment, the body motion of the user in each video frame may be detected from the body motion video of the user, and the body key points may be identified from the body motion of the user. The limb key point is a key part for executing actions in the limb of the human body. For example, the critical points of the limb may be joint location points such as the top of the head, the jaw, the neck-to-head junction, the neck-to-body junction, the shoulder joint, the elbow joint, the wrist joint, the finger joint, the thigh-to-body junction, the knee joint, and the ankle joint.

In identifying the body key points of the user, a technology for identifying the body key points of the user in the prior art or a technology developed in the future may be adopted, which is not limited by the present disclosure. For example, the body key points in each video frame may be identified based on a machine learning model, or the body key points may be labeled manually, and depth information of the human body may also be determined in an auxiliary manner by using other technologies (such as an infrared technology, a laser radar, and the like) while the video is captured, and then the body key points are determined according to the depth information and the video frames. After the body key points of the user are determined, a key point model of a standard three-dimensional image can be obtained and a corresponding relation is established with the body key points of the user, so that a user key point model is obtained.

In some optional implementations of this embodiment, establishing the user keypoint model based on the limb keypoints in the user limb action video includes: identifying human face key points, gesture key points and human body key points of a user in each frame from video frames of the user limb action video; and establishing a user key point model based on the face key points, the gesture key points and the human body key points of the user in each frame.

In the implementation manner, the body key points in the user body action video are further defined, and the body key points can comprise face key points, gesture key points and human body key points of the user. The face key points may be face key points in the prior art or face key points in a future developed technology. For example, the face keypoints may include 68 face keypoints. The 68 individual face key points divide the face key points into internal key points and contour key points, wherein the internal key points comprise 51 key points in total, such as eyebrows, eyes, noses and mouths, and the contour key points comprise 17 key points. Then, based on the identified human face key points, gesture key points and human body key points in each frame, a user key point model can be established.

And step 203, acquiring a target person video.

In this embodiment, the executing entity (for example, the server or the terminal shown in fig. 1) of the above-mentioned method for generating information may obtain the target person video from a local or remote database and other servers or terminals, and may also capture the target person video via the image capturing device.

And step 204, mapping the action of the target person in the target person video into the target action of the user by adopting the user key point model.

In the present embodiment, the motion of the target person in the target person video may be recognized as the body information of the target person, such as the absolute position and relative position of each body part, such as the upper arm, lower arm, thigh, lower leg, head, and hand. Then, the body information of the target person is mapped to the target action of the user.

In some optional implementation manners of this embodiment, mapping, by using the user key point model, the action of the target person in the target person video to the target action of the user includes: identifying human face key points, gesture key points and human body key points of the target person in each frame from the video frames of the target person video; extracting the body information of the target figure in each video frame based on the face key point, the gesture key point and the human body key point of the target figure in each frame; adopting a self-adaptive algorithm to map the extracted limb information of the target person in each video frame to a user key point model to obtain the user key point model mapped with the limb information; and generating a target action of the user by adopting a generative confrontation network technology based on the user key point model mapped with the limb information.

In this implementation, the adaptive algorithm may include: a global attitude normalization algorithm, a time sequence smoothing algorithm and a Gaussian anti-shake algorithm. Specifically, the global pose normalization algorithm may be adopted to consider the difference between the shapes of the human body in the source video and the target video and the difference between the positions in the respective videos, that is, to obtain the limb information of the target person in the target person video, and convert the limb information into the limb information of the user through normalization. And then, a time sequence smoothing algorithm can be adopted to enable the action in each video frame to be more coherent, and a Gaussian anti-shake algorithm is adopted to change the fuzzy face of the user video caused by the action, so that the face of the user video is more vivid and clear.

A Generative Adaptive Networks (GAN) is a method for unsupervised learning on complex distributions in a deep learning model. The model passes through (at least) two modules in the framework: the mutual game learning of the Generative Model (Generative Model) and the Discriminative Model (Discriminative Model) yields a reasonably good output. The generative confrontation network does not require that the generative model and the discriminant model are both neural networks, and only needs to be a function capable of fitting corresponding generation and discrimination. Wherein the generative model is a user key point model given the mapped limb information to generate a target motion of the user. The discriminant model needs to predict whether the generated target action of the user is real. And based on the dynamic game of the generative model and the discriminant model, a spurious and spurious generative model can be obtained, and the target user action generated by the generative model can be obtained.

In some optional implementation manners of this embodiment, identifying, from the video frames of the target person video, the face key point, the gesture key point, and the human body key point of the target person in each frame includes: separating a target character image in each frame from a video frame of a target character video; and identifying the target figure image in each frame to obtain the face key point, the gesture key point and the human body key point of the target figure in each frame.

In this embodiment, by first separating the target person image in each frame, it is possible to ensure that the data of the recognition target is valid data with a higher probability, thereby improving the recognition efficiency.

An exemplary application scenario of the method of generating information of the present disclosure is described below in conjunction with fig. 3.

As shown in fig. 3, fig. 3 shows a schematic flow diagram of one application scenario of a method of generating information according to the present disclosure.

As shown in fig. 3, a method 300 of generating information operates in an execution body 310 and may include:

firstly, acquiring a user limb action video 301;

then, establishing a user key point model 303 based on the limb key points 302 in the user limb action video;

then, acquiring a target person video 304;

finally, the action 305 of the target person in the target person video 304 is mapped to the target action 306 of the user using the user keypoint model 303.

It should be understood that the application scenario of the method for generating information shown in fig. 3 is only an exemplary description of the method for generating information, and does not represent a limitation on the method. For example, the steps shown in fig. 3 above may be implemented in further detail.

The method for generating information provided by the embodiment of the disclosure includes firstly acquiring a user limb action video; then, establishing a user key point model based on the limb key points in the user limb action video; then, acquiring a target person video; and finally, mapping the action of the target character in the target character video to the target action of the user by adopting a user key point model. In the process, the action of the target character realized by the user key point model is generated based on the user key point model and the action of the target character in the target character video, so that the target action of the user is realized, the efficiency of generating the target action of the user is improved, and the pertinence and the accuracy of the generated target action of the user are improved.

Referring to fig. 4, a flow diagram of yet another embodiment of a method of generating information in accordance with the present disclosure is shown.

As shown in fig. 4, a flow 400 of the method for generating information according to the present embodiment may include the following steps:

in step 401, a user limb motion video is acquired.

In step 402, a user key point model is built based on the limb key points in the user limb motion video.

In this embodiment, the body motion of the user in each video frame may be detected from the body motion video of the user, and the body key points may be identified from the body motion of the user. The limb key point is a key part for executing actions in the limb of the human body. For example, the critical points of the limb may be joint location points such as the crown of the head, the chin, the neck-head junction, the neck-body junction, the shoulder joints, the elbow joints, the wrist joints, the finger joints, the thigh-body junction, the knee joints, and the ankle joints.

In identifying the body key points of the user, a technology for identifying the body key points of the user in the prior art or a technology developed in the future may be adopted, which is not limited by the present disclosure. For example, the body key points in each video frame may be identified based on a machine learning model, or the body key points may be labeled manually, or other technologies (such as infrared technology, laser radar, and the like) may be used to assist in determining the depth information of the human body while shooting the video, and then the body key points may be determined according to the depth information and the video frames. After the body key points of the user are determined, a key point model of a standard three-dimensional image can be obtained and a corresponding relation is established with the body key points of the user, so that a user key point model is obtained.

In step 403, a target person video is obtained.

In step 404, the action of the target person in the target person video is mapped to the target action of the user using the user keypoint model.

In this embodiment, the motion of the target person in the target person video may be recognized as the body information of the target person, such as the absolute position and relative position of each part of the body, such as the upper arm, lower arm, thigh, lower leg, head, and hand. Then, the body information of the target person is mapped to the target action of the user.

It should be understood that steps 401 to 404 described above correspond to steps 201 to 204 in the embodiment shown in fig. 2. Therefore, the operations and features in steps 201 to 204 are also applicable to steps 401 to 404, and are not described herein again.

In step 405, the user background of each frame is separated from the video of the user's limb movement.

In the embodiment, the background of the target motion of the user is determined by separating the user background of each frame from the user limb motion video.

In step 406, the target motion of the user is synthesized in the user background of each frame to obtain a user target video.

In this embodiment, by synthesizing the target motion of the user into the user background of each frame, the user motion of each frame after synthesizing the background can be obtained, so as to obtain the user target video.

It should be understood that the application scenario of the method for generating information shown in fig. 4 is only an exemplary description of the method for generating information, and does not represent a limitation on the method. For example, the steps 401 to 404 shown in fig. 4 may also be further implemented by using the optional implementation manner in the steps 201 to 204. The present disclosure is not limited thereto.

The method for generating information according to the above embodiment of the present disclosure is different from the embodiment shown in fig. 2 in that: separating the user background of each frame from the user limb action video; and synthesizing the target action of the user in the user background of each frame to obtain the user target video. In the process, the generated target action of the user can be synthesized in the user background of each frame, and the pertinence and the accuracy of the finally generated target video of the user are improved.

With further reference to fig. 5, as an implementation of the methods shown in the above-mentioned figures, the present disclosure provides an embodiment of an apparatus for generating information, which corresponds to the method embodiments shown in fig. 2 to fig. 4, and which can be applied in various electronic devices in particular.

As shown in fig. 5, the apparatus 500 for generating information of the present embodiment may include: a user video acquiring unit 510 configured to acquire a user limb motion video; a user model establishing unit 520 configured to establish a user key point model based on the limb key points in the user limb motion video; a target video acquiring unit 530 configured to acquire a target person video; and a target action generating unit 540 configured to map the action of the target person in the target person video to the target action of the user by using the user key point model.

In some optional implementations of the present embodiment, the user model establishing unit 520 is further configured to: identifying face key points, gesture key points and human body key points of a user in each frame from video frames of the user limb action video; and establishing a user key point model based on the face key points, the gesture key points and the human body key points of the user in each frame.

In some optional implementations of this embodiment, the target action generating unit 540 is further configured to: identifying human face key points, gesture key points and human body key points of the target person in each frame from the video frames of the target person video; extracting the body information of the target figure in each video frame based on the face key point, the gesture key point and the human body key point of the target figure in each frame; adopting a self-adaptive algorithm to map the extracted limb information of the target person in each video frame to a user key point model to obtain the user key point model mapped with the limb information; and generating a target action of the user by adopting a generative confrontation network technology based on the user key point model with the mapped limb information.

In some optional implementations of the present embodiment, the adaptive algorithm adopted in the target action generating unit 540 includes: a global attitude normalization algorithm, a time sequence smoothing algorithm and a Gaussian anti-shake algorithm.

In some optional implementations of the embodiment, the identifying, in the target action generating unit 540, the face key point, the gesture key point, and the human body key point of the target person in each frame from the video frames of the target person video includes: separating target character images in each frame from video frames of the target character videos; and identifying the target figure image in each frame to obtain the face key point, the gesture key point and the human body key point of the target figure in each frame.

In some optional implementations of this embodiment, the apparatus further comprises: a user background separating unit 550 configured to separate the user background of each frame from the user limb motion video; and a user video synthesizing unit 560 configured to synthesize the target motion of the user in the user background of each frame to obtain a user target video.

It should be understood that the units recited in the apparatus 500 may correspond to various steps in the method described with reference to fig. 2-4. Thus, the operations and features described above for the method are equally applicable to the apparatus 500 and the units included therein, and are not described in detail here.

Referring now to FIG. 6, shown is a schematic block diagram of an electronic device (e.g., a server or terminal device of FIG. 1) 600 suitable for use in implementing embodiments of the present disclosure. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a fixed terminal such as a digital TV, a desktop computer, and the like. The terminal device/server shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 6, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; a storage device 608 including, for example, a hard disk; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 6 may represent one device or may represent multiple devices as desired.

In particular, the processes described above with reference to the flow diagrams may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of the embodiments of the present disclosure. It should be noted that the computer readable medium described in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a user limb action video; establishing a user key point model based on the limb key points in the user limb action video; acquiring a target person video; and mapping the action of the target character in the target character video to the target action of the user by adopting a user key point model.

Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a user video acquisition unit, a user model building unit, a target video acquisition unit, and a target action generation unit. The names of the units do not form a limitation on the units themselves in some cases, for example, the user video acquiring unit may also be described as a "unit acquiring the user limb action video".

As another aspect, the present disclosure also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: acquiring a user limb action video; establishing a user key point model based on the limb key points in the user limb action video; acquiring a target person video; and mapping the action of the target character in the target character video to the target action of the user by adopting a user key point model.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept as defined above. For example, the above features and (but not limited to) the features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

Claims

1. A method of generating information, comprising:

acquiring a user limb action video;

establishing a user key point model based on the limb key points in the user limb action video;

acquiring a target person video;

adopting the user key point model to map the action of the target character in the target character video to the target action of the user, wherein the mapping comprises the following steps: identifying human face key points, gesture key points and human body key points of the target person in each frame from the video frames of the target person video; extracting the body information of the target person in each video frame based on the face key point, the gesture key point and the human body key point of the target person in each frame; adopting a self-adaptive algorithm to map the extracted limb information of the target person in each video frame to the user key point model to obtain the user key point model with mapped limb information; and generating a target action of the user based on the user key point model with the mapped limb information by adopting a generative confrontation network technology.

2. The method of claim 1, wherein the building a user keypoint model based on limb keypoints in the user limb action video comprises:

identifying human face key points, gesture key points and human body key points of the user in each frame from the video frames of the user limb action video;

and establishing a user key point model based on the face key points, the gesture key points and the human body key points of the user in each frame.

3. The method of claim 1, wherein the adaptive algorithm comprises: a global attitude normalization algorithm, a time sequence smoothing algorithm and a Gaussian anti-shake algorithm.

4. The method of claim 1, wherein the identifying, from the video frames of the target person video, face, gesture, and body keypoints of the target person in each frame comprises:

separating target character images in each frame from the video frames of the target character videos;

and identifying the target figure images in each frame to obtain human face key points, gesture key points and human body key points of the target figures in each frame.

5. The method of claim 1, wherein the method further comprises:

separating user backgrounds of all frames from the user limb action video;

and synthesizing the target action of the user in the user background of each frame to obtain a user target video.

6. An apparatus to generate information, comprising:

a user video acquisition unit configured to acquire a user limb motion video;

a user model establishing unit configured to establish a user key point model based on the limb key points in the user limb action video;

a target video acquisition unit configured to acquire a target person video;

a target action generating unit configured to map the action of the target person in the target person video to a target action of the user by using the user key point model;

the target action generation unit is further configured to: identifying face key points, gesture key points and human body key points of the target person in each frame from the video frames of the target person video; extracting the body information of the target person in each video frame based on the face key point, the gesture key point and the human body key point of the target person in each frame; adopting a self-adaptive algorithm to map the extracted limb information of the target person in each video frame to the user key point model to obtain the user key point model with the mapped limb information; and generating a target action of the user based on the user key point model of the mapped limb information by adopting a generative confrontation network technology.

7. The apparatus of claim 6, wherein the user model building unit is further configured to:

8. The apparatus of claim 6, wherein the adaptive algorithm employed in the target action generating unit comprises: a global attitude normalization algorithm, a time sequence smoothing algorithm and a Gaussian anti-shake algorithm.

9. The apparatus according to claim 6, wherein the identifying, in the target action generating unit, face key points, gesture key points, and human key points of the target person in each frame from the video frames of the target person video comprises:

separating a target character image in each frame from the video frame of the target character video;

10. The apparatus of claim 6, wherein the apparatus further comprises:

a user background separation unit configured to separate a user background of each frame from the user limb action video;

and the user video synthesis unit is configured to synthesize the target action of the user in the user background of each frame to obtain a user target video.

11. A server, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-5.

12. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-5.