CN113902845A

CN113902845A - Motion video generation method and device, electronic equipment and readable storage medium

Info

Publication number: CN113902845A
Application number: CN202111176082.1A
Authority: CN
Inventors: 黄永祯; 任禹衡; 谢曙光
Original assignee: Watrix Technology Beijing Co ltd
Current assignee: Watrix Technology Beijing Co ltd
Priority date: 2021-10-09
Filing date: 2021-10-09
Publication date: 2022-01-07

Abstract

The application provides a motion video generation method, a motion video generation device, an electronic device and a readable storage medium, wherein the method comprises the following steps: acquiring a standard relative rotation information sequence of a reference object and a target human body three-dimensional model of a target object; the standard relative rotation information sequence is used for representing the relative rotation relation between target key point groups of the reference object when the reference object executes the specified action, and the target key point groups are two connected reference human body key points of the reference object; adjusting each human body key point on the target human body three-dimensional model according to the standard relative rotation information sequence so as to enable the target human body three-dimensional model to execute a specified action; and acquiring a target video of the target human body three-dimensional model executing the specified action. By the method, the difference between the action of the target object after learning according to the video and the standard action of the reference object can be reduced, and the standard degree of the action of the target object is improved.

Description

Motion video generation method and device, electronic equipment and readable storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a motion video generation method and apparatus, an electronic device, and a readable storage medium.

Background

The development of internet technology enables various teaching modes to be changed greatly, and the teaching mode is also changed greatly for action teaching. Through the Internet, various action teaching resources can be acquired, for example, a teacher can issue a dance video, a martial arts video, a fitness video, a rehabilitation training video and the like through the Internet, so that students can learn through the Internet in any time and any place according to actions demonstrated by the teacher in the action teaching video.

In the prior art, when a student learns according to the movement demonstrated by a teacher in a movement teaching video, the body shapes of the teacher and the student may be greatly different, so that the student easily understands the movement in the learning process to generate deviation, and therefore, the learning mode enables the student to have a larger difference between the movement executed after the student learns according to the teaching video and the standard movement of the teacher, so that the standard degree of the movement of the student is influenced.

Disclosure of Invention

In view of the above, an object of the present application is to provide a motion video generation method, device, electronic device and readable storage medium, which are beneficial to reducing the difference between the motion executed by a student after learning according to a video and the standard motion of a teacher, and are beneficial to improving the standard degree of the motion of the student.

In a first aspect, an embodiment of the present application provides a motion video generation method, including:

acquiring a standard relative rotation information sequence of a reference object and a target human body three-dimensional model of a target object; the standard relative rotation information sequence is used for representing a relative rotation relation between target key point groups of the reference object when the reference object executes a specified action, wherein the target key point groups are two connected reference human body key points of the reference object;

adjusting each human body key point on the target human body three-dimensional model according to the standard relative rotation information sequence so as to enable the target human body three-dimensional model to execute the specified action;

and acquiring a target video of the target human body three-dimensional model executing the specified action.

With reference to the first aspect, an embodiment of the present application provides a first possible implementation manner of the first aspect, where the acquiring a standard relative rotation information sequence of a reference object includes:

acquiring a standard action video of the reference object executing the specified action;

for each frame of first image in the standard motion video, determining first two-dimensional key point coordinates of the reference object in the first image; wherein, the first two-dimensional key point coordinate is used for representing the position of the reference human body key point in the first image;

according to a first time sequence of the first image in the standard motion video, sorting the first two-dimensional key point coordinates corresponding to the first image according to the first time sequence, inputting the first two-dimensional key point coordinates into a full convolution model based on a cavity time domain of a 2D joint point, and outputting a standard three-dimensional key point sequence of the reference object; wherein the standard three-dimensional keypoint sequence comprises standard three-dimensional keypoint coordinates of the reference object in each of the first images;

converting the standard three-dimensional key point sequence into an information sequence among standard key points; the information sequence between the standard key points comprises a reference skeleton distance information sequence between every two connected reference human body key points and the standard relative rotation information sequence.

With reference to the first aspect, an embodiment of the present application provides a second possible implementation manner of the first aspect, where the obtaining a target three-dimensional human body model of a target object includes:

acquiring an initial action video of the target object;

for each frame of second image in the initial motion video, segmenting the target object and the background in the second image to obtain body contour information of the target object in the second image;

for each second image, acquiring information between initial key points corresponding to the second image; the information between the initial key points comprises initial relative rotation information between every two connected target human body key points of the target object and target skeleton distance information; the target bone distance information corresponding to each second image is the same;

determining an average body contour parameter of the target object based on the body contour information corresponding to each second image and the information between the initial key points;

for each second image, inputting the average body contour parameter and the information between the initial key points corresponding to the second image into a human body parameterized model, and outputting a first human body three-dimensional model corresponding to the second image;

for each second image, after the first human body three-dimensional model corresponding to the second image is projected onto the second image, acquiring texture information of the first human body three-dimensional model at a corresponding position on the second image so as to obtain the texture information of each position on the target object epidermis;

determining the target three-dimensional model of the target object based on the target skeletal distance information, the average body contour parameter, and the texture information at each location on the target object's epidermis.

With reference to the second possible implementation manner of the first aspect, an embodiment of the present application provides a third possible implementation manner of the first aspect, where, for each second image, obtaining information between initial key points corresponding to the second image includes:

for each second image, determining second two-dimensional keypoint coordinates of the target object in the second image; the second two-dimensional key point coordinates are used for representing the position of the target human body key point in the second image;

according to a second time sequence of the second image in the initial motion video, sorting the second two-dimensional key point coordinates corresponding to the second image according to the second time sequence, inputting the second two-dimensional key point coordinates into the full convolution model based on the cavity time domain of the 2D joint point, and outputting an initial three-dimensional key point sequence of the target object; wherein the initial three-dimensional keypoint sequence comprises initial three-dimensional keypoint coordinates of the target object in each of the second images;

and for each second image, converting the initial three-dimensional key point coordinates corresponding to the second image into information among the initial key points.

With reference to the second possible implementation manner of the first aspect, this application example provides a fourth possible implementation manner of the first aspect, where the determining an average body contour parameter of the target object based on the body contour information corresponding to each second image and the information between the target key points includes:

for each second image, inputting standard body contour parameters and information between the initial key points corresponding to the second image into the human body parameterized model, and outputting a second human body three-dimensional model of the target object corresponding to the second image;

for each second image, projecting the second three-dimensional human body model corresponding to the second image onto the second image, correcting the standard body contour parameters through the body contour information in the second image, and when the second three-dimensional human body model corresponding to the second image is overlapped with the body contour of the target object in the second image, taking the corrected standard body contour parameters as the target body contour parameters corresponding to the target object on the second image;

obtaining an average body contour parameter of the target object based on the target body contour parameter corresponding to each second image; wherein the average body contour parameter is an average of the target body contour parameters.

With reference to the first aspect, an embodiment of the present application provides a fifth possible implementation manner of the first aspect, where the acquiring a standard relative rotation information sequence of a reference object includes:

acquiring the rotation amount of the reference object at each reference human key point at each moment when the specified action is executed; wherein the amount of rotation is acquired by an inertial measurement device;

for each moment, calculating relative rotation information between every two connected reference human body key points corresponding to the moment according to the rotation amount of each reference human body key point corresponding to the moment;

and combining the relative rotation information corresponding to each moment according to a time sequence to obtain the standard relative rotation information sequence of the acquired reference object.

With reference to the first aspect, an embodiment of the present application provides a sixth possible implementation manner of the first aspect, where the obtaining a target three-dimensional human body model of a target object includes:

acquiring third images of the target object at all angles at the same moment; wherein the third image is captured by a multi-view camera;

inputting the third image into a three-dimensional reconstruction model, and outputting a third human body three-dimensional model for representing the contour and the texture of the target object;

for each third image, determining third two-dimensional keypoint coordinates of the target object in the third image; the third two-dimensional key point coordinate is used for representing the position of the target human body key point in the third image;

determining the three-dimensional coordinates of each target human body key point based on the third two-dimensional key point coordinates corresponding to each third image and the camera parameters of the multi-view camera; wherein the three-dimensional coordinates are used to represent the location of each of the target human body keypoints on the third human body three-dimensional model;

and constructing the target human body three-dimensional model of the target object based on the third human body three-dimensional model and the three-dimensional coordinates of the target human body key points.

In a second aspect, an embodiment of the present application further provides a motion video generating apparatus, including:

the first acquisition module is used for acquiring a standard relative rotation information sequence of a reference object and a target human body three-dimensional model of a target object; the standard relative rotation information sequence is used for representing a relative rotation relation between target key point groups of the reference object when the reference object executes a specified action, wherein the target key point groups are two connected reference human body key points of the reference object;

the adjusting module is used for adjusting each human body key point on the target human body three-dimensional model according to the standard relative rotation information sequence so as to enable the target human body three-dimensional model to execute the specified action;

and the second acquisition module is used for acquiring a target video of the target human body three-dimensional model executing the specified action.

With reference to the second aspect, an embodiment of the present application provides a first possible implementation manner of the second aspect, where the first obtaining module, when configured to obtain a standard relative rotation information sequence of a reference object, is specifically configured to:

With reference to the second aspect, an embodiment of the present application provides a second possible implementation manner of the second aspect, where the first obtaining module, when configured to obtain a target three-dimensional human body model of a target object, is specifically configured to:

acquiring an initial action video of the target object;

With reference to the second possible implementation manner of the second aspect, an embodiment of the present application provides a third possible implementation manner of the second aspect, where the first obtaining module, when configured to obtain, for each second image, information between initial key points corresponding to the second image, is specifically configured to:

With reference to the second possible implementation manner of the second aspect, an embodiment of the present application provides a fourth possible implementation manner of the second aspect, where the first obtaining module, when configured to determine the average body contour parameter of the target object based on the body contour information corresponding to each of the second images and the information between the target key points, is specifically configured to:

With reference to the second aspect, an embodiment of the present application provides a fifth possible implementation manner of the second aspect, where the first obtaining module, when configured to obtain a standard relative rotation information sequence of a reference object, is specifically configured to:

With reference to the second aspect, an embodiment of the present application provides a sixth possible implementation manner of the second aspect, where the first obtaining module, when configured to obtain a target three-dimensional human body model of a target object, is specifically configured to:

In a third aspect, an embodiment of the present application further provides an electronic device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is running, the machine-readable instructions being executable by the processor to perform the steps of any one of the possible implementations of the first aspect.

In a fourth aspect, this application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program is executed by a processor to perform the steps in any one of the possible implementation manners of the first aspect.

In the application, a target three-dimensional model of a target object and a standard relative rotation information sequence used for representing a relative rotation relation between a target key point group of the reference object when the reference object executes a specified action are obtained, and each human key point on the target three-dimensional model is adjusted according to the standard relative rotation information sequence, so that the target three-dimensional model executes the specified action, and a target video when the target three-dimensional model executes the specified action is obtained, and at the moment, the target object can learn according to the specified action executed by the target three-dimensional model in the target video. By the method, the problem that the understanding of the target object to the action is deviated in the learning process due to the shape difference between the target object and the reference object is solved, so that the difference between the action executed by the target object after video learning and the standard action of the reference object can be reduced, and the standard degree of the action of the target object is improved.

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

Fig. 1 shows a flowchart of a motion video generation method provided by an embodiment of the present application;

FIG. 2 is a schematic diagram illustrating key points of a human body according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram illustrating a motion video generating apparatus according to an embodiment of the present application;

fig. 4 shows a schematic structural diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

In the prior art, when the student learns according to the action teaching video that the teacher recorded, because teacher and student's figure (for example, tall and short fat thin etc.) probably have very big difference for the student produces the deviation according to the understanding of the in-process of the video study that the teacher recorded to the action, so such learning mode can make the student according to the action of the video study back execution that the teacher recorded and the difference between the standard action of teacher great, thereby influence the standard degree of student's action.

In view of the foregoing, embodiments of the present application provide a motion video generation method, device, electronic device, and readable storage medium, which are advantageous for reducing the difference between the motion performed by the target object (student) after learning according to the video recorded by the reference object (teacher) and the standard motion of the reference object (teacher), and improving the standard degree of motion of the target object (student), and are described below by way of embodiments.

The first embodiment is as follows:

to facilitate understanding of the present embodiment, a motion video generation method disclosed in the first embodiment of the present application will be described in detail first. Fig. 1 shows a flowchart of a motion video generation method provided in an embodiment of the present application, and as shown in fig. 1, the method includes the following steps:

s101: acquiring a standard relative rotation information sequence of a reference object and a target human body three-dimensional model of a target object; the standard relative rotation information sequence is used for representing a relative rotation relation between target key point groups of the reference object when the reference object executes a specified action, and the target key point groups are two connected reference human body key points of the reference object.

In embodiments of the present application, the designated action may be a continuous action, such as a dance action, martial arts action, etc.; or may be discrete individual movements such as campwalks, standing army postures, etc. Wherein the designated action comprises one or more of a dance action, a fitness action, a martial arts action, and a rehabilitation training action. The target object refers to an object to be learned and practiced with a specified action, and the reference object refers to an object capable of making a standard specified action. For example, when the movement is designated as a dance movement, the reference object refers to a dance teacher, and the target object refers to a dance student; when an action is specified as a rehabilitation training action, the reference object refers to a medical staff demonstrating the specified action, and the target object refers to a patient.

The target three-dimensional human body model refers to a three-dimensional model constructed according to a body contour of a target object, a texture of the target object, and target human body key points of the target object, and in the present application, the target three-dimensional human body model is the same as or proportionally similar to the body contour of the target object (for example, the same height and weight or the same proportional similar), the target three-dimensional human body model is the same as the texture of the target object (for example, the color of the skin of the target object, the color of clothes worn, and the like), and the bone distance information between the target three-dimensional human body model and two connected human body key points of the target object is the same (for example, the same bone distance between the elbow and the wrist on the same side of the body).

Fig. 2 illustrates a schematic diagram of key points of a human body provided by an embodiment of the present application, as shown in fig. 2, the key points of the human body include one or more of eyes, a nose 1, a mouth 2, ears, a wrist joint, an elbow joint, a shoulder joint, a knee joint, and a hip joint of the human body. Wherein the eyes include a left eye 3 and a right eye 4; the ears comprise a left ear 5 and a right ear 6; the wrist joints comprise a left wrist joint 7, a right wrist joint 8, a left ankle joint 9 and a right ankle joint 10; the elbow joint comprises a left elbow joint 11 and a right elbow joint 12; the shoulder joints include a left shoulder joint 13 and a right shoulder joint 14; the knee joint comprises a left knee joint 15 and a right knee joint 16; the hip joint comprises a left hip joint 17 and a right hip joint 18. The human key points of the target object are target human key points, the human key points of the reference object are reference human key points, namely the target human key points and the reference human key points comprise one or more of eyes, a nose, a mouth, ears, wrist joints, elbow joints, shoulder joints, knee joints and hip joints of the human body.

The target keypoint group refers to two connected reference body keypoints of the reference object, which may be, for example, the right elbow and right shoulder, the left ankle and the left knee of the reference object, and it is noted that the right wrist and the right shoulder of the reference object do not belong to the two connected reference body keypoints.

In the present application, the standard relative rotation information sequence is used to indicate a relative rotation relationship between the target keypoint groups of the reference object when the reference object performs the specified action, wherein the relative rotation relationship may be a relative rotation angle and a relative rotation direction.

S102: and adjusting each human body key point on the target human body three-dimensional model according to the standard relative rotation information sequence so as to enable the target human body three-dimensional model to execute the specified action.

In the application, the standard relative rotation information sequence is imported into the target human body three-dimensional model, and each human body key point on the target human body three-dimensional model is adjusted according to the standard relative rotation information sequence, so that the target human body three-dimensional model executes the specified action.

S103: and acquiring a target video of the target human body three-dimensional model executing the specified action.

In the application, after the target three-dimensional human body model can completely execute the specified action, the target video of the target three-dimensional human body model executing the specified action is obtained, and at the moment, the target object can learn and practice according to the specified action demonstrated by the target three-dimensional human body model in the target video.

Since there may be a great difference between the shapes of the reference object and the target object, which may cause deviation in understanding of the motion of the target object when the target object learns according to the video recorded by the reference object, in the present application, the target object is made to practice according to the video by generating the motion video (i.e., the target video) when the target object performs the specified motion, so that the difference between the motion performed by the target object after learning according to the video recorded by the reference object and the standard motion of the reference object is reduced, and the standard degree of the motion of the target object is improved.

In the embodiment of the application, two methods are provided for acquiring the standard relative rotation information sequence of the reference object, wherein the first method does not need additional acquisition equipment and can determine the standard relative rotation information sequence of the reference object directly according to the standard motion video of the reference object; the second method does not need to record the standard motion video of the reference object, and can determine the standard relative rotation information sequence of the reference object according to additional equipment. The first method is explained in detail below.

In a possible implementation manner, when step S101 is executed to acquire the standard relative rotation information sequence of the reference object, the following steps may be specifically executed:

s111: and acquiring a standard action video of the reference object executing the specified action.

In the method, a standard action video when a reference object executes a specified action is obtained, wherein the standard action video comprises the reference object executing the whole specified action, and a first image of each frame in the standard action video is determined according to the standard action video.

S112: for each frame of first image in the standard motion video, determining first two-dimensional key point coordinates of the reference object in the first image; and the first two-dimensional key point coordinates are used for representing the position of the reference human body key point in the first image.

In the application, for each frame of the first image, the first two-dimensional key point coordinates of the reference body key points of the reference object in the first image are determined. Each frame of first image corresponds to a first two-dimensional key point coordinate, and the first two-dimensional key point coordinate comprises the positions of all reference human body key points in the first image.

S113: according to a first time sequence of the first image in the standard motion video, sorting the first two-dimensional key point coordinates corresponding to the first image according to the first time sequence, inputting the first two-dimensional key point coordinates into a full convolution model based on a cavity time domain of a 2D joint point, and outputting a standard three-dimensional key point sequence of the reference object; wherein the standard three-dimensional keypoint sequence comprises standard three-dimensional keypoint coordinates of the reference object in each of the first images.

According to a first time sequence of each frame of first image in a standard action video, sorting the first two-dimensional key point coordinates corresponding to the first image according to the first time sequence to obtain a first two-dimensional key point coordinate sequence; and inputting the first two-dimensional key point coordinate sequence into a full convolution model (VideoPose3D) based on a cavity time domain of the 2D joint point, and outputting a standard three-dimensional key point sequence of the reference object. The standard three-dimensional key point sequence comprises standard three-dimensional key point coordinates of the reference object in each first image. The standard three-dimensional key point coordinates are used for representing the spatial coordinate positions of the key points of the reference human body.

S114: converting the standard three-dimensional key point sequence into an information sequence among standard key points; the information sequence between the standard key points comprises a reference skeleton distance information sequence between every two connected reference human body key points and the standard relative rotation information sequence.

In the application, the three-dimensional key point sequence includes standard three-dimensional key point coordinates of reference human key points in each first image, and the standard three-dimensional key point coordinates are converted into information among the standard key points for the standard three-dimensional key point coordinates corresponding to each first image, wherein the information among the standard key points includes reference bone distance information and standard relative rotation information between the reference human key points of the reference object in the first image, the reference bone distance information refers to a distance between every two connected reference human key points of the reference object, and the standard relative rotation information refers to a relative rotation relationship between the two connected reference human key points in each first image. And according to the time sequence of each first image in the standard motion video, combining the information among the standard key points corresponding to all the first images to obtain an information sequence among the standard key points.

The reference skeletal distance information sequence is used to represent the distance between two reference keypoints in the target keypoint group of the reference object, e.g. the distance between the left wrist and the left elbow, the distance between the right elbow and the right shoulder. The reference bone distance information sequence comprises reference bone distance information between two connected reference human key points in each first image, and the reference bone distance information represents the distance between the two connected reference human key points in the first image.

The second method is explained in detail below.

In another possible implementation manner, when step S101 is executed to acquire the standard relative rotation information sequence of the reference object, the following steps may be specifically executed:

s121: acquiring the rotation amount of the reference object at each human key point at each moment when the specified action is executed; wherein the amount of rotation is acquired by an inertial measurement device.

In the application, inertial measurement equipment (imu equipment) is worn at each reference human key point position of a reference object, after the reference object wears the inertial measurement equipment, designated action is executed, and at the moment, the inertial measurement equipment collects and stores the rotation amount of each reference human key point in real time until the reference object finishes the whole set of designated action, and the inertial measurement equipment finishes collection. The rotation amount refers to the rotation change of each reference human body key point, such as the change of the rotation angle and the change of the rotation direction.

S122: and for each moment, calculating relative rotation information between every two connected reference human body key points corresponding to the moment according to the rotation amount of each reference human body key point corresponding to the moment.

In the present application, the inertial measurement device collects the rotation amount at each key point of the reference body once every moment, wherein each moment can be set according to actual conditions, for example, each second, each millisecond and the like can be set. In the present application, the rotation amount may be a rotation angle and a rotation direction of the reference human body key point. And for each moment, calculating the relative rotation information between every two connected reference human body key points corresponding to the moment according to the rotation amount of all the reference human body key points corresponding to the moment. At this time, the relative rotation information refers to a relative rotation relationship between two connected reference human body key points of the reference object at the same time.

S123: and combining the relative rotation information corresponding to each moment according to a time sequence to obtain the standard relative rotation information sequence of the acquired reference object.

In the application, each time corresponds to one piece of relative rotation information, and the relative rotation information corresponding to each time is combined according to a time sequence to obtain a standard relative rotation information sequence of the reference object.

In the embodiment of the application, two methods are provided for obtaining the target human body three-dimensional model of the target object, wherein the first method does not need additional acquisition equipment and can directly determine the target human body three-dimensional model of the target object according to the initial action video of the target object; the second method does not need to record the initial action video of the target object, and can determine the target human body three-dimensional model of the target object according to additional equipment. The first method is explained in detail below.

In a possible implementation manner, when the step S101 is executed to acquire the target three-dimensional human body model of the target object, the following steps may be specifically executed:

s131: and acquiring an initial action video of the target object.

When the initial motion video of the target object is acquired, in a possible embodiment, the initial motion video of the target object executing the specified motion is acquired, the target object is firstly trained according to the standard motion video of the reference object, and the motion of the reference object in the initial motion video acquired at this time is not standard enough because the shapes of the reference object and the target object are greatly different.

In another possible embodiment, an initial motion video of the target object performing other motions may also be obtained, and since the purpose of the present application is to obtain a target three-dimensional human body model of the target object, only images of various angles of the target object need to be obtained, so the initial motion video in the present application may also be a video of the target object performing other motions.

S132: and for each frame of second image in the initial motion video, segmenting the target object and the background in the second image to obtain body contour information of the target object in the second image.

According to the initial action video, determining each frame of second image in the initial action video, segmenting a target object and a background in the second image aiming at each second image to obtain the category of each pixel point on the second image, then obtaining the contour of each category through the boundary of the pixel points of different categories to obtain a segmentation line between the target object and the background, and taking the coordinates of each point on the segmentation line as body contour information of the target object in the second image.

S133: for each second image, acquiring information between initial key points corresponding to the second image; the information between the initial key points comprises initial relative rotation information between every two connected target human body key points of the target object and target skeleton distance information; the target bone distance information corresponding to each second image is the same.

And acquiring information between initial key points corresponding to each second image, wherein the information between the initial key points comprises initial relative rotation information between two connected target human body key points and target skeleton distance information. The initial relative rotation information refers to the relative rotation relationship between every two connected target human body key points in the second image, and the target skeleton distance information refers to the distance between every two connected target human body key points in the second image. In the application, the distance between every two connected target human body key points of the target object is fixed, so that the target bone distance information corresponding to different second images is the same. For example, the distance between the left wrist and the left elbow of the target subject is fixed, and the distance between the left wrist and the left elbow of the target subject is the same in any of the second images.

In a possible implementation manner, when the step S133 is executed to acquire, for each second image, information between initial key points corresponding to the second image, the following steps may be specifically executed:

s1331: for each second image, determining second two-dimensional keypoint coordinates of the target object in the second image; and the second two-dimensional key point coordinates are used for representing the position of the target human body key point in the second image.

In the present application, for each second image, second two-dimensional keypoint coordinates of a target human keypoint of a target object in the second image are determined. And each frame of second image corresponds to a second two-dimensional key point coordinate, and the second two-dimensional key point coordinate comprises the positions of all target human body key points in the second image.

S1332: according to a second time sequence of the second image in the initial motion video, sorting the second two-dimensional key point coordinates corresponding to the second image according to the second time sequence, inputting the second two-dimensional key point coordinates into the full convolution model based on the cavity time domain of the 2D joint point, and outputting an initial three-dimensional key point sequence of the target object; wherein the initial three-dimensional keypoint sequence comprises initial three-dimensional keypoint coordinates of the target object in each of the second images.

According to a second time sequence of each second image in the initial action video, sorting second two-dimensional key point coordinates corresponding to the second images according to the second time sequence to obtain a second two-dimensional key point coordinate sequence; inputting the second two-dimensional key point coordinate sequence into a full convolution model of a cavity time domain based on the 2D joint point, and outputting an initial three-dimensional key point sequence of the target object; and the initial three-dimensional key point sequence comprises the initial three-dimensional key point coordinates of the target object in each second image. According to the initial three-dimensional key point sequence, the initial three-dimensional key point coordinates of the target human body key points in each second image are determined, and the initial three-dimensional key point coordinates are used for representing the space coordinate positions of the target human body key points.

S1333: and for each second image, converting the initial three-dimensional key point coordinates corresponding to the second image into information among the initial key points.

In the application, each second image corresponds to initial key point information, and the initial key point information includes target skeleton distance information and initial relative rotation information between every two connected target human body key points of a target object in the second image.

S134: and determining the average body contour parameter of the target object based on the body contour information corresponding to each second image and the information between the initial key points.

In this application, the body contour information corresponding to the second image is used to indicate the body contour information of the target object in the second image, that is, only the body contour information of the target object at a certain angle can be determined, and the whole body contour information of the target object cannot be determined, while the average body contour parameter indicates the whole and real body contour of the target object.

In a possible implementation manner, when the step S134 is executed to determine the average body contour parameter of the target object based on the body contour information corresponding to each of the second images and the information between the initial key points, the following steps may be specifically executed:

s1341: and inputting standard body contour parameters and information between the initial key points corresponding to the second image into the human body parameterized model aiming at each second image, and outputting a second human body three-dimensional model of the target object corresponding to the second image.

In this application, the standard body contour parameters are used to represent parameters of a standard posture of a standard model, wherein the standard posture of the standard model may be: the arms and the shoulders are extended at the same height, the palm is parallel to the ground, the palm center is downward, and the feet are not closed. In the present application, the standard body contour parameter is a statistical standard information, and this parameter is not a random parameter, but a fixed parameter.

When determining the average body contour parameter of the target object, firstly, for each second image, inputting the standard body contour parameter and the information between the initial key points corresponding to the second image into a human body parameterized model (SMPL), and outputting a second human body three-dimensional model of the target image corresponding to the second image, wherein each second image corresponds to one second human body three-dimensional model.

S1342: and for each second image, projecting the second three-dimensional human body model corresponding to the second image onto the second image, correcting the standard body contour parameters through the body contour information in the second image, and when the second three-dimensional human body model corresponding to the second image is superposed with the body contour of the target object in the second image, taking the corrected standard body contour parameters as the target body contour parameters corresponding to the target object in the second image.

And for each second image, projecting the second three-dimensional human body model corresponding to the second image onto the second image, correcting (constraining) standard body contour parameters through body contour information of a target object in the second image, and when the second three-dimensional human body model corresponding to the second image is superposed with the body contour of the target object in the second image, taking the standard body contour parameters corrected by using the body contour information of the target object as the target body contour parameters corresponding to the target object in the second image. Wherein the target body contour parameter is used to represent the true body contour of the target object on the second image.

S1343: obtaining an average body contour parameter of the target object based on the target body contour parameter corresponding to each second image; wherein the average body contour parameter is an average of the target body contour parameters.

And averaging the target body contour parameters corresponding to all the second images to obtain the average body contour parameter of the target object.

S135: and for each second image, inputting the average body contour parameter and the information between the initial key points corresponding to the second image into a human body parameterized model, and outputting a first human body three-dimensional model corresponding to the second image.

In the application, only one average body contour parameter is provided, each second image corresponds to information among initial key points, the average body contour parameter and the information among the initial key points corresponding to the second image are input into a human body parameterized model (SMPL) aiming at each second image, and a first human body three-dimensional model corresponding to the second image is output. I.e. each second image corresponds to a three-dimensional model of the first human body.

S136: and for each second image, after the first human body three-dimensional model corresponding to the second image is projected onto the second image, acquiring texture information of the first human body three-dimensional model at a corresponding position on the second image so as to obtain the texture information of each position on the target object epidermis.

In the application, the first three-dimensional human body model is free of texture information and only has body contour information and information between key points of a target human body, so that when the texture of the target object is obtained, for each second image, after the first three-dimensional human body model corresponding to the second image is projected onto the second image, the texture information of the first three-dimensional human body model at the corresponding position on the second image is obtained to obtain the texture information of the target object on the second image, and the texture information at each position on the epidermis of the target object is determined according to the texture information of the target object corresponding to each second image. Wherein, the texture may be skin, clothing, etc. of the target object, and the texture information may be color information.

S137: determining the target three-dimensional model of the target object based on the target skeletal distance information, the average body contour parameter, and the texture information at each location on the target object's epidermis.

In the present application, a target three-dimensional model of a target object is determined based on a distance between every two connected target key points (target skeleton distance information), body contour information of the target object (average body contour parameter), and texture information of the target object.

In another possible implementation manner, when the step S101 is executed to acquire the target three-dimensional human body model of the target object, the following steps may be specifically executed:

s141: acquiring third images of the target object at all angles at the same moment; wherein the third image is captured by a multi-view camera.

After a multi-view camera environment is built in a fixed place, a target object is in the environment, in the application, the target object is shot at the same time through the multi-view camera, and third images of the target object at all angles at the same time are obtained.

S142: and inputting the third image into a three-dimensional reconstruction model, and outputting a third human body three-dimensional model representing the contour and the texture of the target object.

And inputting a third image acquired at the same time into a three-dimensional reconstruction model, and outputting a third human body three-dimensional model of the target object, wherein the three-dimensional reconstruction model can be SFM (Structure frommotion) or a deep learning model. And the third image includes the clothing and the contour of the target object, so the third three-dimensional human body model obtained in the application includes the contour and the texture of the target object.

S143: for each third image, determining third two-dimensional keypoint coordinates of the target object in the third image; and the third two-dimensional key point coordinate is used for representing the position of the target human body key point in the third image.

And determining a third two-dimensional key point coordinate of each target human body key point of the target object in each third image, wherein each third image corresponds to one third two-dimensional key point coordinate.

S144: determining the three-dimensional coordinates of each target human body key point based on the third two-dimensional key point coordinates corresponding to each third image and the camera parameters of the multi-view camera; wherein the three-dimensional coordinates are used to represent the location of each of the target human keypoints on the third human three-dimensional model.

And determining the three-dimensional coordinates of each target human body key point of the target object based on the coordinates of the third two-dimensional key points corresponding to all the third images at the same moment and the camera parameters of the multi-view camera, wherein the camera parameters refer to internal parameters and external parameters of the camera, the internal parameters comprise the focal length, distortion vectors and the like of the camera, and the external parameters comprise a rotation matrix and the like.

S145: and constructing the target human body three-dimensional model of the target object based on the third human body three-dimensional model and the three-dimensional coordinates of the target human body key points.

The third three-dimensional model of the human body in the application includes the texture and the contour of the target object, but does not include the position of each key point of the human body on the third three-dimensional model of the human body, and at this time, the third three-dimensional model of the human body has no way to move according to the standard relative rotation information sequence of the reference object.

Example two:

based on the same technical concept, an embodiment of the present application further provides a motion video generating device, and fig. 3 shows a schematic structural diagram of the motion video generating device provided in the embodiment of the present application, and as shown in fig. 3, the device includes:

a first obtaining module 301, configured to obtain a standard relative rotation information sequence of a reference object and a target three-dimensional human body model of a target object; the standard relative rotation information sequence is used for representing a relative rotation relation between target key point groups of the reference object when the reference object executes a specified action, wherein the target key point groups are two connected reference human body key points of the reference object;

an adjusting module 302, configured to adjust each human body key point on the target human body three-dimensional model according to the standard relative rotation information sequence, so that the target human body three-dimensional model executes the specified action;

a second obtaining module 303, configured to obtain a target video of the target three-dimensional human body model executing the specified action.

Optionally, when the first obtaining module 301 is configured to obtain the standard relative rotation information sequence of the reference object, specifically:

Optionally, when the first obtaining module 301 is configured to obtain a target three-dimensional human body model of a target object, specifically:

acquiring an initial action video of the target object;

for each second image, acquiring information between initial key points corresponding to the second image; the information between the initial target key points comprises initial relative rotation information between every two connected target human body key points of the target object and target skeleton distance information; the target bone distance information corresponding to each second image is the same;

Optionally, when the first obtaining module 301 is configured to, for each second image, obtain information between initial key points corresponding to the second image, specifically:

Optionally, when the first obtaining module 301 is configured to determine the average body contour parameter of the target object based on the body contour information corresponding to each second image and the information between the target key points, specifically configured to:

For the specific implementation steps and principles, reference is made to the description of the first embodiment, which is not repeated herein.

Example three:

based on the same technical concept, an embodiment of the present application further provides an electronic device, and fig. 4 shows a schematic structural diagram of the electronic device provided in the embodiment of the present application, and as shown in fig. 4, the electronic device 400 includes: a processor 401, a memory 402 and a bus 403, wherein the memory stores machine-readable instructions executable by the processor, when the electronic device is operated, the processor 401 and the memory 402 communicate with each other through the bus 403, and the processor 401 executes the machine-readable instructions to execute the method steps described in the first embodiment. For the specific implementation steps and principles, reference is made to the description of the first embodiment, which is not repeated herein.

Example four:

a fourth embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program is executed by a processor to perform the method steps described in the first embodiment.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. For the specific implementation steps and principles, reference is made to the description of the first embodiment, which is not repeated herein.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope disclosed in the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the exemplary embodiments of the present application, and are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A motion video generation method, comprising:

2. The motion video generation method according to claim 1, wherein the obtaining of the standard relative rotation information sequence of the reference object includes:

3. The motion video generation method according to claim 1, wherein the obtaining of the target three-dimensional model of the target object comprises:

acquiring an initial action video of the target object;

4. The motion video generation method according to claim 3, wherein the obtaining, for each of the second images, information between initial key points corresponding to the second image includes:

5. The motion video generation method according to claim 3, wherein the determining an average body contour parameter of the target object based on the body contour information corresponding to each of the second images and the information between the target key points comprises:

6. The motion video generation method according to claim 1, wherein the obtaining of the standard relative rotation information sequence of the reference object includes:

7. The motion video generation method according to claim 1, wherein the obtaining of the target three-dimensional model of the target object comprises:

8. A motion video generating apparatus, comprising:

9. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when the electronic device is operating, the machine-readable instructions when executed by the processor performing the steps of the motion video generation method according to any of claims 1 to 7.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of the motion video generation method according to any one of claims 1 to 7.