CN110225400B

CN110225400B - Motion capture method and device, mobile terminal and storage medium

Info

Publication number: CN110225400B
Application number: CN201910611391.3A
Authority: CN
Inventors: 王光伟
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2019-07-08
Filing date: 2019-07-08
Publication date: 2022-03-04
Anticipated expiration: 2039-07-08
Also published as: CN110225400A

Abstract

The embodiment of the disclosure discloses a motion capture method, a motion capture device, a mobile terminal and a storage medium. The method comprises the following steps: determining human body key points in a video frame of a target user live video; and determining the spatial position of the human key points in the video frame in a video shooting device coordinate system according to the parameters of the video shooting device and the standard length of the human key point connecting line corresponding to the target user. According to the method and the device, the spatial position of the human key point in the video frame of the live video of the target user in the coordinate system of the video shooting equipment can be determined according to the parameters of the video shooting equipment and the standard length of the connecting line of the human key point, so that the human action in the video frame can be accurately identified.

Description

Motion capture method and device, mobile terminal and storage medium

Technical Field

The embodiment of the disclosure relates to the technical field of image processing, and in particular, to a motion capture method and device, a mobile terminal and a storage medium.

Background

The video live broadcast anchor can select a virtual character to carry out live broadcast in the live broadcast process. The avatar may be a variety of cartoon characters. And generating the virtual character role according to the virtual character selected by the anchor of the live video. In the live broadcasting process, the action of the anchor is obtained according to each frame of video image, action capture is realized, and then the virtual character role is controlled to simulate the action of the anchor, so that the virtual character video is obtained. And adding the virtual character video into the live-action video to obtain a mixed scene video, and uploading the mixed scene video to a live broadcast platform for live broadcast.

In the prior art, generally, motion recognition is performed on each frame of video image, motion types in the video images are determined, and then a virtual character role is controlled to simulate the motion of a main player according to the motion types. For example, if the type of action in the video image is determined to be left-handed, the control avatar also hands left-handed.

The prior art has the disadvantage that the accuracy of motion capture is low by determining only the type of motion in the video image.

Disclosure of Invention

The present disclosure provides a motion capture method, apparatus, mobile terminal and storage medium to achieve accurate recognition of human body motion in video images.

In a first aspect, an embodiment of the present disclosure provides a motion capture method, including:

determining human body key points in a video frame of a target user live video;

and determining the spatial position of the human key points in the video frame in the coordinate system of the video shooting equipment according to the parameters of the video shooting equipment and the standard length of the human key point connecting line corresponding to the target user.

In a second aspect, an embodiment of the present disclosure further provides a motion capture apparatus, including:

the key point determining module is used for determining human key points in a video frame of a live video of a target user;

and the spatial position determining module is used for determining the spatial position of the human key points in the video frame in the coordinate system of the video shooting equipment according to the parameters of the video shooting equipment and the standard length of the human key point connecting line corresponding to the target user.

In a third aspect, an embodiment of the present disclosure further provides a mobile terminal, including:

one or more processing devices;

storage means for storing one or more programs;

when the one or more programs are executed by the one or more processing devices, the one or more processing devices are caused to implement the motion capture method according to the embodiment of the present disclosure.

In a fourth aspect, the disclosed embodiments also provide a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the motion capture method according to the disclosed embodiments.

According to the method and the device, the spatial position of the human key points in the video frame of the live video of the target user in the coordinate system of the video shooting device is determined by determining the human key points in the video frame of the live video of the target user and the standard length of the connecting line of the human key points corresponding to the target user according to the parameters of the video shooting device and the standard length of the connecting line of the human key points, and the spatial position of the human key points in the video frame of the live video of the target user in the coordinate system of the video shooting device can be determined according to the parameters of the video shooting device and the standard length of the connecting line of the human key points, so that the human motion in the video frame is accurately identified.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.

Fig. 1 is a flowchart of a motion capture method provided by an embodiment of the present disclosure;

FIG. 2 is a flow chart of a motion capture method provided by an embodiment of the present disclosure;

FIG. 3 is a flow chart of a motion capture method provided by an embodiment of the present disclosure;

FIG. 4 is a schematic structural diagram of a motion capture device according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of a mobile terminal according to an embodiment of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

Fig. 1 is a flowchart of a motion capture method according to an embodiment of the present disclosure. The embodiment is applicable to the case of recognizing human body motion in video frames, and the method can be executed by a motion capture device, which can be implemented in software and/or hardware, and can be configured in a mobile terminal. As shown in fig. 1, the method may include the steps of:

step 101, determining human body key points in a video frame of a live video of a target user.

The video frame is a video image in a target user live video. In the process of live broadcasting of a target user, the target user shoots through a video shooting device (for example, a camera) in the mobile terminal to obtain each frame of video image.

Optionally, determining the human body key points in the video frame of the target user live video may include: after the shot video frame is obtained, image recognition is carried out on the video frame, and human key points included in the video frame are recognized. The human body key points may include: head, neck, left shoulder, right shoulder, left elbow, right elbow, left wrist, right wrist. And establishing an image coordinate system by taking the image center of the video frame as an origin, and acquiring the coordinate positions of all the human body key points in the image coordinate system. And determining the coordinate position of each human body key point in the image coordinate system as the image position of each human body key point.

And 102, determining the spatial position of the human key points in the video frame in a coordinate system of the video shooting equipment according to the parameters of the video shooting equipment and the standard length of the human key point connecting line corresponding to the target user.

The standard length of the human body key point connecting line corresponding to the target user is the actual length of the human body key point connecting line of the target user, which is measured in advance. Optionally, the human body key point connecting line includes: a line between the head and the neck, a line between the neck and the left shoulder, a line between the neck and the right shoulder, a line between the left shoulder and the left elbow, a line between the right shoulder and the right elbow, a line between the left elbow and the left wrist, and a line between the right elbow and the right wrist.

In a specific embodiment, a reference video image of a target user is acquired; the action of the target user in the reference video image is a preset standard action, and the preset standard action is an action that all human body key points are located on the same vertical plane. Namely, the distances from each human key point of the target user to the vertical plane where the video shooting device is located are the same. And then determining the standard length of the connecting line of the human key points corresponding to the target user according to the reference video image and the parameters of the video shooting equipment.

Optionally, determining the spatial position of the human key point in the video frame in the coordinate system of the video capturing device according to the parameter of the video capturing device and the standard length of the human key point connection line corresponding to the target user, which may include: determining a ray which takes the position of the video shooting equipment as a starting point and corresponds to the image position of the human body key point according to the image position of the human body key point and the parameters of the video shooting equipment; and obtaining the spatial position of the human key points in the video frame in the coordinate system of the video shooting equipment according to the ray and the standard length of the human key point connecting line corresponding to the target user.

And establishing a coordinate system of the video shooting equipment according to the parameters of the video shooting equipment. And recording the origin of the coordinate system of the video shooting device as the position of the video shooting device. The video shooting equipment coordinate system is a three-dimensional rectangular coordinate system which is established by taking the focusing center of the video shooting equipment as an origin and taking the optical axis of the video shooting equipment as a Z axis. The origin of the coordinate system of the video shooting device is the optical center of the video shooting device. The X-axis and Y-axis of the video capture device coordinate system are parallel to the X, Y-axis of the image coordinate system. The z-axis of the coordinate system of the video camera is the optical axis of the video camera and is perpendicular to the imaging plane. The image coordinate system is on the imaging plane. The intersection point of the optical axis of the video shooting equipment and the imaging plane is the origin of an image coordinate system, and the image coordinate system is a two-dimensional rectangular coordinate system. The distance between the origin of the coordinate system of the video photographing apparatus and the origin of the coordinate system of the image is the focal length of the video photographing apparatus.

In a coordinate system of the video shooting equipment, the image position of each human body key point and the position of the video shooting equipment are respectively connected, and a ray which takes the position of the video shooting equipment as a starting point and passes through the image position of the corresponding human body key point, namely the ray which takes the position of the video shooting equipment as the starting point and corresponds to the image position of the human body key point, is obtained. The image position of each human body key point is the intersection point of the connecting line of each human body key point of the target user and the position of the video shooting equipment and the imaging plane. Therefore, it can be determined that each human body key point of the target user is located on a ray that starts from the position of the video photographing apparatus and passes through the image position of the corresponding human body key point.

And then obtaining the spatial position of the human key points in the video frame in the coordinate system of the video shooting equipment according to the ray and the standard length of the human key point connecting line corresponding to the target user. For example, the spatial positions of the neck, the left shoulder and the right shoulder in the coordinate system of the video shooting device are determined according to three rays which take the position of the video shooting device as a starting point and respectively correspond to the neck, the left shoulder and the right shoulder, the standard length of a connecting line between the neck and the left shoulder and the standard length of a connecting line between the neck and the right shoulder. The neck, left shoulder and right shoulder of human body are usually located on the same straight line. Determining a straight line according to three rays which take the position of the video shooting device as a starting point and respectively correspond to the neck, the left shoulder and the right shoulder, the standard length of a connecting line between the neck and the left shoulder and the standard length of a connecting line between the neck and the right shoulder. The straight line intersects with three rays which take the position of the video shooting device as a starting point and respectively correspond to the neck, the left shoulder and the right shoulder, the length of a connecting line between the intersection point of the ray corresponding to the straight line and the neck and the intersection point of the ray corresponding to the straight line and the left shoulder is equal to the standard length of the connecting line between the neck and the left shoulder, and the length of a connecting line between the intersection point of the ray corresponding to the straight line and the neck and the intersection point of the ray corresponding to the straight line and the right shoulder is equal to the standard length of the connecting line between the neck and the right shoulder. The intersection point of the straight line and the ray corresponding to the neck is the neck of the target user. And the coordinate position of the intersection point of the straight line and the ray corresponding to the neck is the spatial position of the neck of the target user in the coordinate system of the video shooting device. The intersection point of the straight line and the ray corresponding to the left shoulder is the left shoulder of the target user. The coordinate position of the intersection point of the straight line and the ray corresponding to the left shoulder is the spatial position of the left shoulder of the target user in the coordinate system of the video shooting device. The intersection point of the straight line and the ray corresponding to the right shoulder is the left shoulder of the target user. And the coordinate position of the intersection point of the straight line and the ray corresponding to the right shoulder is the spatial position of the right shoulder of the target user in the coordinate system of the video shooting equipment.

According to the technical scheme of the embodiment, the spatial position of the human key points in the video frame of the live video of the target user in the coordinate system of the video shooting device is determined by determining the human key points in the video frame of the live video of the target user and the standard length of the connecting line of the human key points corresponding to the target user according to the parameters of the video shooting device and the standard length of the connecting line of the human key points, and the spatial position of the human key points in the video frame of the live video of the target user in the coordinate system of the video shooting device can be determined according to the parameters of the video shooting device and the standard length of the connecting line of the human key points, so that the human motion in the video frame is accurately identified.

Fig. 2 is a flowchart of a motion capture method according to an embodiment of the present disclosure. This embodiment may be combined with each optional solution in one or more of the above embodiments, in this embodiment, before determining the human body key point in the video frame of the target user live video, further include: acquiring a reference video image of a target user; the action of a target user in the reference video image is a preset standard action, and the preset standard action is an action that all human body key points are located on the same vertical plane; and determining the standard length of the connecting line of the human key points corresponding to the target user according to the reference video image and the parameters of the video shooting equipment.

As shown in fig. 2, the method may include the steps of:

step 201, acquiring a reference video image of a target user; the action of the target user in the reference video image is a preset standard action, and the preset standard action is an action that all human body key points are located on the same vertical plane.

Wherein the reference video image is a frame of video image. The action of a target user in the reference video image is a preset standard action, and the preset standard action is an action that all key points of a human body are located on the same vertical plane. Namely, the distances from each human key point of the target user to the vertical plane where the video shooting device is located are the same.

For example, human key points include: head, neck, left shoulder, right shoulder, left elbow, right elbow, left wrist, right wrist. The preset standard motion is motion of the head, neck, left shoulder, right shoulder, left elbow, right elbow, left wrist and right wrist in the same vertical plane.

Optionally, in the video shooting process, standard action prompt information is output to the target user. The standard action prompt message is used for prompting the user to put out a preset standard action. And then shooting through video shooting equipment in the mobile terminal to obtain a reference video image.

Step 202, determining the standard length of the human body key point connecting line corresponding to the target user according to the reference video image and the parameters of the video shooting equipment.

Optionally, determining a standard length of a human body key point connecting line corresponding to the target user according to the reference video image and the parameter of the video shooting device may include: identifying human body key points in the reference video image, and determining the image positions of the human body key points; determining a ray which takes the position of the video shooting equipment as a starting point and corresponds to the image position of the human body key point according to the image position of the human body key point and the parameters of the video shooting equipment; identifying the reference video image to obtain a head image of a target user; determining the depth matched with the head image of the target user according to the preset corresponding relation between the head image and the depth; determining the spatial position of the key point of the human body in a coordinate system of the video shooting equipment according to the ray and the depth matched with the head image of the target user; and determining the standard length of the connecting line of the human key points corresponding to the target user according to the space position.

After a reference video image of a target user is obtained, image recognition is carried out on the reference video image, and human key points included in the reference video image are recognized. The human body key points may include: head, neck, left shoulder, right shoulder, left elbow, right elbow, left wrist, right wrist. And establishing an image coordinate system by taking the image center of the reference video image as an origin, and acquiring the coordinate positions of all the human body key points in the image coordinate system. And determining the coordinate position of each human body key point in the image coordinate system as the image position of each human body key point.

And establishing a coordinate system of the video shooting equipment according to the parameters of the video shooting equipment. In a coordinate system of the video shooting equipment, the image position of each human body key point and the position of the video shooting equipment are respectively connected, and a ray which takes the position of the video shooting equipment as a starting point and passes through the image position of the corresponding human body key point, namely the ray which takes the position of the video shooting equipment as the starting point and corresponds to the image position of the human body key point, is obtained. For example, a ray that starts from the video camera position and passes through the image position of the head, a ray that starts from the video camera position and passes through the image position of the neck, a ray that starts from the video camera position and passes through the image position of the left shoulder, a ray that starts from the video camera position and passes through the image position of the right shoulder, a ray that starts from the video camera position and passes through the image position of the left elbow, a ray that starts from the video camera position and passes through the image position of the right elbow, a ray that starts from the video camera position and passes through the image position of the left wrist, and a ray that starts from the video camera position and passes through the image position of the right wrist are obtained.

The image position of each human body key point is the intersection point of the connecting line of each human body key point of the target user and the position of the video shooting equipment and the imaging plane. Therefore, it can be determined that each human body key point of the target user is located on a ray that starts from the position of the video photographing apparatus and passes through the image position of the corresponding human body key point.

Depth is the distance of an object from the vertical plane in which the video capture device is located. And acquiring human head images at different depths in advance. And establishing a corresponding relation between the head image and the depth according to the collected head images of the human body at different depths. Different depths correspond to different head images. And correspondingly storing the depth and the head image matched with the depth.

And identifying the reference video image to obtain a head image of the target user. And then inquiring the head images matched with the head images of the target users in all the stored head images according to the image characteristics of the head images. Optionally, the image features include a size of the head image and facial features in the head image. The facial features in the head image may be the distribution positions of the five sense organs. And then determining the depth corresponding to the head image matched with the head image of the target user as the depth matched with the head image of the target user.

The depth that matches the image of the target user's head is the distance from the target user's head to the vertical plane in which the video capture device is located. The preset standard action is the action that all key points of the human body are positioned on the same vertical plane. Distances from each human body key point of the target user to the vertical plane where the video shooting device is located are the same and are all equal to the distances from the head of the target user to the vertical plane where the video shooting device is located. Therefore, the distance from each human body key point to the vertical plane where the video shooting equipment is located can be determined.

And acquiring a three-dimensional coordinate point, wherein the distance of the vertical plane where the video shooting equipment is located is equal to the depth matched with the head image of the target user, on a ray which takes the position of the video shooting equipment as a starting point and passes through the image position of the corresponding human body key point. The obtained three-dimensional coordinate points are corresponding human body key points. And the obtained coordinate position of the three-dimensional coordinate point is the spatial position of the human body key point in the coordinate system of the video shooting equipment. Thereby, the spatial position of each human key point of the target user in the coordinate system of the video shooting device can be determined.

The human body key point connecting line is a connecting line between two human body key points. And connecting the spatial positions of two human key points in the human key point connecting line in a video shooting equipment coordinate system, calculating the length of the connecting line according to the two spatial positions, and determining the calculated length of the connecting line as the standard length of the human key point connecting line corresponding to the target user.

In one particular example, the spatial location of the target user's head, neck, left shoulder, right shoulder, left elbow, right elbow, left wrist, and right wrist in the video capture device coordinate system is determined. The human body key point connecting line comprises: a line between the head and the neck, a line between the neck and the left shoulder, a line between the neck and the right shoulder, a line between the left shoulder and the left elbow, a line between the right shoulder and the right elbow, a line between the left elbow and the left wrist, and a line between the right elbow and the right wrist.

Connecting the spatial position of the head of the target user in the video shooting equipment coordinate system with the spatial position of the neck of the target user in the video shooting equipment coordinate system, calculating the length of the connecting line according to the spatial position of the head and the spatial position of the neck, and determining the calculated length of the connecting line as the standard length of the connecting line between the head and the neck corresponding to the target user.

Connecting the spatial position of the neck of the target user in the video shooting equipment coordinate system with the spatial position of the left shoulder of the target user in the video shooting equipment coordinate system, calculating the length of the connecting line according to the spatial position of the neck and the spatial position of the left shoulder, and determining the calculated length of the connecting line as the standard length of the connecting line between the neck and the left shoulder corresponding to the target user.

Connecting the spatial position of the neck of the target user in the video shooting equipment coordinate system with the spatial position of the right shoulder of the target user in the video shooting equipment coordinate system, calculating the length of the connecting line according to the spatial position of the neck and the spatial position of the right shoulder, and determining the calculated length of the connecting line as the standard length of the connecting line between the neck and the right shoulder corresponding to the target user.

Connecting the spatial position of the left shoulder of the target user in the video shooting device coordinate system with the spatial position of the left elbow of the target user in the video shooting device coordinate system, calculating the length of the connecting line according to the spatial position of the left shoulder and the spatial position of the left elbow, and determining the calculated length of the connecting line as the standard length of the connecting line between the left shoulder and the left elbow corresponding to the target user.

Connecting the space position of the right shoulder of the target user in the video shooting device coordinate system and the space position of the right elbow of the target user in the video shooting device coordinate system, calculating the length of the connecting line according to the space position of the right shoulder and the space position of the right elbow, and determining the calculated length of the connecting line as the standard length of the connecting line between the right shoulder and the right elbow corresponding to the target user.

Connecting the space position of the left elbow of the target user in the video shooting equipment coordinate system with the space position of the left wrist of the target user in the video shooting equipment coordinate system, calculating the length of the connecting line according to the space position of the left elbow and the space position of the left wrist, and determining the calculated length of the connecting line as the standard length of the connecting line between the left elbow and the left wrist corresponding to the target user.

Connecting the space position of the right elbow of the target user in the video shooting equipment coordinate system and the space position of the right wrist of the target user in the video shooting equipment coordinate system, calculating the length of the connecting line according to the space position of the right elbow and the space position of the right wrist, and determining the calculated length of the connecting line as the standard length of the connecting line between the right elbow and the right wrist corresponding to the target user.

And step 203, determining human body key points in the video frame of the live video of the target user.

And step 204, determining the spatial position of the human key points in the video frame in the coordinate system of the video shooting equipment according to the parameters of the video shooting equipment and the standard length of the human key point connecting line corresponding to the target user.

According to the technical scheme of the embodiment, the reference video image of the target user is obtained, the action of the target user in the reference video image is a preset standard action, the preset standard action is an action that all human key points are located on the same vertical plane, then the standard length of the human key point connecting line corresponding to the target user is determined according to the reference video image and the parameters of the video shooting equipment, and the actual length of the human key point connecting line of the target user can be measured in advance according to the obtained reference video image of the target user.

Fig. 3 is a flowchart of a motion capture method according to an embodiment of the present disclosure. This embodiment may be combined with various alternatives in one or more of the above embodiments, and in this embodiment, determining a human body key point in a video frame of a live video of a target user may include: and identifying the human body key points in the video frame of the live video of the target user, and determining the image positions of the human body key points.

And determining the spatial position of the human key point in the video frame in the coordinate system of the video shooting device according to the parameters of the video shooting device and the standard length of the human key point connecting line corresponding to the target user, wherein the determining may include: determining a ray which takes the position of the video shooting equipment as a starting point and corresponds to the image position of the human body key point according to the image position of the human body key point and the parameters of the video shooting equipment; and obtaining the spatial position of the human key points in the video frame in the coordinate system of the video shooting equipment according to the ray and the standard length of the human key point connecting line corresponding to the target user.

And after determining the spatial position of the human body key point in the video frame in the video shooting device coordinate system, the method may further include: controlling a virtual character role corresponding to a target user to simulate the action of the target user according to the spatial position of a human body key point in a video frame in a video shooting equipment coordinate system to obtain a real-time virtual character video image; adding a real-time virtual character video image into the video frame to obtain a mixed video frame; and uploading the mixed video frame to a live broadcast platform.

As shown in fig. 3, the method may include the steps of:

step 301, identifying human key points in a video frame of a live video of a target user, and determining image positions of the human key points.

After the shot video frame is obtained, image recognition is carried out on the video frame, and human key points included in the video frame are recognized. Optionally, the human body key points may include: head, neck, left shoulder, right shoulder, left elbow, right elbow, left wrist, right wrist. And establishing an image coordinate system by taking the image center of the video frame as an origin, and acquiring the coordinate positions of all the human body key points in the image coordinate system. And determining the coordinate position of each human body key point in the image coordinate system as the image position of each human body key point.

Step 302, according to the image position of the human body key point and the parameters of the video shooting device, determining a ray which takes the position of the video shooting device as a starting point and corresponds to the image position of the human body key point.

And 303, obtaining the spatial position of the human key points in the video frame in the coordinate system of the video shooting device according to the ray and the standard length of the human key point connecting line corresponding to the target user.

Optionally, the human body key point connecting line includes: a line between the head and the neck, a line between the neck and the left shoulder, a line between the neck and the right shoulder, a line between the left shoulder and the left elbow, a line between the right shoulder and the right elbow, a line between the left elbow and the left wrist, and a line between the right elbow and the right wrist.

Optionally, obtaining the spatial position of the human key point in the video frame in the coordinate system of the video shooting device according to the ray and the standard length of the human key point connection line corresponding to the target user may include: determining the spatial positions of the neck, the left shoulder and the right shoulder in a coordinate system of the video shooting device according to three rays which take the position of the video shooting device as a starting point and respectively correspond to the neck, the left shoulder and the right shoulder, the standard length of a connecting line between the neck and the left shoulder and the standard length of a connecting line between the neck and the right shoulder; determining the spatial position of the head in the coordinate system of the video shooting device according to the spatial position of the neck in the coordinate system of the video shooting device, the standard length of a connecting line between the head and the neck and a ray which takes the position of the video shooting device as a starting point and corresponds to the head; determining the spatial position of the left elbow in the coordinate system of the video shooting device according to the spatial position of the left shoulder in the coordinate system of the video shooting device, the standard length of a connecting line between the left shoulder and the left elbow and a ray which takes the position of the video shooting device as a starting point and corresponds to the left elbow; determining the spatial position of the right elbow in the coordinate system of the video shooting device according to the spatial position of the right shoulder in the coordinate system of the video shooting device, the standard length of a connecting line between the right shoulder and the right elbow and a ray which takes the position of the video shooting device as a starting point and corresponds to the right elbow; determining the spatial position of the left wrist in the coordinate system of the video shooting device according to the spatial position of the left elbow in the coordinate system of the video shooting device, the standard length of a connecting line between the left elbow and the left wrist and a ray which takes the position of the video shooting device as a starting point and corresponds to the left wrist; and determining the spatial position of the right wrist in the coordinate system of the video shooting device according to the spatial position of the right elbow in the coordinate system of the video shooting device, the standard length of a connecting line between the right elbow and the right wrist and a ray which takes the position of the video shooting device as a starting point and corresponds to the right wrist.

The neck, left shoulder and right shoulder of human body are usually located on the same straight line. Optionally, determining the spatial positions of the neck, the left shoulder and the right shoulder in the coordinate system of the video shooting device according to three rays which take the position of the video shooting device as a starting point and respectively correspond to the neck, the left shoulder and the right shoulder, the standard length of a connecting line between the neck and the left shoulder, and the standard length of a connecting line between the neck and the right shoulder, may include: determining a straight line according to three rays which take the position of the video shooting device as a starting point and respectively correspond to the neck, the left shoulder and the right shoulder, the standard length of a connecting line between the neck and the left shoulder and the standard length of a connecting line between the neck and the right shoulder. The straight line intersects with three rays which take the position of the video shooting device as a starting point and respectively correspond to the neck, the left shoulder and the right shoulder, the length of a connecting line between the intersection point of the ray corresponding to the straight line and the neck and the intersection point of the ray corresponding to the straight line and the left shoulder is equal to the standard length of the connecting line between the neck and the left shoulder, and the length of a connecting line between the intersection point of the ray corresponding to the straight line and the neck and the intersection point of the ray corresponding to the straight line and the right shoulder is equal to the standard length of the connecting line between the neck and the right shoulder. The intersection point of the straight line and the ray corresponding to the neck is the neck of the target user. And the coordinate position of the intersection point of the straight line and the ray corresponding to the neck is the spatial position of the neck of the target user in the coordinate system of the video shooting device. The intersection point of the straight line and the ray corresponding to the left shoulder is the left shoulder of the target user. The coordinate position of the intersection point of the straight line and the ray corresponding to the left shoulder is the spatial position of the left shoulder of the target user in the coordinate system of the video shooting device. The intersection point of the straight line and the ray corresponding to the right shoulder is the left shoulder of the target user. And the coordinate position of the intersection point of the straight line and the ray corresponding to the right shoulder is the spatial position of the right shoulder of the target user in the coordinate system of the video shooting equipment.

Optionally, determining the spatial position of the head in the coordinate system of the video capturing device according to the spatial position of the neck in the coordinate system of the video capturing device, the standard length of the connection line between the head and the neck, and the ray which takes the position of the video capturing device as a starting point and corresponds to the head, may include: and determining two alternative three-dimensional coordinate points, which have the distance from the spatial position of the neck in the coordinate system of the video shooting device equal to the standard length of a connecting line between the head and the neck, on a ray which takes the position of the video shooting device as a starting point and corresponds to the head. One candidate three-dimensional coordinate point of the two candidate three-dimensional coordinate points is a three-dimensional coordinate point corresponding to the head in a forward tilting state, and the other candidate three-dimensional coordinate point is a three-dimensional coordinate point corresponding to the head in a backward tilting state. And performing image recognition on the video frame, and recognizing whether the head of the target user in the video frame is in a forward tilting state or a backward tilting state. And if the head of the target user in the video frame is in a forward-inclined state, determining the alternative three-dimensional coordinate point corresponding to the head in the forward-inclined state as the head of the target user. The coordinate position of the candidate three-dimensional coordinate point is the spatial position of the head of the target user in the coordinate system of the video shooting device. If the head of the target user in the video frame is in a retroversion state, determining an alternative three-dimensional coordinate point corresponding to the head in the retroversion state as the head of the target user. The coordinate position of the candidate three-dimensional coordinate point is the spatial position of the head of the target user in the coordinate system of the video shooting device.

Optionally, determining the spatial position of the left elbow in the coordinate system of the video capturing device according to the spatial position of the left shoulder in the coordinate system of the video capturing device, the standard length of the connection line between the left shoulder and the left elbow, and the ray which takes the position of the video capturing device as a starting point and corresponds to the left elbow, may include: and determining two alternative three-dimensional coordinate points, which are on a ray corresponding to the left elbow and taking the position of the video shooting device as a starting point, wherein the distance between the two alternative three-dimensional coordinate points and the spatial position of the left shoulder in the coordinate system of the video shooting device is equal to the standard length of a connecting line between the left shoulder and the left elbow. One candidate three-dimensional coordinate point of the two candidate three-dimensional coordinate points is a three-dimensional coordinate point corresponding to the elbow in a forward tilting state, and the other candidate three-dimensional coordinate point is a three-dimensional coordinate point corresponding to the elbow in a backward tilting state. And determining the candidate three-dimensional coordinate point corresponding to the left elbow in the forward tilting state as the left elbow of the target user. And the coordinate position of the candidate three-dimensional coordinate point is the spatial position of the left elbow of the target user in the coordinate system of the video shooting device.

Optionally, determining the spatial position of the right elbow in the coordinate system of the video capturing device according to the spatial position of the right shoulder in the coordinate system of the video capturing device, the standard length of the connection line between the right shoulder and the right elbow, and the ray which takes the position of the video capturing device as a starting point and corresponds to the right elbow, may include: and determining two alternative three-dimensional coordinate points, which are away from the spatial position of the right shoulder in the coordinate system of the video shooting device by the distance equal to the standard length of a connecting line between the right shoulder and the right elbow, on a ray which takes the position of the video shooting device as a starting point and corresponds to the right elbow. One candidate three-dimensional coordinate point of the two candidate three-dimensional coordinate points is a three-dimensional coordinate point corresponding to the right elbow in a forward tilting state, and the other candidate three-dimensional coordinate point is a three-dimensional coordinate point corresponding to the right elbow in a backward tilting state. And determining the candidate three-dimensional coordinate point corresponding to the right elbow in the forward tilting state as the right elbow of the target user. And the coordinate position of the candidate three-dimensional coordinate point is the spatial position of the right elbow of the target user in the coordinate system of the video shooting device.

Optionally, determining the spatial position of the left wrist in the coordinate system of the video capturing device according to the spatial position of the left elbow in the coordinate system of the video capturing device, the standard length of the connection line between the left elbow and the left wrist, and the ray which takes the position of the video capturing device as a starting point and corresponds to the left wrist, may include: and determining two alternative three-dimensional coordinate points, wherein the distance between the two alternative three-dimensional coordinate points and the spatial position of the left elbow in the coordinate system of the video shooting device is equal to the standard length of a connecting line between the left elbow and the left wrist, on a ray which takes the position of the video shooting device as a starting point and corresponds to the left wrist. One candidate three-dimensional coordinate point of the two candidate three-dimensional coordinate points is a three-dimensional coordinate point corresponding to the left wrist in a forward tilting state, and the other candidate three-dimensional coordinate point is a three-dimensional coordinate point corresponding to the left wrist in a backward tilting state. And determining the candidate three-dimensional coordinate point corresponding to the left wrist in the forward tilting state as the left wrist of the target user. And the coordinate position of the candidate three-dimensional coordinate point is the spatial position of the left wrist of the target user in the coordinate system of the video shooting equipment.

Optionally, determining the spatial position of the right wrist in the coordinate system of the video capturing device according to the spatial position of the right elbow in the coordinate system of the video capturing device, the standard length of the connection line between the right elbow and the right wrist, and the ray which takes the position of the video capturing device as a starting point and corresponds to the right wrist, may include: and determining two alternative three-dimensional coordinate points, wherein the distance between the two alternative three-dimensional coordinate points and the spatial position of the right elbow in the coordinate system of the video shooting device is equal to the standard length of a connecting line between the right elbow and the right wrist, on a ray which takes the position of the video shooting device as a starting point and corresponds to the right wrist. One candidate three-dimensional coordinate point of the two candidate three-dimensional coordinate points is a three-dimensional coordinate point corresponding to the right wrist in a forward tilting state, and the other candidate three-dimensional coordinate point is a three-dimensional coordinate point corresponding to the right wrist in a backward tilting state. And determining the candidate three-dimensional coordinate point corresponding to the left wrist in the forward tilting state as the right wrist of the target user. And the coordinate position of the alternative three-dimensional coordinate point is the space position of the right wrist of the target user in the coordinate system of the video shooting equipment.

And 304, controlling the virtual character role corresponding to the target user to simulate the action of the target user according to the spatial position of the human body key point in the video frame in the coordinate system of the video shooting device, so as to obtain a real-time virtual character video image.

Alternatively, the avatar may be a variety of cartoon characters. A three-dimensional model of the virtual character is established in advance. The three-dimensional model of the virtual character has key points corresponding to the key points of the human body.

Optionally, controlling, according to a spatial position of a human key point in a video frame in a coordinate system of a video capturing device, an action of a virtual character role corresponding to a target user to simulate the target user includes: and determining the positions of all key points of the three-dimensional model of the virtual character role corresponding to the target user in the coordinate system of the video shooting equipment according to the spatial positions of the key points of the human body in the video frame in the coordinate system of the video shooting equipment, so as to obtain a real-time virtual character video image corresponding to the video frame.

And 305, adding the real-time virtual character video image into the video frame to obtain a mixed video frame.

Optionally, the real-time virtual character video image is superimposed on the video frame to obtain a mixed video frame. The image of the target user in the mixed video frame is overlaid with the real-time avatar video image.

And step 306, uploading the mixed video frame to a live broadcast platform.

The real-time virtual character video image of the virtual character role is added into the mixed video frame, and live broadcast with the virtual character role as the anchor is realized.

According to the technical scheme of the embodiment, rays which take the position of the video shooting equipment as a starting point and correspond to the image position of the human key point are determined according to the image position of the human key point and parameters of the video shooting equipment, the spatial position of the human key point in a video frame in a coordinate system of the video shooting equipment is obtained according to the rays and the standard length of a connecting line of the human key point corresponding to a target user, then the action of a virtual character role corresponding to the target user is controlled to simulate the action of the target user according to the spatial position of the human key point in the video frame in the coordinate system of the video shooting equipment to obtain a real-time virtual character video image, the real-time virtual character video image is added into the video frame to obtain a mixed video frame, and the mixed video frame is uploaded to a live broadcast platform, wherein the real-time virtual character video image can be obtained according to the image position of the human key point and the parameters of the video shooting equipment, And the standard length of the connecting line of the human key points corresponding to the target user is obtained, the spatial position of the human key points in the video frame in the coordinate system of the video shooting equipment is obtained, the action of the virtual character role corresponding to the target user to simulate the target user can be controlled according to the spatial position of the human key points in the video frame in the coordinate system of the video shooting equipment, and the live broadcast with the virtual character role as the main broadcast is realized.

Fig. 4 is a schematic structural diagram of a motion capture device according to an embodiment of the present disclosure. The embodiment can be applied to the situation of recognizing the human body action in the video frame. The apparatus can be implemented in software and/or hardware, and the apparatus can be configured in a mobile terminal. As shown in fig. 4, the apparatus may include: a keypoint determination module 401 and a spatial location determination module 402.

The key point determining module 401 is configured to determine a human key point in a video frame of a live video of a target user; and a spatial position determining module 402, configured to determine spatial positions of the human key points in the video frame in a coordinate system of the video capturing device according to the parameters of the video capturing device and a standard length of a human key point connection line corresponding to the target user.

Optionally, on the basis of the above technical solution, the method may further include: the image acquisition module is used for acquiring a reference video image of a target user; the action of a target user in the reference video image is a preset standard action, and the preset standard action is an action that all human body key points are located on the same vertical plane; and the standard length determining module is used for determining the standard length of the human body key point connecting line corresponding to the target user according to the reference video image and the parameters of the video shooting equipment.

Optionally, on the basis of the foregoing technical solution, the standard length determining module may include: the first position determining unit is used for identifying the human key points in the reference video image and determining the image positions of the human key points; the first ray determining unit is used for determining rays which take the position of the video shooting equipment as a starting point and correspond to the image position of the human body key point according to the image position of the human body key point and the parameters of the video shooting equipment; the head image acquisition unit is used for identifying the reference video image and acquiring a head image of a target user; the depth determining unit is used for determining the depth matched with the head image of the target user according to the preset corresponding relation between the head image and the depth; the standard length determining unit is used for determining the spatial position of the human body key point in a coordinate system of the video shooting equipment according to the ray and the depth matched with the head image of the target user; and determining the standard length of the connecting line of the human key points corresponding to the target user according to the space position.

Optionally, on the basis of the foregoing technical solution, the key point determining module 401 may include: the second position determining unit is used for identifying human key points in a video frame of a live video of a target user and determining the image positions of the human key points; the spatial location determination module 402 may include: the second ray determining unit is used for determining a ray which takes the position of the video shooting equipment as a starting point and corresponds to the image position of the human body key point according to the image position of the human body key point and the parameters of the video shooting equipment; and the spatial position determining unit is used for obtaining the spatial position of the human key points in the video frame in the coordinate system of the video shooting equipment according to the ray and the standard length of the connecting line of the human key points corresponding to the target user.

Optionally, on the basis of the above technical solution, the human body key points may include: head, neck, left shoulder, right shoulder, left elbow, right elbow, left wrist, right wrist; the human body key point connecting line may include: a line between the head and the neck, a line between the neck and the left shoulder, a line between the neck and the right shoulder, a line between the left shoulder and the left elbow, a line between the right shoulder and the right elbow, a line between the left elbow and the left wrist, and a line between the right elbow and the right wrist.

Optionally, on the basis of the foregoing technical solution, the spatial position determining unit may include: the first determining subunit is used for determining the spatial positions of the neck, the left shoulder and the right shoulder in a coordinate system of the video shooting device according to three rays which respectively correspond to the neck, the left shoulder and the right shoulder and take the position of the video shooting device as a starting point, the standard length of a connecting line between the neck and the left shoulder and the standard length of a connecting line between the neck and the right shoulder; the second determining subunit is used for determining the spatial position of the head in the coordinate system of the video shooting device according to the spatial position of the neck in the coordinate system of the video shooting device, the standard length of a connecting line between the head and the neck and a ray which takes the position of the video shooting device as a starting point and corresponds to the head; the third determining subunit is used for determining the spatial position of the left elbow in the coordinate system of the video shooting device according to the spatial position of the left shoulder in the coordinate system of the video shooting device, the standard length of a connecting line between the left shoulder and the left elbow and a ray which takes the position of the video shooting device as a starting point and corresponds to the left elbow; the fourth determining subunit is used for determining the spatial position of the right elbow in the coordinate system of the video shooting device according to the spatial position of the right shoulder in the coordinate system of the video shooting device, the standard length of a connecting line between the right shoulder and the right elbow and a ray which takes the position of the video shooting device as a starting point and corresponds to the right elbow; a fifth determining subunit, configured to determine, according to a spatial position of the left elbow in the coordinate system of the video capturing device, a standard length of a connection line between the left elbow and the left wrist, and a ray that takes the position of the video capturing device as a starting point and corresponds to the left wrist, a spatial position of the left wrist in the coordinate system of the video capturing device; and the sixth determining subunit is used for determining the spatial position of the right wrist in the coordinate system of the video shooting device according to the spatial position of the right elbow in the coordinate system of the video shooting device, the standard length of a connecting line between the right elbow and the right wrist and a ray which takes the position of the video shooting device as a starting point and corresponds to the right wrist.

Optionally, on the basis of the above technical solution, the method may further include: the action simulation module is used for controlling a virtual character role corresponding to the target user to simulate the action of the target user according to the spatial position of the human body key point in the video frame in the coordinate system of the video shooting device so as to obtain a real-time virtual character video image; the image adding module is used for adding the real-time virtual character video image into the video frame to obtain a mixed video frame; and the video frame uploading module is used for uploading the mixed video frame to the live broadcast platform.

The motion capture device provided by the embodiment of the disclosure can execute the motion capture method provided by the embodiment of the disclosure, and has corresponding functional modules and beneficial effects of the execution method.

Referring now to fig. 5, a block diagram of a mobile terminal 500 suitable for use in implementing embodiments of the present disclosure is shown. The mobile terminal in the embodiments of the present disclosure may include, but is not limited to, devices such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like. The mobile terminal shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 5, mobile terminal 500 may include a processing device (e.g., central processing unit, graphics processor, etc.) 501 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage device 506 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the mobile terminal 500 are also stored. The processing device 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

Generally, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 507 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; storage devices 506 including, for example, magnetic tape, hard disk, etc.; and a communication device 509. The communication means 509 may allow the mobile terminal 500 to perform wireless or wired communication with other devices to exchange data. While fig. 5 illustrates a mobile terminal 500 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 509, or installed from the storage means 506, or installed from the ROM 502. The computer program performs the above-described functions defined in the methods of the embodiments of the present disclosure when executed by the processing device 501.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the mobile terminal; or may exist separately and not be incorporated into the mobile terminal.

The computer readable medium carries one or more programs which, when executed by the mobile terminal, cause the mobile terminal to: determining human body key points in a video frame of a target user live video; and determining the spatial position of the human key points in the video frame in the coordinate system of the video shooting equipment according to the parameters of the video shooting equipment and the standard length of the human key point connecting line corresponding to the target user.

Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of methods, apparatus, mobile terminals, and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules, units and sub-units described in the embodiments of the present disclosure may be implemented by software, or may be implemented by hardware. For example, the image acquisition module may be further described as a "module for acquiring a reference video image of a target user", the first position determination unit may be further described as a "unit for identifying a key point of a human body in the reference video image and determining an image position of the key point of the human body", and the first determination subunit may be further described as a "subunit for determining a spatial position of the neck, the left shoulder, and the right shoulder in a coordinate system of the video photographing apparatus based on three rays respectively corresponding to the neck, the left shoulder, and the right shoulder and starting from the position of the video photographing apparatus, a standard length of a connecting line between the neck and the left shoulder, and a standard length of a connecting line between the neck and the right shoulder.

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

In accordance with one or more embodiments of the present disclosure, an example provides a motion capture method, including:

determining human body key points in a video frame of a target user live video;

and determining the spatial position of the human key points in the video frame in a video shooting device coordinate system according to the parameters of the video shooting device and the standard length of the human key point connecting line corresponding to the target user.

In accordance with one or more embodiments of the present disclosure, example two provides a motion capture method, on the basis of the motion capture method of example one, before determining a human body key point in a video frame of a target user live video, further comprising:

acquiring a reference video image of a target user; the action of a target user in the reference video image is a preset standard action, and the preset standard action is an action that all human body key points are located on the same vertical plane;

and determining the standard length of a human body key point connecting line corresponding to the target user according to the reference video image and the parameters of the video shooting equipment.

According to one or more embodiments of the present disclosure, example three provides a motion capture method, and on the basis of the motion capture method of example two, the determining a standard length of a human body key point connecting line corresponding to a target user according to the reference video image and parameters of a video shooting device includes:

identifying human key points in the reference video image, and determining the image positions of the human key points;

determining a ray which takes the position of the video shooting equipment as a starting point and corresponds to the image position of the human body key point according to the image position of the human body key point and the parameters of the video shooting equipment;

identifying the reference video image to acquire a head image of the target user;

determining the depth matched with the head image of the target user according to the preset corresponding relation between the head image and the depth;

according to the ray and the depth matched with the head image of the target user, determining the spatial position of the human key point in a coordinate system of video shooting equipment;

and determining the standard length of the human body key point connecting line corresponding to the target user according to the space position.

In accordance with one or more embodiments of the present disclosure, example four provides a motion capture method, and on the basis of the motion capture method of example one, the determining human body key points in video frames of a target user live video includes:

identifying human key points in a video frame of a live video of a target user, and determining image positions of the human key points;

determining the spatial position of the human key points in the video frame in the coordinate system of the video shooting device according to the parameters of the video shooting device and the standard length of the human key point connecting line corresponding to the target user, including:

and obtaining the spatial position of the human key points in the video frame in the coordinate system of the video shooting equipment according to the ray and the standard length of the human key point connecting line corresponding to the target user.

Example five provides a motion capture method according to one or more embodiments of the present disclosure, and on the basis of the motion capture method of example four, the human body key points include: head, neck, left shoulder, right shoulder, left elbow, right elbow, left wrist, right wrist;

the human body key point connecting line comprises: a line between the head and the neck, a line between the neck and the left shoulder, a line between the neck and the right shoulder, a line between the left shoulder and the left elbow, a line between the right shoulder and the right elbow, a line between the left elbow and the left wrist, and a line between the right elbow and the right wrist.

According to one or more embodiments of the present disclosure, example six provides a motion capture method, and on the basis of the motion capture method of example five, the obtaining, according to the ray and a standard length of a connecting line of human key points corresponding to a target user, a spatial position of a human key point in a video frame in a coordinate system of a video capturing device includes:

determining spatial positions of the neck, the left shoulder and the right shoulder in a video shooting device coordinate system according to three rays which take the position of the video shooting device as a starting point and respectively correspond to the neck, the left shoulder and the right shoulder, the standard length of a connecting line between the neck and the left shoulder and the standard length of a connecting line between the neck and the right shoulder;

determining the spatial position of the head in the coordinate system of the video shooting device according to the spatial position of the neck in the coordinate system of the video shooting device, the standard length of a connecting line between the head and the neck and a ray which takes the position of the video shooting device as a starting point and corresponds to the head;

determining the spatial position of the left elbow in the coordinate system of the video shooting device according to the spatial position of the left shoulder in the coordinate system of the video shooting device, the standard length of a connecting line between the left shoulder and the left elbow and a ray which takes the position of the video shooting device as a starting point and corresponds to the left elbow;

determining the spatial position of the right elbow in the coordinate system of the video shooting device according to the spatial position of the right shoulder in the coordinate system of the video shooting device, the standard length of a connecting line between the right shoulder and the right elbow and a ray which takes the position of the video shooting device as a starting point and corresponds to the right elbow;

determining the spatial position of the left wrist in the coordinate system of the video shooting device according to the spatial position of the left elbow in the coordinate system of the video shooting device, the standard length of a connecting line between the left elbow and the left wrist and a ray which takes the position of the video shooting device as a starting point and corresponds to the left wrist;

and determining the spatial position of the right wrist in the coordinate system of the video shooting equipment according to the spatial position of the right elbow in the coordinate system of the video shooting equipment, the standard length of a connecting line between the right elbow and the right wrist and a ray which takes the position of the video shooting equipment as a starting point and corresponds to the right wrist.

According to one or more embodiments of the present disclosure, example seven provides a motion capture method, on the basis of the motion capture method of example one, after determining the spatial position of the human body key point in the video frame in the video capturing device coordinate system, the method further includes:

controlling a virtual character role corresponding to the target user to simulate the action of the target user according to the spatial position of the human body key point in the video frame in a video shooting device coordinate system to obtain a real-time virtual character video image;

adding the real-time virtual character video image into the video frame to obtain a mixed video frame;

and uploading the mixed video frame to a live broadcast platform.

Example eight provides, in accordance with one or more embodiments of the present disclosure, a motion capture device, comprising:

and the spatial position determining module is used for determining the spatial position of the human key points in the video frame in a video shooting device coordinate system according to the parameters of the video shooting device and the standard length of the human key point connecting line corresponding to the target user.

Example nine provides, in accordance with one or more embodiments of the present disclosure, a mobile terminal, comprising:

one or more processing devices;

storage means for storing one or more programs;

when executed by the one or more processing devices, cause the one or more processing devices to implement the motion capture method of any of examples one to seven.

Example ten provides, according to one or more embodiments of the present disclosure, a computer-readable storage medium having stored thereon a computer program that, when executed by a processor, implements a motion capture method as recited in any of examples one to seven.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

1. A motion capture method, comprising:

determining human body key points in a video frame of a target user live video;

determining the spatial position of the human key points in the video frame in a video shooting device coordinate system according to the parameters of the video shooting device and the standard length of the human key point connecting line corresponding to the target user, wherein the video shooting device coordinate system is a three-dimensional rectangular coordinate system established by taking the focusing center of the video shooting device as the origin and taking the optical axis of the video shooting device as the Z axis, and the standard length of the human key point connecting line corresponding to the target user is the actual length of the human key point connecting line of the target user measured in advance;

the determining of the human body key points in the video frame of the live video of the target user comprises the following steps:

2. The method of claim 1, further comprising, prior to determining human keypoints in video frames of a target user live video:

3. The method according to claim 2, wherein the determining a standard length of a connecting line of the human key points corresponding to the target user according to the reference video image and parameters of a video shooting device comprises:

4. The method of claim 1, wherein the human keypoints comprise: head, neck, left shoulder, right shoulder, left elbow, right elbow, left wrist, right wrist;

5. The method according to claim 4, wherein obtaining the spatial position of the human key points in the video frame in the coordinate system of the video capturing device according to the ray and the standard length of the connecting line of the human key points corresponding to the target user comprises:

6. The method of claim 1, after determining the spatial location of the human keypoints in the video frames in the video capture device coordinate system, further comprising:

and uploading the mixed video frame to a live broadcast platform.

7. A motion capture device, comprising:

the spatial position determining module is used for determining the spatial position of the human key points in the video frame in a video shooting device coordinate system according to the parameters of the video shooting device and the standard length of the human key point connecting line corresponding to the target user, wherein the video shooting device coordinate system is a three-dimensional rectangular coordinate system established by taking the focus center of the video shooting device as the origin and taking the optical axis of the video shooting device as the Z axis, and the standard length of the human key point connecting line corresponding to the target user is the actual length of the human key point connecting line of the target user measured in advance;

the key point determination module comprises: the second position determining unit is used for identifying human key points in a video frame of a live video of a target user and determining the image positions of the human key points;

the spatial position determination module includes: the second ray determining unit is used for determining a ray which takes the position of the video shooting equipment as a starting point and corresponds to the image position of the human body key point according to the image position of the human body key point and the parameters of the video shooting equipment; and the spatial position determining unit is used for obtaining the spatial position of the human key points in the video frame in the coordinate system of the video shooting equipment according to the ray and the standard length of the connecting line of the human key points corresponding to the target user.

8. A mobile terminal, characterized in that the mobile terminal comprises:

one or more processing devices;

storage means for storing one or more programs;

the one or more programs, when executed by the one or more processing devices, cause the one or more processing devices to implement the motion capture method of any of claims 1-6.

9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the motion capture method according to any one of claims 1-6.