CN114078279A

CN114078279A - Motion capture method, motion capture device, electronic device and storage medium

Info

Publication number: CN114078279A
Application number: CN202010799445.6A
Authority: CN
Inventors: 徐屹
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2020-08-11
Filing date: 2020-08-11
Publication date: 2022-02-22

Abstract

The present disclosure relates to a motion capture method, apparatus, electronic device, and storage medium, the method comprising: acquiring whole body motion video data of a target object through monocular acquisition equipment; recognizing whole body action video data of a target object through a neural network model to obtain motion information of a main body joint, wherein the main body joint comprises other joints except wrist joints in a human body; receiving wrist joint motion information output by handheld equipment on a target object, wherein the handheld equipment is provided with an inertial sensor; and aligning the main body joint motion information and the wrist joint motion information to obtain target action information of the target object. According to the scheme disclosed by the invention, when the whole body action video data of the target object is collected, the wrist joint movement information is collected by the handheld device provided with the inertial sensor, so that the target action information containing accurate wrist joint action information can be obtained, and the complete action capture is realized.

Description

Motion capture method, motion capture device, electronic device and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a motion capture method and apparatus, an electronic device, and a storage medium.

Background

Motion capture refers to a technique for estimating the body motion of a performer, and is widely used in industries such as CG movies and avatars. The real-time motion capture means that motion capture results can be obtained while performers perform motions, and the technology can enable various applications of the virtual image industry, such as live virtual idol, interaction of a virtual host and audiences, body magic effects in short video application scenes and the like.

Motion capture can be achieved in several ways, optical motion capture, inertial motion capture, and markerless visual motion capture. Optical motion capture is a motion capture scheme with high capture quality in the prior art and is a common means in the three-dimensional animation movie production process. Inertial motion capture is widely used in semi-professional content production such as virtual idols. For optical motion capture and inertial motion capture, a performer is required to wear a professional garment and to use a professional acquisition device for motion acquisition, which is a problem of high cost. The unmarked visual motion capture mode solves the problem of high motion capture cost, and a performer can acquire motion by using a common mobile phone or a computer with a common camera. However, in the related art, the unmarked visual motion capture mode is difficult to capture the clear motion state of the wrist of the performer due to motion blur, too far distance between the hand and the camera when the performer draws the whole body, and other factors, so that it is difficult to obtain complete motion information.

Disclosure of Invention

The present disclosure provides a motion capture method, device, electronic device and storage medium, to at least solve the problem in the related art that it is difficult to obtain complete motion information in the motion capture process. The technical scheme of the disclosure is as follows:

according to a first aspect of embodiments of the present disclosure, there is provided a motion capture method, comprising:

acquiring whole body motion video data of a target object through monocular acquisition equipment;

recognizing whole body action video data of a target object through a neural network model to obtain motion information of a main body joint, wherein the main body joint comprises other joints except wrist joints in a human body;

receiving wrist joint motion information output by handheld equipment on a target object, wherein the handheld equipment is provided with an inertial sensor;

and aligning the main body joint motion information and the wrist joint motion information to obtain target action information of the target object.

In one embodiment, aligning the subject joint motion information and the wrist joint motion information to obtain target motion information of the target object includes:

acquiring main body joint motion information corresponding to each frame of video image in whole body motion video data;

for each frame of video image in the whole body action video data, obtaining wrist joint movement information which is closest to each frame of video image in terms of time, and taking the wrist joint movement information as wrist joint movement information corresponding to each frame of video image;

and generating a frame of target action information according to the main body joint motion information and the wrist joint motion information corresponding to each frame of video image.

In one embodiment, the subject joint movement information includes rotation angles of a plurality of joints, and the wrist joint movement information includes wrist joint directions; generating a frame of target action information according to the main body joint motion information and the wrist joint motion information corresponding to each frame of video image, wherein the method comprises the following steps:

acquiring a preset sequence of a plurality of joints, wherein the joints comprise elbow joints, and the sequence of the elbow joints is the last joint of the joints;

sequentially rotating each joint according to the sequence of the joints and the rotation angle of each joint until the elbow joint is rotated to obtain the elbow joint direction as the arm direction;

acquiring a rotation angle between the direction of a wrist joint and the direction of an arm as the rotation angle of the wrist joint;

and taking the rotation angles of a plurality of joints and the rotation angle of a wrist joint corresponding to each frame of video image as one frame of target action information.

In one embodiment, the handheld devices include a left-handed handheld device and a right-handed handheld device; the method for aligning the main body joint motion information and the wrist joint motion information to obtain the target action information of the target object comprises the following steps:

and aligning the main body joint motion information, the wrist joint motion information corresponding to the left-hand handheld device and the wrist joint motion information corresponding to the right-hand handheld device to obtain target action information of the target object.

In one embodiment, before receiving the wrist joint movement information output by the handheld device on the target object, the method further comprises:

receiving a calibration request sent by a handheld device;

and responding to the calibration request, and performing zero calibration on the rotation direction of the handheld device under a world coordinate system, wherein the world coordinate system is the coordinate system of the monocular acquisition device.

In one embodiment, in response to a calibration request, zeroing calibration of the rotation direction of the handheld device in the world coordinate system comprises:

zeroing the rotational Euler angle of the handheld device in response to the calibration request, wherein,

when zeroing calibration is performed, the handheld device is pointed vertically for the monocular acquisition device, and the screen of the handheld device is facing directly above.

In one embodiment, the inertial sensor is a gyroscope.

According to a second aspect of the embodiments of the present disclosure, there is provided a motion capture apparatus, comprising:

a collection module configured to perform a whole body motion video data collection of a target object by a monocular collection device;

the recognition module is configured to recognize whole body action video data of the target object through the neural network model to obtain main body joint motion information, wherein the main body joints comprise other joints except wrist joints in the human body;

a receiving module configured to perform receiving wrist joint movement information output by a handheld device on a target object, the handheld device being provided with an inertial sensor;

and the motion information generating module is configured to perform alignment processing on the main body joint motion information and the wrist joint motion information to obtain target motion information of the target object.

In one embodiment, the action information generating module includes:

an acquisition module configured to perform acquisition of subject joint movement information corresponding to each frame of video image in the whole-body motion video data

The alignment module is configured to execute the steps of acquiring wrist joint movement information which is closest to each frame of video image in time as the wrist joint movement information corresponding to each frame of video image for each frame of video image in the whole body movement video data;

and the one-frame action information generation module is configured to execute generation of one-frame target action information according to the body joint movement information and the wrist joint movement information corresponding to each frame of video image.

In one embodiment, the subject joint movement information includes rotation angles of a plurality of joints, and the wrist joint movement information includes wrist joint directions; a frame action information generation module, comprising:

an acquisition unit configured to perform an order of acquiring a plurality of joints configured in advance, the plurality of joints including an elbow joint, an order of the elbow joint being a last one of the plurality of joints;

the arm direction generating unit is configured to execute sequencing according to a plurality of joints, and sequentially rotate each joint according to the rotation angle of each joint until the elbow joint is rotated to obtain the elbow joint direction as the arm direction;

a wrist joint rotation angle generation unit configured to perform acquisition of a rotation angle between a wrist joint direction and an arm direction as a rotation angle of a wrist joint;

and a frame motion information generating unit configured to execute, as one frame target motion information, rotation angles of a plurality of joints and rotation angles of wrist joints corresponding to each frame video image.

In one embodiment, the handheld devices include a left-handed handheld device and a right-handed handheld device; and the action information generating module is configured to perform alignment processing on the main body joint motion information, the wrist joint motion information corresponding to the left-hand handheld device and the wrist joint motion information corresponding to the right-hand handheld device to obtain target action information of the target object.

In one embodiment, the receiving module is further configured to perform receiving a calibration request sent by the handheld device;

the apparatus also includes a calibration module configured to perform a zeroing calibration of a rotational direction of the handheld device in response to the calibration request in a world coordinate system of the monocular acquisition device.

In one embodiment, the calibration module is configured to perform zeroing of a rotational euler angle of the handheld device in response to a calibration request, wherein, in zeroing calibration, the handheld device is pointed vertically for monocular acquisition devices and the screen of the handheld device is facing directly above.

In one embodiment, the inertial sensor is a gyroscope.

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement a motion capture method as described in any embodiment of the first aspect.

According to a fourth aspect of embodiments of the present disclosure, there is provided a storage medium having instructions that, when executed by a processor of an electronic device, enable the electronic device to perform a motion capture method as described in any one of the embodiments of the first aspect.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising a computer program stored in a readable storage medium, from which the at least one processor of the device reads and executes the computer program, causing the device to perform the motion capture method as described in any one of the embodiments of the first aspect.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

acquiring whole body motion video data of a target object through monocular acquisition equipment; recognizing whole body action video data through a neural network model to obtain main joint motion information; collecting wrist joint motion information through a handheld device provided with an inertial sensor; and aligning the main body joint motion information and the wrist joint motion information to obtain target action information. According to the scheme disclosed by the invention, on one hand, the real-time motion capture system based on multi-mode input is adopted to respectively acquire the main body joint motion information and the wrist joint motion information, and when a performer makes motions such as a hand (for example, salute and clap) with high precision requirement on the wrist orientation, and holds a prop (for example, fencing and cup lifting), the target motion information containing the accurate wrist joint motion information can be obtained, so that complete motion capture is realized. On the other hand, the technical scheme carries out motion capture based on a pure vision method, and realizes complete motion acquisition within the cost of civilization.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

FIG. 1 is a diagram of an application environment illustrating a method of motion capture in accordance with an exemplary embodiment.

FIG. 2 is a flow diagram illustrating a method of motion capture in accordance with an exemplary embodiment.

FIG. 3 is a flowchart illustrating a step of generating a frame of target action information according to an example embodiment.

FIG. 4 is a flowchart illustrating one step of generating target action information in accordance with an exemplary embodiment.

FIG. 5 is a flowchart illustrating one calibration step according to an exemplary embodiment.

FIG. 6 is a flow diagram illustrating a method of motion capture in accordance with an exemplary embodiment.

FIG. 7 is a block diagram illustrating a motion capture device, according to an example embodiment.

Fig. 8 is an internal block diagram of an electronic device shown in accordance with an example embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

The motion capture method provided by the present disclosure can be applied to the application environment as shown in fig. 1. The application environment includes a terminal 110, a monocular acquisition device 120, and a handheld device 130. The terminal 110 is connected with the monocular acquisition device 120 and is connected with the handheld device 130 through a network. The monocular acquisition device 120 may be a built-in device of the terminal 110, or may be a separate device, which is not limited herein. The handheld device 130 is a device with an inertial sensor built therein for acquiring wrist joint movement information of the target object. A motion capture system for capturing motion based on a visual method is deployed in the terminal 110, and is used for processing the whole body motion video data acquired by the monocular acquisition device 120 in real time to obtain main body joint motion information corresponding to the whole body motion video data; the handheld terminal is used for receiving the wrist joint motion information sent by the handheld terminal, aligning the main body joint motion information and the wrist joint motion information and obtaining target action information of a target object. Further, the terminal 110 may drive the three-dimensional model through the aligned body joint motion information and wrist joint motion information, so as to obtain a body joint motion and a wrist joint motion of the three-dimensional model. The terminal 110 further includes a display screen for playing the body joint movements and wrist joint movements of the three-dimensional model, thereby enabling the user to preview the collected complete movements.

Specifically, the terminal 110 acquires the whole body motion video data of the target object acquired by the monocular acquisition device 120; the terminal 110 identifies whole body motion video data through a neural network model to obtain main joint motion information; the terminal 110 receives the wrist joint movement information output by the handheld device 130 on the target object; and aligning the main body joint motion information and the wrist joint motion information to obtain target action information of the target object. The terminal 110 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices. Monocular capturing device 120 may be, but is not limited to, at least one of a video camera, a still camera, and a camera mounted on an electronic device. The handheld device 130 may be, but is not limited to, various smart phones, tablet computers, and portable wearable devices provided with inertial sensors.

Fig. 2 is a flowchart illustrating a motion capture method according to an exemplary embodiment, where the motion capture method is used in the terminal 110, as shown in fig. 2, and includes the following steps.

In step S210, whole body motion video data of the target subject is captured by the monocular capturing device.

Wherein, monocular acquisition equipment is to adopt a video acquisition equipment to gather. The target object refers to an object that is performing an action performance, and the target object will be described as a user in the following embodiments. Specifically, motion capture is based on a set of whole-body motions that require performance. When motion capture is required, a user may launch a motion capture system deployed in the terminal. And acquiring the video data of the whole body action by adopting monocular acquisition equipment. The terminal acquires the whole-body action video data acquired by the monocular acquisition equipment in real time.

In step S220, the whole body motion video data of the target object is recognized by the neural network model, and the subject joint motion information is obtained.

The main joints include other joints in the human body besides wrist joints, such as spinal joints, clavicle joints, shoulder joints, elbow joints, ankle joints, crotch joints, and the like. The trained neural network model is deployed in the terminal and used for identifying the whole body action video data to obtain the main body joint motion information. The neural network model may be any model that can be used for body pose estimation, for example, DensePose (real-time body pose estimation model), OpenPose (key point detection model). Specifically, the terminal collects the whole body motion video data of the user in real time through monocular collecting equipment. And inputting the whole body motion video data into the trained neural network model, and identifying image frames in the whole body motion video data through the neural network model to obtain the main body joint motion information.

In step S230, wrist joint movement information output by a handheld device on a target object, the handheld device being provided with an inertial sensor, is received.

Among them, the inertial sensor is a sensor for detecting and measuring motion information of the wrist joint, for example, information such as acceleration, rotation, and orientation of the wrist joint. The inertial sensor is not limited to a gyroscope. Specifically, during the action performance of the user, the hand of the user holds a handheld device with an internal inertial sensor. In the process of action performance of a user, along with the rotation of the wrist, the handheld device can acquire wrist joint motion information in real time and send the acquired wrist joint motion information to the terminal in real time, so that the terminal can further process the received wrist joint motion information.

In step S240, the subject joint movement information and the wrist joint movement information are aligned to obtain target motion information of the target object.

Specifically, after receiving the wrist joint movement information sent by the handheld device, the terminal performs time alignment processing on the wrist joint movement information and the main body joint movement information. That is, the wrist joint movement information and the subject joint movement information that are closest at the same time or time are acquired as the complete target movement information.

In the motion capture method, on one hand, the real-time motion capture system based on multi-mode input is adopted to respectively collect the main body joint motion information and the wrist joint motion information, and when a performer performs motions such as a hand movement (such as salutation and clapping) with high precision requirement on the wrist orientation and a hand-held prop (such as swinging a sword and lifting a cup), the target motion information containing accurate wrist joint motion information can be obtained, so that complete motion capture is realized. On the other hand, the technical scheme carries out motion capture based on a pure vision method, and realizes complete motion acquisition within the cost of civilization.

In an exemplary embodiment, the handheld devices include a left-handed handheld device and a right-handed handheld device; in step S240, the alignment processing is performed on the subject joint movement information and the wrist joint movement information to obtain target motion information of the target object, and the target motion information includes: and aligning the main body joint motion information, the wrist joint motion information corresponding to the left-hand handheld device and the wrist joint motion information corresponding to the right-hand handheld device to obtain target action information of the target object.

Specifically, when the user needs to collect the wrist joint movement information of both hands, one handheld device provided with an inertial sensor may be held by both hands. The wrist joint motion information of the left wrist is collected through the left-hand handheld device, and the wrist joint motion information of the right wrist is collected through the right-hand handheld device. The terminal receives wrist joint motion information output by the left-hand handheld device and the right-hand handheld device respectively in real time. And aligning the main body joint motion information, the left hand wrist joint motion information output by the left hand handheld device and the right hand wrist joint motion information output by the right hand handheld device to obtain target action information. In this embodiment, the left hand and the right hand of the user are respectively provided with the handheld device, so that the wrist joint motion information of both hands can be collected, complete motion information can be obtained, and the applicability of the motion capture method can be improved.

In an exemplary embodiment, as shown in fig. 3, in step S240, the alignment processing is performed on the subject joint movement information and the wrist joint movement information to obtain the target motion information of the target object, which may be specifically implemented by the following steps:

in step S310, subject joint movement information corresponding to each frame of video image in the whole-body motion video data is acquired.

In step S320, for each frame of video image in the whole-body motion video data, wrist joint movement information that is temporally closest to each frame of video image is acquired as wrist joint movement information corresponding to each frame of video image.

In step S330, one frame of target motion information is generated based on the subject joint motion information and the wrist joint motion information corresponding to each frame of video image.

Specifically, the whole body motion video data acquired by the terminal through the monocular acquisition device carries a first time stamp, and the received wrist joint motion information output by the handheld device carries a second time stamp. Since the refresh rate of the inertial sensor of the handheld device is usually much higher than the video frame rate of the monocular capturing device, for example, the refresh rate of the inertial sensor may be up to 100fps (frames per second) or more, while the video frame rate of the monocular capturing device is 30 fps. Therefore, for each frame of video image in the whole-body motion video data, the second timestamp with the nearest time can be searched according to the first timestamp corresponding to each frame of video image. And taking the main body joint motion information corresponding to each frame of video image and the wrist joint motion information of the second time stamp corresponding to each frame of video image as complete target motion information of one frame.

In this embodiment, the whole body movement video data and the wrist joint movement information are collected in real time based on multi-modal input, and the whole body movement video data and the wrist joint movement information are strictly time-synchronized, so that accurate and clear complete movement can be obtained.

In an exemplary embodiment, the subject joint movement information includes rotation angles of a plurality of joints, and the wrist joint movement information includes wrist joint directions; as shown in fig. 4, in step S330, generating one frame of target motion information from the subject joint motion information and the wrist joint motion information corresponding to each frame of video image includes:

in step S410, a pre-configured ordering of a plurality of joints is obtained, the plurality of joints including an elbow joint, the order of the elbow joint being a last one of the plurality of joints.

In step S420, according to the sequence of the plurality of joints, sequentially performing rotation processing on each joint according to the rotation angle of each joint until the elbow joint is subjected to rotation processing to obtain an elbow joint direction as an arm direction.

Specifically, after receiving the wrist joint direction output by the handheld device, the terminal performs time alignment processing on the rotation angles of the multiple joints and the wrist joint direction for each frame of video image in the whole body motion video data to obtain the rotation angles of the multiple joints and the wrist joint direction corresponding to each frame of video image. And then, sequentially rotating the rotation angles of the joints by adopting a chain rule according to the preset sequence of the joints until the rotation angles reach the elbow joints to obtain the elbow joint direction. Illustratively, assume that the plurality of joints include a crotch joint, several spinal joints, a clavicle joint, a shoulder joint, and an elbow joint. The crotch joint can be used as a father node and sequentially passes through a plurality of spinal joints, clavicle joints, shoulder joints and elbow joints from the crotch joint to form a chain structure. The rotation angle of each joint is represented by a matrix. And carrying out matrix multiplication on a plurality of passing spinal joints, clavicle joints, shoulder joints and elbow joints in sequence from the hip joint to obtain the elbow joint direction.

In step S430, the rotation angle between the wrist joint direction and the arm direction is acquired as the rotation angle of the wrist joint.

In step S440, the rotation angles of the plurality of joints and the rotation angle of the wrist joint corresponding to each frame of the video image are used as one frame of target motion information.

Specifically, after the elbow joint direction and the corresponding wrist joint direction are acquired, the rotation angle between the wrist joint direction and the arm direction is calculated as the rotation angle of the wrist joint. And regarding each frame of video image, taking the rotation angles of a plurality of joints and the corresponding rotation angles of the wrist joints as complete target action information of one frame.

In this embodiment, a frame of complete body motion can be obtained by obtaining the rotation angles of the plurality of joints and the rotation angle of the wrist joint in a frame, so that the motion capture is complete.

In an exemplary embodiment, as shown in fig. 5, before receiving the wrist joint movement information output by the handheld device on the target object, the method further comprises the following steps:

in step S510, a calibration request from a handheld device is received.

In step S520, in response to the calibration request, the rotation direction of the handheld device in the world coordinate system is subjected to the zeroing calibration, and the world coordinate system is the coordinate system of the monocular acquisition device.

In particular, in order to be able to fuse the subject articulation information and the wrist articulation information output by the handheld device, it is necessary to correct the orientation of the handheld device to the coordinate system of the monocular acquisition device. The user can hold the handheld device by hands according to the prompt information displayed by the terminal or the handheld device. And triggering a calibration request through the handheld device, and sending the calibration request to the terminal. The terminal performs a zeroing calibration of the rotational direction of the handset in response to the received calibration request.

In the embodiment, before motion capture, the orientation of the handheld device is corrected to the coordinate system of the monocular acquisition device by performing zeroing calibration on the rotation direction of the handheld device, so that the main body joint motion information and the wrist joint motion information can be conveniently fused subsequently, and the accuracy of motion capture can be improved.

In an exemplary embodiment, in response to the calibration request, zero calibrating the rotation direction of the handheld device in the world coordinate system includes: zeroing the rotational Euler angle of the handheld device in response to the calibration request, wherein the handheld device is pointed vertically for the monocular acquisition device and the screen of the handheld device is facing directly above when zeroing the calibration.

Wherein, Euler's angle is used to confirm three independent angle parameters of fixed point rotation rigid body position. In the present embodiment, the euler angle is used to characterize the rotation angle of the wrist joint in the world coordinate system. Specifically, the terminal responds to the calibration request, and performs zero setting on the Euler angle of the handheld device under the world coordinate system of the monocular acquisition device, so that the coordinate system of the handheld device is converted into the coordinate system of the monocular acquisition device. In the present embodiment, the grip posture and orientation of the hand-held device at the time of the zeroing calibration are exemplified. When zeroing calibration is performed, the handheld device is vertically directed to the monocular acquisition device, and the screen of the handheld device faces right above. Further, in the process of collecting the wrist joint movement information of the user, the long edges of the handheld device can be parallel to the roots of the four fingers except the thumb, the top end of the handheld device is located on one side of the thumb, the handheld device is held in a posture that the screen of the handheld device faces back to the palm, and the handheld device is kept not to move relative to the palm any more.

FIG. 6 is a flow diagram illustrating a method of motion capture in accordance with an exemplary embodiment. In this embodiment, applications that can be used in cooperation with each other may be deployed in the terminal and the handheld device, and the application used in cooperation with each other is not limited to implement the steps shown in fig. 6. As shown in fig. 6, the motion capture method includes the following steps.

In step S601, a calibration request from a handheld device is received.

In step S602, in response to the calibration request, the rotation direction of the handheld device in the world coordinate system is subjected to the zeroing calibration, and the world coordinate system is the coordinate system of the monocular acquisition device. Specifically, during zeroing calibration, the user may hold the handheld device in a certain taking posture: the handheld device is vertically directed to the monocular acquisition device, and the screen of the handheld device faces directly above.

In step S603, whole body motion video data of the target subject is captured by the monocular capturing device. The first timestamp is carried in the whole-body motion video data. The monocular acquisition device may be a WEBCAM (WEBCAM) camera.

In step S604, the body posture capture algorithm is used to identify the whole body motion video data, and the rotation angle of the joint of the subject is obtained. The body posture capturing algorithm can be a real-time algorithm, namely, the body posture capturing algorithm is used for identifying the whole body motion video data while obtaining the whole body motion video data, so that the rotation angle of the joint of the main body is obtained.

In step S605, the wrist joint direction output by the handheld device is received. The wrist joint orientation carries a second time stamp. The handheld device can be, but is not limited to, various smart phones, tablet computers and portable wearable devices with built-in gyroscopes. The wrist joint direction can be acquired by a gyroscope in the handheld device.

In step S606, for each frame of video image in the whole-body motion video data, subject joint movement information corresponding to each frame of video image, and the wrist joint direction that is temporally closest to each frame of video image are acquired.

Specifically, the second timestamp with the closest time can be searched according to the first timestamp corresponding to each frame of video image. And taking the wrist joint direction corresponding to the second time stamp as the wrist joint direction corresponding to each frame of video image.

In step S607, a pre-configured ordering of a plurality of joints including an elbow joint in an order of a last one of the plurality of joints is obtained.

In step S608, the joints are sequentially rotated according to the rotation angle of each joint in the sequence of the joints until the elbow joint is rotated to obtain the elbow joint direction as the arm direction. The rotation process performed by the plurality of joints may adopt a chain rule, and may be specifically described with reference to the embodiment shown in fig. 4, which is not specifically described herein.

In step S609, the rotation angle between the wrist joint direction and the arm direction is acquired as the rotation angle of the wrist joint.

In step S610, the rotation angles of the plurality of joints and the rotation angle of the wrist joint corresponding to each frame of the video image are used as one frame of target motion information.

Further, after a frame of target action information is acquired, the predefined three-dimensional model can be driven in real time through the frame of target action information to obtain a frame of complete action of the three-dimensional model, and the complete action of the three-dimensional model is played in real time on a screen. The body (except for the wrist) motion and the wrist motion of the three-dimensional model come from different devices, but after the calibration and alignment processing of the method, the user can see the three-dimensional model driven by the complete motion in real time. The three-dimensional model may be a three-dimensional avatar. Various three-dimensional avatars can be included in the motion capture system of the terminal, such as live virtual idols, virtual presenters, and body magic effects in short video application scenarios. The user can select a proper three-dimensional model for use according to actual requirements.

It should be understood that although the various steps in the flow charts of fig. 1-6 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 1-6 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternating with other steps or at least some of the sub-steps or stages in other steps.

FIG. 7 is a block diagram illustrating a motion capture device 700, according to an example embodiment. Referring to fig. 7, the apparatus 700 includes an acquisition module 701, a recognition module 702, a receiving module 703, and an action information generation module 704.

An acquisition module 701 configured to perform acquisition of whole body motion video data of a target object by a monocular acquisition device;

a recognition module 702 configured to perform recognition of whole body motion video data of the target object through a neural network model, resulting in subject joint motion information;

a receiving module 703 configured to perform receiving wrist joint movement information output by a handheld device on a target object, the handheld device being provided with an inertial sensor;

and the motion information generating module 704 is configured to perform alignment processing on the subject joint motion information and the wrist joint motion information to obtain target motion information of the target object.

In an exemplary embodiment, the action information generating module 704 includes: the acquisition module is configured to acquire subject joint movement information corresponding to each frame of video image in the whole body action video data; the alignment module is configured to execute the steps of acquiring wrist joint movement information which is closest to each frame of video image in time as the wrist joint movement information corresponding to each frame of video image for each frame of video image in the whole body movement video data; and the one-frame action information generation module is configured to execute generation of one-frame target action information according to the body joint movement information and the wrist joint movement information corresponding to each frame of video image.

In an exemplary embodiment, the subject joint movement information includes rotation angles of a plurality of joints, and the wrist joint movement information includes wrist joint directions; a frame action information generation module, comprising: an acquisition unit configured to perform an order of acquiring a plurality of joints configured in advance, the plurality of joints including an elbow joint, an order of the elbow joint being a last one of the plurality of joints; the arm direction generating unit is configured to execute sequencing according to a plurality of joints, and sequentially rotate each joint according to the rotation angle of each joint until the elbow joint is rotated to obtain the elbow joint direction as the arm direction; a wrist joint rotation angle generation unit configured to perform acquisition of a rotation angle between a wrist joint direction and an arm direction as a rotation angle of a wrist joint; and a frame motion information generating unit configured to execute, as one frame target motion information, rotation angles of a plurality of joints and rotation angles of wrist joints corresponding to each frame video image.

In an exemplary embodiment, the handheld devices include a left-handed handheld device and a right-handed handheld device; and the action information generating module 704 is configured to perform alignment processing on the subject joint motion information, the wrist joint motion information corresponding to the left-hand handheld device, and the wrist joint motion information corresponding to the right-hand handheld device, so as to obtain target action information of the target object.

In an exemplary embodiment, the receiving module 703 is further configured to perform receiving a calibration request sent by the handheld device;

the apparatus 700 further includes a calibration module configured to perform a zeroing calibration of a rotational direction of the handheld device in response to the calibration request in a world coordinate system of the monocular acquisition device.

In an exemplary embodiment, the calibration module is configured to perform zeroing of a rotational euler angle of the handheld device in response to a calibration request, wherein, in zeroing calibration, the handheld device is pointed vertically for monocular acquisition devices with the screen of the handheld device facing directly above.

In an exemplary embodiment, the inertial sensor is a gyroscope.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 8 is a block diagram illustrating a device 800 for motion capture according to an example embodiment. For example, the device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a gaming console, a tablet device, a medical device, a fitness device, a personal digital assistant, and so forth.

Referring to fig. 8, device 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communications component 816.

The processing component 802 generally controls overall operation of the device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operation at the device 800. Examples of such data include instructions for any application or method operating on device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile storage devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power component 806 provides power to the various components of the device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device 800.

The multimedia component 808 includes a screen that provides an output interface between the device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 800 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the device 800. For example, the sensor assembly 814 may detect the open/closed state of the device 800, the relative positioning of components, such as a display and keypad of the device 800, the sensor assembly 814 may also detect a change in the position of the device 800 or a component of the device 800, the presence or absence of user contact with the device 800, orientation or acceleration/deceleration of the device 800, and a change in the temperature of the device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

Communications component 816 is configured to facilitate communications between device 800 and other devices in a wired or wireless manner. The device 800 may access a wireless network based on a communication standard, such as WiFi, an operator network (such as 2G, 3G, 4G, or 5G), or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 804 comprising instructions, executable by the processor 820 of the device 800 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A motion capture method, comprising:

identifying whole body motion video data of the target object through a neural network model to obtain motion information of main body joints, wherein the main body joints comprise other joints except wrist joints in a human body;

receiving wrist joint movement information output by handheld equipment on the target object, wherein the handheld equipment is provided with an inertial sensor;

2. The motion capture method according to claim 1, wherein the aligning the subject joint motion information and the wrist joint motion information to obtain target motion information of the target object comprises:

acquiring main body joint motion information corresponding to each frame of video image in the whole body motion video data;

for each frame of video image in the whole body motion video data, obtaining wrist joint motion information which is closest to each frame of video image in terms of time, and using the wrist joint motion information as wrist joint motion information corresponding to each frame of video image;

3. The motion capture method of claim 2, wherein the subject joint motion information comprises rotation angles of a plurality of joints, the wrist joint motion information comprising wrist joint directions; generating a frame of target action information according to the main body joint motion information and the wrist joint motion information corresponding to each frame of video image, wherein the method comprises the following steps:

obtaining a preconfigured ordering of the plurality of joints, including an elbow joint, the order of the elbow joint being a last of the plurality of joints;

sequentially rotating each joint according to the sequence of the joints and the rotation angle of each joint until the elbow joint direction is obtained after the elbow joint is rotated and used as the arm direction;

acquiring a rotation angle between the direction of the wrist joint and the direction of the arm as the rotation angle of the wrist joint;

and taking the rotation angles of the plurality of joints and the rotation angle of the wrist joint corresponding to each frame of video image as the target action information of one frame.

4. The motion capture method of claim 1, wherein the handheld devices comprise a left-handed handheld device and a right-handed handheld device; the aligning the main body joint motion information and the wrist joint motion information to obtain the target action information of the target object includes:

5. The motion capture method of claim 1, further comprising, prior to said receiving wrist joint motion information output by a handheld device on the target object:

receiving a calibration request sent by the handheld device;

and responding to the calibration request, and performing zero calibration on the rotation direction of the handheld device in a world coordinate system, wherein the world coordinate system is a coordinate system of the monocular acquisition device.

6. The motion capture method of claim 5, wherein the zero calibration of the rotational orientation of the handheld device in a world coordinate system in response to the calibration request comprises:

zeroing a rotational Euler angle of the handheld device in response to the calibration request, wherein,

7. The motion capture method of any one of claims 1 to 6, wherein the inertial sensor is a gyroscope.

8. A motion capture device, comprising:

the recognition module is configured to recognize the whole body action video data of the target object through a neural network model to obtain the motion information of the main body joints, wherein the main body joints comprise other joints except wrist joints in the human body;

a receiving module configured to perform receiving wrist joint movement information output by a handheld device on the target object, the handheld device being provided with an inertial sensor;

9. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the motion capture method of any of claims 1 to 7.

10. A storage medium, wherein instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the motion capture method of any of claims 1-7.