CN115311472A - Motion capture method and related equipment - Google Patents

Motion capture method and related equipment Download PDF

Info

Publication number
CN115311472A
CN115311472A CN202210600917.XA CN202210600917A CN115311472A CN 115311472 A CN115311472 A CN 115311472A CN 202210600917 A CN202210600917 A CN 202210600917A CN 115311472 A CN115311472 A CN 115311472A
Authority
CN
China
Prior art keywords
dimensional
shooting
shot
videos
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210600917.XA
Other languages
Chinese (zh)
Inventor
刘书颖
刘宏达
吴文斌
谷统伟
林悦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Netease Hangzhou Network Co Ltd
Original Assignee
Netease Hangzhou Network Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Netease Hangzhou Network Co Ltd filed Critical Netease Hangzhou Network Co Ltd
Priority to CN202210600917.XA priority Critical patent/CN115311472A/en
Publication of CN115311472A publication Critical patent/CN115311472A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/48Matching video sequences
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Artificial Intelligence (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Length Measuring Devices By Optical Means (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application provides a motion capture method and related equipment, and relates to the field of artificial intelligence. The method comprises the following steps: and shooting the shot object in multiple positions to obtain multiple shot videos, wherein the shooting visual angles of the multiple shot videos are different. Two-dimensional coordinates of a key point of a photographic subject corresponding to each of the plurality of photographic videos are determined. And constructing a plurality of three-dimensional postures of the shot object according to the two-dimensional coordinates of the key points of the shot object corresponding to each shot video, wherein the plurality of shot videos correspond to the plurality of three-dimensional postures one to one. And determining relative position information among the plurality of machine positions according to the plurality of three-dimensional postures. And constructing a three-dimensional posture of the shot object according to the relative position information among the multiple machine positions and the two-dimensional coordinates of the key points of the shot object corresponding to each shot video. And determining the action state of the shooting object according to the three-dimensional posture of the shooting object.

Description

Motion capture method and related equipment
Technical Field
The present application relates to the field of artificial intelligence, and in particular, to a motion capture method and related device.
Background
In recent years, with the rapid development of machine learning, particularly the related technology in the deep learning field, the human body posture estimation technology based on images has also been in breakthrough development. The human body posture estimation mainly positions human body joints in images or videos so as to capture and recognize human body actions in the following process, and can be widely applied to animation production processes of a plurality of films and games.
The existing human motion capturing and recognizing process is to shoot monocular video by using shooting equipment. Each frame image of the monocular video is a picture shot by one device under one shooting visual angle. And then, performing three-dimensional reconstruction by using a two-dimensional picture in the monocular video, and finally identifying the action posture of the shooting object in a three-dimensional space.
In the above method, since the images in the monocular video are both two-dimensional, the lack of depth information makes the accuracy and precision of the finally obtained three-dimensional pose not meet the requirements. The problems of expensive equipment, extremely high site requirement, strong constraint on the shot object and the like exist when the special capture equipment is used for capturing the motion. Therefore, how to simply and efficiently capture the human motion becomes a problem to be solved urgently.
Disclosure of Invention
In view of the above, the present application provides a motion capture method, which determines relative position information between multiple machine positions by using different shooting perspectives of multiple monocular videos, and then constructs a three-dimensional stereo pose of a shooting object according to the relative position between the multiple machine positions and two-dimensional information provided by the multiple monocular videos. Therefore, the accuracy and precision of the three-dimensional posture can be improved, and the human body action can be captured efficiently and accurately finally.
A first aspect of an embodiment of the present application provides a motion capture method, including:
and shooting the shot object in multiple positions to obtain a plurality of shot videos, wherein the shooting visual angles of the plurality of shot videos are different.
Two-dimensional coordinates of key points of a photographic object corresponding to each of the plurality of photographic videos are determined.
And constructing a plurality of three-dimensional postures of the shot object according to the two-dimensional coordinates of the key points of the shot object corresponding to each shot video, wherein the plurality of shot videos correspond to the plurality of three-dimensional postures one to one.
And determining relative position information among the plurality of machine positions according to the plurality of three-dimensional postures.
And constructing a three-dimensional posture of the shot object according to the relative position information among the multiple machine positions and the two-dimensional coordinates of the key points of the shot object corresponding to each shot video.
And determining the action state of the shooting object according to the three-dimensional posture of the shooting object.
In an optional embodiment, determining relative position information between multiple machine positions according to multiple three-dimensional postures comprises:
a capture perspective for each of the plurality of three-dimensional poses is determined.
And comparing the shooting visual angles of the three-dimensional postures, and determining the rotation angle change and the displacement change between the shooting visual angles of the three-dimensional postures.
And determining relative position information among the plurality of machine positions according to the rotation angle change and the displacement change.
In an alternative embodiment, determining a capture perspective for each of a plurality of three-dimensional poses includes:
and matching the image frames of the plurality of shot videos according to the plurality of three-dimensional postures.
And carrying out image frame alignment processing on the plurality of shot videos according to the matching result.
And determining the shooting visual angle of each three-dimensional posture in the plurality of three-dimensional postures according to the plurality of shooting videos subjected to the alignment processing.
In an optional embodiment, the image frame alignment processing is performed on a plurality of captured videos according to the matching result, and includes:
and determining the number of image frame difference frames among the plurality of shot videos according to the matching result.
And according to the image frame difference frame number, carrying out image frame alignment processing on a plurality of shot videos.
In an optional embodiment, before determining the two-dimensional coordinates of the key points of the photographic subject corresponding to each of the plurality of photographic videos, the method further includes:
and carrying out validity detection on the plurality of shot videos.
Each of the plurality of captured videos is determined to be eligible.
In an optional embodiment, the motion capture method further comprises:
and correcting the three-dimensional posture of the shot object according to a preset constraint condition.
Determining the action state of the shooting object according to the three-dimensional stereo posture of the shooting object, comprising the following steps:
and determining the action state of the shooting object according to the corrected three-dimensional posture of the shooting object.
In an optional embodiment, determining the motion state of the photographic subject according to the corrected three-dimensional posture of the photographic subject includes:
and determining the position information of the key point of the shooting object according to the corrected three-dimensional posture of the shooting object.
And converting the position information of the key points of the shot object into skeleton motion data.
And correcting the skeletal motion data by using a motion correction algorithm.
And determining the action state of the shooting object according to the corrected skeleton action data.
In an optional embodiment, determining two-dimensional coordinates of a key point of a photographic subject corresponding to each of the plurality of photographic videos includes:
and inputting each of the plurality of shot videos into the two-dimensional attitude estimation model, and determining the two-dimensional coordinates of the key points of the shot object corresponding to each shot video according to the output result of the two-dimensional attitude estimation module.
In an optional embodiment, the motion capture method further comprises:
and training the two-dimensional attitude estimation model according to the training sample set.
The training process of the two-dimensional attitude estimation model comprises the following steps:
and acquiring a plurality of groups of training samples in the training sample set, wherein each group of training samples in the plurality of groups of training samples is a multi-frame sample shooting image carrying marking information, and the marking information is correct two-dimensional coordinates of key points in each frame of sample shooting image in the multi-frame sample shooting image.
And inputting each group of training samples into the two-dimensional attitude estimation model to obtain an output result of each group of training samples.
And adjusting the model parameters of the two-dimensional attitude estimation model according to the output result of each group of training samples and the labeling information of each group of training samples.
And when the preset training condition is reached, finishing the training process of the two-dimensional attitude estimation model.
In an alternative embodiment, the preset training conditions include: the training times reach the preset times or the output result of each group of training samples reaches the preset precision.
A second aspect of embodiments of the present application provides a motion capture apparatus, including:
the acquisition unit is used for carrying out multi-camera shooting on the shot object to obtain a plurality of shot videos, wherein the shooting visual angles of the shot videos are different.
And the determining unit is used for determining the two-dimensional coordinates of the key points of the shooting objects corresponding to each shooting video in the plurality of shooting videos.
And the processing unit is used for constructing a plurality of three-dimensional postures of the shot object according to the two-dimensional coordinates of the key points of the shot object corresponding to each shot video, wherein the plurality of shot videos correspond to the plurality of three-dimensional postures one to one.
And the determining unit is also used for determining relative position information among the multiple machine positions according to the multiple three-dimensional postures.
And the processing unit is also used for constructing the three-dimensional posture of the shot object according to the relative position information among the multiple camera positions and the two-dimensional coordinates of the key points of the shot object corresponding to each shot video.
And the recognition unit is used for determining the action state of the shooting object according to the three-dimensional stereo posture of the shooting object.
In an alternative embodiment, the determining unit is specifically configured to determine a capturing perspective of each of the plurality of three-dimensional poses. And comparing the shooting visual angles of the three-dimensional postures, and determining the rotation angle change and the displacement change among the shooting visual angles of the three-dimensional postures. And determining relative position information among the plurality of machine positions according to the rotation angle change and the displacement change.
In an optional embodiment, the processing unit is further configured to match a plurality of image frames of the captured video according to the plurality of three-dimensional poses. And carrying out image frame alignment processing on the plurality of shot videos according to the matching result.
And the determining unit is specifically configured to determine a shooting angle of view of each of the plurality of three-dimensional poses according to the plurality of shooting videos subjected to the alignment processing.
In an alternative embodiment, the processing unit is specifically configured to determine, according to the matching result, a frame difference number between image frames of the plurality of captured videos. And according to the image frame difference frame number, carrying out image frame alignment processing on a plurality of shot videos.
In an alternative embodiment, the motion capture device further comprises a detection unit.
And the detection unit is used for carrying out legality detection on the plurality of shot videos. Each of the plurality of captured videos is determined to be eligible.
In an optional embodiment, the processing unit is further configured to correct a three-dimensional stereoscopic posture of the photographic subject according to a preset constraint condition.
And the recognition unit is specifically used for determining the action state of the shooting object according to the corrected three-dimensional posture of the shooting object.
In an optional embodiment, the recognition unit is specifically configured to determine, according to the corrected three-dimensional posture of the photographic subject, position information of a key point of the photographic subject. And converting the position information of the key points of the shot object into skeleton motion data. And correcting the skeletal motion data by using a motion correction algorithm. And determining the action state of the shooting object according to the corrected skeleton action data.
In an optional embodiment, the determining unit is specifically configured to input each of the multiple captured videos into the two-dimensional pose estimation model, and determine two-dimensional coordinates of a key point of the captured object corresponding to each captured video according to an output result of the two-dimensional pose estimation module.
In an alternative embodiment, the motion capture device further comprises a training unit.
And the training unit is used for training the two-dimensional attitude estimation model according to the training sample set.
The training process of the two-dimensional attitude estimation model comprises the following steps: and acquiring a plurality of groups of training samples in the training sample set, wherein each group of training samples in the plurality of groups of training samples is a multi-frame sample shooting image carrying marking information, and the marking information is correct two-dimensional coordinates of key points in each frame of sample shooting image in the multi-frame sample shooting image. And inputting each group of training samples into the two-dimensional attitude estimation model to obtain an output result of each group of training samples. And adjusting the model parameters of the two-dimensional attitude estimation model according to the output result of each group of training samples and the labeling information of each group of training samples. And when the preset training condition is reached, finishing the training process of the two-dimensional attitude estimation model.
In an alternative embodiment, the preset training conditions include: the training times reach the preset times or the output result of each group of training samples reaches the preset precision.
A third aspect of the embodiments of the present application provides an execution apparatus, including: a memory and a processor, the memory and the processor being coupled.
Wherein the memory is configured to store one or more computer instructions.
The processor is configured to execute one or more computer instructions to implement the motion capture method of the first aspect.
A fourth aspect of the embodiments of the present application provides a training apparatus, including: a memory and a processor, the memory and the processor coupled.
Wherein the memory is configured to store one or more computer instructions.
The processor is configured to execute one or more computer instructions to implement the motion capture method of the first aspect.
A fifth aspect of the embodiments of the present application provides a computer-readable storage medium, on which one or more computer instructions are stored, where the instructions are executed by a processor to implement the method according to any one of the above-mentioned technical solutions.
According to the technical scheme, the shooting objects are shot from different visual angles by the aid of the plurality of machine positions, and a plurality of shooting videos are obtained. And then analyzing and processing each shot video, respectively obtaining two-dimensional information in each shot video, and constructing a plurality of three-dimensional postures based on the two-dimensional information. The relative positions of the plurality of stands are then determined from the constructed plurality of three-dimensional poses. And constructing a final three-dimensional posture according to the relative position of the airplane position, the shooting parameters and the acquired two-dimensional information of the plurality of visual angles, and capturing the motion according to the constructed three-dimensional posture. Because the three-dimensional posture is constructed based on the two-dimensional information of a plurality of visual angles and contains more accurate and more complete state information, the accuracy and precision of the finally obtained three-dimensional posture are greatly improved, and the action and the posture of the shot object can be more efficiently and accurately restored. Therefore, the requirement on high-precision motion capture in most scenes can be met, and the motion recognition performance and the motion restoration performance of the equipment are greatly improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments of the present application will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive exercise.
Fig. 1 is a schematic flow chart of a motion capture method provided in an embodiment of the present application;
FIG. 2 is a flow chart illustrating another method for motion capture according to an embodiment of the present disclosure;
FIG. 3 is a schematic structural diagram of a motion capture device according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of an execution device according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a training apparatus according to an embodiment of the present application.
Detailed Description
The application provides a motion capture method and related equipment, which are used for determining relative position information among multiple machine positions by utilizing different shooting visual angles of a plurality of monocular videos and then constructing a three-dimensional posture of a shooting object according to the relative position among the multiple machine positions and two-dimensional information provided by the monocular videos. Therefore, the accuracy and precision of the three-dimensional posture can be improved, and the human body action can be captured efficiently and accurately finally.
In order to enable those skilled in the art to better understand the technical solutions of the present application, the present application is clearly and completely described below with reference to the drawings in the embodiments of the present application. This application is capable of embodiments in many different forms than those described above and it is therefore intended that all such other embodiments, which would be within the scope of the present application and which are obtained by a person of ordinary skill in the art based on the embodiments provided herein without the exercise of inventive faculty, be covered by the present application.
It should be noted that the terms "first," "second," "third," and the like in the claims, the description, and the drawings of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. The data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In recent years, with the rapid development of machine learning, particularly the related technology in the deep learning field, the human body posture estimation technology based on images has also been in breakthrough development. The human body posture estimation mainly positions human body joints in images or videos so as to capture and recognize human body actions in the following process, and can be widely applied to animation production processes of a plurality of films and games.
Among them, the motion capture technology is a key technology for capturing real human motion. The existing video motion capture technical schemes mainly comprise two types: one is a monocular camera based solution. Each frame image of the monocular video is a picture shot by one device under one shooting visual angle. Specifically, the object to be shot may be shot based on a common RGB camera or an RGBD camera to obtain a monocular video, then three-dimensional reconstruction is performed by using information in a two-dimensional picture in the monocular video, and the motion posture of the object to be shot in a three-dimensional space is determined by using a three-dimensional reconstruction algorithm. In this scheme, since the scheme based on the general RGB camera lacks depth information, motion capture accuracy is limited, and it is difficult to apply the scheme to a scene requiring high accuracy for motion capture. Although the RGBD-based video camera can acquire depth information, only a matched video camera can be used to ensure that camera parameters are known, so that three-dimensional reconstruction can be performed, and flexibility of motion capture is very poor. And the accuracy of the depth estimation of the RGBD camera is very limited, so the accuracy of the motion capture of this scheme is still not high.
Yet another solution is a multi-view RGB camera based motion capture solution. The scheme usually needs customized shooting device support, shooting parameters of the shooting device need to be calibrated before the video is shot, in addition, the method usually needs more camera support, and multiple videos obtained by shooting need to realize strict frame alignment. Because the proposal has requirements on shooting equipment, needs expensive equipment and professional field support, has very high cost, is difficult to be widely applied, and the current proposal based on the multi-view RGB camera in the industry does not realize full-automatic processing, thereby having extremely poor convenience in actual use.
Since the images in the monocular video are two-dimensional, when the two-dimensional information of the monocular video is used for motion capture, and the result is used as motion resources in a movie or a game, the precision of the captured motion is very limited due to the lack of depth information. The existing scheme for capturing the motion by using the binocular video also has the problems of high cost, poor shooting flexibility, strong constraint on a shot object, poor practical use convenience and the like. Therefore, how to simply and efficiently capture the human motion becomes an urgent problem to be solved.
In view of the above problem, the embodiments of the present application provide a new motion capture method, which determines relative position information between multiple machine positions by using different shooting perspectives of multiple different monocular videos, and then constructs a three-dimensional stereo pose of a shooting object according to the relative position between the multiple machine positions and two-dimensional information provided by the multiple monocular videos. Therefore, the defect of monocular video motion capture can be overcome, and the motion can be captured with high precision at lower shooting cost. In the embodiment of the application, videos can be shot from multiple visual angles based on multiple RGB cameras for motion capture, specific and expensive equipment support is not needed, videos do not need to be recorded strictly and synchronously, and high-precision motion gestures can be obtained at extremely low cost.
The method, apparatus, terminal, and computer-readable storage medium described in this application are described in further detail below with reference to specific embodiments and the accompanying drawings.
Fig. 1 is a flowchart illustrating a motion capture method according to an embodiment of the present disclosure. As shown in fig. 1, the motion capture method includes the following steps:
101. and carrying out multi-camera shooting on the shot object to obtain a plurality of shot videos.
When capturing the motion of the photographic subject, the photographic subject is restored and the three-dimensional posture of the photographic subject is determined. A picture photographed by a photographing apparatus is generally a two-dimensional picture, and lacks depth information, so that it is difficult to restore a three-dimensional stereoscopic pose of a photographic subject from a monocular video photographed by a single photographing apparatus. At this time, multiple cameras are required to shoot the shot object, multiple shot videos at different viewing angles are obtained, more state information of the shot object at different angles is obtained, and therefore the three-dimensional posture of the shot object is better constructed.
102. Two-dimensional coordinates of a key point of a photographic subject corresponding to each of the plurality of photographic videos are determined.
The monocular video provides two-dimensional information of the shot object at one visual angle, the monocular videos can provide two-dimensional information of the shot object at multiple visual angles, and the two-dimensional information at the multiple visual angles is combined to determine the three-dimensional state of the shot object more truly and accurately, so that more accurate action postures are captured. Illustratively, the two-dimensional information may be two-dimensional coordinates of key points of the photographic subject, and the two-dimensional posture of the photographic subject can be visually described by using the two-dimensional coordinates of the plurality of joint points, so as to be used for subsequently constructing a three-dimensional posture. Therefore, it is necessary to perform separate analysis and processing on a plurality of monocular videos, and determine two-dimensional coordinates of a key point of a photographic subject in each monocular video, so as to obtain a two-dimensional pose of the photographic subject at each angle of view.
103. And constructing a plurality of three-dimensional postures of the shot object according to the two-dimensional coordinates of the key points of the shot object corresponding to each shot video.
After the two-dimensional coordinates of the key points of the shot object in a monocular video are determined, a three-dimensional posture of the shot object can be constructed based on the two-dimensional coordinates. Each monocular video corresponds to a three-dimensional posture, and it can be understood that the monocular video lacks depth information, so that the three-dimensional posture established by means of the two-dimensional information of the monocular video is not accurate and is only estimated for the three-dimensional posture of the shot object. The three-dimensional postures corresponding to different monocular videos are different, and the difference is caused by different visual angles, so that the relative positions of different machine positions can be determined by comparing a plurality of three-dimensional postures, the emission conditions of the machine positions are obtained, and information is provided for the subsequent construction of the three-dimensional postures.
104. And determining relative position information among the plurality of machine positions according to the plurality of three-dimensional postures.
Illustratively, the shooting perspective of a machine position can be determined according to the first three-dimensional posture obtained by construction. And determining the shooting visual angle of the other machine position according to the second three-dimensional posture obtained by construction. After the plurality of shooting visual angles are acquired, the plurality of shooting visual angles can be compared, the rotation angle change and the displacement change between one shooting visual angle and the other shooting visual angle are determined, and finally the relative position information among the plurality of machine positions is determined based on the rotation angle change and the displacement change.
105. And constructing a three-dimensional posture of the shot object according to the relative position information among the multiple machine positions and the two-dimensional coordinates of the key points of the shot object corresponding to each shot video.
After the relative position information among the plurality of machine positions is acquired, the two-dimensional information of a plurality of angles provided by the plurality of monocular videos can be combined and reconstructed based on the shooting parameters of the plurality of machine positions, and the three-dimensional posture of the shot object is obtained. As can be appreciated, the three-dimensional stereoscopic pose includes a plurality of two-dimensional information determining the photographing angle, and the pose and motion of the photographic subject can be restored based on the plurality of angles, thus having very high accuracy. The three-dimensional posture can restore the real action and posture of the shot object with high precision, and a high-precision three-dimensional posture model is provided for subsequently capturing the action of the shot object.
Illustratively, the reconstructed three-dimensional pose has several characteristics: firstly, the pose accuracy is achieved, and the projection coordinates of the reconstructed three-dimensional pose at the corresponding view angles of the plurality of cameras should be matched with the result of the two-dimensional pose as much as possible. Then, inter-frame stability is carried out, namely, the reconstructed three-dimensional stereo posture has stable motion performance and no jump situation occurs. Finally, bone length stabilization, i.e., the reconstructed three-dimensional pose sequence should have the same bone length throughout.
106. And determining the action state of the shooting object according to the three-dimensional posture of the shooting object.
When the three-dimensional attitude of the shooting object is obtained, action calculation and action correction are required to be carried out on the three-dimensional attitude so as to capture the real action of the shooting object. Illustratively, the skeletal motion file may be output from a three-dimensional stereo pose. The reasonability of the finally generated skeleton action is determined by the action resolving result, and the position information of the key points in the three-dimensional posture needs to be converted into reasonable skeleton action data. Illustratively, the motion solution can be performed using an inverse kinematics algorithm plus a bone prior constraint.
Wherein the final skeletal movements obtained are required to meet postural rationality and foot stability. The posture rationality means that the skeleton movement needs to be normal and rational, and reverse joint or unreasonable distortion should not occur. Foot stability requires reasonable control over the sliding step of the motion. Meanwhile, the action calculation result can be corrected by utilizing an action correction algorithm, and some jitters and unreasonable postures of skeleton action data are eliminated, so that more reasonable and effective actions of the shot object can be captured.
The above-described embodiment captures a photographic subject from different viewing angles using a plurality of stands, obtaining a plurality of captured videos. And then analyzing and processing each shot video, respectively obtaining two-dimensional information in each shot video, and constructing a plurality of three-dimensional postures based on the two-dimensional information. Relative positions of the plurality of stands are then determined from the constructed plurality of three-dimensional poses. And constructing a final three-dimensional posture according to the relative position of the airplane position, the shooting parameters and the acquired two-dimensional information of the plurality of visual angles, and capturing the motion according to the constructed three-dimensional posture. Because the three-dimensional posture is constructed based on the two-dimensional information of a plurality of visual angles and contains more accurate and more complete state information, the accuracy and precision of the finally obtained three-dimensional posture are greatly improved, and the action and the posture of the shot object can be more efficiently and accurately restored. Therefore, the requirement on high-precision motion capture in most scenes can be met, and the motion recognition performance and the motion restoration performance of the equipment are greatly improved.
Based on the description of the above embodiments, fig. 2 is a flowchart illustrating another motion capture method provided in an embodiment of the present application. As shown in fig. 2, the motion capture method includes the following steps:
201. and carrying out multi-camera shooting on the shot object to obtain a plurality of shot videos.
A picture photographed by a photographing apparatus is generally a two-dimensional picture, and lacks depth information, so that it is difficult to restore a three-dimensional stereoscopic pose of a photographic subject from a monocular video photographed by a single photographing apparatus. Therefore, multiple cameras are required to shoot the shot object to obtain multiple shot videos at different angles, so that more state information of the shot object at different angles can be obtained, and the three-dimensional posture of the shot object can be better constructed.
For example, after a shooting site is selected, a plurality of shooting devices need to be placed on the site, the shooting angle of the shooting devices can be between 60 degrees and 100 degrees, and the whole body of a shooting object can be completely shot. Then, the subject performs a motion within a range where the subject can be photographed, and the plurality of photographing apparatuses start recording the subject. And finally, the plurality of shooting devices send the recorded plurality of monocular videos to the server side, and the server side automatically processes the videos to perform the subsequent motion capture process.
In the above example, the user does not need to purchase a specific shooting device to shoot, and can shoot by using any shooting device as required. And the shooting equipment does not need to be subjected to internal reference calibration for many times, and only needs to be subjected to one-time internal reference calibration. After the plurality of monocular videos are uploaded to the server, the server can achieve automatic processing, and the use convenience is higher.
202. And carrying out legality detection on the plurality of shot videos.
After obtaining a plurality of shot videos, the validity of the shot videos needs to be detected. Illustratively, it is necessary to detect whether the shot video is damaged or not, whether the picture is complete or not, and the like. And after the plurality of monocular videos are screened and the legality of the monocular videos is confirmed, the plurality of monocular videos are processed, and a subsequent motion capture process is started.
203. Two-dimensional coordinates of a key point of a photographic subject corresponding to each of the plurality of photographic videos are determined.
The monocular video provides two-dimensional information of the shot object at one visual angle, the monocular videos can provide two-dimensional information of the shot object at multiple visual angles, and the two-dimensional information at the multiple visual angles is combined to determine the three-dimensional state of the shot object more truly and accurately, so that more accurate action postures are captured.
Illustratively, a two-dimensional pose estimation model may be utilized to obtain two-dimensional information in a monocular video. The input of the two-dimensional attitude estimation model is a plurality of continuous frame pictures of a monocular video containing a shooting object, and the output is the two-dimensional coordinates of key points of the shooting object in each frame picture. Specifically, the first captured video may be input to the two-dimensional pose estimation model, and the first two-dimensional coordinates of the key points of the object to be captured in the first captured video may be obtained according to the output result of the two-dimensional pose estimation module. And then inputting the second shot video into the two-dimensional attitude estimation model, and determining a second two-dimensional coordinate of the key point of the shot object in the second shot video according to the output result of the two-dimensional attitude estimation module. And the two-dimensional attitude (two-dimensional coordinates) output by the two-dimensional attitude estimation model is used as a basis for constructing three-dimensional attitude estimation, and two-dimensional information under different visual angles is provided for constructing a three-dimensional attitude.
204. And constructing a plurality of three-dimensional postures of the shot object according to the two-dimensional coordinates of the key points of the shot object corresponding to each shot video.
Illustratively, the three-dimensional pose may be constructed using a three-dimensional pose estimation model. The three-dimensional attitude estimation model can be realized by a one-dimensional convolution neural network, the input of the three-dimensional attitude estimation model is a two-dimensional coordinate sequence output by the two-dimensional attitude estimation model, and the output of the three-dimensional attitude estimation model is a three-dimensional coordinate sequence of key points of the shooting object. Three-dimensional gestures can be constructed using the three-dimensional coordinate sequence. Specifically, each monocular video needs to be processed independently, that is, a first two-dimensional coordinate corresponding to a first shot video needs to be input into the three-dimensional attitude estimation model, so as to obtain a first three-dimensional attitude corresponding to the first shot video. And then inputting a second two-dimensional coordinate corresponding to the second shot video into the three-dimensional attitude estimation model to obtain a second three-dimensional attitude corresponding to the second shot video, and so on.
205. And matching the image frames of the plurality of shot videos according to the plurality of three-dimensional postures.
After the three-dimensional gestures corresponding to the monocular videos are obtained, different shooting angles of the monocular videos need to be analyzed according to the three-dimensional gestures, and then the relative positions of the multiple machine positions are calculated based on the visual angle differences of the multiple shooting videos. Before the relative position of multiple machine positions is calculated, a plurality of monocular videos need to be subjected to frame alignment processing, so that the shooting angle can be determined based on pictures shot at the same moment. Illustratively, after obtaining three-dimensional postures at two visual angles, the image frames of the first shot video and the second shot video need to be subjected to action matching according to the three-dimensional postures. And then, image frame alignment processing is carried out according to the matching result.
206. And carrying out image frame alignment processing on the plurality of shot videos according to the matching result.
It can be understood that a plurality of monocular videos are shot pictures of the same shot object from different perspectives, and the approximate posture and motion of the shot object can be obtained from the three-dimensional posture constructed by the monocular videos, so that the pictures can be matched according to the motion similarity, and then the successfully matched picture frames are aligned. For example, the server may analyze a plurality of three-dimensional poses, determine the number of image frame differences between a plurality of captured videos, and then perform image frame alignment processing on the plurality of captured videos according to the number of image frame differences.
207. And determining the shooting visual angle of each three-dimensional posture in the plurality of three-dimensional postures according to the plurality of shooting videos subjected to the alignment processing.
Illustratively, the shooting perspective of a machine position can be determined according to the first three-dimensional posture obtained by construction. And determining a second shooting visual angle of the other machine position according to the constructed second three-dimensional posture. After the plurality of shooting visual angles are acquired, the plurality of shooting visual angles can be compared, the rotation angle change and the displacement change between one shooting visual angle and the other shooting visual angle are determined, and finally the relative position information among the plurality of machine positions is determined based on the rotation angle change and the displacement change.
208. And determining relative position information among the multiple machine positions according to the multiple shooting visual angles of the multiple three-dimensional postures.
The method comprises the steps of determining relative position information among multiple machine positions, namely calibrating shooting equipment, obtaining a relative rotation amount R through alignment transformation based on monocular three-dimensional postures under two visual angles, obtaining a relative displacement T through least square optimization by taking one of the visual angles as a reference, and optimizing a result, wherein an optimization target is that a two-dimensional key point obtained by projecting a three-dimensional key point obtained based on the RT to a pixel plane can be consistent with a two-dimensional coordinate predicted by a model in the previous step. After the relative position information among the multiple positions is determined, the final three-dimensional attitude is obtained through three-dimensional reconstruction according to the calibration result.
209. And constructing a three-dimensional posture of the shot object according to the relative position information among the multiple machine positions and the two-dimensional coordinates of the key points of the shot object corresponding to each shot video.
Illustratively, the three-dimensional reconstruction algorithm can be used for integrating two-dimensional information under multiple viewing angles to obtain a three-dimensional posture of the shot object. As can be appreciated, the three-dimensional stereoscopic pose includes a plurality of two-dimensional information determining the photographing angle, and the pose and motion of the photographic subject can be restored based on the plurality of angles, thus having very high accuracy. The three-dimensional posture can restore the real action and posture of the shot object with high precision, and a high-precision three-dimensional posture model is provided for subsequently capturing the action of the shot object.
Among them, it is necessary to reconstruct three-dimensional key points based on RT and two-dimensional key point coordinates of multiple viewing angles. The reconstruction can be optimized by adopting least square, and the optimization target is that the coordinates of a plurality of groups of two-dimensional key points obtained by projecting the three-dimensional key points obtained by reconstruction to a plurality of pixel planes are consistent with the previous two-dimensional coordinates as far as possible. In order to obtain a stable three-dimensional posture, various constraint conditions are also required to be considered during reconstruction so as to correct three-dimensional coordinates.
210. And correcting the three-dimensional posture of the shot object according to a preset constraint condition.
Illustratively, the preset constraint may be a constraint on bone length, requiring bone length stabilization. The method can also be the case of interframe smooth constraint, namely that the reconstructed three-dimensional stereo posture has stable motion expression and does not have jumping. And then binocular attitude coincidence constraint is carried out, and the constructed three-dimensional attitude is required to be accurate. It can be understood that the preset constraint condition can be adjusted according to requirements, and the purpose is to restore the posture of the shooting object more truly in the three-dimensional posture, which is not limited specifically. Meanwhile, an iterative optimization method can be adopted, and stable RT and three-dimensional coordinates of the three-dimensional key points are obtained through multiple calculations and used for optimizing a construction result.
211. And determining the action state of the shooting object according to the corrected three-dimensional posture of the shooting object.
After the three-dimensional posture is corrected with high accuracy, it is necessary to capture the motion of the subject based on the three-dimensional posture of the subject. Illustratively, the position information of key points in the three-dimensional posture is converted into reasonable skeleton motion data by using a motion calculation algorithm. Motion solution is performed using inverse kinematics algorithms plus a bone prior constraint (e.g., elbow and knee degrees of freedom 1, range of possible flexion 0-180, etc.). And then correcting the skeletal motion data by using a motion correction algorithm. The skeletal motion data obtained by the previous steps is easy to have some jitter and unreasonable postures, so that motion correction algorithms can be utilized to solve the problems. For example, foot strike control may be considered, i.e., a scheme that determines whether a foot strikes for each motion, and when the foot strikes, the foot should be fixed at a foothold and kept in place. Attitude rationality control, i.e., correcting an oblique attitude and an unreasonable attitude such as penetration of the foot into the ground, can also be considered. Thus, the action state of the shooting object can be determined according to the corrected skeleton action data, and the action accuracy is improved. Illustratively, the skeletal motion data may be redirected to the skeleton of the target model for reasonably stable redirected motion.
The above-described embodiment photographs a photographic subject from different viewing angles using a plurality of stands, obtaining a plurality of photographic videos. And then analyzing and processing each shot video, respectively obtaining two-dimensional information in each shot video, and constructing a plurality of three-dimensional postures based on the two-dimensional information. Relative positions of the plurality of stands are then determined from the constructed plurality of three-dimensional poses. And constructing a final three-dimensional posture according to the relative position of the airplane position, the shooting parameters and the acquired two-dimensional information of the plurality of visual angles, and capturing the motion according to the constructed three-dimensional posture. Because the three-dimensional posture is constructed based on the two-dimensional information of a plurality of visual angles and contains more accurate and more complete state information, the accuracy and precision of the finally obtained three-dimensional posture are greatly improved, and the action and the posture of the shot object can be more efficiently and accurately restored. Therefore, the requirement on high-precision motion capture in most scenes can be met, and the motion recognition performance and the motion restoration performance of the equipment are greatly improved.
The following detailed description is made for the two-dimensional attitude estimation model mentioned in the above embodiment:
the two-dimensional attitude estimation model can be technically realized by adopting a three-dimensional convolution neural network. Under the condition of sufficient training data, the convolutional neural network can fully fit the nonlinear relation between the picture and the two-dimensional posture, so that the two-dimensional posture estimation with higher robustness is realized. Because the two-dimensional attitude changes are very rich and are influenced by the body, the action attitude and other factors, and the like, the algorithms such as manual design and the like are difficult to be applied, so the convolutional neural network is a reasonable choice.
In order to obtain a sufficient amount of training samples, a large amount of sample data can be constructed by itself using a three-dimensional model. When the neural network is trained, partial images outside the object are shot according to the object surrounding frame and are cut out, the partial images are used as the input of the convolutional neural network after being zoomed, and the result of two-dimensional attitude estimation is output in the form of a heat map. Considering that the information of the previous and subsequent frames is very important for predicting the occluded key point of the current frame, the two-dimensional attitude estimation model in the embodiment of the application is a three-dimensional convolutional neural network model with multi-frame input, and compared with a two-dimensional convolutional neural network model, the key point predicted by the model has higher inter-frame stability.
Specifically, the two-dimensional pose estimation model M takes the image frame sequence Xs as an input, and outputs a heat map Hs of the human body key points, that is, hs = M (Xs). Wherein, N is the number of the image frame sequences of the one-time input model M. For a qualified heat map, the numerical value of the spatial position corresponding to the key point of the human body is close to 1, the other positions are close to 0, the coordinates of a plurality of positions with larger numerical values on the heat map are weighted through a Gaussian weighting function, and the two-dimensional coordinates of the key point of the shooting object can be calculated. The following briefly introduces the training process of the two-dimensional pose estimation model.
Firstly, a training sample set is determined, wherein the training sample set is provided with a plurality of groups of training samples, each group of training samples in the plurality of groups of training samples is a continuous multi-frame sample shooting image carrying marking information, and the marking information is the correct two-dimensional coordinates of key points in each frame of sample shooting image in the multi-frame sample shooting image. And then inputting each group of training samples into the two-dimensional attitude estimation model to obtain an output result of each group of training samples. And comparing the output result of each group of training samples with the labeling information of each group of training samples, reversely inputting the comparison result, and adjusting the model parameters of the two-dimensional attitude estimation model according to the comparison result. And finally, the output result of the two-dimensional attitude estimation model is closer to the labeled information.
And when the preset training condition is reached, ending the training process of the two-dimensional attitude estimation model. For example, the preset training condition may be that the number of times of training reaches a preset number of times or that the output result of each set of training samples reaches a preset precision. And finally, processing the monocular video by using the trained two-dimensional attitude estimation model to obtain two-dimensional information corresponding to the monocular video.
Fig. 3 is a schematic structural diagram of a motion capture device according to an embodiment of the present application, and as shown in fig. 3, the motion capture device includes:
the acquiring unit 301 is configured to perform multi-camera shooting on a shot object to obtain multiple shot videos, where the shooting angles of the multiple shot videos are different.
A determining unit 302 configured to determine two-dimensional coordinates of a key point of a photographic subject corresponding to each of the plurality of photographic videos.
The processing unit 303 is configured to construct a plurality of three-dimensional postures of the photographic object according to the two-dimensional coordinates of the key points of the photographic object corresponding to each photographic video, where the plurality of photographic videos correspond to the plurality of three-dimensional postures one to one.
And the determining unit 302 is further configured to determine relative position information between the multiple airplane positions according to the multiple three-dimensional postures.
The processing unit 303 is further configured to construct a three-dimensional posture of the photographic object according to the relative position information between the multiple camera positions and the two-dimensional coordinates of the key points of the photographic object corresponding to each photographic video.
And the recognition unit 304 is used for determining the action state of the shooting object according to the three-dimensional stereo posture of the shooting object.
In an alternative embodiment, the determining unit 302 is specifically configured to determine a shooting angle of view of each of the plurality of three-dimensional poses. And comparing the shooting visual angles of the three-dimensional postures, and determining the rotation angle change and the displacement change among the shooting visual angles of the three-dimensional postures. And determining relative position information among the plurality of machine positions according to the rotation angle change and the displacement change.
In an alternative embodiment, the processing unit 303 is further configured to match a plurality of image frames of the captured video according to a plurality of three-dimensional poses. And carrying out image frame alignment processing on the plurality of shot videos according to the matching result.
The determining unit 302 is specifically configured to determine, according to the multiple captured videos after the alignment processing, a capturing view angle of each of the multiple three-dimensional poses.
In an alternative embodiment, the processing unit 303 is specifically configured to determine, according to the matching result, a frame difference number between image frames of the multiple captured videos. And according to the image frame difference frame number, carrying out image frame alignment processing on a plurality of shot videos.
In an alternative embodiment, the motion capture device further comprises a detection unit 305.
A detection unit 305 for performing validity detection on a plurality of captured videos. Each of the plurality of captured videos is determined to be eligible.
In an optional embodiment, the processing unit 303 is further configured to correct a three-dimensional stereo pose of the photographic subject according to a preset constraint condition.
The recognition unit 304 is specifically configured to determine an operation state of the photographic subject according to the corrected three-dimensional posture of the photographic subject.
In an alternative embodiment, the identifying unit 304 is specifically configured to determine the position information of the key point of the photographic subject according to the corrected three-dimensional posture of the photographic subject. And converting the position information of the key points of the shot object into skeleton motion data. And correcting the skeletal motion data by using a motion correction algorithm. And determining the action state of the shooting object according to the corrected skeleton action data.
In an alternative embodiment, the determining unit 302 is specifically configured to input each of the plurality of captured videos into the two-dimensional pose estimation model, and determine the two-dimensional coordinates of the key point of the captured object corresponding to each captured video according to the output result of the two-dimensional pose estimation module.
In an alternative embodiment, the motion capture device further comprises a training unit 306.
And a training unit 306, configured to train the two-dimensional attitude estimation model according to the training sample set.
The training process of the two-dimensional attitude estimation model comprises the following steps: and acquiring a plurality of groups of training samples in the training sample set, wherein each group of training samples in the plurality of groups of training samples is a multi-frame sample shooting image carrying marking information, and the marking information is correct two-dimensional coordinates of key points in each frame of sample shooting image in the multi-frame sample shooting image. And inputting each group of training samples into the two-dimensional attitude estimation model to obtain an output result of each group of training samples. And adjusting the model parameters of the two-dimensional attitude estimation model according to the output result of each group of training samples and the labeling information of each group of training samples. And when the preset training condition is reached, finishing the training process of the two-dimensional attitude estimation model.
In an alternative embodiment, the preset training conditions include: the training times reach the preset times or the output result of each group of training samples reaches the preset precision.
It should be noted that, the contents of information interaction, execution process, and the like between the modules/units in the motion capture device are based on the same concept as the method embodiments corresponding to fig. 1 to fig. 2 in the present application, and specific contents may refer to the description in the foregoing method embodiments in the present application, and are not repeated herein.
Referring to fig. 4, fig. 4 is a schematic structural diagram of an execution device provided in the embodiment of the present application, and the execution device 800 may be embodied as a virtual reality VR device, a mobile phone, a tablet, a notebook computer, an intelligent wearable device, a monitoring data processing device, or a radar data processing device, which is not limited herein. The execution device 800 may be disposed with the motion capture apparatus described in the embodiment corresponding to fig. 3, so as to implement the functions in the embodiments corresponding to fig. 1 to fig. 2. Specifically, the execution apparatus 800 includes: a receiver 801, a transmitter 802, a processor 803 and a memory 804 (wherein the number of processors 803 in the execution device 800 may be one or more, for example, one processor in fig. 4), wherein the processor 803 may include an application processor 8031 and a communication processor 8032. In some embodiments of the present application, the receiver 801, the transmitter 802, the processor 803, and the memory 804 may be connected by a bus or other means.
The memory 804 may include a read-only memory and a random access memory, and provides instructions and data to the processor 803. A portion of the memory 804 may also include non-volatile random access memory (NVRAM). The memory 804 stores the processor and operating instructions, executable modules or data structures, or a subset or an expanded set thereof, wherein the operating instructions may include various operating instructions for performing various operations.
The processor 803 controls the operation of the execution apparatus. In a particular application, the various components of the execution device are coupled together by a bus system that may include a power bus, a control bus, a status signal bus, etc., in addition to a data bus. For clarity of illustration, the various buses are referred to in the figures as a bus system.
The method disclosed in the embodiments of the present application can be applied to the processor 803 or implemented by the processor 803. The processor 803 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 803. The processor 803 may be a general-purpose processor, a Digital Signal Processor (DSP), a microprocessor or a microcontroller, and may further include an Application Specific Integrated Circuit (ASIC), a field-programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, or discrete hardware components. The processor 803 may implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present application. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 804, and the processor 803 reads the information in the memory 804 to complete the steps of the method in combination with the hardware thereof.
Receiver 801 may be used to receive input numeric or character information and generate signal inputs related to performing device related settings and function control. The transmitter 802 may be configured to output numeric or character information via a first interface; the transmitter 802 may also be configured to send instructions to the disk groups via the first interface to modify data in the disk groups; the transmitter 802 may also include a display device such as a display screen.
In the embodiment of the present application, the application processor 8031 in the processor 803 is configured to execute the motion capture method in the corresponding embodiment of fig. 1 to fig. 2. It should be noted that, the specific manner of executing each step by the application processor 8031 is based on the same concept as that of each method embodiment corresponding to fig. 1 to fig. 2 in the present application, and the technical effect brought by the method embodiment is the same as that of each method embodiment corresponding to fig. 1 to fig. 2 in the present application, and specific contents may refer to the description in the foregoing method embodiments in the present application, and are not described herein again.
Referring to fig. 5, fig. 5 is a schematic structural diagram of a training apparatus provided in an embodiment of the present application, and a motion capture device described in the embodiment corresponding to fig. 3 may be disposed on the training apparatus 900. In particular, training device 900 is implemented as one or more servers, and training device 900 may vary widely in configuration or performance and may include one or more Central Processing Units (CPUs) 922 (e.g., one or more processors) and memory 932, one or more storage media 930 (e.g., one or more mass storage devices) storing applications 942 or data 944. Memory 932 and storage media 930 can be, among other things, transient storage or persistent storage. The program stored on storage medium 930 may include one or more modules (not shown), each of which may include a sequence of instructions operating on the exercise device. Still further, central processor 922 may be configured to communicate with storage medium 930 to execute a sequence of instruction operations in storage medium 930 on exercise device 900.
Training device 900 may also include one or more power supplies 926, one or more wired or wireless network interfaces 950, one or more input-output interfaces 958, and/or one or more operating systems 941, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, etc.
In one embodiment of the present application, the central processor 922 is used to execute the steps executed by the training apparatus described in the embodiment of the present application. The specific manner in which the central processing unit 922 executes the foregoing steps is based on the same concept as that of the method embodiments corresponding to fig. 2 in the present application, and the technical effects brought by the method embodiments are the same as those of the method embodiments corresponding to fig. 2 in the present application, and specific contents may refer to descriptions in the foregoing method embodiments in the present application, and are not described herein again.
A seventh embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium includes computer instructions, and the computer instructions, when executed by a processor, are used to implement any one of the technical solutions of the motion capture method in the embodiments of the present application.
Although the present application has been disclosed in terms of preferred embodiments, it is not intended to limit the present application, and one skilled in the art can make variations and modifications without departing from the spirit and scope of the present application, and therefore the scope of the present application should be limited by the scope of the claims that follow.

Claims (14)

1. A method of motion capture, the method comprising:
carrying out multi-camera shooting on a shot object to obtain a plurality of shot videos; shooting visual angles of the plurality of shooting videos are different;
determining two-dimensional coordinates of key points of the shooting object corresponding to each shooting video in the plurality of shooting videos;
constructing a plurality of three-dimensional postures of the shot object according to the two-dimensional coordinates of the key points of the shot object corresponding to each shot video, wherein the plurality of shot videos correspond to the plurality of three-dimensional postures one by one;
determining relative position information among the multiple machine positions according to the multiple three-dimensional postures;
constructing a three-dimensional posture of the shot object according to the relative position information among the multiple machine positions and the two-dimensional coordinates of the key points of the shot object corresponding to each shot video;
and determining the action state of the shooting object according to the three-dimensional posture of the shooting object.
2. The motion capture method of claim 1, wherein determining relative position information between the plurality of bays from the plurality of three-dimensional poses comprises:
determining a shooting perspective of each of the plurality of three-dimensional poses;
comparing the shooting visual angles of each three-dimensional posture, and determining the rotation angle change and the displacement change between the shooting visual angles of each three-dimensional posture;
and determining the relative position information among the multiple machine positions according to the rotation angle change and the displacement change.
3. The motion capture method of claim 2, wherein the determining a camera perspective for each of the plurality of three-dimensional poses comprises:
matching image frames of the plurality of shot videos according to the plurality of three-dimensional postures;
performing image frame alignment processing on the plurality of shot videos according to the matching result;
and determining a shooting visual angle of each three-dimensional posture in the plurality of three-dimensional postures according to the plurality of shooting videos after the alignment processing.
4. The motion capture method according to claim 3, wherein the performing image frame alignment processing on the plurality of captured videos according to the matching result comprises:
determining the number of frame differences of image frames among the plurality of shot videos according to the matching result;
and according to the image frame difference frame number, carrying out image frame alignment processing on the plurality of shot videos.
5. The motion capture method according to any one of claims 1 to 4, wherein before determining the two-dimensional coordinates of the key points of the photographic subject corresponding to each of the plurality of photographic videos, the method further comprises:
carrying out validity detection on the plurality of shot videos;
determining that each of the plurality of captured videos is eligible.
6. The motion capture method of any one of claims 1 to 5, further comprising:
correcting the three-dimensional posture of the shooting object according to a preset constraint condition;
the determining the action state of the shooting object according to the three-dimensional stereo posture of the shooting object comprises the following steps:
and determining the action state of the shooting object according to the corrected three-dimensional posture of the shooting object.
7. The motion capture method according to claim 6, wherein the determining the motion state of the photographic subject according to the corrected three-dimensional stereo posture of the photographic subject comprises:
determining the position information of key points of the shot object according to the corrected three-dimensional posture of the shot object;
converting the position information of the key points of the shot object into skeleton action data;
correcting the skeletal motion data by using a motion correction algorithm;
and determining the action state of the shooting object according to the corrected skeleton action data.
8. The motion capture method of any one of claims 1 to 7, wherein the determining two-dimensional coordinates of the key points of the photographic subject corresponding to each of the plurality of photographic videos comprises:
and inputting each of the plurality of shot videos into a two-dimensional attitude estimation model, and determining the two-dimensional coordinates of the key points of the shot object corresponding to each shot video according to the output result of the two-dimensional attitude estimation module.
9. The motion capture method of claim 8, further comprising:
training the two-dimensional attitude estimation model according to a training sample set;
the training process of the two-dimensional attitude estimation model comprises the following steps:
acquiring a plurality of groups of training samples in the training sample set; each group of training samples in the multiple groups of training samples is a multi-frame sample shooting image carrying labeling information, and the labeling information is correct two-dimensional coordinates of key points of each frame of sample shooting image in the multi-frame sample shooting image;
inputting each group of training samples into the two-dimensional attitude estimation model to obtain an output result of each group of training samples;
adjusting model parameters of the two-dimensional attitude estimation model according to the output result of each group of training samples and the labeling information of each group of training samples;
and when the preset training condition is reached, finishing the training process of the two-dimensional attitude estimation model.
10. The motion capture method of claim 9, wherein the preset training condition comprises: the training times reach the preset times or the output result of each group of training samples reaches the preset precision.
11. A motion capture device, comprising:
the acquisition unit is used for carrying out multi-camera shooting on a shot object to obtain a plurality of shot videos; shooting visual angles of the plurality of shot videos are different;
a determination unit configured to determine two-dimensional coordinates of a key point of the photographic subject corresponding to each of the plurality of photographic videos;
the processing unit is used for constructing a plurality of three-dimensional postures of the shooting object according to the two-dimensional coordinates of the key points of the shooting object corresponding to each shooting video, wherein the plurality of shooting videos correspond to the plurality of three-dimensional postures one to one;
the determining unit is further used for determining relative position information among the multiple machine positions according to the multiple three-dimensional postures;
the processing unit is further used for constructing a three-dimensional posture of the shot object according to the relative position information among the multiple machine positions and the two-dimensional coordinates of the key points of the shot object corresponding to each shot video;
the recognition unit is used for determining the action state of the shooting object according to the three-dimensional stereo posture of the shooting object.
12. An execution device, comprising: a memory and a processor, the memory and the processor coupled;
the memory is to store one or more computer instructions;
the processor is configured to execute the one or more computer instructions to implement the method according to any one of claims 1-8.
13. An exercise apparatus, comprising: a memory and a processor, the memory and the processor coupled;
the memory is to store one or more computer instructions;
the processor is configured to execute the one or more computer instructions to implement the method of any one of claims 9-10.
14. A computer-readable storage medium having stored thereon one or more computer instructions for execution by a processor to perform the method of any one of claims 1-10.
CN202210600917.XA 2022-05-30 2022-05-30 Motion capture method and related equipment Pending CN115311472A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210600917.XA CN115311472A (en) 2022-05-30 2022-05-30 Motion capture method and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210600917.XA CN115311472A (en) 2022-05-30 2022-05-30 Motion capture method and related equipment

Publications (1)

Publication Number Publication Date
CN115311472A true CN115311472A (en) 2022-11-08

Family

ID=83855473

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210600917.XA Pending CN115311472A (en) 2022-05-30 2022-05-30 Motion capture method and related equipment

Country Status (1)

Country Link
CN (1) CN115311472A (en)

Similar Documents

Publication Publication Date Title
CN110139115B (en) Method and device for controlling virtual image posture based on key points and electronic equipment
CN111862296B (en) Three-dimensional reconstruction method, three-dimensional reconstruction device, three-dimensional reconstruction system, model training method and storage medium
CN108921907B (en) Exercise test scoring method, device, equipment and storage medium
CN112311965B (en) Virtual shooting method, device, system and storage medium
KR101791590B1 (en) Object pose recognition apparatus and method using the same
CN110544301A (en) Three-dimensional human body action reconstruction system, method and action training system
CN109035394B (en) Face three-dimensional model reconstruction method, device, equipment and system and mobile terminal
CN110738717B (en) Method and device for correcting motion data and electronic equipment
CN109821239B (en) Method, device, equipment and storage medium for realizing somatosensory game
CN113850248B (en) Motion attitude evaluation method and device, edge calculation server and storage medium
US20210044787A1 (en) Three-dimensional reconstruction method, three-dimensional reconstruction device, and computer
WO2021190321A1 (en) Image processing method and device
CN112381003B (en) Motion capture method, motion capture device, motion capture equipment and storage medium
CN108958469B (en) Method for adding hyperlinks in virtual world based on augmented reality
CN110111364B (en) Motion detection method and device, electronic equipment and storage medium
CN109902675B (en) Object pose acquisition method and scene reconstruction method and device
CN111813689B (en) Game testing method, apparatus and medium
CN111626105A (en) Attitude estimation method and device and electronic equipment
CN114640833A (en) Projection picture adjusting method and device, electronic equipment and storage medium
CN115457176A (en) Image generation method and device, electronic equipment and storage medium
CN114882106A (en) Pose determination method and device, equipment and medium
WO2019137186A1 (en) Food identification method and apparatus, storage medium and computer device
CN112073632A (en) Image processing method, apparatus and storage medium
CN112116068A (en) Annular image splicing method, equipment and medium
CN111279352B (en) Three-dimensional information acquisition system through pitching exercise and camera parameter calculation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination