CN118097027A - Multi-video 3D human motion capturing method and system based on evolutionary computation - Google Patents

Multi-video 3D human motion capturing method and system based on evolutionary computation Download PDF

Info

Publication number
CN118097027A
CN118097027A CN202410472848.8A CN202410472848A CN118097027A CN 118097027 A CN118097027 A CN 118097027A CN 202410472848 A CN202410472848 A CN 202410472848A CN 118097027 A CN118097027 A CN 118097027A
Authority
CN
China
Prior art keywords
video
human body
human
frames
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410472848.8A
Other languages
Chinese (zh)
Inventor
徐淑涵
王璐威
杨旭旨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Xinhe Shengshi Technology Co ltd
Original Assignee
Hangzhou Xinhe Shengshi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Xinhe Shengshi Technology Co ltd filed Critical Hangzhou Xinhe Shengshi Technology Co ltd
Priority to CN202410472848.8A priority Critical patent/CN118097027A/en
Publication of CN118097027A publication Critical patent/CN118097027A/en
Pending legal-status Critical Current

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The invention relates to the field of 3D motion capture and evolutionary computation, in particular to a multi-video 3D human motion capture method and system based on evolutionary computation; according to the invention, when multi-video 3D human body motion is captured, the multi-video is synchronized, and then the 3D human body gesture of each frame in the video is obtained by adopting a mode of jointly optimizing camera parameters and human body gesture parameters for the synchronized video, so that all video motions are not required to be acquired from the same individual, the characteristics of multi-video and multi-view detection are fully considered, the problems of shielding and depth blurring can be effectively solved, and a better 3D human body motion capturing effect is obtained; meanwhile, the camera parameters and the human body posture parameters are adopted to jointly optimize, so that the capability of jumping out of a local optimal solution in the optimization process can be enhanced, more accurate 3D human body posture modeling is obtained, and the performance of 3D human body motion capture is effectively improved.

Description

Multi-video 3D human motion capturing method and system based on evolutionary computation
Technical Field
The invention relates to the field of 3D motion capture and evolutionary computation, in particular to a multi-video 3D human motion capture method and system based on evolutionary computation.
Background
The 3D motion capture technology aims at restoring the 3D motion gesture of a target from single or multiple videos, is widely applied to various fields such as film special effects, game development, sports analysis and the like, has wide application prospect, is one of important tasks in the computer 3D vision field, and is recently paid great attention to researchers at home and abroad.
The evolutionary computation refers to a heuristic global optimization algorithm which is inspired by the biological evolutionary process in the nature. Particle swarm algorithms are a branch of evolutionary computation, and are proposed by Kennedy and Eberhart in 1995 for solving the global optimization problem, inspired by the birdset foraging. Because of its simplicity and rapid convergence, particle swarm optimization has become one of the mainstream global optimization techniques and is widely used for various practical problems.
Compared with the traditional multi-view 3D human body motion capturing method, the multi-video 3D human body motion capturing technology can use a plurality of videos of which the camera parameters are unknown and the motions are not collected from the same individual at the same moment to extract the target gesture motions, and can reduce a large amount of cost. However, the existing scheme needs to use different optimization methods to respectively optimize the estimation of the camera parameters and the 3D human body posture action parameters, and is easy to fall into a locally optimal solution, so that the optimization effect is poor.
Therefore, a technical scheme of multi-video 3D human motion capture based on evolutionary computation for improving the optimization performance is urgently needed in the prior art.
Disclosure of Invention
Aiming at the defects of the technical scheme, the invention provides a multi-video 3D human body motion capturing method and system based on evolutionary computation, which can realize the joint optimization of camera parameters and 3D human body gesture motion parameters and improve the optimization performance.
To achieve the above object, according to one aspect of the present invention, a multi-video 3D human motion capturing method based on evolutionary computation includes the steps of:
s1: obtaining M videos containing human actions, and recording the number L of frames contained in each video;
S2: initializing human body gestures contained in each frame in the video by using a 3D human body gesture detection algorithm;
S3: calculating the similarity of the 3D human body gestures acquired through the S2 in any two frames of video images, and matching video frames according to the similarity, so as to realize multi-video synchronization;
s4: and acquiring the 3D human body posture of each frame in the video by adopting a mode of jointly optimizing the camera parameters and the human body posture parameters for the video synchronized by the S3.
Preferably, the video containing the human actions is acquired by self-shooting or by searching from social media.
Preferably, among the acquired M videos including human motion, the number of frames included in the i-th video is set to L i.
Preferably, in the step S2, the 3D human body posture algorithm is an HMMR detection algorithm;
preferably, the 3D human body pose k ij included in the j-th frame of the i-th video is denoted as:
In the above, the ratio of/> Representing the SMPL model,/>Is a human body posture parameter,/>Human body shape parameters,/>To model the position of the human body in the world coordinate system,/>Representing a linear regression model;
further, the method comprises the steps of, , />, />, />K=23 represents the number of predefined nodes in the 3D human body, and n=6890 represents the number of vertices of the human body modeling mesh.
Preferably, the S3 specifically is:
S3.1: performing alignment operation on 3D human body gestures in any two frames of images in any two videos by using a Pruck method;
wherein, the 3D human body gesture in any two frames of images in any two videos is defined as AndWherein/>And/>Represents arbitrarily selected/>Video and/>Video,/>And/>Represents arbitrarily selected/>Frame image and/>A frame image;
S3.2: calculating the similarity of the two 3D human body postures aligned by the S3.1;
similarity of two 3D human body attitudes after alignment The calculation formula of (2) is as follows:
;
S3.3: according to the similarity calculated in the step S3.2, multi-video synchronization is realized;
specifically, the S3.3 is specifically: setting the number of frames contained in each video as a super parameter Generating a video synchronization matrix X, establishing a multi-video synchronization objective function, and maximizing the multi-video synchronization objective function, thereby realizing multi-video synchronization;
preferably, the multi-video synchronization objective function The method comprises the following steps:
Where h is the number of frames of the synchronized video,/> For the number of frames in the ith video corresponding to the h frame of the synchronized video,/>The frame number corresponding to the h frame of the synchronized video in the j-th video;
preferably, the S4 specifically is:
s4.1: establishing a target strategy function for jointly optimizing camera parameters and human body posture parameters in a video;
wherein the objective policy function The method comprises the following steps:
Wherein/> For the reprojection error function, the calculation mode is as follows:
, /> Representation of/> First/>, of the individual videosCoordinates of the Z-th human body joint point in the 2D pose of the frame with confidence of/>, />Representing Geman-McClure robust error function for noise suppression detection,/>Representing perspective projection by camera parameters,Representing the coordinates of the predicted Z3D human body node; /(I)For human body posture parameters in the j frame of the i video,/> 、/>Camera parameters for the i-th video; /(I)Is a constant weight,/>For the time smoothing error function, the calculation method is as follows:
, /> Representing the number of frames per video,/> Represents the/>Human body posture matrix of all frames of each video,/>Represents the/>Human body posture matrix of all frames of each video,/>A threshold value set manually;
s4.2: initializing camera parameters And human body action gesture parameter/>
Parameters of cameraAnd human body action gesture parameter/>Unfolding the vector into a one-dimensional vector; for subsequent use of particle swarm algorithm on the camera parameters/>And human body action gesture parameter/>Performing joint optimization;
S4.3: adopting a particle swarm algorithm to perform joint optimization on human body action posture parameters in each frame in the video, and outputting 3D human body posture parameters;
preferably, the S4.3 is specifically: camera parameters of the one-dimensional vector in S4.2 And human body action gesture parameter/>As particle centers, then randomly initializing n particles whose positions areSpeed is/>Updating the speeds and positions of n particles through a particle swarm algorithm, updating and iterating according to a particle updating formula, judging whether the current iteration number is smaller than the maximum iteration number G, if so, continuing iteration, and if not, taking gbest as the optimal solution of the current iteration optimization, and restoring to beTo update the 3D human body gesture contained in each frame of each video, and to output the final result of the updated 3D human body gesture parameters.
Preferably, the particle update formula for updating the speed and the position of the n particles by the particle swarm algorithm is as follows:
Wherein, The inertial weight is used for measuring the influence of the speed of particles in the previous iteration on the iteration of the round;、 /> Is a constant acceleration factor, and/> ;/>、 />Is a D-dimensional random number vector; /(I)Represents the/>Historical optimal positions of individual particles; gbest denotes the historical optimal position of the particle swarm; /(I)Represents the/>The velocity of the individual particles; /(I)Represents the/>The position of the individual particles, D is the dimension and D is the D-th update.
According to another aspect of the present invention, there is provided a multi-video 3D human motion capture system based on evolutionary computation, the system adopting the multi-video 3D human motion capture method based on evolutionary computation, wherein the system comprises:
the video acquisition module is used for acquiring M videos containing human actions and recording the frame number L contained in each video;
the human body posture initialization module is used for initializing human body postures contained in each frame in the video by using a 3D human body posture detection algorithm;
the multi-video synchronization module is used for calculating the similarity of the 3D human body gesture obtained by the S2 in any two frames of video images and matching video frames according to the similarity so as to realize multi-video synchronization;
And the human body posture acquisition module is used for acquiring the 3D human body posture of each frame in the video by adopting a mode of jointly optimizing the camera parameters and the human body posture parameters for the video synchronized by the S3.
Based on the technical scheme, the multi-video 3D human motion capturing method and system based on evolutionary computation provided by the application have the following technical effects:
according to the invention, when multi-video 3D human body motion is captured, the multi-video is synchronized, and then the 3D human body gesture of each frame in the video is obtained by adopting a mode of jointly optimizing camera parameters and human body gesture parameters for the synchronized video, so that all video motions are not required to be acquired from the same individual, the characteristics of multi-video and multi-view detection are fully considered, the problems of shielding and depth blurring can be effectively solved, and a better 3D human body motion capturing effect is obtained;
meanwhile, the camera parameters and the human body posture parameters are adopted to jointly optimize, so that the capability of jumping out of a local optimal solution in the optimization process can be enhanced, more accurate 3D human body posture modeling is obtained, and the performance of 3D human body motion capture is effectively improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate and explain the disclosure, and do not constitute a limitation on the disclosure.
In the drawings:
Fig. 1 is a flowchart of a multi-video 3D human motion capturing method based on evolutionary computation according to an embodiment of the present application.
Fig. 2 is a flowchart for implementing multi-video synchronization according to an embodiment of the present application.
Fig. 3 is a schematic diagram of a multi-video 3D human motion capture system based on evolutionary computation according to an embodiment of the present application.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments herein more apparent, the technical solutions in the embodiments herein will be clearly and completely described below with reference to the accompanying drawings in the embodiments herein, and it is apparent that the described embodiments are some, but not all, embodiments herein. All other embodiments, based on the embodiments herein, which a person of ordinary skill in the art would obtain without undue burden, are within the scope of protection herein. It should be noted that, without conflict, the embodiments and features of the embodiments herein may be arbitrarily combined with each other.
Unless the context clearly requires otherwise, throughout the description and the claims, the words "comprise", "comprising", and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is, it is the meaning of "including but not limited to".
Example 1
In order to achieve the above objective, as shown in fig. 1, a multi-video 3D human motion capturing method based on evolutionary computation includes the following steps:
s1: obtaining M videos containing human actions, and recording the number L of frames contained in each video;
In this embodiment, any action on the screen may be recorded and saved as a video file using a special recording tool, such as SnagIt, or the video may be made by itself using a video processing tool (such as Premiere, drawing, etc.).
After the video file is obtained, a third party tool such as the VideoCapture class of opencv is used for reading the video file, and each frame of the video is circularly read until the video is finished, so that the total frame number L of the video file is determined and recorded.
In another embodiment, the video including the human motion may be obtained by self-shooting, or may be obtained by searching from social media.
In addition, among the acquired M videos including human motion, the number of frames included in the i-th video is set to L i.
S2: initializing human body gestures contained in each frame in the video by using a 3D human body gesture detection algorithm;
In this embodiment, a 3D human body posture detection algorithm suitable for the application scene and the data set may be selected, and some common open source algorithms include OpenPose, mocap and deep learning-based methods, such as VIBE, SMPLify, and the like. After the human body posture detection algorithm is determined, preprocessing is carried out on the video, so that the data in the video is ensured to be suitable for the algorithm. Then for each frame in the video, the position of the human body is located in the frame, and based thereon, the 2D keypoints of the human body are further detected, based on which the algorithm will estimate their position in 3D space, thereby determining the 3D human body pose of each frame. For this 3D human pose, it may be initialized to a pose representation by the SMPL model.
In another embodiment, the 3D human body pose algorithm is an HMMR detection algorithm;
In S2, the 3D human body pose k ij included in the j-th frame of the i-th video is denoted as:
In the above, the ratio of/> Representing the SMPL model,/>Is a human body posture parameter,/>Human body shape parameters,/>To model the position of the human body in the world coordinate system,/>Representing a linear regression model;
The SMPL model (Skinned Multi-Person Linear) is a parameterized method for a three-dimensional mannequin that takes into account not only skeletal points of the human body, but also skin information of the human body, i.e., the skin covered surface. The object of the SMPL model is to accurately represent the shape and pose of the human body while maintaining simplicity, and the SMPL model is a three-dimensional parameterized model for capturing the shape and pose of the human body, in this embodiment, the 3D human body pose can be accurately represented using the SMPL model.
Further, the method comprises the steps of,, />, />, />K=23 represents the number of predefined nodes in the 3D human body, and n=6890 represents the number of vertices of the human body modeling mesh.
S3: calculating the similarity of the 3D human body gestures acquired through the S2 in any two frames of video images, and matching video frames according to the similarity, so as to realize multi-video synchronization;
in this embodiment, features such as relative positions, angles, distances, etc. between the nodes of the articulation may be extracted from the 3D human body pose. And a similarity measurement method is preset to compare the similarity between the two gestures. This can be calculated based on the features you extract, for example using euclidean distance, cosine similarity or more complex metrics.
After the similarity measurement method is determined, the similarity of the postures between each pair of video frames is calculated by comparing video images of any frames in the video in pairs.
Specifically, as shown in fig. 2, the S3 specifically is:
S3.1: performing alignment operation on 3D human body gestures in any two frames of images in any two videos by using a Pruck method;
wherein, the 3D human body gesture in any two frames of images in any two videos is defined as AndWherein/>And/>Represents arbitrarily selected/>Video and/>Video,/>And/>Represents arbitrarily selected/>Frame image and/>A frame image;
The pullulan method (Procrustes Method) is a commonly used shape distribution analysis method, which can measure the similarity between shape data, find a standard shape through continuous iteration, and find an affine change mode from each object shape to the standard shape by using a least square method. This process is also known as least squares orthogonal mapping (least-squares orthogonal mapping).
The pullup method is based on matching corresponding points (coordinates) in two data sets by translating, rotating and scaling the coordinates of the points in one of the data sets to match the coordinates of the corresponding points in the other data set and minimize the sum of squares of the deviations between the point coordinates. The deviation between the coordinates of the corresponding points is referred to as a vector residual (vector residuals), a small vector residual representing a higher consistency of the two data sets.
S3.2: calculating the similarity of the two 3D human body postures aligned by the S3.1;
Specifically, the similarity of two 3D human body poses after alignment The calculation formula of (2) is as follows:
Wherein/> 3D human body gesture of the j1 st frame of the i1 st video after alignment,/>3D human body gesture of the j2 frame of the i2 video after being aligned;
S3.3: according to the similarity calculated in the step S3.2, multi-video synchronization is realized;
specifically, the S3.3 is specifically: setting the number of frames contained in each video as a super parameter Generating a video synchronization matrix X, establishing a multi-video synchronization objective function, and maximizing the multi-video synchronization objective function, thereby realizing multi-video synchronization;
Further, the multi-video synchronization objective function The method comprises the following steps:
Where h is the number of frames of the synchronized video,/> For the number of frames in the ith video corresponding to the h frame of the synchronized video,/>The frame number corresponding to the h frame of the synchronized video in the j-th video;
through the operation, M videos containing human body actions can be effectively synchronized, and a good foundation is laid for the identification of the subsequent 3D human body action capture.
S4: acquiring the 3D human body posture of each frame in the video by adopting a mode of jointly optimizing camera parameters and human body posture parameters for the video synchronized by the S3;
In this embodiment, the goal of the joint optimization is to find optimal camera parameters and body posture parameters, which can be iteratively adjusted using optimization algorithms, such as gradient descent, levenberg-Marquardt algorithm, etc., to minimize defined loss functions.
The optimization process is repeated until a convergence condition is reached, such as the loss function value is less than a certain threshold, or the parameter variation is less than a certain threshold. In each iteration, the optimization result of the previous frame may be used as an initial parameter of the current frame to accelerate the convergence process.
Specifically, the S4 specifically is:
S4.1: establishing a target strategy function for jointly optimizing camera parameters and human body gestures in a video;
wherein the objective policy function The method comprises the following steps:
Wherein/> For the reprojection error function, the calculation mode is as follows:
,/> Representation of/> First/>, of the individual videosCoordinates of z-th human body joint point in 2D gesture of frame with confidence degree of/>, />Representing Geman-McClure robust error function for noise suppression detection,/>Representing perspective projection by camera parameters,/>Representing the coordinates of the predicted z 3D human body node; /(I)For the human body pose in the j-th frame of the i-th video,、 />Camera parameters for the i-th video; /(I)Is a constant weight,/>For the time smoothing error function, the calculation method is as follows:
Representing the number of frames per video,/> Represents the/>Human body posture parameter matrix of all frames of each videoRepresents the/>Human body posture parameter matrix of all frames of each videoFor a manually set threshold,/>Rank of matrix formed for human body posture parameters,/>Rank is the human body position matrix;
s4.2: initializing camera parameters And human body action gesture parameter/>
Specifically, camera parametersAnd human body action gesture parameter/>Unfolding the vector into a one-dimensional vector; for subsequent use of particle swarm algorithm on the camera parameters/>And human body action gesture parameter/>Performing joint optimization;
S4.3: adopting a particle swarm algorithm to perform joint optimization on human body action posture parameters in each frame in the video, and outputting 3D human body posture parameters;
specifically, the S4.3 is specifically: camera parameters of the one-dimensional vector in S4.2 And human body action gesture parameter/>As particle centers, then randomly initializing n particles whose positions areSpeed is/>Updating the speeds and positions of n particles through a particle swarm algorithm, updating and iterating according to a particle updating formula, judging whether the current iteration number is smaller than the maximum iteration number G, if so, continuing iteration, and if not, taking gbest as the optimal solution of the current iteration optimization, and restoring to beTo update the 3D human body gesture contained in each frame of each video, and to output the final result of the updated 3D human body gesture parameters.
Further, the particle update formula for updating the speed and the position of the n particles by the particle swarm algorithm is as follows:
Wherein, The inertial weight is used for measuring the influence of the speed of particles in the previous iteration on the iteration of the round;、 /> Is a constant acceleration factor, and/> ; />、 />Is a D-dimensional random number vector; /(I)Represents the/>Historical optimal positions of individual particles; gbest denotes the historical optimal position of the particle swarm; /(I)Represents the/>The velocity of the individual particles; /(I)Represents the/>The position of the individual particles, D is the dimension and D is the D-th update.
Illustratively, to verify the effectiveness of the embodiment's offer, the embodiment's approach is implemented on a set of tennis-playing videos acquired from the internet, as opposed to classical HMMR(Kanazawa, A., Zhang, J.Y., Felsen, P., Malik, J., "Learning 3d human dynamics from video," in CVPR, 2019.) and newer iMoCap(J. Dong, Q. Shuai, Y. Zhang, X. Liu, X. Zhou, H. Bao, "Motion capture from internet videos," ECCV, 2020.).
MPJPE and P-MPJPE are evaluation indexes widely used in 3D human body posture estimation algorithms. MPJPE is average (per) joint position error, which is average Euclidean distance between predicted coordinates and real coordinates of all key points, and the predicted result is generally aligned with the real root joint in advance, and the smaller the index is, the better the 3D human body posture estimation algorithm is considered; P-MPJPE is that the prediction result is aligned with the real coordinates through the Pruck analysis method, and then MPJPE is calculated, and finally the comparison results of the three methods are shown in Table 1, so that compared with the prior art, the method provided by the embodiment has more accurate capturing effect.
Table 1 experimental data for comparison of three algorithms
MPJPE (mm) P-MPJPE (mm)
HMMR 109.80 78.26
iMoCap 66.53 50.33
This embodiment 62.32 47.26
When the multi-video 3D human body motion is captured, the human body gesture of each frame in human body motion in an original video image is initialized by utilizing a 3D human body gesture detection algorithm, the similarity of the human body gestures in any two frames of video images is calculated, and video frames are matched according to the similarity, so that multi-video synchronization is realized; then, acquiring the 3D human body posture of each frame in the video by adopting a mode of jointly optimizing camera parameters and human body posture parameters for the synchronized video, wherein all video actions are not required to be acquired from the same individual, the characteristics of multi-video and multi-view detection are fully considered, the problems of shielding and depth blurring can be effectively solved, and a better 3D human body action capturing effect is obtained;
Meanwhile, the camera parameters and the human body posture parameters are adopted to jointly optimize, so that the capability of jumping out of a local optimal solution in the optimization process can be enhanced, more accurate 3D human body posture modeling is obtained, and the performance of 3D human body motion capture is effectively improved.
In a second embodiment, as shown in fig. 3, the present embodiment includes a multi-video 3D human motion capturing system based on evolutionary computation, the system adopts the multi-video 3D human motion capturing method based on evolutionary computation of the first embodiment, where the system includes:
the video acquisition module is used for acquiring M videos containing human actions and recording the frame number L contained in each video;
the human body posture initialization module is used for initializing human body postures contained in each frame in the video by using a 3D human body posture detection algorithm;
the multi-video synchronization module is used for calculating the similarity of the 3D human body gesture obtained by the S2 in any two frames of video images and matching video frames according to the similarity so as to realize multi-video synchronization;
And the human body posture acquisition module is used for acquiring the 3D human body posture of each frame in the video by adopting a mode of jointly optimizing the camera parameters and the human body posture parameters for the video synchronized by the S3.
For specific steps, reference may be made to the method in embodiment one, and details are not described here.
In a third embodiment, the present embodiment includes a computer readable storage medium having a data processing program stored thereon, where the data processing program is executed by a processor to perform the multi-video 3D human motion capture method based on evolutionary computation of the first embodiment.
It will be apparent to one of ordinary skill in the art that embodiments herein may be provided as a method, apparatus (device), or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media having computer-usable program code embodied therein. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, including, but not limited to RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disk (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.
The description herein is with reference to flowchart illustrations and/or block diagrams of methods, apparatus (devices) and computer program products according to embodiments herein. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments herein have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all alterations and modifications as fall within the scope herein.
It will be apparent to those skilled in the art that various modifications and variations can be made herein without departing from the spirit and scope of the disclosure. Thus, given that such modifications and variations herein fall within the scope of the claims herein and their equivalents, such modifications and variations are intended to be included herein.

Claims (10)

1. A multi-video 3D human motion capturing method based on evolutionary computation is characterized in that: the method comprises the following steps:
s1: obtaining M videos containing human actions, and recording the number L of frames contained in each video;
S2: initializing human body gestures contained in each frame in the video by using a 3D human body gesture detection algorithm;
S3: calculating the similarity of the 3D human body gestures acquired through the S2 in any two frames of video images, and matching video frames according to the similarity, so as to realize multi-video synchronization;
s4: and acquiring the 3D human body posture of each frame in the video by adopting a mode of jointly optimizing the camera parameters and the human body posture parameters for the video synchronized by the S3.
2. The evolutionarily-computed-based multi-video 3D human motion capture method of claim 1, wherein the video containing human motion is obtained by self-shooting or by retrieval from social media.
3. The evolutionary computation-based multi-video 3D human motion capture method of claim 2, wherein the number of frames contained in the i-th video among the acquired M human motion-containing videos is set to L i.
4. The evolutionary computation-based multi-video 3D human motion capture method of claim 1, wherein in S2 the 3D human pose algorithm is an HMMR detection algorithm.
5. The evolutionary computation-based multi-video 3D human motion capture method of claim 1, wherein the human pose k ij contained in the j-th frame of the i-th video is noted as:
In the above, the ratio of/> Representing the SMPL model,/>Is a human body posture parameter,/>The parameters of the human body shape are set,To model the position of the human body in the world coordinate system,/>Representing a linear regression model.
6. The evolutionary computation-based multi-video 3D human motion capture method of claim 1, wherein S3 is specifically:
S3.1: performing alignment operation on 3D human body gestures in any two frames of images in any two videos by using a Pruck method;
wherein, the 3D human body gesture in any two frames of images in any two videos is defined as And/>Wherein, the method comprises the steps of, wherein,And/>Represents arbitrarily selected/>Video and/>Video,/>And/>Represents arbitrarily selected/>Frame image and/>A frame image;
S3.2: calculating the similarity of the two 3D human body postures aligned by the S3.1;
similarity of two 3D human body attitudes after alignment The calculation formula of (2) is as follows:
s3.3: and realizing multi-video synchronization according to the similarity calculated in the step S3.2.
7. The evolutionary computation-based multi-video 3D human motion capture method of claim 6, wherein S3.3 is specifically: setting the number of frames contained in each video as a super parameterGenerating a video synchronization matrix X, establishing a multi-video synchronization objective function, and maximizing the multi-video synchronization objective function, thereby realizing multi-video synchronization;
The multi-video synchronization objective function The method comprises the following steps:
Where h is the number of frames of the synchronized video,/> For the number of frames in the ith video corresponding to the h frame of the synchronized video,/>The number of frames in the jth video corresponding to the h frame of the synchronized video.
8. The evolutionary computation-based multi-video 3D human motion capture method of claim 1, wherein S4 is specifically:
s4.1: establishing a target strategy function for jointly optimizing camera parameters and human body posture parameters in a video;
Wherein the objective policy function The method comprises the following steps:
Wherein/> For the reprojection error function, the calculation mode is as follows:
,/> Representation of/> First/>, of the individual videosThe Z-th human body joint point coordinate in the 2D gesture of the frame has the confidence degree of/>,/>Representing Geman-McClure robust error function for noise suppression detection,/>Representing perspective projection by camera parameters,/>Representing the coordinates of the predicted Z3D human body node; /(I)For human body posture parameters in the j frame of the i video,/>、/>Camera parameters for the i-th video; /(I)Is a constant weight,/>For the time smoothing error function, the calculation method is as follows:
,/> representing the number of frames per video,/> Represents the/>Human body posture matrix of all frames of each video,/>Represents the/>Human body posture matrix of all frames of each video,/>A threshold value set manually;
S4.2: initializing camera parameters And human body action gesture parameter/>
Parameters of cameraAnd human body action gesture parameter/>Unfolding the vector into a one-dimensional vector; for subsequent use of particle swarm algorithm on the camera parameters/>And human body action gesture parameter/>Performing joint optimization;
s4.3: and adopting a particle swarm algorithm to perform joint optimization on human body action posture parameters in each frame in the video, and outputting 3D human body posture parameters.
9. The evolutionary computation-based multi-video 3D human motion capture method of claim 8, wherein S4.3 is specifically: camera parameters of the one-dimensional vector in S4.2And human body action posture parametersAs particle center, then randomly initialize/>Individual particles whose positions are/>At a speed ofUpdating the speeds and positions of n particles through a particle swarm algorithm, updating and iterating according to a particle updating formula, judging whether the current iteration number is smaller than the maximum iteration number G, if so, continuing iteration, and if not, taking gbest as the optimal solution of the iteration optimization and restoring to/>To update the 3D human body gesture contained in each frame of each video, and to output the final result of the updated 3D human body gesture parameters.
10. A multi-video 3D human motion capture system based on evolutionary computation employing the multi-video 3D human motion capture method based on evolutionary computation of any one of claims 1-9, wherein the system comprises:
the video acquisition module is used for acquiring M videos containing human actions and recording the frame number L contained in each video;
the human body posture initialization module is used for initializing human body postures contained in each frame in the video by using a 3D human body posture detection algorithm;
the multi-video synchronization module is used for calculating the similarity of the 3D human body gesture obtained by the S2 in any two frames of video images and matching video frames according to the similarity so as to realize multi-video synchronization;
And the human body posture acquisition module is used for acquiring the 3D human body posture of each frame in the video by adopting a mode of jointly optimizing the camera parameters and the human body posture parameters for the video synchronized by the S3.
CN202410472848.8A 2024-04-19 2024-04-19 Multi-video 3D human motion capturing method and system based on evolutionary computation Pending CN118097027A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410472848.8A CN118097027A (en) 2024-04-19 2024-04-19 Multi-video 3D human motion capturing method and system based on evolutionary computation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410472848.8A CN118097027A (en) 2024-04-19 2024-04-19 Multi-video 3D human motion capturing method and system based on evolutionary computation

Publications (1)

Publication Number Publication Date
CN118097027A true CN118097027A (en) 2024-05-28

Family

ID=91165575

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410472848.8A Pending CN118097027A (en) 2024-04-19 2024-04-19 Multi-video 3D human motion capturing method and system based on evolutionary computation

Country Status (1)

Country Link
CN (1) CN118097027A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112183184A (en) * 2020-08-13 2021-01-05 浙江大学 Motion capture method based on asynchronous video
CN113673494A (en) * 2021-10-25 2021-11-19 青岛根尖智能科技有限公司 Human body posture standard motion behavior matching method and system
CN114092863A (en) * 2021-11-26 2022-02-25 重庆大学 Human body motion evaluation method for multi-view video image
WO2022100262A1 (en) * 2020-11-12 2022-05-19 海信视像科技股份有限公司 Display device, human body posture detection method, and application
CN114926530A (en) * 2021-02-12 2022-08-19 格雷兹珀技术有限公司 Computer-implemented method, data processing apparatus and computer program for generating three-dimensional pose estimation data
CN115546360A (en) * 2021-06-29 2022-12-30 阿里巴巴新加坡控股有限公司 Action result identification method and device
CN117437771A (en) * 2022-07-15 2024-01-23 北京图森智途科技有限公司 Target state estimation method, device, electronic equipment and medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112183184A (en) * 2020-08-13 2021-01-05 浙江大学 Motion capture method based on asynchronous video
WO2022100262A1 (en) * 2020-11-12 2022-05-19 海信视像科技股份有限公司 Display device, human body posture detection method, and application
CN114926530A (en) * 2021-02-12 2022-08-19 格雷兹珀技术有限公司 Computer-implemented method, data processing apparatus and computer program for generating three-dimensional pose estimation data
CN115546360A (en) * 2021-06-29 2022-12-30 阿里巴巴新加坡控股有限公司 Action result identification method and device
CN113673494A (en) * 2021-10-25 2021-11-19 青岛根尖智能科技有限公司 Human body posture standard motion behavior matching method and system
CN114092863A (en) * 2021-11-26 2022-02-25 重庆大学 Human body motion evaluation method for multi-view video image
CN117437771A (en) * 2022-07-15 2024-01-23 北京图森智途科技有限公司 Target state estimation method, device, electronic equipment and medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ZHONG L.等: "A new multi-view articulated human motion tracking algorithm with improved silhouette extraction and view adaptive fusion", 2013 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 1 August 2013 (2013-08-01) *
李佳: "多视角三维人体运动捕捉的研究", 中国优秀硕士学位论文全文数据库, 15 October 2013 (2013-10-15) *
李毅: "单目视频人体运动分析技术研究", 中国优秀硕士学位论文全文数据库, 15 August 2016 (2016-08-15) *

Similar Documents

Publication Publication Date Title
CN111795704B (en) Method and device for constructing visual point cloud map
Deng et al. Amodal detection of 3d objects: Inferring 3d bounding boxes from 2d ones in rgb-depth images
CN108369643B (en) Method and system for 3D hand skeleton tracking
Wang et al. 360sd-net: 360 stereo depth estimation with learnable cost volume
Arth et al. Wide area localization on mobile phones
KR101532864B1 (en) Planar mapping and tracking for mobile devices
CN102971768B (en) Posture state estimation unit and posture state method of estimation
US20140043329A1 (en) Method of augmented makeover with 3d face modeling and landmark alignment
Muratov et al. 3DCapture: 3D Reconstruction for a Smartphone
CN113689503B (en) Target object posture detection method, device, equipment and storage medium
CN111815768B (en) Three-dimensional face reconstruction method and device
CN113674400A (en) Spectrum three-dimensional reconstruction method and system based on repositioning technology and storage medium
US10791321B2 (en) Constructing a user's face model using particle filters
CN110188630A (en) A kind of face identification method and camera
Li et al. Sparse-to-local-dense matching for geometry-guided correspondence estimation
CN117726747A (en) Three-dimensional reconstruction method, device, storage medium and equipment for complementing weak texture scene
CN116894876A (en) 6-DOF positioning method based on real-time image
Proenca et al. SHREC’15 Track: Retrieval of Oobjects captured with kinect one camera
CN115797451A (en) Acupuncture point identification method, device and equipment and readable storage medium
CN118097027A (en) Multi-video 3D human motion capturing method and system based on evolutionary computation
Liu et al. Improved template matching based stereo vision sparse 3D reconstruction algorithm
Suo et al. Neural3d: Light-weight neural portrait scanning via context-aware correspondence learning
Kim et al. Ep2p-loc: End-to-end 3d point to 2d pixel localization for large-scale visual localization
Moliner et al. Better prior knowledge improves human-pose-based extrinsic camera calibration
Kamencay et al. A new approach for disparity map estimation from stereo image sequences using hybrid segmentation algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination