CN111291687B - 3D human body action standard identification method - Google Patents
3D human body action standard identification method Download PDFInfo
- Publication number
- CN111291687B CN111291687B CN202010085665.2A CN202010085665A CN111291687B CN 111291687 B CN111291687 B CN 111291687B CN 202010085665 A CN202010085665 A CN 202010085665A CN 111291687 B CN111291687 B CN 111291687B
- Authority
- CN
- China
- Prior art keywords
- equal
- human body
- joint point
- camera
- matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000009471 action Effects 0.000 title claims abstract description 45
- 238000000034 method Methods 0.000 title claims abstract description 33
- 230000036544 posture Effects 0.000 claims abstract description 29
- 230000004927 fusion Effects 0.000 claims abstract description 18
- 239000011159 matrix material Substances 0.000 claims description 58
- 230000004044 response Effects 0.000 claims description 33
- 230000009466 transformation Effects 0.000 claims description 8
- 239000013598 vector Substances 0.000 claims description 7
- 238000000354 decomposition reaction Methods 0.000 claims description 6
- 239000000203 mixture Substances 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 3
- 239000000654 additive Substances 0.000 claims description 2
- 230000000996 additive effect Effects 0.000 claims description 2
- 238000001514 detection method Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 229920003259 poly(silylenemethylene) Polymers 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/64—Three-dimensional objects
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Social Psychology (AREA)
- Psychiatry (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Processing (AREA)
- Image Analysis (AREA)
- Length Measuring Devices By Optical Means (AREA)
Abstract
The invention discloses a 3D human body action standard identification method, which comprises the following steps: (1) Acquiring images of human body actions by adopting a plurality of multi-view cameras, traversing an image set, and finding 2D human body joint point positions of the images under the same timestamp; (2) According to the obtained 2D human body joint point position set under all the multi-view cameras, carrying out information fusion on the 2D joint points of the multi-view cameras to obtain fused 2D joint point positions under all the camera pixel coordinate systems; (3) According to the obtained fused 2D joint point positions under all the camera pixel coordinate systems, calculating side information formed by the 3D human body joint point positions and the joint points, namely 3D human body postures; (4) And performing action standard type identification according to the finally calculated 3D human body posture information. The method disclosed by the invention can accurately judge and recognize the complicated human body postures of the target bodies under different angles, and has real-time property.
Description
Technical Field
The invention relates to a 3D human body action standard identification method.
Background
At present, many techniques for extracting human body joint points based on a posture estimation algorithm to perform human body action standard judgment include:
(1) The 2D attitude estimation algorithm based on the monocular camera is divided into a bottom-up calculation method and a top-down calculation method, wherein the bottom-up detection method comprises the steps of firstly detecting a joint point and then judging whether the joint point belongs to a uniform target body, and the method has real-time performance, can quickly respond to a single image, but sacrifices detection precision; the top-down detection method firstly detects the target body and then extracts the joint points according to the target body, has higher detection precision but insufficient real-time performance.
The methods have common defects that joint points exist on the side surface of a target body, misjudgment and wrong joint point connection exist, action standard scoring errors are caused, the extraction accuracy is limited by the angle of a human body, and the method cannot be put into use.
(2) A3D posture estimation algorithm based on a monocular camera firstly virtualizes a relative 3D grid space, and then carries out 3D joint point reasoning on an input 2D image in a 3D grid space coordinate. Although the method improves the deficiency of the 2D human body posture estimation, the method still cannot meet the actual requirement, depends on the precision of the 2D posture, and can only judge the obtained human body action under a relative coordinate system.
(3) The 3D attitude estimation algorithm based on the multi-view camera depends on 2D attitude estimation precision, and the improvement on the 2D attitude precision of complex and sheltered 2D attitude is still limited.
Therefore, at present, the method for judging the human action standard property according to the human posture estimation has certain defects.
Disclosure of Invention
In order to solve the technical problems, the invention provides a 3D human body action standard identification method, so as to achieve the purposes of accurately identifying complex human body postures of target bodies at different angles and having real-time performance.
In order to achieve the purpose, the technical scheme of the invention is as follows:
A3D human body action standard identification method comprises the following steps:
(1) Acquiring images of human body actions by adopting a plurality of multi-view cameras, traversing an image set, and finding 2D human body joint point positions of the images under the same timestamp;
(2) According to the obtained 2D human body joint point position set under all the multi-view cameras, carrying out information fusion on the 2D joint points of the multi-view cameras to obtain fused 2D joint point positions under all the camera pixel coordinate systems;
(3) According to the obtained fused 2D joint point positions under all the camera pixel coordinate systems, calculating side information formed by the 3D human body joint point positions and the joint points, namely 3D human body postures;
(4) Performing action standard type identification according to the finally calculated 3D human body posture information;
the step (1) is specifically as follows:
given theCamera set Camera = { C 1 ,C 2 ,...,C i ,...,C a I is more than or equal to 1 and less than or equal to a, a represents the number of the multi-view cameras, and a is more than or equal to 2; the image data collected by the multi-view camera is Ig = { I = { (I) } 1 ,I 2 ,...,I i ,...,I a };I i (x, y, C) are at the same time stamp, C i The method comprises the steps that image samples collected by a camera are obtained, wherein x is more than or equal to 0 and less than or equal to W-1, W represents the image width, y is more than or equal to 0 and less than or equal to H-1, H represents the image height, c is channel information of an input image, and c is more than or equal to 0 and less than or equal to 2; traversing the image set Ig, and finding the image I under the same time stamp i The 2D human joint point position of (x, y, c) comprises the following specific steps:
(i) According to image I i (x, y, c) executing a high-low resolution fusion network, and respectively solving a characteristic response matrix of the high-low resolution of the fusion network;
defining its high resolution sub-network characteristic response matrix asI' is more than or equal to 1 and less than or equal to N, wherein N represents the number of feature layers of the high-resolution subnetwork,for the characteristic response submatrix of the i' th layer of the high resolution subnetwork, x is more than or equal to 0 i′ W ' -1, W ' = W, W ' represents the high resolution sub-network feature matrix width, 0 ≦ y i′ H ' -1,H ' = H, H ' represents the length of the high-resolution sub-network feature matrix;channel information of the feature matrix;
defining a set of low resolution sub-network feature response matrices asl1 and l2 respectively represent two low-resolution sub-network structures, i '< N > is more than or equal to 3, i' < N > is more than or equal to 7,andrespectively are feature response submatrices of two low-resolution sub-networks;
0≤x i″ w ≦ W "= W/2-1, W" represents the first low resolution sub-network feature matrix width,
0≤y i″ h ≦ H "= H/2-1, H" represents the first low resolution sub-network feature matrix height,
0≤x i″′ w '≦ W' ″, W '= W/4-1, W' ″ for the second low resolution subnetwork feature matrix width,
0≤y i″′ less than or equal to H ', H ' = H/4-1, H ' indicates the second low resolution subnetwork feature matrix height;
when i ', i' is an even number, the two low resolution subnetworksAndfusing with a high-resolution sub-network through deconvolution operation, wherein a fusion formula is as follows:
is composed ofTo the channelThe transformation matrix to be deconvolved is performed,is composed ofTo the channelA transformation matrix for deconvolution;
the recursion formula for the feature response submatrix of the high resolution subnetwork is:
(ii) The characteristic response submatrix of the high resolution subnetwork calculated according to equation (3)Solving output response matrix set HeatMap i ={H i,1 ,H i,2 ,...,H i,k ,...,H i.K K is more than or equal to 1 and less than or equal to K, K =17 and represents the number of 17 joint points of the human body to be solved, and then the image I is processed i (x, y, c) each pixel coordinate location is evaluated to determine if the kth joint point is located, H i,k (x, y) represents a confidence coefficient matrix of the kth joint point under the ith camera, wherein K is more than or equal to 1 and less than or equal to K:
weights for solving k-th joint position confidence matrixThe weight of the parameter is determined,in order to be offset in the amount of the offset,representing the Nth layer of a converged networkA characteristic response submatrix of the channel;
(iii) According to H calculated in (ii) i,k (x, y), K is more than or equal to 1 and less than or equal to K, and the obtained output response matrix set HeatMap i Solving the mean square error distance:
wherein (mu) x,k ,μ y,k ) Is the true pixel coordinate position, σ, of the joint point k x,k ,σ y,k The variances of the target output are all 1.5;
(iv) By H i,k (x, y) pairsSolving the gradient to update the weight parameter, and (3) representing the number of the feature layers of the N-th layer high-resolution sub-network, wherein the parameter updating formula is as follows:
wherein τ is a small number, 0.1 or 0.01;
similarly, updating the weight parameters of the feature layer of the high-resolution and low-resolution sub-network;
(v) (iv) repeating steps (i) - (iv) until MSE loss Converging or satisfying the maximum iteration number iter to obtain the final output response matrix set HeatMap i ={H i,1 ,H i,2 ,...,H i,k ,...,H i,K },H i,k A position confidence coefficient matrix representing the kth joint point of the human body;
(vi) According to HeadMap i Obtain camera C i Lower image I i The position of the 2D joint point of (x, y, c) is expressed as:wherein,confidence matrix H for representing position of human joint point i,k The pixel coordinate corresponding to the medium maximum value;
then the set of human body 2D joint point positions under all multi-view cameras is represented as J = { J = 1 ,J 2 ,...,J i ,...,J a };
The step (2) is specifically as follows:
(i) Carrying out alignment operation on a plurality of multi-view cameras: converting the world coordinate system into a Camera coordinate system, and then converting the Camera coordinate system into a pixel coordinate system to obtain a multi-view Camera Camera = { C = { (C) } 1 ,C 2 ,...,C i ,...,C a Converting relation between cameras, thereby realizing alignment operation of the multi-view camera;
(ii) Calculating the position of the fused 2D joint point of the multi-view camera, and aligning the camera C i Is as followsBy passingFusing weight matrix theta and J obtained in step (1) j Calculated as fused J i J is more than or equal to 1 and less than or equal to a, j is not equal to i, a represents the number of multi-view cameras, theta satisfies Gaussian distribution on an additive epipolar line, theta-N (0, 1), wherein the position of a joint point after fusion is represented as:
wherein,camera C for showing multiple eyes i The position of the kth joint point, θ, denotes the multi-view camera C j A fusion weight matrix of the detected position of the kth joint point,is switched to the camera C i Pixel coordinates in a pixel coordinate system;
updating the fused 2D joint positions under all camera pixel coordinate systems:
in the above scheme, the step (3) is specifically as follows:
(i) Expressing 3D human body posture information asl 3D =[l 1 ,l 2 ,...,l n′ ...,l K-1 ]Wherein, in the process,the world coordinate system position of the kth joint point of the human body is represented, K is more than or equal to 1 and less than or equal to K, K =17, and 17 joints of the human body are representedNode, l n′ Representing the edge vector formed by the joint points, N' is more than or equal to 1 and less than or equal to K-1, K-1 is the number of the edges formed by the joint points, obtaining a 3D space V taking a root node as the center according to the principle of triangulation, taking the edge length s of the V as an initial value of 2000mm, and discretizing the volume of the V into N g ×N g ×N g Coarse grid of (2):
are all in N 3 In the number of grids g, t is more than or equal to 1 0 ,t 1 ,t 2 ≤N g ,t 0 Representing the depth of the grid, t 1 Denotes the grid width, t 2 Indicating the grid height, N g Taking an initial value of 16;
(ii) 3D pose estimation representation in initial mesh computed according to graph model algorithm PSMl 3D (1)=[l 1 ,l 2 ,...,l n′ ...,l K-1 ],s(1)=2000mm,J 3D (1),l 3D (1) Are all shown in N g The result of 3D pose estimation in the initial grid of =1 is the position coordinate of the joint point in the world coordinate system and the joint point edge vector, respectively;
(iii) Using an iterative algorithm for each joint pointK is more than or equal to 1 and less than or equal to K, discretizes the grid where the K surrounds the current position of the joint pointIs a 2X 2 local grid, i.e. let N g =2, repeating the PSM method in the last step, obtaining updated 3D pose estimationl 3D (2)=[l 1 ,l 2 ,...,l n′ ...,l K-1 ]Along with iteration, the 3D posture of the human body is refined, and the side length s of V is updated toThe precision is improved, and the iteration number iter _3D is determined according to the sample complexity.
In the above scheme, the step (4) is specifically as follows:
according to the finally estimated 3D human body posture information:
l 3D (iter_3D)=[l 1 ,l 2 ,...,l n′ ...,l K-1 ]the action standard judgment is carried out, and the specific method comprises the following steps:
(i) The standard action angle set of the human body posture is as follows: representing any two edges l n′ ,l n″ Wherein the angle of the first and second guide rails,n is more than or equal to 1 and K-1 is more than or equal to n', according to the formula:solving each corner degree
(ii) Establishing a Gaussian mixture modelIs provided withIs shown inIs a mean Gaussian distribution with n' being less than or equal to 1 and K-1 being less than or equal to n ″, thenSubject to a standard normal distribution, with a confidence of 95% Y according to a standard normal distribution table 0.025 ,Y 0.0975 ObtainingThe value:
(iii) By Gaussian mixture modelJudging the gesture motion of the human body 3D gesture to be evaluated as a referenceScoring to obtain a final action standard score and total score set:specifically, judgingWhether or not to satisfy the distributionIf the standard distribution table meets the standard distribution table, the action is qualified, and the standard score of the decomposition action under the distribution is calculated according to the standard distribution table
Through the technical scheme, the 3D human body action standard identification method provided by the invention has the following beneficial effects:
(1) Accurately judging and recognizing complex human body postures of target bodies at different angles;
(2) The multi-view camera is calibrated by following the positioning of a world coordinate system, and the 2D attitude precision can be improved by accurately fusing the characteristics under different viewing angles according to the aligned coordinate system.
(3) The method has real-time performance, and common hardware resources can run.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below.
FIG. 1 is a diagram of a joint point of a human body according to an embodiment of the present invention;
FIG. 2 is a diagram of the interconversion between the world coordinate system and the camera coordinate system;
FIG. 3 is a diagram illustrating a transformation relationship between a camera coordinate system and a pixel coordinate system.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention.
The invention provides a 3D human body action standard identification method, which comprises the following steps:
(1) Acquiring images of human body actions by adopting a plurality of multi-view cameras, traversing an image set, and finding 2D human body joint point positions of the images under the same timestamp;
given set of cameras Camera = { C 1 ,C 2 ,...,C i ,...,C a I is more than or equal to 1 and less than or equal to a, a represents the number of the multi-view cameras, and a is more than or equal to 2; the image data collected by the multi-view camera is Ig = { I = { (I) 1 ,I 2 ,...,I i ,...,I a };I i (x, y, C) are at the same time stamp, C i The method comprises the steps that image samples collected by a camera are obtained, wherein x is more than or equal to 0 and less than or equal to W-1, W represents the image width, y is more than or equal to 0 and less than or equal to H-1, H represents the image height, c is channel information of an input image, and c is more than or equal to 0 and less than or equal to 2; traversing the image set Ig, and finding the image I under the same time stamp i The 2D human joint point position of (x, y, c) comprises the following specific steps:
(i) According to image I i (x, y, c) executing a high-low resolution fusion network, and respectively solving a characteristic response matrix of the high-low resolution of the fusion network;
defining its high resolution sub-network characteristic response matrix asI' is less than or equal to 1 and less than or equal to N, wherein N represents the number of high-resolution sub-network feature layers,for the characteristic response submatrix of the i' th layer of the high resolution subnetwork, x is more than or equal to 0 i′ W '-1,W' = W ', W' represents the high resolution sub-network feature matrix width, y is greater than or equal to 0 i′ H ' -1,H ' = H, H ' represents the length of the high-resolution sub-network feature matrix;channel information that is a feature matrix;
defining a set of low resolution sub-network feature response matrices asl1 and l2 respectively represent two low-resolution sub-network structures, i '< N > is more than or equal to 3, i' < N > is more than or equal to 7,andrespectively a feature response submatrix of two low-resolution sub-networks;
0≤x i″ w ≦ W "= W/2-1, W" represents the first low resolution sub-network feature matrix width,
0≤y i″ h ≦ H ', H ' = H/2-1, H ' represents the first low resolution sub-network feature matrix height,
0≤x i″′ w '≦ W' ″, W '= W/4-1, W' ″ represents the second low resolution sub-network feature matrix width,
0≤y i″′ h "", H "" = H/4-1, H "", indicates the second low resolution subnetwork feature matrix height;
when i ', i' is an even number, the two low resolution subnetworksAndfusing with a high-resolution sub-network through deconvolution operation, wherein a fusion formula is as follows:
is composed ofTo the channelThe transformation matrix of the deconvolution is performed,is composed ofTo the channelA transformation matrix for deconvolution;
the recurrence formula of the feature response submatrix of the high resolution subnetwork is:
(ii) The characteristic response submatrix of the high resolution subnetwork calculated according to equation (3)Solving output response matrix set HeatMap i ={H i,1 ,H i,2 ,...,H i,k ,...,H i.K 1 ≦ K, K =17, representing the number of 17 joint points of the body to be sought, as shown in fig. 1, and then for image I i (x, y, c) each pixel coordinate location is evaluated to determine if it is the location of the kth joint point, H i,k (x, y) represents a confidence matrix of the kth joint point in the ith camera, wherein K is more than or equal to 1 and less than or equal to K:
to solve the problemThe weight parameters of the k joint position confidence matrices,in order to be an offset amount,layer N representing a converged networkA characteristic response submatrix of the channel;
(iii) According to H calculated in (ii) i,k (x, y), K is more than or equal to 1 and less than or equal to K, and the obtained output response matrix set HeatMap i Solving the mean square error distance:
wherein (mu) x,k ,μ y,k ) Is the true pixel coordinate position, σ, of the joint point k x,k ,σ y,k The variances of the target output are all 1.5;
(iv) By H i,k (x, y) pairsSolving the gradient to update the weight parameter, and (3) representing the number of the feature layers of the N-th layer high-resolution sub-network, wherein the parameter updating formula is as follows:
wherein τ is a small number, 0.1 or 0.01;
similarly, updating the weight parameters of the feature layer of the high-resolution and low-resolution sub-network;
(v) (iii) repeating steps (i) - (iv) until MSE loss Converging or satisfying the maximum iteration number iter to obtain the final output response matrix set HeatMap i ={H i,1 ,H i,2 ,...,H i,k ,...,H i,K },H i,k A position confidence coefficient matrix representing the kth joint point of the human body;
(vi) According to HeadMap i Get camera C i Lower image I i The position of the 2D joint point of (x, y, c) is expressed as:wherein,confidence matrix H for representing human joint point position i,k The pixel coordinate corresponding to the medium maximum value;
then the set of human body 2D joint point positions under all multi-view cameras is represented as J = { J = 1 ,J 2 ,...,J i ,...,J a }。
(2) According to the obtained 2D human body joint point position set under all the multi-view cameras, carrying out information fusion on the 2D joint points of the multi-view cameras to obtain fused 2D joint point positions under all the camera pixel coordinate systems;
(i) Performing an alignment operation on a plurality of multi-view cameras:
the alignment operation of the multi-view Camera is performed according to the paper a Flexible New Technique for Camera Calibration, as shown in fig. 2, R is an orthogonal identity matrix of 3 × 3, t is a translation vector, and R, t is an external parameter of the Camera, which is used to represent a distance between a world coordinate system and a Camera coordinate system.
As shown in FIG. 3, the plane π is referred to as the image plane of the camera, point O c Called the center of the camera (optical center), f is the focal length of the camera, and O c Making a ray perpendicular to the image plane for the end point, intersecting the image plane with point p, then ray O c p is called the optical axis (principal axis), the point p is the principal point of the camera, and there are,
Obtaining world coordinate system coordinate P according to FIG. 2 and FIG. 3 w (x w ,y w ,z w ) Conversion to pixel coordinate system coordinates P 1 The transformation process of (u, v) is as follows:
accordingly, the multi-view Camera Camera = { C 1 ,C 2 ,...,C i ,...,C a And (5) converting the relation between the cameras, thereby realizing the alignment operation of the multi-view camera.
(ii) Calculating the position of the fused 2D joint point of the multi-view camera, and aligning the camera C i Is as followsFusing a weight matrix theta with the J obtained in the step (1) j Calculated as fused J i J is more than or equal to 1 and less than or equal to a, j is not equal to i, a represents the number of multi-view cameras, theta satisfies Gaussian distribution on an addition polar line, theta-N (0, 1), wherein the position of the joint point after fusion is represented as:
wherein,camera C for showing multiple eyes i The position of the kth joint point, θ, denotes the multi-view camera C j A fusion weight matrix of the detected position of the kth joint point,is switched to the camera C i Pixel coordinates in a pixel coordinate system;
updating the fused 2D joint positions under all camera pixel coordinate systems: (11)
(3) According to the obtained fused 2D joint point positions under all the camera pixel coordinate systems, calculating side information formed by the 3D human body joint point positions and the joint points, namely 3D human body postures;
(i) Expressing 3D human body posture information asl 3D =[l 1 ,l 2 ,...,l n′ ...,l K-1 ]Whereinthe world coordinate system position of the kth joint point of the human body is represented, K is more than or equal to 1 and less than or equal to K, K =17, the K-th joint point of the human body is represented, and l n′ Representing the edge vector formed by the joint points, N' is more than or equal to 1 and less than or equal to K-1, K-1 is the number of the edges formed by the joint points, obtaining a 3D space V taking a root node as the center according to the principle of triangulation, taking the edge length s of the V as an initial value of 2000mm, and discretizing the volume of the V into N g ×N g ×N g Coarse grid of (2):
are all at N 3 In the number of grids g, t is more than or equal to 1 0 ,t 1 ,t 2 ≤N g ,t 0 Representing mesh depth, t 1 Denotes the grid width, t 2 Indicating the grid height, N g Taking an initial value of 16;
(ii) 3D pose estimation representation in initial mesh computed according to graph model algorithm PSMs(1)=2000mm,J 3D (1),l 3D (1) Are all shown in N g =1, wherein the results of 3D posture estimation in the initial grid are position coordinates and joint point edge vectors of the joint point under a world coordinate system respectively;
(iii) Using an iterative algorithm for each joint pointK is more than or equal to 1 and less than or equal to K, discretizes the grid where the K surrounds the current position of the joint pointIs a 2X 2 local grid, i.e. let N g =2, repeating the PSM method in the last step, obtaining updated 3D pose estimationl 3D (2)=[l 1 ,l 2 ,...,l n′ ...,l K-1 ]Along with iteration, the 3D posture of the human body is refined, and the side length s of V is updated toThe precision is improved, and the iteration number iter _3D is determined according to the sample complexity.
(4) And performing action standard type identification according to the finally calculated 3D human body posture information.
According to the finally estimated 3D human body posture information:
l 3D (iter_3D)=[l 1 ,l 2 ,...,l n′ ...,l K-1 ]the method for judging the action standard comprises the following steps:
(i) The standard action angle set of the human body postures is as follows: representing any two sides l n′ ,l n″ The angle of (a) of (b), wherein,n is more than or equal to 1, and K-1 is less than or equal to n', according to the formula:solving each corner degree
(ii) Establishing a Gaussian mixture modelIs provided withIs shown inIs a mean Gaussian distribution with n' being less than or equal to 1 and K-1 being less than or equal to n ″, thenComplianceStandard normal distribution, Y with 95% confidence level according to standard normal distribution table 0.025 ,Y 0.0975 ObtainingThe value:
(iii) Using Gaussian mixture modelJudging the gesture motion of the human body 3D gesture to be evaluated as a referenceAnd (4) scoring to obtain a final action standard score and total score set:specifically, judgingWhether the distribution is satisfied:if the standard distribution table meets the standard distribution table, the action is qualified, and the standard score of the decomposition action under the distribution is calculated according to the standard distribution table
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (3)
1. A3D human body action standard identification method is characterized by comprising the following steps:
(1) Acquiring images of human body actions by adopting a plurality of multi-view cameras, traversing an image set, and finding 2D human body joint point positions of the images under the same timestamp;
(2) According to the obtained 2D human body joint point position set under all the multi-view cameras, carrying out information fusion on the 2D joint points of the multi-view cameras to obtain fused 2D joint point positions under all the camera pixel coordinate systems;
(3) According to the obtained fused 2D joint point positions under all camera pixel coordinate systems, calculating 3D human body joint point positions and side information formed by joint points, namely 3D human body postures;
(4) Performing action standard type identification according to the finally calculated 3D human body posture information;
the step (1) is specifically as follows:
camera = { C) given set of cameras 1 ,C 2 ,...,C i ,...,C a I is more than or equal to 1 and less than or equal to a, a represents the number of the multi-view cameras, and a is more than or equal to 2; the image data collected by the multi-view camera is Ig = { I = { (I) } 1 ,I 2 ,...,I i ,...,I a };I i (x, y, C) are at the same time stamp, C i The method comprises the steps that image samples collected by a camera are obtained, wherein x is more than or equal to 0 and less than or equal to W-1, W represents the image width, y is more than or equal to 0 and less than or equal to H-1, H represents the image height, c is channel information of an input image, and c is more than or equal to 0 and less than or equal to 2; traversing the image set Ig, and finding the image I under the same time stamp i The 2D human body joint point positions of (x, y, c) comprise the following specific steps:
(i) According to image I i (x, y, c) executing a high-low resolution fusion network, and respectively solving a characteristic response matrix of the high-low resolution of the fusion network;
defining its high resolution sub-network characteristic response matrix asI' is less than or equal to 1 and less than or equal to N, wherein N represents the number of high-resolution sub-network feature layers,for the characteristic response submatrix of the i' th layer of the high resolution subnetwork, x is more than or equal to 0 i′ W '-1,W' = W ', W' represents the high resolution sub-network feature matrix width, y is greater than or equal to 0 i′ H ' -1,H ' = H, H ' represents the length of the high-resolution sub-network feature matrix;channel information that is a feature matrix;
defining a set of low resolution sub-network feature response matrices asl1 and l2 respectively represent two low-resolution sub-network structures, i '< N > is more than or equal to 3, i' < N > is more than or equal to 7,andrespectively a feature response submatrix of two low-resolution sub-networks;
0≤x i″ w ≦ W "= W/2-1, W" represents the first low resolution sub-network feature matrix width,
0≤y i″ h ≦ H "= H/2-1, H" represents the first low resolution sub-network feature matrix height,
0≤x i″′ w '≦ W' ″, W '= W/4-1, W' ″ represents the second low resolution sub-network feature matrix width,
0≤y i″′ h "", H "" = H/4-1, H "", indicates the second low resolution subnetwork feature matrix height;
when i ', i' is an even number, the two low resolution subnetworksAndfusing with a high-resolution sub-network through deconvolution operation, wherein a fusion formula is as follows:
is composed ofTo the channelThe transformation matrix to be deconvolved is performed,is composed ofTo the channelA transformation matrix for deconvolution;
the recursion formula for the feature response submatrix of the high resolution subnetwork is:
(ii) The characteristic response submatrix of the high resolution subnetwork calculated according to equation (3)Solving a set of output response matrices, heatMap i ={H i,1 ,H i,2 ,...,H i,k ,...,H i.K K is more than or equal to 1 and less than or equal to K, K =17 and represents the number of 17 joint points of the human body to be solved, and then the image I is processed i (x, y, c) each pixel coordinate location is evaluated to determine if it is the location of the kth joint point, H i,k (x, y) represents a confidence matrix of the kth joint point in the ith camera, wherein K is more than or equal to 1 and less than or equal to K:
to solve for the weight parameters of the k-th joint position confidence matrix,in order to be offset in the amount of the offset,representing the Nth layer of a converged networkA characteristic response submatrix of the channel;
(iii) According to H calculated in (ii) i,k (x, y), K is more than or equal to 1 and less than or equal to K, and the obtained output response matrix set HeatMap i Solving the mean square error distance:
wherein (mu) x,k ,μ y,k ) Is the true pixel coordinate position, σ, of the joint point k x,k ,σ y,k The variances of the target output are all 1.5;
(iv) By H i,k (x, y) pairsSolving the gradient to update the weight parameter, the number of the feature layers of the Nth layer high-resolution sub-network is expressed, and the parameter updating formula is as follows:
wherein τ is a small number, 0.1 or 0.01;
similarly, updating the weight parameters of the feature layer of the high-resolution and low-resolution sub-network;
(v) (iv) repeating steps (i) - (iv) until MSE loss Converging or meeting the maximum iteration number iter to obtain the final output response matrix set HeatMap i ={H i,1 ,H i,2 ,...,H i,k ,...,H i,K },H i,k A position confidence coefficient matrix representing the kth joint point of the human body;
(vi) According to HeadMap i Get camera C i Lower image I i The position of the 2D joint point of (x, y, c) is expressed as:wherein,confidence matrix H for representing human joint point position i,k The pixel coordinate corresponding to the medium maximum value;
the set of human 2D joint point positions under all multi-view cameras is represented as J = { J = { (J) 1 ,J 2 ,...,J i ,...,J a };
The step (2) is specifically as follows:
(i) Performing an alignment operation on a plurality of multi-view cameras: converting the world coordinate system into a Camera coordinate system, and then converting the Camera coordinate system into a pixel coordinate system to obtain a multi-view Camera = { C = 1 ,C 2 ,...,C i ,...,C a Converting the relationship between the cameras, thereby realizing the alignment operation of the multi-view camera;
(ii) Calculating the fused 2D joint point position of the multi-view camera, and aligning the camera C i Is as followsFusing a weight matrix theta with the J obtained in the step (1) j Calculated as fused J i ,1≤j≤a, j ≠ i, a denotes the number of multi-view cameras, θ satisfies the gaussian distribution on the additive epipolar line, θ -N (0, 1), where the fused joint point position is expressed as:
wherein,camera C for showing multiple eyes i The position of the kth joint point, θ, denotes the multi-view camera C j A fusion weight matrix of the detected position of the kth joint point,is switched to the camera C i Pixel coordinates in a pixel coordinate system;
updating the fused 2D joint positions under all camera pixel coordinate systems:
2. the method for 3D human body action standard judgment according to claim 1, wherein the step (3) is as follows:
(i) Expressing 3D human body posture information asl 3D =[l 1 ,l 2 ,...,l n′ ...,l K-1 ]Wherein, in the process,the world coordinate system position of the kth joint point of the human body is represented, K is more than or equal to 1 and less than or equal to K, K =17, the K-th joint point of the human body is represented, and l n′ Representing the edge vector formed by the joint points, N' is more than or equal to 1 and less than or equal to K-1, K-1 is the number of the edges formed by the joint points, obtaining a 3D space V taking a root node as the center according to the principle of triangulation, taking the edge length s of the V as an initial value of 2000mm, and discretizing the volume of the V into N g ×N g ×N g Coarse grid of (2):
are all at N 3 In the number of grids g, t is more than or equal to 1 0 ,t 1 ,t 2 ≤N g ,t 0 Representing the depth of the grid, t 1 Denotes the grid width, t 2 Indicating the grid height, N g Taking an initial value of 16;
(ii) 3D pose estimation representation in initial mesh computed according to graph model algorithm PSMl 3D (1)=[l 1 ,l 2 ,...,l n′ ...,l K-1 ],s(1)=2000mm,J 3D (1),l 3D (1) Are all shown in N g The result of 3D pose estimation in the initial grid of =1 is the position coordinate of the joint point in the world coordinate system and the joint point edge vector, respectively;
(iii) Using an iterative algorithm for each joint pointK is more than or equal to 1 and less than or equal to K, discretizes the grid where the K surrounds the current position of the joint pointIs a 2X 2 local grid, i.e. let N g =2, repeating the PSM method in the last step, obtaining updated 3D pose estimationl 3D (2)=[l 1 ,l 2 ,...,l n′ ...,l K-1 ]Along with iteration, the 3D posture of the human body is refined, and the side length s of V is updated toThe precision is improved, and the iteration number iter _3D is determined according to the sample complexity.
3. The method for 3D human body action standard judgment according to claim 2, wherein the step (4) is as follows:
and according to the finally estimated 3D human body posture information:l 3D (iter_3D)=[l 1 ,l 2 ,...,l n′ ...,l K-1 ]the method for judging the action standard comprises the following steps:
(i) The standard action angle set of the human body posture is as follows: representing any two sides l n′ ,l n″ The angle of (a) of (b), wherein,n is more than or equal to 1, and K-1 is less than or equal to n', according to the formula:solving each corner degree
(ii) Establishing a Gaussian mixture modelIs provided withIs shown inIs a mean Gaussian distribution with n' being less than or equal to 1 and K-1 being less than or equal to n ″, thenSubject to a standard normal distribution, with a confidence of 95% Y according to a standard normal distribution table 0.025 ,Y 0.0975 ObtainingThe value:
(iii) Using Gaussian mixture modelAs a reference, judging the gesture action of the 3D gesture of the human body to be evaluatedAnd (4) scoring to obtain a final action standard score and total score set:in particular to judgeWhether or not to satisfy the distributionIf yes, the action is qualified, and then the standard score of the decomposition action under the distribution is calculated according to a standard normal distribution table 1. Ltoreq. N ', n' K-1, usually, ω ii =1;
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010085665.2A CN111291687B (en) | 2020-02-11 | 2020-02-11 | 3D human body action standard identification method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010085665.2A CN111291687B (en) | 2020-02-11 | 2020-02-11 | 3D human body action standard identification method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111291687A CN111291687A (en) | 2020-06-16 |
CN111291687B true CN111291687B (en) | 2022-11-11 |
Family
ID=71025534
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010085665.2A Active CN111291687B (en) | 2020-02-11 | 2020-02-11 | 3D human body action standard identification method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111291687B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112183506A (en) * | 2020-11-30 | 2021-01-05 | 成都市谛视科技有限公司 | Human body posture generation method and system |
CN112435731B (en) * | 2020-12-16 | 2024-03-19 | 成都翡铭科技有限公司 | Method for judging whether real-time gesture meets preset rules |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101561881A (en) * | 2009-05-19 | 2009-10-21 | 华中科技大学 | Emotion identification method for human non-programmed motion |
CN108427282A (en) * | 2018-03-30 | 2018-08-21 | 华中科技大学 | A kind of solution of Inverse Kinematics method based on learning from instruction |
CN108549856A (en) * | 2018-04-02 | 2018-09-18 | 上海理工大学 | A kind of human action and road conditions recognition methods |
CN109344821A (en) * | 2018-08-30 | 2019-02-15 | 西安电子科技大学 | Small target detecting method based on Fusion Features and deep learning |
CN110464349A (en) * | 2019-08-30 | 2019-11-19 | 南京邮电大学 | A kind of upper extremity exercise function score method based on hidden Semi-Markov Process |
CN110633005A (en) * | 2019-04-02 | 2019-12-31 | 北京理工大学 | Optical unmarked three-dimensional human body motion capture method |
-
2020
- 2020-02-11 CN CN202010085665.2A patent/CN111291687B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101561881A (en) * | 2009-05-19 | 2009-10-21 | 华中科技大学 | Emotion identification method for human non-programmed motion |
CN108427282A (en) * | 2018-03-30 | 2018-08-21 | 华中科技大学 | A kind of solution of Inverse Kinematics method based on learning from instruction |
CN108549856A (en) * | 2018-04-02 | 2018-09-18 | 上海理工大学 | A kind of human action and road conditions recognition methods |
CN109344821A (en) * | 2018-08-30 | 2019-02-15 | 西安电子科技大学 | Small target detecting method based on Fusion Features and deep learning |
CN110633005A (en) * | 2019-04-02 | 2019-12-31 | 北京理工大学 | Optical unmarked three-dimensional human body motion capture method |
CN110464349A (en) * | 2019-08-30 | 2019-11-19 | 南京邮电大学 | A kind of upper extremity exercise function score method based on hidden Semi-Markov Process |
Non-Patent Citations (3)
Title |
---|
H. Jiang.3D Human Pose Reconstruction Using Millions of Exemplars.《2010 20th International Conference on Pattern Recognition》.2010,第1674-1677页. * |
H. Qiu et al.Cross View Fusion for 3D Human Pose Estimation.《2019 IEEE/CVF International Conference on Computer Vision (ICCV)》.2019,第4341-4350页. * |
K. Sun et al.Deep High-Resolution Representation Learning for Human Pose Estimation.《2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)》.2019,第5686-5696页. * |
Also Published As
Publication number | Publication date |
---|---|
CN111291687A (en) | 2020-06-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11763485B1 (en) | Deep learning based robot target recognition and motion detection method, storage medium and apparatus | |
CN112102458B (en) | Single-lens three-dimensional image reconstruction method based on laser radar point cloud data assistance | |
CN110108258B (en) | Monocular vision odometer positioning method | |
CN108960211B (en) | Multi-target human body posture detection method and system | |
CN112435325A (en) | VI-SLAM and depth estimation network-based unmanned aerial vehicle scene density reconstruction method | |
CN106960449B (en) | Heterogeneous registration method based on multi-feature constraint | |
CN107358629B (en) | Indoor mapping and positioning method based on target identification | |
CN107274483A (en) | A kind of object dimensional model building method | |
CN106709950A (en) | Binocular-vision-based cross-obstacle lead positioning method of line patrol robot | |
CN112509044A (en) | Binocular vision SLAM method based on dotted line feature fusion | |
CN110310305B (en) | Target tracking method and device based on BSSD detection and Kalman filtering | |
CN113393524B (en) | Target pose estimation method combining deep learning and contour point cloud reconstruction | |
CN111998862B (en) | BNN-based dense binocular SLAM method | |
CN111291687B (en) | 3D human body action standard identification method | |
US11741615B2 (en) | Map segmentation method and device, motion estimation method, and device terminal | |
CN107862735A (en) | A kind of RGBD method for reconstructing three-dimensional scene based on structural information | |
CN105513094A (en) | Stereo vision tracking method and stereo vision tracking system based on 3D Delaunay triangulation | |
CN110070610A (en) | The characteristic point matching method and device of characteristic point matching method, three-dimensionalreconstruction process | |
CN114004900A (en) | Indoor binocular vision odometer method based on point-line-surface characteristics | |
CN115841602A (en) | Construction method and device of three-dimensional attitude estimation data set based on multiple visual angles | |
CN111664845B (en) | Traffic sign positioning and visual map making method and device and positioning system | |
CN114494644A (en) | Binocular stereo matching-based spatial non-cooperative target pose estimation and three-dimensional reconstruction method and system | |
CN110060290B (en) | Binocular parallax calculation method based on 3D convolutional neural network | |
CN116630423A (en) | ORB (object oriented analysis) feature-based multi-target binocular positioning method and system for micro robot | |
CN110570473A (en) | weight self-adaptive posture estimation method based on point-line fusion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |