CN111291687B

CN111291687B - 3D human body action standard identification method

Info

Publication number: CN111291687B
Application number: CN202010085665.2A
Authority: CN
Inventors: 纪刚; 周萌萌
Original assignee: Qingdao Lianhe Chuangzhi Technology Co ltd
Current assignee: Qingdao Lianhe Chuangzhi Technology Co ltd
Priority date: 2020-02-11
Filing date: 2020-02-11
Publication date: 2022-11-11
Anticipated expiration: 2040-02-11
Also published as: CN111291687A

Abstract

The invention discloses a 3D human body action standard identification method, which comprises the following steps: (1) Acquiring images of human body actions by adopting a plurality of multi-view cameras, traversing an image set, and finding 2D human body joint point positions of the images under the same timestamp; (2) According to the obtained 2D human body joint point position set under all the multi-view cameras, carrying out information fusion on the 2D joint points of the multi-view cameras to obtain fused 2D joint point positions under all the camera pixel coordinate systems; (3) According to the obtained fused 2D joint point positions under all the camera pixel coordinate systems, calculating side information formed by the 3D human body joint point positions and the joint points, namely 3D human body postures; (4) And performing action standard type identification according to the finally calculated 3D human body posture information. The method disclosed by the invention can accurately judge and recognize the complicated human body postures of the target bodies under different angles, and has real-time property.

Description

3D human body action standard identification method

Technical Field

The invention relates to a 3D human body action standard identification method.

Background

At present, many techniques for extracting human body joint points based on a posture estimation algorithm to perform human body action standard judgment include:

(1) The 2D attitude estimation algorithm based on the monocular camera is divided into a bottom-up calculation method and a top-down calculation method, wherein the bottom-up detection method comprises the steps of firstly detecting a joint point and then judging whether the joint point belongs to a uniform target body, and the method has real-time performance, can quickly respond to a single image, but sacrifices detection precision; the top-down detection method firstly detects the target body and then extracts the joint points according to the target body, has higher detection precision but insufficient real-time performance.

The methods have common defects that joint points exist on the side surface of a target body, misjudgment and wrong joint point connection exist, action standard scoring errors are caused, the extraction accuracy is limited by the angle of a human body, and the method cannot be put into use.

(2) A3D posture estimation algorithm based on a monocular camera firstly virtualizes a relative 3D grid space, and then carries out 3D joint point reasoning on an input 2D image in a 3D grid space coordinate. Although the method improves the deficiency of the 2D human body posture estimation, the method still cannot meet the actual requirement, depends on the precision of the 2D posture, and can only judge the obtained human body action under a relative coordinate system.

(3) The 3D attitude estimation algorithm based on the multi-view camera depends on 2D attitude estimation precision, and the improvement on the 2D attitude precision of complex and sheltered 2D attitude is still limited.

Therefore, at present, the method for judging the human action standard property according to the human posture estimation has certain defects.

Disclosure of Invention

In order to solve the technical problems, the invention provides a 3D human body action standard identification method, so as to achieve the purposes of accurately identifying complex human body postures of target bodies at different angles and having real-time performance.

In order to achieve the purpose, the technical scheme of the invention is as follows:

A3D human body action standard identification method comprises the following steps:

(1) Acquiring images of human body actions by adopting a plurality of multi-view cameras, traversing an image set, and finding 2D human body joint point positions of the images under the same timestamp;

(2) According to the obtained 2D human body joint point position set under all the multi-view cameras, carrying out information fusion on the 2D joint points of the multi-view cameras to obtain fused 2D joint point positions under all the camera pixel coordinate systems;

(3) According to the obtained fused 2D joint point positions under all the camera pixel coordinate systems, calculating side information formed by the 3D human body joint point positions and the joint points, namely 3D human body postures;

(4) Performing action standard type identification according to the finally calculated 3D human body posture information;

the step (1) is specifically as follows:

given theCamera set Camera = { C ₁ ,C ₂ ,...,C _i ,...,C _a I is more than or equal to 1 and less than or equal to a, a represents the number of the multi-view cameras, and a is more than or equal to 2; the image data collected by the multi-view camera is Ig = { I = { (I) } ₁ ,I ₂ ,...,I _i ,...,I _a }；I _i (x, y, C) are at the same time stamp, C _i The method comprises the steps that image samples collected by a camera are obtained, wherein x is more than or equal to 0 and less than or equal to W-1, W represents the image width, y is more than or equal to 0 and less than or equal to H-1, H represents the image height, c is channel information of an input image, and c is more than or equal to 0 and less than or equal to 2; traversing the image set Ig, and finding the image I under the same time stamp _i The 2D human joint point position of (x, y, c) comprises the following specific steps:

(i) According to image I _i (x, y, c) executing a high-low resolution fusion network, and respectively solving a characteristic response matrix of the high-low resolution of the fusion network;

defining its high resolution sub-network characteristic response matrix as

I' is more than or equal to 1 and less than or equal to N, wherein N represents the number of feature layers of the high-resolution subnetwork,

for the characteristic response submatrix of the i' th layer of the high resolution subnetwork, x is more than or equal to 0 _i′ W ' -1, W ' = W, W ' represents the high resolution sub-network feature matrix width, 0 ≦ y _i′ H ' -1,H ' = H, H ' represents the length of the high-resolution sub-network feature matrix;

channel information of the feature matrix;

defining a set of low resolution sub-network feature response matrices as

l1 and l2 respectively represent two low-resolution sub-network structures, i '< N > is more than or equal to 3, i' < N > is more than or equal to 7,

and

respectively are feature response submatrices of two low-resolution sub-networks;

0≤x _i″ w ≦ W "= W/2-1, W" represents the first low resolution sub-network feature matrix width,

0≤y _i″ h ≦ H "= H/2-1, H" represents the first low resolution sub-network feature matrix height,

0≤x _i″′ w '≦ W' ″, W '= W/4-1, W' ″ for the second low resolution subnetwork feature matrix width,

0≤y _i″′ less than or equal to H ', H ' = H/4-1, H ' indicates the second low resolution subnetwork feature matrix height;

channel information of two low-resolution sub-networks respectively;

when i ', i' is an even number, the two low resolution subnetworks

And

fusing with a high-resolution sub-network through deconvolution operation, wherein a fusion formula is as follows:

is composed of

To the channel

The transformation matrix to be deconvolved is performed,

is composed of

To the channel

A transformation matrix for deconvolution;

the recursion formula for the feature response submatrix of the high resolution subnetwork is:

(ii) The characteristic response submatrix of the high resolution subnetwork calculated according to equation (3)

Solving output response matrix set HeatMap _i ＝{H _i,1 ,H _i,2 ,...,H _i,k ,...,H _i.K K is more than or equal to 1 and less than or equal to K, K =17 and represents the number of 17 joint points of the human body to be solved, and then the image I is processed _i (x, y, c) each pixel coordinate location is evaluated to determine if the kth joint point is located, H _i,k (x, y) represents a confidence coefficient matrix of the kth joint point under the ith camera, wherein K is more than or equal to 1 and less than or equal to K:

weights for solving k-th joint position confidence matrixThe weight of the parameter is determined,

in order to be offset in the amount of the offset,

representing the Nth layer of a converged network

A characteristic response submatrix of the channel;

(iii) According to H calculated in (ii) _i,k (x, y), K is more than or equal to 1 and less than or equal to K, and the obtained output response matrix set HeatMap _i Solving the mean square error distance:

wherein (mu) _x,k ,μ _y,k ) Is the true pixel coordinate position, σ, of the joint point k _x,k ,σ _y,k The variances of the target output are all 1.5;

(iv) By H _i,k (x, y) pairs

Solving the gradient to update the weight parameter,

and (3) representing the number of the feature layers of the N-th layer high-resolution sub-network, wherein the parameter updating formula is as follows:

wherein τ is a small number, 0.1 or 0.01;

similarly, updating the weight parameters of the feature layer of the high-resolution and low-resolution sub-network;

(v) (iv) repeating steps (i) - (iv) until MSE _loss Converging or satisfying the maximum iteration number iter to obtain the final output response matrix set HeatMap _i ＝{H _i,1 ,H _i,2 ,...,H _i,k ,...,H _i,K }，H _i,k A position confidence coefficient matrix representing the kth joint point of the human body;

(vi) According to HeadMap _i Obtain camera C _i Lower image I _i The position of the 2D joint point of (x, y, c) is expressed as:

wherein,

confidence matrix H for representing position of human joint point _i,k The pixel coordinate corresponding to the medium maximum value;

then the set of human body 2D joint point positions under all multi-view cameras is represented as J = { J = ₁ ,J ₂ ,...,J _i ,...,J _a }；

The step (2) is specifically as follows:

(i) Carrying out alignment operation on a plurality of multi-view cameras: converting the world coordinate system into a Camera coordinate system, and then converting the Camera coordinate system into a pixel coordinate system to obtain a multi-view Camera Camera = { C = { (C) } ₁ ,C ₂ ,...,C _i ,...,C _a Converting relation between cameras, thereby realizing alignment operation of the multi-view camera;

(ii) Calculating the position of the fused 2D joint point of the multi-view camera, and aligning the camera C _i Is as follows

By passingFusing weight matrix theta and J obtained in step (1) _j Calculated as fused J _i J is more than or equal to 1 and less than or equal to a, j is not equal to i, a represents the number of multi-view cameras, theta satisfies Gaussian distribution on an additive epipolar line, theta-N (0, 1), wherein the position of a joint point after fusion is represented as:

wherein,

camera C for showing multiple eyes _i The position of the kth joint point, θ, denotes the multi-view camera C _j A fusion weight matrix of the detected position of the kth joint point,

is switched to the camera C _i Pixel coordinates in a pixel coordinate system;

updating the fused 2D joint positions under all camera pixel coordinate systems:

in the above scheme, the step (3) is specifically as follows:

(i) Expressing 3D human body posture information as

l ^3D ＝[l ₁ ,l ₂ ,...,l _n′ ...,l _K-1 ]Wherein, in the process,

the world coordinate system position of the kth joint point of the human body is represented, K is more than or equal to 1 and less than or equal to K, K =17, and 17 joints of the human body are representedNode, l _n′ Representing the edge vector formed by the joint points, N' is more than or equal to 1 and less than or equal to K-1, K-1 is the number of the edges formed by the joint points, obtaining a 3D space V taking a root node as the center according to the principle of triangulation, taking the edge length s of the V as an initial value of 2000mm, and discretizing the volume of the V into N _g ×N _g ×N _g Coarse grid of (2):

are all in N ³ In the number of grids g, t is more than or equal to 1 ₀ ,t ₁ ,t ₂ ≤N _g ，t ₀ Representing the depth of the grid, t ₁ Denotes the grid width, t ₂ Indicating the grid height, N _g Taking an initial value of 16;

(ii) 3D pose estimation representation in initial mesh computed according to graph model algorithm PSM

l ^3D (1)＝[l ₁ ,l ₂ ,...,l _n′ ...,l _K-1 ]，s(1)＝2000mm，J ^3D (1)，l ^3D (1) Are all shown in N _g The result of 3D pose estimation in the initial grid of =1 is the position coordinate of the joint point in the world coordinate system and the joint point edge vector, respectively;

(iii) Using an iterative algorithm for each joint point

K is more than or equal to 1 and less than or equal to K, discretizes the grid where the K surrounds the current position of the joint point

Is a 2X 2 local grid, i.e. let N _g =2, repeating the PSM method in the last step, obtaining updated 3D pose estimation

l ^3D (2)＝[l ₁ ,l ₂ ,...,l _n′ ...,l _K-1 ]Along with iteration, the 3D posture of the human body is refined, and the side length s of V is updated to

The precision is improved, and the iteration number iter _3D is determined according to the sample complexity.

In the above scheme, the step (4) is specifically as follows:

according to the finally estimated 3D human body posture information:

l ^3D (iter_3D)＝[l ₁ ,l ₂ ,...,l _n′ ...,l _K-1 ]the action standard judgment is carried out, and the specific method comprises the following steps:

(i) The standard action angle set of the human body posture is as follows:

representing any two edges l _n′ ,l _n″ Wherein the angle of the first and second guide rails,

n is more than or equal to 1 and K-1 is more than or equal to n', according to the formula:

solving each corner degree

(ii) Establishing a Gaussian mixture model

Is provided with

Is shown in

Is a mean Gaussian distribution with n' being less than or equal to 1 and K-1 being less than or equal to n ″, then

Subject to a standard normal distribution, with a confidence of 95% Y according to a standard normal distribution table _0.025 ,Y _0.0975 Obtaining

The value:

(iii) By Gaussian mixture model

Judging the gesture motion of the human body 3D gesture to be evaluated as a reference

Scoring to obtain a final action standard score and total score set:

specifically, judging

Whether or not to satisfy the distribution

If the standard distribution table meets the standard distribution table, the action is qualified, and the standard score of the decomposition action under the distribution is calculated according to the standard distribution table

1. Ltoreq. N ', n' K-1, usually, ω _ii ＝1；

score is the current action standard score, and the decomposition action standard score is

Through the technical scheme, the 3D human body action standard identification method provided by the invention has the following beneficial effects:

(1) Accurately judging and recognizing complex human body postures of target bodies at different angles;

(2) The multi-view camera is calibrated by following the positioning of a world coordinate system, and the 2D attitude precision can be improved by accurately fusing the characteristics under different viewing angles according to the aligned coordinate system.

(3) The method has real-time performance, and common hardware resources can run.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below.

FIG. 1 is a diagram of a joint point of a human body according to an embodiment of the present invention;

FIG. 2 is a diagram of the interconversion between the world coordinate system and the camera coordinate system;

FIG. 3 is a diagram illustrating a transformation relationship between a camera coordinate system and a pixel coordinate system.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention.

The invention provides a 3D human body action standard identification method, which comprises the following steps:

given set of cameras Camera = { C ₁ ,C ₂ ,...,C _i ,...,C _a I is more than or equal to 1 and less than or equal to a, a represents the number of the multi-view cameras, and a is more than or equal to 2; the image data collected by the multi-view camera is Ig = { I = { (I) ₁ ,I ₂ ,...,I _i ,...,I _a }；I _i (x, y, C) are at the same time stamp, C _i The method comprises the steps that image samples collected by a camera are obtained, wherein x is more than or equal to 0 and less than or equal to W-1, W represents the image width, y is more than or equal to 0 and less than or equal to H-1, H represents the image height, c is channel information of an input image, and c is more than or equal to 0 and less than or equal to 2; traversing the image set Ig, and finding the image I under the same time stamp _i The 2D human joint point position of (x, y, c) comprises the following specific steps:

defining its high resolution sub-network characteristic response matrix as

I' is less than or equal to 1 and less than or equal to N, wherein N represents the number of high-resolution sub-network feature layers,

for the characteristic response submatrix of the i' th layer of the high resolution subnetwork, x is more than or equal to 0 _i′ W '-1,W' = W ', W' represents the high resolution sub-network feature matrix width, y is greater than or equal to 0 _i′ H ' -1,H ' = H, H ' represents the length of the high-resolution sub-network feature matrix;

channel information that is a feature matrix;

defining a set of low resolution sub-network feature response matrices as

and

respectively a feature response submatrix of two low-resolution sub-networks;

0≤y _i″ h ≦ H ', H ' = H/2-1, H ' represents the first low resolution sub-network feature matrix height,

0≤x _i″′ w '≦ W' ″, W '= W/4-1, W' ″ represents the second low resolution sub-network feature matrix width,

0≤y _i″′ h "", H "" = H/4-1, H "", indicates the second low resolution subnetwork feature matrix height;

channel information of two low-resolution sub-networks respectively;

when i ', i' is an even number, the two low resolution subnetworks

And

is composed of

To the channel

The transformation matrix of the deconvolution is performed,

is composed of

To the channel

A transformation matrix for deconvolution;

the recurrence formula of the feature response submatrix of the high resolution subnetwork is:

Solving output response matrix set HeatMap _i ＝{H _i,1 ,H _i,2 ,...,H _i,k ,...,H _i.K 1 ≦ K, K =17, representing the number of 17 joint points of the body to be sought, as shown in fig. 1, and then for image I _i (x, y, c) each pixel coordinate location is evaluated to determine if it is the location of the kth joint point, H _i,k (x, y) represents a confidence matrix of the kth joint point in the ith camera, wherein K is more than or equal to 1 and less than or equal to K:

to solve the problemThe weight parameters of the k joint position confidence matrices,

in order to be an offset amount,

layer N representing a converged network

A characteristic response submatrix of the channel;

(iv) By H _i,k (x, y) pairs

Solving the gradient to update the weight parameter,

wherein τ is a small number, 0.1 or 0.01;

(v) (iii) repeating steps (i) - (iv) until MSE _loss Converging or satisfying the maximum iteration number iter to obtain the final output response matrix set HeatMap _i ＝{H _i,1 ,H _i,2 ,...,H _i,k ,...,H _i,K }，H _i,k A position confidence coefficient matrix representing the kth joint point of the human body;

(vi) According to HeadMap _i Get camera C _i Lower image I _i The position of the 2D joint point of (x, y, c) is expressed as:

wherein,

confidence matrix H for representing human joint point position _i,k The pixel coordinate corresponding to the medium maximum value;

then the set of human body 2D joint point positions under all multi-view cameras is represented as J = { J = ₁ ,J ₂ ,...,J _i ,...,J _a }。

(i) Performing an alignment operation on a plurality of multi-view cameras:

the alignment operation of the multi-view Camera is performed according to the paper a Flexible New Technique for Camera Calibration, as shown in fig. 2, R is an orthogonal identity matrix of 3 × 3, t is a translation vector, and R, t is an external parameter of the Camera, which is used to represent a distance between a world coordinate system and a Camera coordinate system.

As shown in FIG. 3, the plane π is referred to as the image plane of the camera, point O _c Called the center of the camera (optical center), f is the focal length of the camera, and O _c Making a ray perpendicular to the image plane for the end point, intersecting the image plane with point p, then ray O _c p is called the optical axis (principal axis), the point p is the principal point of the camera, and there are,

(x _c ,y _c ,z _c ) Representing the camera coordinate system coordinates.

Obtaining world coordinate system coordinate P according to FIG. 2 and FIG. 3 _w (x _w ,y _w ,z _w ) Conversion to pixel coordinate system coordinates P ₁ The transformation process of (u, v) is as follows:

accordingly, the multi-view Camera Camera = { C ₁ ,C ₂ ,...,C _i ,...,C _a And (5) converting the relation between the cameras, thereby realizing the alignment operation of the multi-view camera.

Fusing a weight matrix theta with the J obtained in the step (1) _j Calculated as fused J _i J is more than or equal to 1 and less than or equal to a, j is not equal to i, a represents the number of multi-view cameras, theta satisfies Gaussian distribution on an addition polar line, theta-N (0, 1), wherein the position of the joint point after fusion is represented as:

wherein,

is switched to the camera C _i Pixel coordinates in a pixel coordinate system;

updating the fused 2D joint positions under all camera pixel coordinate systems: (11)

(i) Expressing 3D human body posture information as

l ^3D ＝[l ₁ ,l ₂ ,...,l _n′ ...,l _K-1 ]Wherein

the world coordinate system position of the kth joint point of the human body is represented, K is more than or equal to 1 and less than or equal to K, K =17, the K-th joint point of the human body is represented, and l _n′ Representing the edge vector formed by the joint points, N' is more than or equal to 1 and less than or equal to K-1, K-1 is the number of the edges formed by the joint points, obtaining a 3D space V taking a root node as the center according to the principle of triangulation, taking the edge length s of the V as an initial value of 2000mm, and discretizing the volume of the V into N _g ×N _g ×N _g Coarse grid of (2):

are all at N ³ In the number of grids g, t is more than or equal to 1 ₀ ,t ₁ ,t ₂ ≤N _g ，t ₀ Representing mesh depth, t ₁ Denotes the grid width, t ₂ Indicating the grid height, N _g Taking an initial value of 16;

s(1)＝2000mm，J ^3D (1)，l ^3D (1) Are all shown in N _g =1, wherein the results of 3D posture estimation in the initial grid are position coordinates and joint point edge vectors of the joint point under a world coordinate system respectively;

(iii) Using an iterative algorithm for each joint point

(4) And performing action standard type identification according to the finally calculated 3D human body posture information.

According to the finally estimated 3D human body posture information:

l ^3D (iter_3D)＝[l ₁ ,l ₂ ,...,l _n′ ...,l _K-1 ]the method for judging the action standard comprises the following steps:

(i) The standard action angle set of the human body postures is as follows:

representing any two sides l _n′ ,l _n″ The angle of (a) of (b), wherein,

n is more than or equal to 1, and K-1 is less than or equal to n', according to the formula:

solving each corner degree

(ii) Establishing a Gaussian mixture model

Is provided with

Is shown in

ComplianceStandard normal distribution, Y with 95% confidence level according to standard normal distribution table _0.025 ,Y _0.0975 Obtaining

The value:

(iii) Using Gaussian mixture model

And (4) scoring to obtain a final action standard score and total score set:

specifically, judging

Whether the distribution is satisfied:

1. Ltoreq. N ', n'. Ltoreq. K-1, usually, ω _ii ＝1；

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A3D human body action standard identification method is characterized by comprising the following steps:

(3) According to the obtained fused 2D joint point positions under all camera pixel coordinate systems, calculating 3D human body joint point positions and side information formed by joint points, namely 3D human body postures;

the step (1) is specifically as follows:

camera = { C) given set of cameras ₁ ,C ₂ ,...,C _i ,...,C _a I is more than or equal to 1 and less than or equal to a, a represents the number of the multi-view cameras, and a is more than or equal to 2; the image data collected by the multi-view camera is Ig = { I = { (I) } ₁ ,I ₂ ,...,I _i ,...,I _a }；I _i (x, y, C) are at the same time stamp, C _i The method comprises the steps that image samples collected by a camera are obtained, wherein x is more than or equal to 0 and less than or equal to W-1, W represents the image width, y is more than or equal to 0 and less than or equal to H-1, H represents the image height, c is channel information of an input image, and c is more than or equal to 0 and less than or equal to 2; traversing the image set Ig, and finding the image I under the same time stamp _i The 2D human body joint point positions of (x, y, c) comprise the following specific steps:

defining its high resolution sub-network characteristic response matrix as

channel information that is a feature matrix;

defining a set of low resolution sub-network feature response matrices as

and

respectively a feature response submatrix of two low-resolution sub-networks;

channel information of two low-resolution sub-networks respectively;

when i ', i' is an even number, the two low resolution subnetworks

And

is composed of

To the channel

The transformation matrix to be deconvolved is performed,

is composed of

To the channel

A transformation matrix for deconvolution;

Solving a set of output response matrices, heatMap _i ＝{H _i,1 ,H _i,2 ,...,H _i,k ,...,H _i.K K is more than or equal to 1 and less than or equal to K, K =17 and represents the number of 17 joint points of the human body to be solved, and then the image I is processed _i (x, y, c) each pixel coordinate location is evaluated to determine if it is the location of the kth joint point, H _i,k (x, y) represents a confidence matrix of the kth joint point in the ith camera, wherein K is more than or equal to 1 and less than or equal to K:

to solve for the weight parameters of the k-th joint position confidence matrix,

in order to be offset in the amount of the offset,

representing the Nth layer of a converged network

A characteristic response submatrix of the channel;

(iv) By H _i,k (x, y) pairs

Solving the gradient to update the weight parameter,

the number of the feature layers of the Nth layer high-resolution sub-network is expressed, and the parameter updating formula is as follows:

wherein τ is a small number, 0.1 or 0.01;

(v) (iv) repeating steps (i) - (iv) until MSE _loss Converging or meeting the maximum iteration number iter to obtain the final output response matrix set HeatMap _i ＝{H _i,1 ,H _i,2 ,...,H _i,k ,...,H _i,K }，H _i,k A position confidence coefficient matrix representing the kth joint point of the human body;

wherein,

the set of human 2D joint point positions under all multi-view cameras is represented as J = { J = { (J) ₁ ,J ₂ ,...,J _i ,...,J _a }；

The step (2) is specifically as follows:

(i) Performing an alignment operation on a plurality of multi-view cameras: converting the world coordinate system into a Camera coordinate system, and then converting the Camera coordinate system into a pixel coordinate system to obtain a multi-view Camera = { C = ₁ ,C ₂ ,...,C _i ,...,C _a Converting the relationship between the cameras, thereby realizing the alignment operation of the multi-view camera;

(ii) Calculating the fused 2D joint point position of the multi-view camera, and aligning the camera C _i Is as follows

Fusing a weight matrix theta with the J obtained in the step (1) _j Calculated as fused J _i ，1≤j≤a, j ≠ i, a denotes the number of multi-view cameras, θ satisfies the gaussian distribution on the additive epipolar line, θ -N (0, 1), where the fused joint point position is expressed as:

wherein,

is switched to the camera C _i Pixel coordinates in a pixel coordinate system;

2. the method for 3D human body action standard judgment according to claim 1, wherein the step (3) is as follows:

(i) Expressing 3D human body posture information as

l ^3D ＝[l ₁ ,l ₂ ,...,l _n′ ...,l _K-1 ]Wherein, in the process,

are all at N ³ In the number of grids g, t is more than or equal to 1 ₀ ,t ₁ ,t ₂ ≤N _g ，t ₀ Representing the depth of the grid, t ₁ Denotes the grid width, t ₂ Indicating the grid height, N _g Taking an initial value of 16;

(iii) Using an iterative algorithm for each joint point

3. The method for 3D human body action standard judgment according to claim 2, wherein the step (4) is as follows:

and according to the finally estimated 3D human body posture information:

(i) The standard action angle set of the human body posture is as follows:

representing any two sides l _n′ ,l _n″ The angle of (a) of (b), wherein,

solving each corner degree

(ii) Establishing a Gaussian mixture model

Is provided with

Is shown in

The value:

(iii) Using Gaussian mixture model

As a reference, judging the gesture action of the 3D gesture of the human body to be evaluated

And (4) scoring to obtain a final action standard score and total score set:

in particular to judge

Whether or not to satisfy the distribution

If yes, the action is qualified, and then the standard score of the decomposition action under the distribution is calculated according to a standard normal distribution table

1. Ltoreq. N ', n' K-1, usually, ω _ii ＝1；