CN107229920B

CN107229920B - Behavior identification method based on integration depth typical time warping and related correction

Info

Publication number: CN107229920B
Application number: CN201710425906.1A
Authority: CN
Inventors: 葛永新; 陈乐扬; 杨丹; 张小洪; 徐玲; 杨梦宁; 洪明坚; 王洪星; 黄晟; 陈飞宇
Original assignee: Chongqing University
Current assignee: Chongqing Space New Vision Artificial Intelligence Technology Research Institute Co.,Ltd.
Priority date: 2017-06-08
Filing date: 2017-06-08
Publication date: 2020-11-13
Anticipated expiration: 2037-06-08
Also published as: CN107229920A

Abstract

The invention relates to a behavior recognition method based on integration depth typical time warping and relevant correction, which solves the technical problems of low recognition accuracy and long consumed time, and adopts the technical scheme that the behavior of a human body is expressed as rigid displacement, the rigid displacement is decomposed into rigid translation and rigid rotation, the rigid displacement is expressed by a homogeneous matrix lie group SE (3), the lie algebra is SO (3), a skeleton model C (t) is established by collected skeleton data, the displacement mapping relation is expressed as the homogeneous matrix lie group SE (3), the skeleton model C (t) based on the relative feature description method of the lie algebra is established, and the difference value processing is carried out on the skeleton model C (t); aligning using an integrated depth canonical time warping method; the technical scheme that the correlation characteristics are utilized, the alignment characteristic samples are corrected, and the corrected characteristic samples are classified by using a support vector machine better solves the problem and is used for behavior recognition of the 3D skeleton.

Description

Behavior identification method based on integration depth typical time warping and related correction

Technical Field

The invention relates to the field of human behavior recognition, in particular to a behavior recognition method based on integration depth typical time warping and relevant correction.

Background

Human behavior recognition belongs to the field of computer vision and behavior pattern recognition, and is a popular topic in recent years. In the aspects of man-machine interaction, security monitoring, content-based video retrieval and the like, behavior identification has huge practical use value, and the important role of behavior identification is increasingly highlighted along with the construction of informatization. Meanwhile, behavior recognition also has certain promotion effect on other fields of computer vision, such as face recognition, gait analysis and the like. In the past, due to the limitation of science and technology, most behavior recognition is based on 2D videos and images, and the recognition effect is not ideal. With the development and popularization of 3D technologies such as depth cameras, human behavior recognition is also gradually advancing from planar feature recognition to facade feature recognition, and 3D behavior recognition is becoming a main research method for human behavior recognition. In the process of acquiring the motion characteristics, the different motion speeds can cause the generated data to have larger differences in time series. Therefore, in the process of behavior recognition, data alignment is an important factor influencing the accuracy of recognition.

Existing 3D behavior recognition optical flow models, 3D skeletal models, 3D contour models, spatio-temporal features. The existing 3D bone-based behavior recognition method uses multiple cameras or other sensors to represent the position information of all joints in each frame in the whole motion in a coordinate manner, and the arrangement of the data of each frame in a time sequence constitutes all the information of the motion. This is a feature description method based on point features. The human body can be regarded as an articulated system consisting of rigid body segments, the connection points are joints of the human body, and in computer vision, a great deal of research is carried out to identify behaviors by extracting joint information or detecting local body parts. A method for behavior recognition by dividing the human body into several parts, extracting the joint position information of the human body from a single image, and dividing the human body into parts such as head, neck, shoulder, arm, elbow, wrist, hand, trunk, leg, knee, ankle, etc., wherein different motions can be represented by different parts. The method stimulates the interest of scientific researchers in the field of computer vision, and the identification of human behaviors by utilizing the articulation of bones becomes a hot spot in the field of behavior identification. However, the prior art has the problems of low identification precision and long identification time. Therefore, it is necessary to provide a behavior recognition method based on 3D bones with high recognition accuracy.

Disclosure of Invention

The invention aims to solve the technical problems of low detection and identification precision and long time in the prior art. The behavior recognition method based on the 3D skeleton has the characteristic of high recognition accuracy.

In order to solve the technical problems, the technical scheme is as follows:

a behavior recognition method based on integrated depth canonical time warping and related correction comprises image preprocessing, image analysis and image understanding, wherein the image analysis comprises the following steps:

(1) representing the behavior of a human body as rigid body displacement, decomposing the rigid body displacement into rigid body translation and rigid body rotation, representing the rigid body displacement by a homogeneous matrix lie group SE (3), and representing the lie algebra reflecting all information of the homogeneous matrix lie group SE (3) as SO (3);

(2) establishing a skeleton model C (t) by using the collected skeleton data, wherein N sections of skeleton rigid bodies exist in the skeleton model C (t), one section of skeleton is defined as a result of displacement of the other section of skeleton at the time t, the position relation between the one section of skeleton and the other section of skeleton is defined as a displacement mapping relation, the displacement mapping relation is expressed as a homogeneous matrix Lie group SE (3) in a Euclidean space, then a Lie Algebra SO (3) corresponding to the homogeneous matrix Lie group SE (3) is used as a motion characteristic, establishing a skeleton model C (t) based on a Lie Algebra relative characteristic description method, and carrying out difference processing on the skeleton model C (t), wherein the Lie Algebra relative characteristic description method is Lie Algebra relative distances, called LARP for short,

wherein N is a positive integer;

(3) aligning the skeleton model C (t) by using a depth typical time warping method to obtain an aligned feature sample;

(4) correcting the alignment feature sample in the step (3) by using the correlation feature to obtain a corrected alignment feature sample;

(5) and classifying the corrected feature samples by using a support vector machine to obtain a behavior recognition result.

The working principle of the invention is as follows: in order to enhance the behavior recognition effect, the aligned test samples are corrected by using correlation, and a better test result is obtained. The correlation between samples is improved mainly by training and deep canonical time warping.

In the above scheme, for optimization, further, the depth-typical time warping method is an integrated depth-typical time warping method. The deep typical time warping method, i.e. the deep network structure in the DCTW, is mainly improved in order to properly reduce the running time of the algorithm. In the process of applying the DCTW to the LARP, although the accuracy of behavior recognition can be improved by performing the non-linear transformation on each group of data in the training sample, the computation time of the algorithm is too long due to too many gradients to be calculated, and therefore, we try to find other simple methods. The purpose of practical training is to find a general set of data features, it is not necessary to calculate a loss function for each set of data in a sample, and we try to obtain a set of data first by using a dynamic time warping averaging method, and use the set as a representative calculation loss function of all samples, so that the amount of calculation required in a deep network is remarkably improved, and the calculation time can be reduced.

Further, the correlation characteristic in the step (4) includes aligning a bone model c (t) with the aligned characteristic sample, calculating the correlation between the bone model c (t) and each frame of the aligned characteristic sample, calculating a correction coefficient by using a correlation coefficient x e R as a variable, multiplying the correction coefficient by the characteristic data of each frame to change the weight of the corresponding frame, wherein the correction coefficient obeys a correction function e^x。

Further, the integration depth typical time warping method in step (3) includes:

carrying out nonlinear transformation on the skeleton model C (t) by utilizing the weight W (t) and an activation function of a deep network to obtain a transformation sample C '(t), taking a first group of data C' 1 in the transformation sample C '(t) as standard data, respectively aligning the transformation sample C' (t) with the standard data C '1 by using a dynamic time warping method, averaging to obtain an integration characteristic C, and obtaining an integration characteristic C according to the integration characteristic C and the standard data C' 1:

covariance calculation and SVD decomposition U, V, transform samples C, (t) and C U, V

Calculating gradients G (t) corresponding to the transformation samples C (t) one by one, updating weights W (t) according to the gradients G (t) and the learning rate of the deep network, and converging all training characteristics C transformed by the deep network_1f，C_2f，C_3f… -training feature C corresponding to Standard data C' 1_fAfter the dynamic time is used for regular alignment, averaging is carried out, and the calculation result is an alignment characteristic sample;

wherein the total number of layers of the deep network is at least 2.

Further, the deep layer network is a BP network, and the BP network includes an input layer, an output layer, and two hidden layers.

Further, the difference processing of step (2) includes using an interpolation method, the interpolation method including: the characteristics with the same frame number are obtained through interpolation, and the interpolation formula is as follows:

wherein the content of the first and second substances,

definition of Q₁，Q₂，Q₃…Q_nE SE (3) is t₁，t₂，t₃…t_nInstantaneous relative position of time of day.

Further, the establishing of the bone model c (t) based on the lie algebra relative feature description method in the step (2) includes: selecting e in a skeletal model_mBones and e_nBone analysis, e_mEnd point of bone is e_m1And e_m2，e_nEnd point of bone is e_n1And e_n2By end point e_n1Is the origin, e_nThe direction vector of the bone is the axis, bone e_mAnd bones e_nThe surface is used as a coordinate surface to establish a coordinate system, e_mEnd point e of bone_m1And e_m2The relative position of (a) is expressed as:

e_nend point of (c) relative to (e)_mThe positions of (A) are:

e_mbones and e_nThe displacement mapping relation of the skeleton is e_mBones and e_nThe relative position relationship of bones is:

according to e_mBones and e_nCalculating the phase shift mapping relation ve (B) of each skeleton and other skeletons according to the relative position relation of the skeletons, defining that a skeleton model has n sections of skeleton rigid bodies, and obtaining that the shape of the skeleton model at the time t is C (t) as follows:

C(t)＝(ve(B_1，2)，ve(B_1，3)，…，ve(B_n，n-1))，

ve(B)＝(u₁，u₂，u₃，v₁，v₂，v₃)

wherein C (t) has 6 XnX (N-1) vectors, and N and m are positive integers less than N.

Further, the image preprocessing also comprises image noise reduction and image enhancement.

The logarithm of a matrix lie group can form a lie algebra space, and information in the lie group can be completely converted into the form of a lie algebra. The displacement of the rigid body can be represented by the lie group, and the displacement transformation of the rigid body can be decomposed into a translation part and a rotation part. Assuming that the displacement vector of p in the rigid body displacement process as the coordinate origin is q ∈ R³The rotation of the rigid body around p is R ∈ SO (3), then we can combine the two to get a pair of displacement coordinates (q, R) to represent this rigid body displacement, all such coordinates forming a set SE (3): SE (3) { (q, R): q is an element of R³，R∈SO(3)}＝R³XSO (3). Let x₁，x₂Representing the positions before and after translation, respectively, then:

x₂＝q+Rx₁wherein q, R ∈ SE (3) represents the displacement of the rigid body, of SE (3)The elements represent the displacement changes of the rigid body. In order to show the translation and rotation in a simpler and clearer way, a homogeneous matrix is used to represent the transformation of the rigid body. In the process of homogenization, the position of the point is represented here by 1 and the vector by 0.

The position transformation of a point can be expressed as:

is a 4X 4 matrix, is a homogeneous representation of g ∈ SE (3)

Redefining SE (3), homogenizing all elements in the original SE (3) to obtain a new SE (3), wherein the new SE (3) can form a group under multiplication, the SE (3) meets rigid body displacement conditions, so that the SE (3) can represent displacement of a rigid body,

the coordinate system is based on the q point before displacement as the origin, so that the q point after displacement reflects the case of translational transformation of the q point, and SE (3) is a lie group. There must be one lie number that can reflect the full information of SE (3).

Human behavioral characteristics based on 3D skeleton are described using SE (3). The object of behavior recognition is a video or an image sequence, in which the motion of a person is discrete, that is, the motion of the person is formed by arranging a plurality of static skeleton shapes, so that the feature description of human behavior recognition based on a 3D skeleton is a method for finding a shape capable of describing the static skeleton. Between two sections of boneThe position relation of (A) is understood as a displacement mapping relation, namely, a piece of bone is regarded as a result of displacement of another piece of bone, the position of a certain piece of fixed bone relative to another piece of bone in the Euclidean space can also be represented by a lie group, and then a corresponding lie algebra is found as a characteristic of motion. The purpose of lie algebraic relative description is to calculate the relative position between each bone and other bones and use ve (B) as a representation. Assuming a bone model with n segments of bone rigid bodies, at time t, the shape of the bone model can be expressed as c (t) -ve (B)_1，2)，ve(B_1，3)，…，ve(B_n，n-1) A total number of 6 xn (n-1) of such vectors.

The dynamic time warping processing object is an array arranged in a time sequence, and the two arrays are respectively X: x is the number of₁，x₂，x₃，…，x_NAnd Y: y is₁，y₂，y₃，…，y_MThe lengths are N and M belongs to R respectively. The two arrays may be discrete signals or, more commonly, time-equidistant points generated during the acquisition of the series of feature numbers. Let this feature space be F, then x_n，y_mE.g. F, where N is e [1, N]，m∈[1，M]. In general, when two different features x, y ∈ F are compared, a function is needed as a method for comparing the similarity of the two features, and the function satisfies c: f × F → R.gtoreq.0. If the similarity between x and y is higher, the value of c (x, y) is smaller. Considering the number pairs of elements in all X and Y, the objective of dynamic time warping is to find an optimal combination that minimizes the sum of all c (X, Y) values obtained after combining the elements in X and Y two by two.

Dynamic time warping has the drawback of not being able to handle sequences of different dimensions. The invention adopts an improved method, and integrates depth typical time warping. And typical correlation analysis is introduced to strengthen the similarity of two arrays needing to be aligned, and the alignment accuracy can be greatly improved under the condition of higher similarity. The correlation between the two data is a measure of the linear relationship between the objects. The higher the correlation the more similar the two sets of data. The correlation between two sets of data reflects the linear correlation degree of the two sets of data, and the greater the correlation, the easier it is to predict the distribution of one set of data from the other set of data. In the process of behavior recognition, the skeleton shapes represented by the two arrays with high correlation degree also have higher similarity. The effect of the typical correlation analysis is to make the two groups of data have higher correlation degree through linear transformation.

The depth typical correlation analysis changes the linear transformation in the typical correlation analysis into the nonlinear transformation, so that the similarity of the two arrays after the transformation reaches a higher level. Deep learning refers to a deep network with more than two layers, and a more realistic effect can be obtained through a nonlinear function formed by abstract multilayer nesting. Through deep learning, good effects can be obtained through initializing parameters and through training an updating mode. In deep canonical correlation analysis, deep learning is mainly to find a nonlinear transformation that makes the correlation of two arrays higher.

After nonlinear transformation is carried out on the sequence in deep learning, a loss function of the deep network is calculated through a corresponding method, then the gradient of each weight is determined according to the loss function, the weight is updated on the basis of the gradient, and finally a stable state is achieved, wherein the weight at the moment is the final result after the deep network is updated for many times, and the final output result is the solution of the deep network.

The typical time warping is to perform typical correlation analysis on the arrays before aligning the arrays, so as to improve the correlation between the two sets of data, and the alignment of the arrays can obtain better effect under the condition of high similarity.

The typical time warping of depth is that firstly, the function in the deep network is utilized to carry out nonlinear transformation on data, then typical correlation analysis is carried out, and each parameter in the nonlinear mapping is updated through the loss function, and finally a stable result is obtained.

The fusion depth typical time warping is mainly an improvement on the depth typical time warping. In particular to improve the way in which the loss function in a deep network is calculated. For samples f subjected to nonlinear transformation₁，f₂，f₃…, obtaining an integrated function f by using a dynamic time warping calculation process_dtwThen we select f in the sample₁Using the two samples to calculate

In calculating the loss function, using f_dtwAnd f₁Calculating covariance, K and singular value decomposition K to obtain U and V, calculating each sample f₁，f₂，f₃… is calculated as

Only one time of K and SVD decomposition is needed to be calculated in one cycle of the deep network, and simultaneously

And

and also only one time of calculation is needed, so that a large amount of time can be saved.

To improve the accuracy of the identification, we try to correct the features with correlation. And testing the data alignment effect through the correlation between the aligned test sample and the general characteristic data obtained by training. Since the structure of the LARP motion feature matrix is shown in fig. 8, which is a time-series representation in the structure of the feature matrix, fig. 9 shows a corresponding correlation coefficient calculation manner. The effect of the alignment is evaluated by testing the respective coefficients between each column of the matrix and the respective column of the generic matrix.

And the correlation coefficient is used for evaluating the correlation of the test sample, so that the function of the frame number with higher correlation in identification is increased, and the function of the frame number with lower correlation is reduced. To distinguish the difference between the correlations, a correlation coefficient x ∈ R is used as a variable calculation coefficient, the weight of the frame number is changed by multiplying, and a correction function e is selected^xThe correction flow is as shown in fig. 10. Through the adjustment of the sample correlation by integrating depth time warping, the correlation between the test sample and the training result is mostly between 0 and 0.2, and the condition of negative correlation rarely occurs. Therefore, it is possible to distinguish a frame having a low correlation from other frames having a high correlation, and to reduce the weight of the frame having a low correlation.

The invention has the beneficial effects that:

the method has the advantages that the recognition accuracy is improved by adopting the typical time warping of the fusion depth and the correlation correction alignment;

the method has the advantages that the defects of existing 3D bone behavior identification are overcome;

effect three, using fusion depth typical time warping significantly reduces the computation time for behavior recognition.

Drawings

The invention is further illustrated with reference to the following figures and examples.

Fig. 1 is a schematic flow chart of 3D behavior recognition in embodiment 1.

Fig. 2 is a schematic diagram of a behavior recognition result in embodiment 1.

Fig. 3 is a schematic diagram of an integrated depth exemplary time-warping deep network structure.

FIG. 4 is a schematic diagram of a dynamic time warping integration method.

Fig. 5, rigid body motion split diagram.

Fig. 6 is a skeleton diagram of human motion behavior.

Fig. 7, a diagram of a correction function.

Fig. 8 is a schematic diagram showing a time series of matrices.

Fig. 9 is a schematic diagram of correlation coefficient calculation.

Fig. 10 is a schematic view of a correction flow.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Example 1

The embodiment provides a behavior recognition method based on integrated depth canonical time warping and related correction, which includes image preprocessing, image analysis and image understanding, as shown in fig. 1, where the image analysis includes:

(1) representing the behavior of a human body as rigid body displacement, decomposing the rigid body displacement into rigid body translation and rigid body rotation, representing the rigid body displacement by a homogeneous matrix lie group SE (3), and reflecting all information of the homogeneous matrix lie group SE (3) by a lie algebra SO (3);

(2) establishing a skeleton model C (t) by using the collected skeleton data, wherein N sections of skeleton rigid bodies exist in the skeleton model C (t), one section of skeleton is defined as a result of displacement of the other section of skeleton at the time t, the position relation between the one section of skeleton and the other section of skeleton is defined as a displacement mapping relation, the displacement mapping relation is expressed as a homogeneous matrix lie group SE (3) in a Euclidean space, then a lie algebra SO (3) corresponding to the homogeneous matrix lie group SE (3) is used as a motion characteristic, establishing a skeleton model C (t) based on a lie algebra relative characteristic description method, and carrying out difference processing on the skeleton model C (t),

wherein N is a positive integer;

The experimental subject used the Florence3D-Action data set produced by Kinect. The Kinect comprises a total of 3 cameras, among which are RGB cameras for capturing a color image with a resolution of 640 x 480, taking 30 frames per second, and two depth sensors on both sides for detecting the relative position of the user. The Florence3D-Action dataset uses a fixed-location Kinect sensor to collect data. It consists of 10 different people performing 9 different actions, respectively. Each person performed two to three actions per action, for a total of 215 actions throughout the data set. The skeletal model in the dataset consists of 15 joints. In this embodiment, half of the motions are selected as training samples for each person, and the remaining half are used as test samples.

The skeleton information in the Florence3D-Action data set is processed on the skeleton output by the Kinect platform. The speed of the video of Kinect is kept at 30 frames. In the skeleton model of Kinect, as shown in fig. 6, which is a schematic view of a behavioral skeleton, similarly, the body parts of the forearm, upper arm, torso, and head are all regarded as rigid bodies. An absolute coordinate system is established to represent each joint, the origin is the position of the Kinect, the x axis is in the horizontal left direction, the y axis is in the vertical upper direction, the z axis is in the direction right in front of the Kinect, and each joint has a position coordinate in the coordinate system. In consideration of the correlation in the joint movement, there is a large amount of redundancy in the data of the original model of Kinect, and therefore improvement is required on the basis of these data. In this embodiment, it is assumed that some joints with high correlation in the human body are relatively fixed and invariant, and the trunk of the human body is a large rigid body. The joints on the skeleton are divided into two stages: first order is the joint point on the torso; second level is the sub-joint point located outside the trunk and connected to the first level joint by the skeleton.

In order to determine the position of the primary joint, the model establishes a coordinate system with the caudal region of the human body as the origin to obtain coordinates (u, r, t)₀、(u，r，t)₁、(u，r，t)₂And (u, r, t)₃. And respectively establishing a coordinate system by taking each primary stage as the origin of the coordinate system to obtain the coordinates of the secondary joint points.

The workflow of this embodiment: collecting a bone parameter model, describing bone parameters by adopting an LARP method, aligning the result by using depth typical time warping, then performing correlation correction, and finally classifying and outputting the result.

The depth canonical time warping method is canonical time warping with deep learning introduced. Typical time warping for deep learning is to perform nonlinear transformation on a bone model C (t) by using a weight w (t) and an activation function of a deep network to obtain a transformed sample C, (t), wherein a first group of data C, 1 in the transformed sample C, (t) is used as standard data, and the transformed sample C, (t) is aligned with the standard data C, 1 respectively to obtain an aligned sample C, (t,) according to the aligned sample C, (t,) and the standard data C' 1 and:

calculating a gradient G (t) corresponding to the alignment sample C '(t'), updating the weight W (t) according to the gradient G (t) and the learning rate of the deep network, and repeating until a deep network convergence condition is met; training characteristics C of all deep web learning_1f，C_2f，C_3f… -and C_1fAfter being aligned by a dynamic time warping method, the training result is calculated

Wherein the total number of layers of the deep network is at least 2.

The depth typical time warping method specifically comprises the following steps: inputting original bone motion data X; determining the LARP characteristic C₁，C₂，C₃…, respectively; using weight W₁，W₂，W₃And the activation function pair C of the network₁，C₂，C₃…_-Is subjected to nonlinear transformation to obtain C'₁，C′₂，C′₃…；

C'₁As a template, let C'₁，C′₂，C′₃… are each independently substituted with C'₁Alignment with DTW to give C ″)₁，C″₂，C″₃…；C″₁，C″₂，C″₃… are respectively connected with C 'according to a gradient formula'₁Determining the corresponding gradient G₁，G₂，G₃…；

The gradient formula is:

according to G₁，G₂，G₃… and learning rate update weight W of network settings₁，W₂，W₃…；

Preferably, to reduce the calculation time. Depth-typical time warping is preferably aligned by a fused depth-typical time warping method.

The correction flow is shown in fig. 10. The correlation characteristic comprises aligning a bone model C (t) with the aligned characteristic sample, calculating the correlation between the bone model C (t) and each frame of the aligned characteristic sample, calculating a correction coefficient by using a correlation coefficient x epsilon R as a variable, multiplying the correction coefficient with the characteristic data of each frame to change the weight of the corresponding frame, wherein the correction coefficient obeys a correction function e^x. Wherein the test sample is a bone modelAnd the aligned characteristic sample is a training result.

Since the structure of the LARP motion profile matrix is shown in fig. 8, the effect of alignment is evaluated by testing the respective coefficients between each column of the matrix and the corresponding column of the generic matrix as in fig. 9.

Preferably, to reduce the time consumption of the alignment algorithm, the depth-typical time warping may be optimized to integrate the depth-typical time warping. Therefore, the integration depth typical time warping method of step (3) includes: carrying out nonlinear transformation on a skeleton model C (t) by utilizing the weight W (t) and an activation function of a deep network to obtain a transformation sample C '(t), taking a first group of data C' 1 in the transformation sample C '(t) as standard data, aligning the transformation sample C (t) with the standard data C' 1 respectively by using a dynamic time warping method, and obtaining an integration characteristic C after averaging, wherein the integration characteristic C is obtained according to the integration characteristic C and the standard data C, 1:

Calculating gradients G (t) corresponding to the transformation samples C' (t) one by one, updating weights W (t) according to the gradients G (t) and the learning rate of the deep network, manually setting the convergence times to 5 times in the embodiment, and after convergence, performing transformation on all training characteristics C of the deep network_1f，C_2f，C_3f…_-Training feature C corresponding to standard data C' 1_1fAnd averaging after the dynamic time warping alignment, wherein the calculation result is an alignment characteristic sample.

Specifically, the difference processing of step (2) includes using an interpolation method including:

the characteristics with the same frame number are obtained through interpolation, and the interpolation formula is as follows:

wherein the content of the first and second substances,

definition of Q₁，Q₂，Q₃…Q_nE is SE (3) is Jt₁，t₂，t₃…t_nInstantaneous relative position of time of day.

Wherein, the establishing of the skeleton model C (t) based on the lie algebra relative feature description method in the step (2) comprises: selecting e in a skeletal model_mBones and e_nBone analysis, e_mEnd point of bone is e_m1And e_m2，e_nEnd point of bone is e_n1And e_n2By end point e_n1Is the origin, e_nThe direction vector of the bone is the axis, bone e_mAnd bones e_nThe surface is used as a coordinate surface to establish a coordinate system, e_mEnd point e of bone_m1And e_m2The relative position of (a) is expressed as:

e_nend point of (c) relative to (e)_mThe positions of (A) are:

C(t)＝(ve(B_1，2)，ve(B_1，3)，…，ve(B_n，n-1))，

ve(B)＝(u₁，u₂，u₃，v₁，v₂，v₃)

Preferably, the image preprocessing further comprises image noise reduction and image enhancement. The recognition effect can be improved.

The depth-integrated typical time warping structure in example 1 is shown in fig. 3, wherein the deep learning deep network we use is a BP network structure. The deep network is provided with 4 layers, including an input layer, an output layer and two hidden layers. As shown in fig. 5, the lie group can represent the displacement of the rigid body, and the displacement transformation of the rigid body can be decomposed into two parts, i.e., translation and rotation.

In the training process, the flow of the dynamic time warping method, i.e. DTW, is as shown in fig. 4: and selecting the first array in the training data set as a reference to align all other data, then averaging to obtain a new reference array, and circulating the step for 25 times to finally obtain the integrated data. Since the deep network only needs to compute the gradient once, our deep network has an improvement in algorithm time compared to the DCTW before the improvement, which is shown in table 1 below.

TABLE 1

In the process of performing correlation correction on the features, the correction aims at distinguishing a video frame with better correlation from a video frame with poorer correlation, the function is required to expand the influence of the correlation on a correction coefficient as much as possible, and the function with larger derivative can produce better effect in our experiment. Selecting the function with the best result as the correction function, and selecting e with the highest accuracy^xAs a correction function, e^xIn [ -1, 1 [)]The function image over the interval of (a) is shown in fig. 7.

The results of the correction function after correcting the data are shown in table 2. The final correlation correction results were 0.62% improvement over the initial LARP + DTW.

TABLE 2

Experiment of	Results
		LARP+DTW	0.9071
LARP+DCTW	0.9124
		LARP + DCTW + correlation correction	0.9133

After the correlation is corrected, the behavior recognition result is as shown in fig. 2. The behavior identification method in the embodiment obviously improves actions with low identification rate, for example, the accuracy rate of identification is improved by about 1% -3% for two actions of drinking and making a call.

Although the illustrative embodiments of the present invention have been described above to enable those skilled in the art to understand the present invention, the present invention is not limited to the scope of the embodiments, and it is apparent to those skilled in the art that all the inventive concepts using the present invention are protected as long as they can be changed within the spirit and scope of the present invention as defined and defined by the appended claims.

Claims

1. A behavior recognition method based on integration depth canonical time warping and related correction comprises image preprocessing, image analysis and image understanding, and is characterized in that: the image analysis includes:

wherein N is a positive integer;

(3) aligning the skeleton model C (t) by using an integration depth typical time warping method to obtain an aligned feature sample;

(5) classifying the corrected feature samples by using a support vector machine to obtain a behavior recognition result;

aligning a bone model C (t) with the aligned feature samples, calculating the correlation between the bone model C (t) and each frame of the aligned feature samples, calculating a correction coefficient by using a correlation coefficient x e R as a variable, multiplying the correction coefficient with the feature data of each frame to change the weight of the corresponding frame, wherein the correction coefficient obeys a correction function e^x；

The integration depth typical time warping method in step (3) comprises the following steps:

carrying out nonlinear transformation on the skeleton model C (t) by utilizing the weight W (t) and an activation function of a deep network to obtain a transformation sample C '(t), taking a first group of data C' 1 in the transformation sample C '(t) as standard data, respectively aligning the transformation sample C' (t) with the standard data C '1 by using a dynamic time warping method, and averaging to obtain an integration characteristic C, wherein the integration characteristic C is obtained according to the integration characteristic C and the standard data C' 1:

covariance calculation, SVD decomposition U, V, transform samples C '(t) and C' (t) from U, V

Calculating gradients G (t) corresponding to the transformation samples C' (t) one by one, updating weights W (t) according to the gradients G (t) and the learning rate of the deep network, and converging all training characteristics C transformed by the deep network_1f，C_2f，C_3f.. training feature C corresponding to Standard data C' 1_1fAfter the dynamic time is used for regular alignment, averaging is carried out, and the calculation result is an alignment characteristic sample;

wherein the total number of layers of the deep network is at least 2; t represents the time period in each data sample; f. of₁Representing the original sample X₁Result of the non-linear change, f_iRepresenting the result of the non-linear change of the ith original sample; f. of_dtwObtaining an integrated function for the dynamic time warping calculation process; | K | luminance_*Indicating that the trace calculation is carried out on the matrix K; k represents a coefficient matrix for singular value decomposition; s represents a non-zero singular value matrix after SVD decomposition; x_iRepresenting first raw bone motion data; theta₁Representing the value of the parameter.

2. The behavior recognition method based on integrated depth canonical time warping and correlation correction as claimed in claim 1, wherein:

the deep layer network is a BP network, and the BP network comprises an input layer, an output layer and two hidden layers.

3. The behavior recognition method based on integrated depth-canonical time warping and correlation correction according to claim 1, wherein: the difference processing of step (2) includes using an interpolation method including: the characteristics with the same frame number are obtained through interpolation, and the interpolation formula is as follows:

wherein the content of the first and second substances,

definition of Q₁，Q₂，Q₃...Q_nE SE (3) is t₁，t₂，t₃...t_nInstantaneous relative position of time of day.

4. The behavior recognition method based on integrated depth canonical time warping and related corrections of claim 1, wherein: the building of the skeleton model C (t) based on the lie algebra relative feature description method in the step (2) comprises the following steps: selecting e in a skeletal model_mBones and e_nBone analysis, e_mEnd point of bone is e_m1And e_m2，e_nEnd point of bone is e_n1And e_n2By end point e_n1Is the origin, e_nThe direction vector of the bone is the axis, bone e_mAnd bones e_nThe surface is used as a coordinate surface to establish a coordinate system, e_mEnd point e of bone_m1And e_m2The relative position of (a) is expressed as:

e_nend point of (c) relative to (e)_mThe positions of (A) are:

according to e_mBones and e_nCalculating the relative position relation of each skeleton and the relative displacement mapping relation ve (B) of other skeletons, defining that a skeleton model has n sections of skeleton rigid bodies, and obtaining the shape of the skeleton model at the time t as C (t) as follows:

C(t)＝(ve(B_1，2)，ve(B_1，3)，...，ve(B_n，n-1))，

ve(B)＝(u₁，u₂，u₃，v₁，v₂，v₃)

5. The behavior recognition method based on integrated depth canonical time warping and related corrections of claim 1, wherein: the image pre-processing also includes image noise reduction and image enhancement.