CN107229920B - Behavior identification method based on integration depth typical time warping and related correction - Google Patents

Behavior identification method based on integration depth typical time warping and related correction Download PDF

Info

Publication number
CN107229920B
CN107229920B CN201710425906.1A CN201710425906A CN107229920B CN 107229920 B CN107229920 B CN 107229920B CN 201710425906 A CN201710425906 A CN 201710425906A CN 107229920 B CN107229920 B CN 107229920B
Authority
CN
China
Prior art keywords
skeleton
time warping
bone
sample
displacement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710425906.1A
Other languages
Chinese (zh)
Other versions
CN107229920A (en
Inventor
葛永新
陈乐扬
杨丹
张小洪
徐玲
杨梦宁
洪明坚
王洪星
黄晟
陈飞宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Space New Vision Artificial Intelligence Technology Research Institute Co.,Ltd.
Original Assignee
Chongqing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University filed Critical Chongqing University
Priority to CN201710425906.1A priority Critical patent/CN107229920B/en
Publication of CN107229920A publication Critical patent/CN107229920A/en
Application granted granted Critical
Publication of CN107229920B publication Critical patent/CN107229920B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Abstract

The invention relates to a behavior recognition method based on integration depth typical time warping and relevant correction, which solves the technical problems of low recognition accuracy and long consumed time, and adopts the technical scheme that the behavior of a human body is expressed as rigid displacement, the rigid displacement is decomposed into rigid translation and rigid rotation, the rigid displacement is expressed by a homogeneous matrix lie group SE (3), the lie algebra is SO (3), a skeleton model C (t) is established by collected skeleton data, the displacement mapping relation is expressed as the homogeneous matrix lie group SE (3), the skeleton model C (t) based on the relative feature description method of the lie algebra is established, and the difference value processing is carried out on the skeleton model C (t); aligning using an integrated depth canonical time warping method; the technical scheme that the correlation characteristics are utilized, the alignment characteristic samples are corrected, and the corrected characteristic samples are classified by using a support vector machine better solves the problem and is used for behavior recognition of the 3D skeleton.

Description

Behavior identification method based on integration depth typical time warping and related correction
Technical Field
The invention relates to the field of human behavior recognition, in particular to a behavior recognition method based on integration depth typical time warping and relevant correction.
Background
Human behavior recognition belongs to the field of computer vision and behavior pattern recognition, and is a popular topic in recent years. In the aspects of man-machine interaction, security monitoring, content-based video retrieval and the like, behavior identification has huge practical use value, and the important role of behavior identification is increasingly highlighted along with the construction of informatization. Meanwhile, behavior recognition also has certain promotion effect on other fields of computer vision, such as face recognition, gait analysis and the like. In the past, due to the limitation of science and technology, most behavior recognition is based on 2D videos and images, and the recognition effect is not ideal. With the development and popularization of 3D technologies such as depth cameras, human behavior recognition is also gradually advancing from planar feature recognition to facade feature recognition, and 3D behavior recognition is becoming a main research method for human behavior recognition. In the process of acquiring the motion characteristics, the different motion speeds can cause the generated data to have larger differences in time series. Therefore, in the process of behavior recognition, data alignment is an important factor influencing the accuracy of recognition.
Existing 3D behavior recognition optical flow models, 3D skeletal models, 3D contour models, spatio-temporal features. The existing 3D bone-based behavior recognition method uses multiple cameras or other sensors to represent the position information of all joints in each frame in the whole motion in a coordinate manner, and the arrangement of the data of each frame in a time sequence constitutes all the information of the motion. This is a feature description method based on point features. The human body can be regarded as an articulated system consisting of rigid body segments, the connection points are joints of the human body, and in computer vision, a great deal of research is carried out to identify behaviors by extracting joint information or detecting local body parts. A method for behavior recognition by dividing the human body into several parts, extracting the joint position information of the human body from a single image, and dividing the human body into parts such as head, neck, shoulder, arm, elbow, wrist, hand, trunk, leg, knee, ankle, etc., wherein different motions can be represented by different parts. The method stimulates the interest of scientific researchers in the field of computer vision, and the identification of human behaviors by utilizing the articulation of bones becomes a hot spot in the field of behavior identification. However, the prior art has the problems of low identification precision and long identification time. Therefore, it is necessary to provide a behavior recognition method based on 3D bones with high recognition accuracy.
Disclosure of Invention
The invention aims to solve the technical problems of low detection and identification precision and long time in the prior art. The behavior recognition method based on the 3D skeleton has the characteristic of high recognition accuracy.
In order to solve the technical problems, the technical scheme is as follows:
a behavior recognition method based on integrated depth canonical time warping and related correction comprises image preprocessing, image analysis and image understanding, wherein the image analysis comprises the following steps:
(1) representing the behavior of a human body as rigid body displacement, decomposing the rigid body displacement into rigid body translation and rigid body rotation, representing the rigid body displacement by a homogeneous matrix lie group SE (3), and representing the lie algebra reflecting all information of the homogeneous matrix lie group SE (3) as SO (3);
(2) establishing a skeleton model C (t) by using the collected skeleton data, wherein N sections of skeleton rigid bodies exist in the skeleton model C (t), one section of skeleton is defined as a result of displacement of the other section of skeleton at the time t, the position relation between the one section of skeleton and the other section of skeleton is defined as a displacement mapping relation, the displacement mapping relation is expressed as a homogeneous matrix Lie group SE (3) in a Euclidean space, then a Lie Algebra SO (3) corresponding to the homogeneous matrix Lie group SE (3) is used as a motion characteristic, establishing a skeleton model C (t) based on a Lie Algebra relative characteristic description method, and carrying out difference processing on the skeleton model C (t), wherein the Lie Algebra relative characteristic description method is Lie Algebra relative distances, called LARP for short,
wherein N is a positive integer;
(3) aligning the skeleton model C (t) by using a depth typical time warping method to obtain an aligned feature sample;
(4) correcting the alignment feature sample in the step (3) by using the correlation feature to obtain a corrected alignment feature sample;
(5) and classifying the corrected feature samples by using a support vector machine to obtain a behavior recognition result.
The working principle of the invention is as follows: in order to enhance the behavior recognition effect, the aligned test samples are corrected by using correlation, and a better test result is obtained. The correlation between samples is improved mainly by training and deep canonical time warping.
In the above scheme, for optimization, further, the depth-typical time warping method is an integrated depth-typical time warping method. The deep typical time warping method, i.e. the deep network structure in the DCTW, is mainly improved in order to properly reduce the running time of the algorithm. In the process of applying the DCTW to the LARP, although the accuracy of behavior recognition can be improved by performing the non-linear transformation on each group of data in the training sample, the computation time of the algorithm is too long due to too many gradients to be calculated, and therefore, we try to find other simple methods. The purpose of practical training is to find a general set of data features, it is not necessary to calculate a loss function for each set of data in a sample, and we try to obtain a set of data first by using a dynamic time warping averaging method, and use the set as a representative calculation loss function of all samples, so that the amount of calculation required in a deep network is remarkably improved, and the calculation time can be reduced.
Further, the correlation characteristic in the step (4) includes aligning a bone model c (t) with the aligned characteristic sample, calculating the correlation between the bone model c (t) and each frame of the aligned characteristic sample, calculating a correction coefficient by using a correlation coefficient x e R as a variable, multiplying the correction coefficient by the characteristic data of each frame to change the weight of the corresponding frame, wherein the correction coefficient obeys a correction function ex
Further, the integration depth typical time warping method in step (3) includes:
carrying out nonlinear transformation on the skeleton model C (t) by utilizing the weight W (t) and an activation function of a deep network to obtain a transformation sample C '(t), taking a first group of data C' 1 in the transformation sample C '(t) as standard data, respectively aligning the transformation sample C' (t) with the standard data C '1 by using a dynamic time warping method, averaging to obtain an integration characteristic C, and obtaining an integration characteristic C according to the integration characteristic C and the standard data C' 1:
Figure BDA0001316170140000041
Figure BDA0001316170140000042
Figure BDA0001316170140000043
covariance calculation and SVD decomposition U, V, transform samples C, (t) and C U, V
Figure BDA0001316170140000051
Figure BDA0001316170140000052
Figure BDA0001316170140000053
Calculating gradients G (t) corresponding to the transformation samples C (t) one by one, updating weights W (t) according to the gradients G (t) and the learning rate of the deep network, and converging all training characteristics C transformed by the deep network1f,C2f,C3f… -training feature C corresponding to Standard data C' 1fAfter the dynamic time is used for regular alignment, averaging is carried out, and the calculation result is an alignment characteristic sample;
wherein the total number of layers of the deep network is at least 2.
Further, the deep layer network is a BP network, and the BP network includes an input layer, an output layer, and two hidden layers.
Further, the difference processing of step (2) includes using an interpolation method, the interpolation method including: the characteristics with the same frame number are obtained through interpolation, and the interpolation formula is as follows:
Figure BDA0001316170140000054
wherein the content of the first and second substances,
Figure BDA0001316170140000055
definition of Q1,Q2,Q3…QnE SE (3) is t1,t2,t3…tnInstantaneous relative position of time of day.
Further, the establishing of the bone model c (t) based on the lie algebra relative feature description method in the step (2) includes: selecting e in a skeletal modelmBones and enBone analysis, emEnd point of bone is em1And em2,enEnd point of bone is en1And en2By end point en1Is the origin, enThe direction vector of the bone is the axis, bone emAnd bones enThe surface is used as a coordinate surface to establish a coordinate system, emEnd point e of bonem1And em2The relative position of (a) is expressed as:
Figure BDA0001316170140000061
enend point of (c) relative to (e)mThe positions of (A) are:
Figure BDA0001316170140000062
embones and enThe displacement mapping relation of the skeleton is emBones and enThe relative position relationship of bones is:
Figure BDA0001316170140000063
Figure BDA0001316170140000064
according to emBones and enCalculating the phase shift mapping relation ve (B) of each skeleton and other skeletons according to the relative position relation of the skeletons, defining that a skeleton model has n sections of skeleton rigid bodies, and obtaining that the shape of the skeleton model at the time t is C (t) as follows:
C(t)=(ve(B1,2),ve(B1,3),…,ve(Bn,n-1)),
Figure BDA0001316170140000071
ve(B)=(u1,u2,u3,v1,v2,v3)
wherein C (t) has 6 XnX (N-1) vectors, and N and m are positive integers less than N.
Further, the image preprocessing also comprises image noise reduction and image enhancement.
The logarithm of a matrix lie group can form a lie algebra space, and information in the lie group can be completely converted into the form of a lie algebra. The displacement of the rigid body can be represented by the lie group, and the displacement transformation of the rigid body can be decomposed into a translation part and a rotation part. Assuming that the displacement vector of p in the rigid body displacement process as the coordinate origin is q ∈ R3The rotation of the rigid body around p is R ∈ SO (3), then we can combine the two to get a pair of displacement coordinates (q, R) to represent this rigid body displacement, all such coordinates forming a set SE (3): SE (3) { (q, R): q is an element of R3,R∈SO(3)}=R3XSO (3). Let x1,x2Representing the positions before and after translation, respectively, then:
x2=q+Rx1wherein q, R ∈ SE (3) represents the displacement of the rigid body, of SE (3)The elements represent the displacement changes of the rigid body. In order to show the translation and rotation in a simpler and clearer way, a homogeneous matrix is used to represent the transformation of the rigid body. In the process of homogenization, the position of the point is represented here by 1 and the vector by 0.
The position transformation of a point can be expressed as:
Figure BDA0001316170140000081
Figure BDA0001316170140000082
is a 4X 4 matrix, is a homogeneous representation of g ∈ SE (3)
Figure BDA0001316170140000083
Figure BDA0001316170140000084
Redefining SE (3), homogenizing all elements in the original SE (3) to obtain a new SE (3), wherein the new SE (3) can form a group under multiplication, the SE (3) meets rigid body displacement conditions, so that the SE (3) can represent displacement of a rigid body,
Figure BDA0001316170140000085
the coordinate system is based on the q point before displacement as the origin, so that the q point after displacement reflects the case of translational transformation of the q point, and SE (3) is a lie group. There must be one lie number that can reflect the full information of SE (3).
Human behavioral characteristics based on 3D skeleton are described using SE (3). The object of behavior recognition is a video or an image sequence, in which the motion of a person is discrete, that is, the motion of the person is formed by arranging a plurality of static skeleton shapes, so that the feature description of human behavior recognition based on a 3D skeleton is a method for finding a shape capable of describing the static skeleton. Between two sections of boneThe position relation of (A) is understood as a displacement mapping relation, namely, a piece of bone is regarded as a result of displacement of another piece of bone, the position of a certain piece of fixed bone relative to another piece of bone in the Euclidean space can also be represented by a lie group, and then a corresponding lie algebra is found as a characteristic of motion. The purpose of lie algebraic relative description is to calculate the relative position between each bone and other bones and use ve (B) as a representation. Assuming a bone model with n segments of bone rigid bodies, at time t, the shape of the bone model can be expressed as c (t) -ve (B)1,2),ve(B1,3),…,ve(Bn,n-1) A total number of 6 xn (n-1) of such vectors.
The dynamic time warping processing object is an array arranged in a time sequence, and the two arrays are respectively X: x is the number of1,x2,x3,…,xNAnd Y: y is1,y2,y3,…,yMThe lengths are N and M belongs to R respectively. The two arrays may be discrete signals or, more commonly, time-equidistant points generated during the acquisition of the series of feature numbers. Let this feature space be F, then xn,ymE.g. F, where N is e [1, N],m∈[1,M]. In general, when two different features x, y ∈ F are compared, a function is needed as a method for comparing the similarity of the two features, and the function satisfies c: f × F → R.gtoreq.0. If the similarity between x and y is higher, the value of c (x, y) is smaller. Considering the number pairs of elements in all X and Y, the objective of dynamic time warping is to find an optimal combination that minimizes the sum of all c (X, Y) values obtained after combining the elements in X and Y two by two.
Dynamic time warping has the drawback of not being able to handle sequences of different dimensions. The invention adopts an improved method, and integrates depth typical time warping. And typical correlation analysis is introduced to strengthen the similarity of two arrays needing to be aligned, and the alignment accuracy can be greatly improved under the condition of higher similarity. The correlation between the two data is a measure of the linear relationship between the objects. The higher the correlation the more similar the two sets of data. The correlation between two sets of data reflects the linear correlation degree of the two sets of data, and the greater the correlation, the easier it is to predict the distribution of one set of data from the other set of data. In the process of behavior recognition, the skeleton shapes represented by the two arrays with high correlation degree also have higher similarity. The effect of the typical correlation analysis is to make the two groups of data have higher correlation degree through linear transformation.
The depth typical correlation analysis changes the linear transformation in the typical correlation analysis into the nonlinear transformation, so that the similarity of the two arrays after the transformation reaches a higher level. Deep learning refers to a deep network with more than two layers, and a more realistic effect can be obtained through a nonlinear function formed by abstract multilayer nesting. Through deep learning, good effects can be obtained through initializing parameters and through training an updating mode. In deep canonical correlation analysis, deep learning is mainly to find a nonlinear transformation that makes the correlation of two arrays higher.
After nonlinear transformation is carried out on the sequence in deep learning, a loss function of the deep network is calculated through a corresponding method, then the gradient of each weight is determined according to the loss function, the weight is updated on the basis of the gradient, and finally a stable state is achieved, wherein the weight at the moment is the final result after the deep network is updated for many times, and the final output result is the solution of the deep network.
The typical time warping is to perform typical correlation analysis on the arrays before aligning the arrays, so as to improve the correlation between the two sets of data, and the alignment of the arrays can obtain better effect under the condition of high similarity.
The typical time warping of depth is that firstly, the function in the deep network is utilized to carry out nonlinear transformation on data, then typical correlation analysis is carried out, and each parameter in the nonlinear mapping is updated through the loss function, and finally a stable result is obtained.
The fusion depth typical time warping is mainly an improvement on the depth typical time warping. In particular to improve the way in which the loss function in a deep network is calculated. For samples f subjected to nonlinear transformation1,f2,f3…, obtaining an integrated function f by using a dynamic time warping calculation processdtwThen we select f in the sample1Using the two samples to calculate
Figure BDA0001316170140000111
Figure BDA0001316170140000112
Figure BDA0001316170140000113
In calculating the loss function, using fdtwAnd f1Calculating covariance, K and singular value decomposition K to obtain U and V, calculating each sample f1,f2,f3… is calculated as
Figure BDA0001316170140000114
Figure BDA0001316170140000115
Figure BDA0001316170140000116
Only one time of K and SVD decomposition is needed to be calculated in one cycle of the deep network, and simultaneously
Figure BDA0001316170140000121
And
Figure BDA0001316170140000122
and also only one time of calculation is needed, so that a large amount of time can be saved.
To improve the accuracy of the identification, we try to correct the features with correlation. And testing the data alignment effect through the correlation between the aligned test sample and the general characteristic data obtained by training. Since the structure of the LARP motion feature matrix is shown in fig. 8, which is a time-series representation in the structure of the feature matrix, fig. 9 shows a corresponding correlation coefficient calculation manner. The effect of the alignment is evaluated by testing the respective coefficients between each column of the matrix and the respective column of the generic matrix.
And the correlation coefficient is used for evaluating the correlation of the test sample, so that the function of the frame number with higher correlation in identification is increased, and the function of the frame number with lower correlation is reduced. To distinguish the difference between the correlations, a correlation coefficient x ∈ R is used as a variable calculation coefficient, the weight of the frame number is changed by multiplying, and a correction function e is selectedxThe correction flow is as shown in fig. 10. Through the adjustment of the sample correlation by integrating depth time warping, the correlation between the test sample and the training result is mostly between 0 and 0.2, and the condition of negative correlation rarely occurs. Therefore, it is possible to distinguish a frame having a low correlation from other frames having a high correlation, and to reduce the weight of the frame having a low correlation.
The invention has the beneficial effects that:
the method has the advantages that the recognition accuracy is improved by adopting the typical time warping of the fusion depth and the correlation correction alignment;
the method has the advantages that the defects of existing 3D bone behavior identification are overcome;
effect three, using fusion depth typical time warping significantly reduces the computation time for behavior recognition.
Drawings
The invention is further illustrated with reference to the following figures and examples.
Fig. 1 is a schematic flow chart of 3D behavior recognition in embodiment 1.
Fig. 2 is a schematic diagram of a behavior recognition result in embodiment 1.
Fig. 3 is a schematic diagram of an integrated depth exemplary time-warping deep network structure.
FIG. 4 is a schematic diagram of a dynamic time warping integration method.
Fig. 5, rigid body motion split diagram.
Fig. 6 is a skeleton diagram of human motion behavior.
Fig. 7, a diagram of a correction function.
Fig. 8 is a schematic diagram showing a time series of matrices.
Fig. 9 is a schematic diagram of correlation coefficient calculation.
Fig. 10 is a schematic view of a correction flow.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Example 1
The embodiment provides a behavior recognition method based on integrated depth canonical time warping and related correction, which includes image preprocessing, image analysis and image understanding, as shown in fig. 1, where the image analysis includes:
(1) representing the behavior of a human body as rigid body displacement, decomposing the rigid body displacement into rigid body translation and rigid body rotation, representing the rigid body displacement by a homogeneous matrix lie group SE (3), and reflecting all information of the homogeneous matrix lie group SE (3) by a lie algebra SO (3);
(2) establishing a skeleton model C (t) by using the collected skeleton data, wherein N sections of skeleton rigid bodies exist in the skeleton model C (t), one section of skeleton is defined as a result of displacement of the other section of skeleton at the time t, the position relation between the one section of skeleton and the other section of skeleton is defined as a displacement mapping relation, the displacement mapping relation is expressed as a homogeneous matrix lie group SE (3) in a Euclidean space, then a lie algebra SO (3) corresponding to the homogeneous matrix lie group SE (3) is used as a motion characteristic, establishing a skeleton model C (t) based on a lie algebra relative characteristic description method, and carrying out difference processing on the skeleton model C (t),
wherein N is a positive integer;
(3) aligning the skeleton model C (t) by using a depth typical time warping method to obtain an aligned feature sample;
(4) correcting the alignment feature sample in the step (3) by using the correlation feature to obtain a corrected alignment feature sample;
(5) and classifying the corrected feature samples by using a support vector machine to obtain a behavior recognition result.
The experimental subject used the Florence3D-Action data set produced by Kinect. The Kinect comprises a total of 3 cameras, among which are RGB cameras for capturing a color image with a resolution of 640 x 480, taking 30 frames per second, and two depth sensors on both sides for detecting the relative position of the user. The Florence3D-Action dataset uses a fixed-location Kinect sensor to collect data. It consists of 10 different people performing 9 different actions, respectively. Each person performed two to three actions per action, for a total of 215 actions throughout the data set. The skeletal model in the dataset consists of 15 joints. In this embodiment, half of the motions are selected as training samples for each person, and the remaining half are used as test samples.
The skeleton information in the Florence3D-Action data set is processed on the skeleton output by the Kinect platform. The speed of the video of Kinect is kept at 30 frames. In the skeleton model of Kinect, as shown in fig. 6, which is a schematic view of a behavioral skeleton, similarly, the body parts of the forearm, upper arm, torso, and head are all regarded as rigid bodies. An absolute coordinate system is established to represent each joint, the origin is the position of the Kinect, the x axis is in the horizontal left direction, the y axis is in the vertical upper direction, the z axis is in the direction right in front of the Kinect, and each joint has a position coordinate in the coordinate system. In consideration of the correlation in the joint movement, there is a large amount of redundancy in the data of the original model of Kinect, and therefore improvement is required on the basis of these data. In this embodiment, it is assumed that some joints with high correlation in the human body are relatively fixed and invariant, and the trunk of the human body is a large rigid body. The joints on the skeleton are divided into two stages: first order is the joint point on the torso; second level is the sub-joint point located outside the trunk and connected to the first level joint by the skeleton.
In order to determine the position of the primary joint, the model establishes a coordinate system with the caudal region of the human body as the origin to obtain coordinates (u, r, t)0、(u,r,t)1、(u,r,t)2And (u, r, t)3. And respectively establishing a coordinate system by taking each primary stage as the origin of the coordinate system to obtain the coordinates of the secondary joint points.
The workflow of this embodiment: collecting a bone parameter model, describing bone parameters by adopting an LARP method, aligning the result by using depth typical time warping, then performing correlation correction, and finally classifying and outputting the result.
The depth canonical time warping method is canonical time warping with deep learning introduced. Typical time warping for deep learning is to perform nonlinear transformation on a bone model C (t) by using a weight w (t) and an activation function of a deep network to obtain a transformed sample C, (t), wherein a first group of data C, 1 in the transformed sample C, (t) is used as standard data, and the transformed sample C, (t) is aligned with the standard data C, 1 respectively to obtain an aligned sample C, (t,) according to the aligned sample C, (t,) and the standard data C' 1 and:
Figure BDA0001316170140000161
Figure BDA0001316170140000162
Figure BDA0001316170140000163
calculating a gradient G (t) corresponding to the alignment sample C '(t'), updating the weight W (t) according to the gradient G (t) and the learning rate of the deep network, and repeating until a deep network convergence condition is met; training characteristics C of all deep web learning1f,C2f,C3f… -and C1fAfter being aligned by a dynamic time warping method, the training result is calculated
Wherein the total number of layers of the deep network is at least 2.
The depth typical time warping method specifically comprises the following steps: inputting original bone motion data X; determining the LARP characteristic C1,C2,C3…, respectively; using weight W1,W2,W3And the activation function pair C of the network1,C2,C3-Is subjected to nonlinear transformation to obtain C'1,C′2,C′3…;
C'1As a template, let C'1,C′2,C′3… are each independently substituted with C'1Alignment with DTW to give C ″)1,C″2,C″3…;C″1,C″2,C″3… are respectively connected with C 'according to a gradient formula'1Determining the corresponding gradient G1,G2,G3…;
The gradient formula is:
Figure BDA0001316170140000171
Figure BDA0001316170140000172
Figure BDA0001316170140000173
according to G1,G2,G3… and learning rate update weight W of network settings1,W2,W3…;
Preferably, to reduce the calculation time. Depth-typical time warping is preferably aligned by a fused depth-typical time warping method.
The correction flow is shown in fig. 10. The correlation characteristic comprises aligning a bone model C (t) with the aligned characteristic sample, calculating the correlation between the bone model C (t) and each frame of the aligned characteristic sample, calculating a correction coefficient by using a correlation coefficient x epsilon R as a variable, multiplying the correction coefficient with the characteristic data of each frame to change the weight of the corresponding frame, wherein the correction coefficient obeys a correction function ex. Wherein the test sample is a bone modelAnd the aligned characteristic sample is a training result.
Since the structure of the LARP motion profile matrix is shown in fig. 8, the effect of alignment is evaluated by testing the respective coefficients between each column of the matrix and the corresponding column of the generic matrix as in fig. 9.
Preferably, to reduce the time consumption of the alignment algorithm, the depth-typical time warping may be optimized to integrate the depth-typical time warping. Therefore, the integration depth typical time warping method of step (3) includes: carrying out nonlinear transformation on a skeleton model C (t) by utilizing the weight W (t) and an activation function of a deep network to obtain a transformation sample C '(t), taking a first group of data C' 1 in the transformation sample C '(t) as standard data, aligning the transformation sample C (t) with the standard data C' 1 respectively by using a dynamic time warping method, and obtaining an integration characteristic C after averaging, wherein the integration characteristic C is obtained according to the integration characteristic C and the standard data C, 1:
Figure BDA0001316170140000181
Figure BDA0001316170140000182
Figure BDA0001316170140000183
covariance calculation and SVD decomposition U, V, transform samples C, (t) and C U, V
Figure BDA0001316170140000184
Figure BDA0001316170140000185
Figure BDA0001316170140000186
Calculating gradients G (t) corresponding to the transformation samples C' (t) one by one, updating weights W (t) according to the gradients G (t) and the learning rate of the deep network, manually setting the convergence times to 5 times in the embodiment, and after convergence, performing transformation on all training characteristics C of the deep network1f,C2f,C3f-Training feature C corresponding to standard data C' 11fAnd averaging after the dynamic time warping alignment, wherein the calculation result is an alignment characteristic sample.
Specifically, the difference processing of step (2) includes using an interpolation method including:
the characteristics with the same frame number are obtained through interpolation, and the interpolation formula is as follows:
Figure BDA0001316170140000191
wherein the content of the first and second substances,
Figure BDA0001316170140000192
definition of Q1,Q2,Q3…QnE is SE (3) is Jt1,t2,t3…tnInstantaneous relative position of time of day.
Wherein, the establishing of the skeleton model C (t) based on the lie algebra relative feature description method in the step (2) comprises: selecting e in a skeletal modelmBones and enBone analysis, emEnd point of bone is em1And em2,enEnd point of bone is en1And en2By end point en1Is the origin, enThe direction vector of the bone is the axis, bone emAnd bones enThe surface is used as a coordinate surface to establish a coordinate system, emEnd point e of bonem1And em2The relative position of (a) is expressed as:
Figure BDA0001316170140000193
enend point of (c) relative to (e)mThe positions of (A) are:
Figure BDA0001316170140000194
embones and enThe displacement mapping relation of the skeleton is emBones and enThe relative position relationship of bones is:
Figure BDA0001316170140000201
Figure BDA0001316170140000202
according to emBones and enCalculating the phase shift mapping relation ve (B) of each skeleton and other skeletons according to the relative position relation of the skeletons, defining that a skeleton model has n sections of skeleton rigid bodies, and obtaining that the shape of the skeleton model at the time t is C (t) as follows:
C(t)=(ve(B1,2),ve(B1,3),…,ve(Bn,n-1)),
Figure BDA0001316170140000203
ve(B)=(u1,u2,u3,v1,v2,v3)
wherein C (t) has 6 XnX (N-1) vectors, and N and m are positive integers less than N.
Preferably, the image preprocessing further comprises image noise reduction and image enhancement. The recognition effect can be improved.
The depth-integrated typical time warping structure in example 1 is shown in fig. 3, wherein the deep learning deep network we use is a BP network structure. The deep network is provided with 4 layers, including an input layer, an output layer and two hidden layers. As shown in fig. 5, the lie group can represent the displacement of the rigid body, and the displacement transformation of the rigid body can be decomposed into two parts, i.e., translation and rotation.
In the training process, the flow of the dynamic time warping method, i.e. DTW, is as shown in fig. 4: and selecting the first array in the training data set as a reference to align all other data, then averaging to obtain a new reference array, and circulating the step for 25 times to finally obtain the integrated data. Since the deep network only needs to compute the gradient once, our deep network has an improvement in algorithm time compared to the DCTW before the improvement, which is shown in table 1 below.
TABLE 1
Figure BDA0001316170140000211
In the process of performing correlation correction on the features, the correction aims at distinguishing a video frame with better correlation from a video frame with poorer correlation, the function is required to expand the influence of the correlation on a correction coefficient as much as possible, and the function with larger derivative can produce better effect in our experiment. Selecting the function with the best result as the correction function, and selecting e with the highest accuracyxAs a correction function, exIn [ -1, 1 [)]The function image over the interval of (a) is shown in fig. 7.
The results of the correction function after correcting the data are shown in table 2. The final correlation correction results were 0.62% improvement over the initial LARP + DTW.
TABLE 2
Experiment of Results
LARP+DTW 0.9071
LARP+DCTW 0.9124
LARP + DCTW + correlation correction 0.9133
After the correlation is corrected, the behavior recognition result is as shown in fig. 2. The behavior identification method in the embodiment obviously improves actions with low identification rate, for example, the accuracy rate of identification is improved by about 1% -3% for two actions of drinking and making a call.
Although the illustrative embodiments of the present invention have been described above to enable those skilled in the art to understand the present invention, the present invention is not limited to the scope of the embodiments, and it is apparent to those skilled in the art that all the inventive concepts using the present invention are protected as long as they can be changed within the spirit and scope of the present invention as defined and defined by the appended claims.

Claims (5)

1. A behavior recognition method based on integration depth canonical time warping and related correction comprises image preprocessing, image analysis and image understanding, and is characterized in that: the image analysis includes:
(1) representing the behavior of a human body as rigid body displacement, decomposing the rigid body displacement into rigid body translation and rigid body rotation, representing the rigid body displacement by a homogeneous matrix lie group SE (3), and reflecting all information of the homogeneous matrix lie group SE (3) by a lie algebra SO (3);
(2) establishing a skeleton model C (t) by using the collected skeleton data, wherein N sections of skeleton rigid bodies exist in the skeleton model C (t), one section of skeleton is defined as a result of displacement of the other section of skeleton at the time t, the position relation between the one section of skeleton and the other section of skeleton is defined as a displacement mapping relation, the displacement mapping relation is expressed as a homogeneous matrix lie group SE (3) in a Euclidean space, then a lie algebra SO (3) corresponding to the homogeneous matrix lie group SE (3) is used as a motion characteristic, establishing a skeleton model C (t) based on a lie algebra relative characteristic description method, and carrying out difference processing on the skeleton model C (t),
wherein N is a positive integer;
(3) aligning the skeleton model C (t) by using an integration depth typical time warping method to obtain an aligned feature sample;
(4) correcting the alignment feature sample in the step (3) by using the correlation feature to obtain a corrected alignment feature sample;
(5) classifying the corrected feature samples by using a support vector machine to obtain a behavior recognition result;
aligning a bone model C (t) with the aligned feature samples, calculating the correlation between the bone model C (t) and each frame of the aligned feature samples, calculating a correction coefficient by using a correlation coefficient x e R as a variable, multiplying the correction coefficient with the feature data of each frame to change the weight of the corresponding frame, wherein the correction coefficient obeys a correction function ex
The integration depth typical time warping method in step (3) comprises the following steps:
carrying out nonlinear transformation on the skeleton model C (t) by utilizing the weight W (t) and an activation function of a deep network to obtain a transformation sample C '(t), taking a first group of data C' 1 in the transformation sample C '(t) as standard data, respectively aligning the transformation sample C' (t) with the standard data C '1 by using a dynamic time warping method, and averaging to obtain an integration characteristic C, wherein the integration characteristic C is obtained according to the integration characteristic C and the standard data C' 1:
Figure FDA0002662917820000021
Figure FDA0002662917820000022
Figure FDA0002662917820000023
covariance calculation, SVD decomposition U, V, transform samples C '(t) and C' (t) from U, V
Figure FDA0002662917820000024
Figure FDA0002662917820000025
Figure FDA0002662917820000026
Calculating gradients G (t) corresponding to the transformation samples C' (t) one by one, updating weights W (t) according to the gradients G (t) and the learning rate of the deep network, and converging all training characteristics C transformed by the deep network1f,C2f,C3f.. training feature C corresponding to Standard data C' 11fAfter the dynamic time is used for regular alignment, averaging is carried out, and the calculation result is an alignment characteristic sample;
wherein the total number of layers of the deep network is at least 2; t represents the time period in each data sample; f. of1Representing the original sample X1Result of the non-linear change, fiRepresenting the result of the non-linear change of the ith original sample; f. ofdtwObtaining an integrated function for the dynamic time warping calculation process; | K | luminance*Indicating that the trace calculation is carried out on the matrix K; k represents a coefficient matrix for singular value decomposition; s represents a non-zero singular value matrix after SVD decomposition; xiRepresenting first raw bone motion data; theta1Representing the value of the parameter.
2. The behavior recognition method based on integrated depth canonical time warping and correlation correction as claimed in claim 1, wherein:
the deep layer network is a BP network, and the BP network comprises an input layer, an output layer and two hidden layers.
3. The behavior recognition method based on integrated depth-canonical time warping and correlation correction according to claim 1, wherein: the difference processing of step (2) includes using an interpolation method including: the characteristics with the same frame number are obtained through interpolation, and the interpolation formula is as follows:
Figure FDA0002662917820000031
wherein the content of the first and second substances,
Figure FDA0002662917820000032
definition of Q1,Q2,Q3...QnE SE (3) is t1,t2,t3...tnInstantaneous relative position of time of day.
4. The behavior recognition method based on integrated depth canonical time warping and related corrections of claim 1, wherein: the building of the skeleton model C (t) based on the lie algebra relative feature description method in the step (2) comprises the following steps: selecting e in a skeletal modelmBones and enBone analysis, emEnd point of bone is em1And em2,enEnd point of bone is en1And en2By end point en1Is the origin, enThe direction vector of the bone is the axis, bone emAnd bones enThe surface is used as a coordinate surface to establish a coordinate system, emEnd point e of bonem1And em2The relative position of (a) is expressed as:
Figure FDA0002662917820000041
enend point of (c) relative to (e)mThe positions of (A) are:
Figure FDA0002662917820000042
embones and enThe displacement mapping relation of the skeleton is emBones and enThe relative position relationship of bones is:
Figure FDA0002662917820000043
Figure FDA0002662917820000044
according to emBones and enCalculating the relative position relation of each skeleton and the relative displacement mapping relation ve (B) of other skeletons, defining that a skeleton model has n sections of skeleton rigid bodies, and obtaining the shape of the skeleton model at the time t as C (t) as follows:
C(t)=(ve(B1,2),ve(B1,3),...,ve(Bn,n-1)),
Figure FDA0002662917820000045
ve(B)=(u1,u2,u3,v1,v2,v3)
wherein C (t) has 6 XnX (N-1) vectors, and N and m are positive integers less than N.
5. The behavior recognition method based on integrated depth canonical time warping and related corrections of claim 1, wherein: the image pre-processing also includes image noise reduction and image enhancement.
CN201710425906.1A 2017-06-08 2017-06-08 Behavior identification method based on integration depth typical time warping and related correction Active CN107229920B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710425906.1A CN107229920B (en) 2017-06-08 2017-06-08 Behavior identification method based on integration depth typical time warping and related correction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710425906.1A CN107229920B (en) 2017-06-08 2017-06-08 Behavior identification method based on integration depth typical time warping and related correction

Publications (2)

Publication Number Publication Date
CN107229920A CN107229920A (en) 2017-10-03
CN107229920B true CN107229920B (en) 2020-11-13

Family

ID=59936262

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710425906.1A Active CN107229920B (en) 2017-06-08 2017-06-08 Behavior identification method based on integration depth typical time warping and related correction

Country Status (1)

Country Link
CN (1) CN107229920B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108830206A (en) * 2018-06-06 2018-11-16 成都邑教云信息技术有限公司 A kind of course axis Internet Educational System
CN109614899B (en) * 2018-11-29 2022-07-01 重庆邮电大学 Human body action recognition method based on lie group features and convolutional neural network
CN109871750B (en) * 2019-01-02 2023-08-18 东南大学 Gait recognition method based on skeleton diagram sequence abnormal joint repair
CN110197195B (en) * 2019-04-15 2022-12-23 深圳大学 Novel deep network system and method for behavior recognition
CN110472497A (en) * 2019-07-08 2019-11-19 西安工程大学 A kind of motion characteristic representation method merging rotation amount
CN111709323B (en) * 2020-05-29 2024-02-02 重庆大学 Gesture recognition method based on Liqun and long-short-term memory network
CN111832427B (en) * 2020-06-22 2022-02-18 华中科技大学 EEG classification transfer learning method and system based on Euclidean alignment and Procrustes analysis
CN117647788B (en) * 2024-01-29 2024-04-26 北京清雷科技有限公司 Dangerous behavior identification method and device based on human body 3D point cloud

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101673403A (en) * 2009-10-10 2010-03-17 安防制造(中国)有限公司 Target following method in complex interference scene
CN103778227A (en) * 2014-01-23 2014-05-07 西安电子科技大学 Method for screening useful images from retrieved images
CN105930767A (en) * 2016-04-06 2016-09-07 南京华捷艾米软件科技有限公司 Human body skeleton-based action recognition method
CN106384093A (en) * 2016-09-13 2017-02-08 东北电力大学 Human action recognition method based on noise reduction automatic encoder and particle filter

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11397462B2 (en) * 2012-09-28 2022-07-26 Sri International Real-time human-machine collaboration using big data driven augmented reality technologies
US8929600B2 (en) * 2012-12-19 2015-01-06 Microsoft Corporation Action recognition based on depth maps

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101673403A (en) * 2009-10-10 2010-03-17 安防制造(中国)有限公司 Target following method in complex interference scene
CN103778227A (en) * 2014-01-23 2014-05-07 西安电子科技大学 Method for screening useful images from retrieved images
CN105930767A (en) * 2016-04-06 2016-09-07 南京华捷艾米软件科技有限公司 Human body skeleton-based action recognition method
CN106384093A (en) * 2016-09-13 2017-02-08 东北电力大学 Human action recognition method based on noise reduction automatic encoder and particle filter

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Deep Canonical Time Warping;George Trigeorgis 等;《2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR)》;20161212;参见第5110-5115页 *
Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group;Raviteja Vemulapalli 等;《2014 IEEE Conference on Computer Vision and Pattern Recognition》;20140925;参见摘要,第589-593页 *

Also Published As

Publication number Publication date
CN107229920A (en) 2017-10-03

Similar Documents

Publication Publication Date Title
CN107229920B (en) Behavior identification method based on integration depth typical time warping and related correction
CN110503680B (en) Unsupervised convolutional neural network-based monocular scene depth estimation method
CN110009674B (en) Monocular image depth of field real-time calculation method based on unsupervised depth learning
CN111275518A (en) Video virtual fitting method and device based on mixed optical flow
CN111062326B (en) Self-supervision human body 3D gesture estimation network training method based on geometric driving
CN111062340A (en) Abnormal gait behavior identification method based on virtual posture sample synthesis
CN111476077A (en) Multi-view gait recognition method based on deep learning
CN112750198B (en) Dense correspondence prediction method based on non-rigid point cloud
CN112330813A (en) Wearing three-dimensional human body model reconstruction method based on monocular depth camera
CN111488857A (en) Three-dimensional face recognition model training method and device
CN113570658A (en) Monocular video depth estimation method based on depth convolutional network
CN114821640A (en) Skeleton action identification method based on multi-stream multi-scale expansion space-time diagram convolution network
CN116012950A (en) Skeleton action recognition method based on multi-heart space-time attention pattern convolution network
CN115661862A (en) Pressure vision convolution model-based sitting posture sample set automatic labeling method
CN110288026A (en) A kind of image partition method and device practised based on metric relation graphics
CN112270691B (en) Monocular video structure and motion prediction method based on dynamic filter network
CN113256789A (en) Three-dimensional real-time human body posture reconstruction method
CN111553954A (en) Direct method monocular SLAM-based online luminosity calibration method
CN110211122A (en) A kind of detection image processing method and processing device
CN113192186B (en) 3D human body posture estimation model establishing method based on single-frame image and application thereof
CN115496859A (en) Three-dimensional scene motion trend estimation method based on scattered point cloud cross attention learning
CN111428555B (en) Joint-divided hand posture estimation method
CN114882545A (en) Multi-angle face recognition method based on three-dimensional intelligent reconstruction
CN113255514B (en) Behavior identification method based on local scene perception graph convolutional network
CN111899284B (en) Planar target tracking method based on parameterized ESM network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20210111

Address after: 401120 room 701, room 1, 7 / F, building 11, No. 106, west section of Jinkai Avenue, Yubei District, Chongqing

Patentee after: Chongqing Space Visual Creation Technology Co.,Ltd.

Address before: 400044 No. 174 Sha Jie street, Shapingba District, Chongqing

Patentee before: Chongqing University

CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: 401121 room 701, room 1, floor 7, building 11, No. 106, west section of Jinkai Avenue, Yubei District, Chongqing

Patentee after: Space Shichuang (Chongqing) Technology Co.,Ltd.

Address before: 401120 room 701, room 1, 7 / F, building 11, No. 106, west section of Jinkai Avenue, Yubei District, Chongqing

Patentee before: Chongqing Space Visual Creation Technology Co.,Ltd.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20231123

Address after: 401331 CQ-01-B2-XN020, Building 2, No. 37 Jingyang Road, Huxi Street, Shapingba District, Chongqing City

Patentee after: Chongqing Space New Vision Artificial Intelligence Technology Research Institute Co.,Ltd.

Address before: 401121 room 701, room 1, floor 7, building 11, No. 106, west section of Jinkai Avenue, Yubei District, Chongqing

Patentee before: Space Shichuang (Chongqing) Technology Co.,Ltd.