CN115050055B - Human skeleton sequence construction method based on Kalman filtering - Google Patents

Human skeleton sequence construction method based on Kalman filtering Download PDF

Info

Publication number
CN115050055B
CN115050055B CN202210788077.4A CN202210788077A CN115050055B CN 115050055 B CN115050055 B CN 115050055B CN 202210788077 A CN202210788077 A CN 202210788077A CN 115050055 B CN115050055 B CN 115050055B
Authority
CN
China
Prior art keywords
skeleton
frame
kalman filtering
processing queue
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210788077.4A
Other languages
Chinese (zh)
Other versions
CN115050055A (en
Inventor
彭倍
刁宏健
邵继业
杨文章
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202210788077.4A priority Critical patent/CN115050055B/en
Publication of CN115050055A publication Critical patent/CN115050055A/en
Application granted granted Critical
Publication of CN115050055B publication Critical patent/CN115050055B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a human skeleton sequence construction method based on Kalman filtering, which comprises the following steps: s1, carrying out attitude estimation, and carrying out normalization processing on joint point characteristics; s2, numbering the skeletons of which all the characteristics of the first frame of video are not all invalid values, and adding the skeletons into a processing queue; s3, inputting all numbered skeleton sequences and skeleton sets in the processing queue into a Kalman filtering module for constructing human skeleton sequences, and updating the processing queue according to the processing result; s4, repeatedly executing S3 on each frame of video until all frames are processed, wherein each numbered skeleton sequence in the processing queue is the constructed human skeleton sequence. Compared with the traditional Kalman filtering, the invention further processes the observed value of the attitude estimation method by the newly defined decision module, so that the Kalman filtering module has the capability of tracking the motion of the human body; the problem of feature extraction errors caused by false detection and missing detection in the attitude estimation method is corrected through a Kalman filtering algorithm.

Description

Human skeleton sequence construction method based on Kalman filtering
Technical Field
The invention belongs to the field of computer vision, and particularly relates to a human skeleton sequence construction method based on Kalman filtering.
Background
Behavior recognition based on video is one of the representative tasks of understanding video information, and the task of recognizing human actions in video is called video behavior recognition. The current video behavior recognition method based on deep learning comprises the following steps: dual stream networks (Two-stream networks), three-dimensional convolutional neural networks (3D Convolutional Neural Networks), other non-end-to-end methods, and the like.
The proposal of the space-time convolutional network (Spatial Temporal Graph Convolutional Networks, STGCN), P-CNN (Pose-based CNN), provides a new non-end-to-end approach for behavior recognition. According to the method, the frame-by-frame joint points are extracted through an advanced posture estimation method, the joint points are clustered to form a framework to serve as network input, and joint characteristics are extracted and fused to be used for video behavior recognition.
Because the gesture information is closely related to human behaviors, the method has a good effect on behavior recognition independent of background information. The main stream human body posture estimation method such as openpose does not correlate the skeletons among different frames, but some methods only analyze the behaviors of the same person, the methods are seriously dependent on the joint point information extracted by the posture estimation method, are quite sensitive to misidentification and missing identification results of the posture estimation method caused by disturbance in a video frame, and the posture estimation method can number each skeleton randomly when a plurality of people are simultaneously contained in the video, so that the behavior identification result is wrong.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a human skeleton sequence construction method based on Kalman filtering, which is used for carrying out observation value decision on the output result of a current frame posture estimation method by predicting the characteristics through a Kalman filtering module and updating a processing queue by using the decision result so as to obtain a human skeleton sequence. The problem of feature extraction errors caused by false detection and missing detection in the attitude estimation method is corrected.
The aim of the invention is realized by the following technical scheme: a human skeleton sequence construction method based on Kalman filtering comprises the following steps:
S1, carrying out gesture estimation on videos containing human behavior information frame by frame, and carrying out normalization processing on all joint point characteristics to obtain a skeleton set with joint point characteristic information;
s2, numbering the skeletons of which all the characteristics of the first frame of video are not all invalid values, and adding the skeletons into a processing queue;
s3, inputting all numbered skeleton sequences and skeleton sets in the processing queue into a Kalman filtering module for constructing human skeleton sequences, and updating the processing queue into a frame according to the processing result;
S4, repeatedly executing S3 on each frame of video until all frames are processed, wherein each numbered skeleton sequence in the processing queue is the constructed human skeleton sequence.
Further, in the step S3, the kalman filtering module for constructing the human skeleton sequence includes a forward prediction module, a decision module and an update module.
The prediction module is a Kalman filtering algorithm prediction part, the prediction input is all numbered skeletons in the processing queue, and the output is specifically:
Wherein, To estimate/>, from t-1 moment posterior statePrior state estimation at T moment predicted by motion model, T is total frame number, A is state transition matrix,/>To estimate covariance/>, based on t-1 moment posterior stateThe calculated prior state estimation covariance at the moment T is represented by a process covariance matrix Q, and a superscript T represents transposition;
Posterior state estimation The definition is as follows:
When t=0, x and y are defined as the characteristics of joint points of the joint v with Z x,t,v,n,Zy,t,v,n,Zx,t,v,n、Zy,t,v,n being the serial number n after the normalization of the t frame; n is a skeleton sequence number, and n is equal to m during initialization; v x,vy is defined as 0 when t=0, when t noteq0, The iteration result of the Kalman filtering module is represented by posterior state estimation of the joint v with the number m in the t frame;
t=0 Defined as an identity matrix, when t is not equal to 0,/>The iteration result of the Kalman filtering module is represented by the posterior state estimation covariance of the joint v with the number m in the t frame;
the state transition matrix a is defined as:
dt is the video frame time interval;
the process covariance matrix Q is defined as:
For the variance of the coordinates of joints of human body,/> Is a coordinate movement velocity variance parameter.
The decision module specifically comprises the following steps:
S31, calculating the matching degree of the processing queue and all detected skeletons of the current frame to obtain a Marsh distance matrix D t; the method comprises the following steps: calculating a candidate set of each skeleton in the processing queue, wherein elements in the set are current frame skeletons successfully matched, and outputting a result through a prediction module And/>Calculating the prior estimation/>, of each joint point v characteristic Z t,v,n of all frameworks n of the current frame t, and each framework m in the corresponding processing queueHorse-type distance D t,v,nm;
wherein, the mahalanobis distance matrix used for measuring the matching degree is:
Dt,nm=min(Dt,v,nm),v∈V
H is the observation matrix:
S32, obtaining optimal matching of the D t matrix by using a Hungary algorithm, obtaining a matching threshold alpha, if D t,nm is less than or equal to alpha, matching successfully, setting the observed value C t,v,m - of the current frame as the joint point characteristic Z t,v,n;Dt,nm > alpha, and setting the observed value as a predicted value, wherein the matching fails
S33, adding the unmatched frameworks in S32, which are caused by D t,nm > alpha or N > M, into a processing queue by using a new number.
The updating module specifically comprises: posterior estimation for computing Kalman gain K t,v,m and joint point characteristicsCovariance/>, with posterior estimation
Wherein the method comprises the steps ofC t,v,m - is the output result of the prediction module and the decision module, and R is the observed noise covariance matrix:
Estimating a variance for the pose;
finally update C t,v,m:
C t,v,m is an element in the set C v,m, and represents the characteristic of the joint v in the skeleton number m at the t-th frame of C v,m.
The beneficial effects of the invention are as follows: the invention uses the attitude estimation method to process the result and initialize the processing queue, takes the frame as a unit, predicts the characteristics through the Kalman filtering module, makes the observation value decision on the output result of the current frame attitude estimation method, and uses the decision result to update the processing queue, thereby obtaining the human skeleton sequence. The method has the following advantages:
1. Compared with the traditional Kalman filtering, the newly defined decision module further processes the observed value of the attitude estimation method, so that the Kalman filtering module has the capability of tracking the motion of the human body;
2. the problem of feature extraction errors caused by false detection and missing detection in the attitude estimation method is corrected through a Kalman filtering algorithm;
3. Aiming at a video behavior recognition network such as STGCN which needs to extract the same skeleton motion information, the problem that the network cannot be directly used because the sequence number of the human skeleton recognized by the gesture estimation method is random in a multi-person scene is solved.
Drawings
FIG. 1 is a block diagram of an algorithm flow of the present invention;
FIG. 2 is a block diagram of a Kalman filtering module for constructing a human skeleton sequence according to the invention;
FIG. 3 is a diagram showing the comparison between the construction result of the frame skip in the skeleton detection and the original data;
FIG. 4 is a diagram showing the comparison of the construction result of the false detection in the skeleton detection with the original data;
FIG. 5 is a diagram showing the comparison between the construction result and the original data when the sequence numbers of the multi-person scene skeletons are frequently exchanged;
FIG. 6 is a graph comparing the construction result with the original data when the simulation multi-person scene is partially overlapped;
FIG. 7 is a diagram showing the alignment of the raw data with the recognition results of the framework sequences constructed according to the present invention at STGCN.
Detailed Description
The invention provides a human skeleton sequence construction method based on Kalman filtering, which predicts probability distribution of a target joint point according to a time sequence by using Kalman filtering, processes outliers and missing values by setting a probability threshold to match with an optimal matching skeleton instead of directly inputting the skeleton of posture estimation into a posture recognition network, can give a more stable skeleton sequence, and can realize target skeleton tracking in a multi-person environment, thereby avoiding behavior recognition errors caused by confusion of numbers of multiple persons to a certain extent. The technical scheme of the invention is further described below with reference to the accompanying drawings.
As shown in FIG. 1, the human skeleton sequence construction method based on Kalman filtering comprises the following steps:
S1, carrying out gesture estimation on videos containing human behavior information frame by frame, and carrying out normalization processing on all joint point characteristics to obtain a skeleton set with joint point characteristic information;
The skeleton set is defined as Z F,T,V,N, where f= { f|f=x, f=y } is a joint feature, i.e. two-dimensional coordinate information of the image; t is the total frame number, V is the total joint number, and if the human posture estimation method openpose is used, the joint number effective value is v= [ V 0,v1,...,v24 ]; n is the total skeleton number, which is the random number according to the number of the identified skeletons in a certain frame, and the numbers of different frames are not linked;
The normalization method comprises the following steps:
Wherein, Z x,t,v,n、Zy,t,v,n is the characteristic of the joint point after the normalization of the joint V with the sequence number n in the t frame, X t,v,n、Yt,v,n is the posture estimation result of the joint V with the sequence number n in the t frame, V xmax、Vymax is the maximum effective value on the corresponding characteristic, generally the image width and height, and V xmin、Vymin is the minimum effective value on the corresponding characteristic, generally zero.
S2, numbering the skeletons of which all the characteristics of the first frame of video are not all invalid values, and adding the skeletons into a processing queue; the skeleton in the treatment queue is defined as C V,M, M is the total skeleton number, V is the total joint number, C v,m represents all feature sets of the joints V in skeleton number M in the video set, m=1, 2.
S3, inputting all numbered skeleton sequences and skeleton sets in the processing queue into a Kalman filtering module for constructing human skeleton sequences, and updating the processing queue into a frame according to the processing result;
The Kalman filtering module for constructing the human skeleton sequence comprises a forward prediction module, a decision module and an updating module, which are connected in sequence, as shown in figure 2.
The prediction module is a Kalman filtering algorithm prediction part, the prediction input is all numbered skeletons in the processing queue, and the output is specifically:
Wherein, To estimate/>, from t-1 moment posterior statePrior state estimation at T moment predicted by motion model, T is total frame number, A is state transition matrix,/>To estimate covariance/>, based on t-1 moment posterior stateThe calculated prior state estimation covariance at the moment T is represented by a process covariance matrix Q, and a superscript T represents transposition;
Posterior state estimation The definition is as follows:
X, y is defined as Z x,t,v,n,Zy,t,v,n when t=0, n is a skeleton number, n is equal to m when initialized, v x,vy is defined as 0 when t=0, t is not equal to 0, The iteration result of the Kalman filtering module is represented by posterior state estimation of the joint v with the number m in the t frame;
t=0 Defined as an identity matrix, when t is not equal to 0,/>The iteration result of the Kalman filtering module is represented by the posterior state estimation covariance of the joint v with the number m in the t frame;
the state transition matrix a is defined as:
dt is the video frame time interval;
the process covariance matrix Q is defined as:
For the variance of the coordinates of joints of human body,/> Is a coordinate movement velocity variance parameter.
The decision module specifically comprises the following steps:
s31, calculating the matching degree of the processing queue (numbered skeleton sequence) and all detected skeletons of the current frame to obtain a Marsh distance matrix D t; the method comprises the following steps: calculating a candidate set of each skeleton in the processing queue, wherein elements in the set are current frame skeletons successfully matched, and outputting a result through a prediction module And/>Calculating the prior estimation/>, of each joint point v characteristic Z t,v,n of all frameworks n of the current frame t, and each framework m in the corresponding processing queueHorse-type distance D t,v,nm.
Wherein, the mahalanobis distance matrix used for measuring the matching degree is:
Dt,nm=min(Dt,v,nm),v∈V
H is the observation matrix:
S32, the essence of the matching problem between the numbered skeleton sequences and all skeletons of the current frame is an assignment problem, the Hungary algorithm is used for obtaining the optimal matching of the D t matrix, the matching threshold alpha is obtained, the matching is successful if D t,nm is less than or equal to alpha, the matching fails if the observed value C t,v,m - of the current frame is set as the joint point characteristic Z t,v,n;Dt,nm & gtalpha, and the observed value is set as the predicted value
S33, adding the unmatched frameworks in S32, which are caused by D t,nm > alpha or N > M, into a processing queue by using a new number.
The updating module specifically comprises: posterior estimation for computing Kalman gain K t,v,m and joint point characteristicsCovariance/>, with posterior estimation
Wherein the method comprises the steps ofC t,v,m - is the output result of the prediction module and the decision module, and R is the observed noise covariance matrix:
Estimating a variance for the pose;
And finally updating Ct ,v,m:
C t,v,m is an element in the set C v,m, and represents the characteristics of the joint v in the skeleton number m at the t-th frame of C v, m.
S4, repeatedly executing S3 on each frame of video until all frames are processed, wherein each numbered skeleton sequence in the processing queue is the constructed human skeleton sequence.
In this example, experiments were performed using python3.8 in a Windows operating system using a data set of Kinetics-700. Video in which detection errors, detection missing frames, and multi-person scenes are typically compared are selected as the material for constructing the skeleton sequence. The common parameters used in the verification process are listed in table 1.
Table 1 common parameter table
Inputting the video into openpose human body posture estimation method, analyzing and normalizing the output result, inputting the result into Kalman filtering module for constructing human body skeleton sequence to obtain constructed skeleton sequence.
Fig. 3 is a diagram comparing the posture estimation result with the constructed neck joint of the skeleton sequence when the frame jump occurs in the posture estimation, fig. 3 (a) is an image drawn according to the neck joint point characteristics in the result output by openpose when the frame 2 of the data is shown, no human skeleton is detected at this time, fig. 3 (b) is a diagram comparing the posture estimation result (marked with an asterisk) with the constructed skeleton sequence (marked with a dotted line) on the x-axis with respect to the frame after normalization, and fig. 3 (c) is a diagram comparing the frame on the corresponding y-axis. In the 2 nd frame, because the scene is complex, openpose does not detect the human skeleton, namely the x, y feature coordinates are 0; at this time, the Kalman filtering module output value filters the outlier, and replaces the outlier with the prior estimated value to update the skeleton sequence.
Fig. 4 is a diagram comparing the posture estimation result with the constructed skeleton sequence neck joint when the posture estimation is false, fig. 4 (a) is an image drawn according to the neck joint point characteristics in the openpose output result when the data is shown in the 4 th frame, fig. 4 (b) is a diagram comparing the normalized posture estimation result (marked with an asterisk) with the constructed skeleton sequence (marked with a dotted line) on the x-axis, and fig. 4 (c) is a diagram comparing the frame on the corresponding y-axis. And in the 4 th and 5 th frames, because of background interference, openpose detection errors, the x and y coordinates have larger offset, and the x and y values are corrected after the X and y coordinates pass through a Kalman filtering module.
Fig. 5 is a diagram showing a comparison between a skeleton sequence and a result of posture estimation when the skeleton sequence obtained by posture estimation is repeatedly switched, and fig. 5 (a) is an image drawn according to openpose output results. Fig. 5 (b) is a diagram showing features of neck joints of two skeletons in openpose output results, a horizontal axis is a normalized x feature, a vertical axis is a normalized y feature, an asterisk shows features of neck joint Z f,t,1,0 with a posture estimation output skeleton number of 0, a plus sign shows features of Z f,t,1,1 with a number of 1, solid lines and broken lines respectively show feature lines of neck joint C t,1,0 and C t,1,1 in a skeleton sequence with numbers of 0 and 1 after a skeleton sequence is constructed, and subscript v=1 shows neck joint. Fig. 5 (c) and 5 (d) are graphs comparing the results of the construction of the x-feature and the y-feature according to the number of frames in fig. 5 (b), respectively.
In fig. 5 (b), it can be seen that the detected two human skeletons are distributed in the upper left and lower right regions, but openpose detects for each frame, and the detection results between frames are independent, so that the skeleton numbers are not fixed, and the skeleton numbers in the upper left and lower right regions are frequently switched. The result shows that the characteristic change between two skeleton frames is effectively tracked after the Kalman filtering module of the skeleton sequence is constructed.
FIG. 6 is a comparison of the building of skeleton sequences before and after the simulation of the partial overlapping of a multi-person scene, wherein the program generates two paths from (0.35,0.47) to (0.39,0.32) and from (0.5,0.37) to (0.31,0.46), samples the paths and adds Gaussian noise, the Gaussian noise conforms to N to (0,0.0001), discrete points are obtained on the paths as simulations of joint coordinates, the skeleton number is randomly 0 or 1, the points are used for simulating joint movements, and FIG. 6 (a) is the generated discrete point coordinates. Fig. 6 (b) and 6 (c) are graphs comparing the construction results of the x-feature and the y-feature according to the number of frames in fig. 6 (a), respectively.
It can be seen that under the environment of a multi-person scene, if the correct joint characteristics can be detected, even if the distances between the joint points are small and the motion tracks overlap, the invention can still capture the inter-frame joint information and has the capability of distinguishing different skeletons.
All the constructed skeleton sequences are input STGCN with the original data, the model weights are provided by the authorities, st_gcn.graphics.pt, and the partial video is inferred at STGCN as shown in Table 2.
Sequence number: video sequence number, all video is abserling video under label in kinetics-700. The video 1 camera is stable, and the behavior of the person is clear. Video 2 is the video shown in fig. 3, and the attitude estimation result is unstable and has the condition of missing detection and false detection. Video 3 is the video shown in fig. 5, and has the problem of multi-user skeleton number switching. 1',2',3' are the results of the STGCN reasoning of the skeleton sequences constructed by the method of the invention corresponding to the 1,2,3 videos:
Frame number: representing the total number of identified video frames;
Output frame number: STGCN number of frames of output characteristics, the original frame number T is compressed to
Abserling: the number of frames in the output frame number identified as abserling acts;
WATER SKIING: the number of frames identified as WATER SKIING acts among the output frames, which acts are easily confused with abserling;
other behavior amounts: the number of actions in the output frame number that are identified as other actions;
Other behavior frame numbers: outputting a number of frames of the number of frames identified as other behavior;
voting result: and outputting the behavior with the largest number of frames in the frame number as the voting result, and considering the behavior as the main behavior in the video.
TABLE 2 comparison of the results of reasoning on STGCN for part of the video
Abserling and WATER SKIING are similar in feature with background omitted, so that the recognition results of abserling and WATER SKIING can be considered correct at the same time in this case. The comparison result of the original data and the human skeleton recognition result constructed by the invention according to the table 2 is shown in fig. 7, wherein in two adjacent bar graphs of each frame in the figure, the left side is the data before processing, and the right side is the processed data. From comparison results, the skeleton sequence constructed by the method has different degrees of improvement in recognition accuracy when compared with openpose posture estimation results for direct recognition. Meanwhile, the skeleton sequence constructed by the method can reduce the false recognition of the gesture estimation method, and the adverse effect of the false recognition on the recognition result is reduced, namely the number of false recognition behaviors is reduced. For STGCN needing to extract time information, the problem that the skeleton sequence number is output randomly by the gesture estimation method is effectively solved.
Those of ordinary skill in the art will recognize that the embodiments described herein are for the purpose of aiding the reader in understanding the principles of the present invention and should be understood that the scope of the invention is not limited to such specific statements and embodiments. Those of ordinary skill in the art can make various other specific modifications and combinations from the teachings of the present disclosure without departing from the spirit thereof, and such modifications and combinations remain within the scope of the present disclosure.

Claims (2)

1. The human skeleton sequence construction method based on Kalman filtering is characterized by comprising the following steps of:
S1, carrying out gesture estimation on videos containing human behavior information frame by frame, and carrying out normalization processing on all joint point characteristics to obtain a skeleton set with joint point characteristic information;
s2, numbering the skeletons of which all the characteristics of the first frame of video are not all invalid values, and adding the skeletons into a processing queue;
S3, inputting all numbered skeleton sequences and skeleton sets in the processing queue into a Kalman filtering module for constructing human skeleton sequences, and updating the processing queue into a frame according to the processing result; the Kalman filtering module for constructing the human skeleton sequence comprises a forward prediction module, a decision module and an updating module;
The prediction module is a Kalman filtering algorithm prediction part, the prediction input is all numbered skeletons in the processing queue, and the output is specifically:
Wherein, To estimate/>, from t-1 moment posterior statePrior state estimation at T moment predicted by motion model, T is total frame number, A is state transition matrix,/>To estimate covariance/>, based on t-1 moment posterior stateThe calculated prior state estimation covariance at the moment T is represented by a process covariance matrix Q, and a superscript T represents transposition;
Posterior state estimation The definition is as follows:
when t=0, x and y are defined as the characteristics of joint points of the joint v with Z x,t,v,n,Zy,t,v,n,Zx,t,v,n、Zy,t,v,n being the serial number n after the normalization of the t frame; n is a skeleton sequence number, and n is equal to m during initialization; v x,vy is defined as 0 when t=0, when t noteq0, The iteration result of the Kalman filtering module is represented by posterior state estimation of the joint v with the number m in the t frame;
t=0 Defined as an identity matrix, when t is not equal to 0,/>The iteration result of the Kalman filtering module is represented by the posterior state estimation covariance of the joint v with the number m in the t frame;
the state transition matrix a is defined as:
dt is the video frame time interval;
the process covariance matrix Q is defined as:
For the variance of the coordinates of joints of human body,/> The coordinate motion speed variance parameter is used;
the decision module specifically comprises the following steps:
S31, calculating the matching degree of the processing queue and all detected skeletons of the current frame to obtain a Marsh distance matrix D t; the method comprises the following steps: calculating a candidate set of each skeleton in the processing queue, wherein elements in the set are current frame skeletons successfully matched, and outputting a result through a prediction module And/>Calculating the prior estimation/>, of each joint point v characteristic Z t,v,n of all frameworks n of the current frame t, and each framework m in the corresponding processing queueHorse-type distance D t,v,nm;
wherein, the mahalanobis distance matrix used for measuring the matching degree is:
Dt,nm=min(Dt,v,nm),v∈V
H is the observation matrix:
S32, obtaining optimal matching of the D t matrix by using a Hungary algorithm, obtaining a matching threshold alpha, if D t,nm is less than or equal to alpha, matching successfully, setting the observed value C t,v,m - of the current frame as the joint point characteristic Z t,v,n;Dt,nm > alpha, and setting the observed value as a predicted value, wherein the matching fails
S33, adding a new number to the processing queue for the unmatched skeleton caused by D t,nm > alpha or N > M in S32;
S4, repeatedly executing S3 on each frame of video until all frames are processed, wherein each numbered skeleton sequence in the processing queue is the constructed human skeleton sequence.
2. The human skeleton sequence construction method based on kalman filtering according to claim 1, wherein the updating module specifically comprises: posterior estimation for computing Kalman gain K t,v,m and joint point characteristicsEstimating covariance by posterior
Wherein the method comprises the steps ofFor the output result of the prediction module and the decision module, R is the observed noise covariance matrix:
Estimating a variance for the pose;
finally update C t,v,m:
C t,v,m is an element in the set C v,m, and represents the characteristic of the joint v in the skeleton number m at the t-th frame of C v,m.
CN202210788077.4A 2022-07-06 2022-07-06 Human skeleton sequence construction method based on Kalman filtering Active CN115050055B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210788077.4A CN115050055B (en) 2022-07-06 2022-07-06 Human skeleton sequence construction method based on Kalman filtering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210788077.4A CN115050055B (en) 2022-07-06 2022-07-06 Human skeleton sequence construction method based on Kalman filtering

Publications (2)

Publication Number Publication Date
CN115050055A CN115050055A (en) 2022-09-13
CN115050055B true CN115050055B (en) 2024-04-30

Family

ID=83164491

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210788077.4A Active CN115050055B (en) 2022-07-06 2022-07-06 Human skeleton sequence construction method based on Kalman filtering

Country Status (1)

Country Link
CN (1) CN115050055B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109522793A (en) * 2018-10-10 2019-03-26 华南理工大学 More people's unusual checkings and recognition methods based on machine vision
CN110458944A (en) * 2019-08-08 2019-11-15 西安工业大学 A kind of human skeleton method for reconstructing based on the fusion of double-visual angle Kinect artis
CN110530365A (en) * 2019-08-05 2019-12-03 浙江工业大学 A kind of estimation method of human posture based on adaptive Kalman filter
CN111932580A (en) * 2020-07-03 2020-11-13 江苏大学 Road 3D vehicle tracking method and system based on Kalman filtering and Hungary algorithm
CN112633205A (en) * 2020-12-28 2021-04-09 北京眼神智能科技有限公司 Pedestrian tracking method and device based on head and shoulder detection, electronic equipment and storage medium
CN114038056A (en) * 2021-10-29 2022-02-11 同济大学 Skip and squat type ticket evasion behavior identification method
CN114609912A (en) * 2022-03-18 2022-06-10 电子科技大学 Angle-only target tracking method based on pseudo-linear maximum correlation entropy Kalman filtering

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109522793A (en) * 2018-10-10 2019-03-26 华南理工大学 More people's unusual checkings and recognition methods based on machine vision
CN110530365A (en) * 2019-08-05 2019-12-03 浙江工业大学 A kind of estimation method of human posture based on adaptive Kalman filter
CN110458944A (en) * 2019-08-08 2019-11-15 西安工业大学 A kind of human skeleton method for reconstructing based on the fusion of double-visual angle Kinect artis
CN111932580A (en) * 2020-07-03 2020-11-13 江苏大学 Road 3D vehicle tracking method and system based on Kalman filtering and Hungary algorithm
CN112633205A (en) * 2020-12-28 2021-04-09 北京眼神智能科技有限公司 Pedestrian tracking method and device based on head and shoulder detection, electronic equipment and storage medium
CN114038056A (en) * 2021-10-29 2022-02-11 同济大学 Skip and squat type ticket evasion behavior identification method
CN114609912A (en) * 2022-03-18 2022-06-10 电子科技大学 Angle-only target tracking method based on pseudo-linear maximum correlation entropy Kalman filtering

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Multiple kinect sensor fusion for human skeleton tracking using Kalman filtering;Sungphill Moon等;《SAGE Journals》;20170515;全文 *
基于深度学习的斗殴行为识别技术研究;刁宏健;《万方数据》;20231002;全文 *
基于视频序列的运动目标追踪算法;李扬;;电子科技;20120815(08);全文 *

Also Published As

Publication number Publication date
CN115050055A (en) 2022-09-13

Similar Documents

Publication Publication Date Title
CN109460702B (en) Passenger abnormal behavior identification method based on human body skeleton sequence
Kim et al. Simultaneous gesture segmentation and recognition based on forward spotting accumulative HMMs
Jalal et al. Shape and motion features approach for activity tracking and recognition from kinect video camera
Haritaoglu et al. W/sup 4: real-time surveillance of people and their activities
WO2017150032A1 (en) Method and system for detecting actions of object in scene
CN114582030B (en) Behavior recognition method based on service robot
Dreuw et al. Tracking using dynamic programming for appearance-based sign language recognition
CN107833239B (en) Optimization matching target tracking method based on weighting model constraint
CN112434655A (en) Gait recognition method based on adaptive confidence map convolution network
CN110633004A (en) Interaction method, device and system based on human body posture estimation
CN113608663B (en) Fingertip tracking method based on deep learning and K-curvature method
CN114187665A (en) Multi-person gait recognition method based on human body skeleton heat map
Vezzani et al. HMM based action recognition with projection histogram features
Bagherpour et al. Upper body tracking using KLT and Kalman filter
CN112926522A (en) Behavior identification method based on skeleton attitude and space-time diagram convolutional network
CN114694261A (en) Video three-dimensional human body posture estimation method and system based on multi-level supervision graph convolution
Yu et al. Towards robust human trajectory prediction in raw videos
Ali et al. Deep Learning Algorithms for Human Fighting Action Recognition.
Shi et al. Recognition of abnormal human behavior in elevators based on CNN
CN115050055B (en) Human skeleton sequence construction method based on Kalman filtering
Foytik et al. Tracking and recognizing multiple faces using Kalman filter and ModularPCA
DelRose et al. Evidence feed forward hidden Markov model: A new type of hidden Markov model
CN115713806A (en) Falling behavior identification method based on video classification and electronic equipment
Hashem et al. Human gait identification system based on transfer learning
Noriega et al. Multicues 3D Monocular Upper Body Tracking Using Constrained Belief Propagation.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant