CN115050055A - Human body skeleton sequence construction method based on Kalman filtering - Google Patents

Human body skeleton sequence construction method based on Kalman filtering Download PDF

Info

Publication number
CN115050055A
CN115050055A CN202210788077.4A CN202210788077A CN115050055A CN 115050055 A CN115050055 A CN 115050055A CN 202210788077 A CN202210788077 A CN 202210788077A CN 115050055 A CN115050055 A CN 115050055A
Authority
CN
China
Prior art keywords
frame
skeleton
kalman filtering
processing queue
human body
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210788077.4A
Other languages
Chinese (zh)
Other versions
CN115050055B (en
Inventor
彭倍
刁宏健
邵继业
杨文章
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202210788077.4A priority Critical patent/CN115050055B/en
Publication of CN115050055A publication Critical patent/CN115050055A/en
Application granted granted Critical
Publication of CN115050055B publication Critical patent/CN115050055B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a human body skeleton sequence construction method based on Kalman filtering, which comprises the following steps: s1, carrying out attitude estimation and carrying out normalization processing on the joint point characteristics; s2, numbering all skeletons of which the characteristics of the first frame of video are not all invalid values, and adding the skeletons into a processing queue; s3, inputting all numbered skeleton sequences and skeleton sets in the processing queue into a Kalman filtering module for constructing human skeleton sequences, and updating the processing queue according to the processing result; and S4, repeatedly executing S3 on each frame of video until all frames are processed, wherein each numbered skeleton sequence in the processing queue is the constructed human skeleton sequence. Compared with the traditional Kalman filtering, the newly defined decision module further processes the observed value of the attitude estimation method, so that the Kalman filtering module has the capability of tracking the motion of the human body; through the Kalman filtering algorithm, the problem of characteristic extraction errors caused by false detection and missing detection of the attitude estimation method is corrected.

Description

Human skeleton sequence construction method based on Kalman filtering
Technical Field
The invention belongs to the field of computer vision, and particularly relates to a human skeleton sequence construction method based on Kalman filtering.
Background
Video-based behavior recognition is one of the representative tasks for understanding video information, and the task of recognizing human actions in video is called video behavior recognition. The current video behavior identification method based on deep learning comprises the following steps: double-stream Networks (Two-stream Networks), three-dimensional Convolutional Neural Networks (3D Convolutional Neural Networks), other non-end-to-end methods, and the like.
The proposal of space-time Convolutional network (STGCN) and P-CNN (Pose-based CNN) provides a new non-end-to-end method for behavior identification. The method extracts the joint points frame by frame through an advanced posture estimation method, clusters the joint points to form a skeleton as network input, and extracts and fuses the joint characteristics for identifying the video behavior.
Because the posture information is closely related to human behaviors, the method achieves a good effect on behavior recognition independent of background information. The mainstream human body posture estimation method does not correlate skeletons among different frames if openposition, but some methods can only analyze behaviors of the same person, and the method depends heavily on joint point information extracted by the posture estimation method, is quite sensitive to mistaken identification and missing identification results of the posture estimation method caused by disturbance in video frames, and randomly numbers each skeleton by the posture estimation method when a plurality of persons exist in a video simultaneously, so that the behavior identification result is wrong.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a human body skeleton sequence construction method based on Kalman filtering, which carries out observation value decision by comparing the output results of the current frame attitude estimation method through Kalman filtering module prediction characteristics and updates a processing queue by using the decision results so as to obtain a human body skeleton sequence. The problem of characteristic extraction errors caused by false detection and missing detection of the attitude estimation method is solved.
The purpose of the invention is realized by the following technical scheme: a human body skeleton sequence construction method based on Kalman filtering comprises the following steps:
s1, carrying out posture estimation on the video containing the human body behavior information frame by frame, and carrying out normalization processing on all joint point characteristics to obtain a skeleton set of related node characteristic information;
s2, numbering all skeletons of which the characteristics of the first frame of video are not all invalid values, and adding the skeletons into a processing queue;
s3, inputting all numbered skeleton sequences and skeleton sets in the processing queue into a Kalman filtering module for constructing human skeleton sequences, and updating the processing queue according to the processing result, wherein the updating unit is a frame;
and S4, repeatedly executing S3 on each frame of video until all frames are processed, wherein each numbered skeleton sequence in the processing queue is the constructed human skeleton sequence.
Further, in step S3, the kalman filtering module for constructing the human skeleton sequence includes a prediction module, a decision module, and an update module connected in series.
The prediction module is a Kalman filtering algorithm prediction part, prediction input is all numbered frameworks in a processing queue, and output specifically comprises the following steps:
Figure BDA0003732351630000021
Figure BDA0003732351630000022
wherein the content of the first and second substances,
Figure BDA0003732351630000023
for estimating posterior state according to t-1 time
Figure BDA0003732351630000024
And the estimated prior state at the moment T predicted by the motion model, wherein T is the total frame number, A is a state transition matrix,
Figure BDA0003732351630000025
estimating covariance for posterior state based on time t-1
Figure BDA0003732351630000026
The calculated prior state at the time T is used for estimating covariance, Q is a process covariance matrix, and superscript T represents transposition;
posterior state estimation
Figure BDA0003732351630000027
Is defined as:
Figure BDA0003732351630000028
x and y are defined as Z when t is 0 x,t,v,n ,Z y,t,v,n ,Z x,t,v,n 、Z y,t,v,n Normalizing the joint point characteristics of the joint v with the serial number n in the t frame; n is a framework serial number, and n is equal to m during initialization; v. of x ,v y When t is 0, t is not equal to 0,
Figure BDA0003732351630000029
representing the posterior state estimation of the joint v with the number m in the t frame for the iteration result of the Kalman filtering module;
when t is 0
Figure BDA00037323516300000210
Defined as an identity matrix, when t ≠ 0,
Figure BDA00037323516300000211
estimating covariance of posterior state of the joint v with the number m in the t frame for an iteration result of the Kalman filtering module;
the state transition matrix a is defined as:
Figure BDA00037323516300000212
dt is the video frame time interval;
the process covariance matrix Q is defined as:
Figure BDA00037323516300000213
Figure BDA00037323516300000214
is the coordinate variance of the joints of the human body,
Figure BDA00037323516300000215
is coordinate movement velocity variance parameter.
The decision module specifically comprises the following steps:
s31, calculating the matching degree of the processing queue and all detected skeletons of the current frame to obtain a Mahalanobis distance matrix D t (ii) a The method specifically comprises the following steps: calculating a candidate set of each framework in the processing queue, wherein elements in the set are successfully matched current frame frameworks, and outputting a result through a prediction module
Figure BDA0003732351630000031
And
Figure BDA0003732351630000032
calculating each joint point v characteristic Z of all skeletons n of current frame t t,v,n A priori estimate of each skeleton m in the corresponding processing queue
Figure BDA0003732351630000033
Is a horse-like distance D t,v,nm
The mahalanobis distance matrix used for measuring the matching degree is as follows:
Figure BDA0003732351630000034
D t,nm =min(D t,v,nm ),v∈V
Figure BDA0003732351630000035
h is an observation matrix:
Figure BDA0003732351630000036
s32, obtaining D by using Hungarian algorithm t Optimal matching of matrix, taking matching threshold alpha, as long as D is satisfied t,nm If the matching is successful, the current frame observed value C is used t,v,m - Set as a joint point feature Z t,v,n ;D t,nm If alpha is greater than alpha, the matching fails, and the observed value is set as a predicted value
Figure BDA0003732351630000037
Figure BDA0003732351630000038
S33, substituting factor D in S32 t,nm Alpha or N > M results in unmatched skeletons being added to the processing queue with new numbers.
The updating module is specifically as follows: calculating the Kalman gain K t,v,m Posterior estimation of joint point characteristics
Figure BDA0003732351630000039
Covariance with a posteriori estimate
Figure BDA00037323516300000310
Figure BDA00037323516300000311
Figure BDA00037323516300000312
Figure BDA00037323516300000313
Wherein
Figure BDA00037323516300000314
C t,v,m - Outputting results by a prediction module and a decision module, wherein R is an observation noise covariance matrix:
Figure BDA0003732351630000041
Figure BDA0003732351630000042
estimating a variance for the pose;
finally update C t,v,m
Figure BDA0003732351630000043
C t,v,m Is set C v,m Element (2) represents C v,m At frame t, the features of joint v in skeleton number m.
The invention has the beneficial effects that: the invention uses the attitude estimation method to process the result initialization processing queue, takes the frame as the unit, predicts the characteristics through the Kalman filtering module, compares the output result of the current frame attitude estimation method to carry out the observed value decision, and uses the decision result to update the processing queue, thereby obtaining the human skeleton sequence. The method has the following advantages:
1. compared with the traditional Kalman filtering, the newly defined decision module further processes the observed value of the attitude estimation method, so that the Kalman filtering module has the capability of tracking the human motion;
2. through a Kalman filtering algorithm, the problem of characteristic extraction errors caused by false detection and missing detection of an attitude estimation method is corrected;
3. aiming at a video behavior recognition network such as STGCN (Standard template network) which needs to extract the same skeleton motion information, the problem that the network cannot be directly used due to the fact that the human skeleton serial number recognized by the posture estimation method is random in a multi-person scene is solved.
Drawings
FIG. 1 is a block diagram of the algorithm flow of the present invention;
FIG. 2 is a block diagram of a Kalman filtering module for constructing a human skeleton sequence according to the present invention;
FIG. 3 is a diagram comparing a frame skipping construction result with original data in skeleton detection according to the present invention;
FIG. 4 is a comparison graph of the construction result of the false detection in the skeleton detection and the original data in the present invention;
FIG. 5 is a diagram comparing a constructed result with original data when serial numbers of a multi-user scene framework are frequently exchanged;
FIG. 6 is a comparison graph of the constructed result and the original data when the simulated multi-person scenes are partially overlapped;
FIG. 7 is a comparison of the original data with the recognition result of the skeleton sequence constructed according to the present invention on STGCN.
Detailed Description
The invention provides a human body skeleton sequence construction method based on Kalman filtering, which is characterized in that probability distribution of target joint points is predicted by using the Kalman filtering according to a time sequence, an outlier and a missing value are processed by setting a probability threshold value to match with an optimal adaptive skeleton, rather than directly inputting a skeleton of posture estimation into a posture identification network, so that a more stable skeleton sequence can be provided. The technical scheme of the invention is further explained by combining the attached drawings.
As shown in FIG. 1, the human skeleton sequence construction method based on Kalman filtering of the invention comprises the following steps:
s1, carrying out posture estimation on the video containing the human body behavior information frame by frame, and carrying out normalization processing on all joint point characteristics to obtain a skeleton set of related node characteristic information;
the skeleton set is defined as Z F,T,V,N Where F ═ { F ═ x, F ═ y } is an articulation point feature, i.e., image twoDimensional coordinate information; t is the total frame number, V is the total joint number, if the human body posture estimation method openposition is used, the effective value of the joint number is V ═ V 0 ,v 1 ,...,v 24 ](ii) a N is a total framework serial number which is a random number carried out in a certain frame according to the quantity of the identified frameworks, and the numbers of different frames are not related;
the normalization method comprises the following steps:
Figure BDA0003732351630000051
Figure BDA0003732351630000052
Figure BDA0003732351630000053
wherein Z is x,t,v,n 、Z y,t,v,n Normalized joint point feature of joint v at frame t, X, with sequence number n t,v,n 、Y t,v,n The attitude estimation result of the joint V with the sequence number n in the t frame, V xmax 、V ymax For maximum significant value on the corresponding feature, typically image width and height, V xmin 、V ymin Is the smallest valid value on the corresponding feature, typically zero.
S2, numbering all skeletons of which the characteristics of the first frame of video are not all invalid values, and adding the skeletons into a processing queue; the skeleton in the processing queue is defined as C V,M M is the total skeleton number, V is the total joint number, C v,m All feature sets of the joints V in the skeleton number M in the video set are represented, and M is 1, 2.
S3, inputting all numbered skeleton sequences and skeleton sets in the processing queue into a Kalman filtering module for constructing human skeleton sequences, and updating the processing queue according to the processing result, wherein the updating unit is a frame;
the kalman filtering module for constructing the human skeleton sequence includes a prediction module, a decision module and an update module connected in series, as shown in fig. 2.
The prediction module is a Kalman filtering algorithm prediction part, prediction input is all numbered frameworks in a processing queue, and output specifically comprises the following steps:
Figure BDA0003732351630000054
Figure BDA0003732351630000055
wherein the content of the first and second substances,
Figure BDA0003732351630000056
for estimating posterior state according to t-1 time
Figure BDA0003732351630000057
And the estimated prior state at the moment T predicted by the motion model, wherein T is the total frame number, A is a state transition matrix,
Figure BDA0003732351630000058
estimating covariance for posterior state based on time t-1
Figure BDA0003732351630000059
The calculated prior state at the time T is used for estimating covariance, Q is a process covariance matrix, and superscript T represents transposition;
posterior state estimation
Figure BDA00037323516300000510
Is defined as:
Figure BDA0003732351630000061
x and y are defined as Z when t is 0 x,t,v,n ,Z y,t,v,n N is the number of the skeleton, n is equal to m during initialization, v x ,v y When t is 0, t is not equal to 0,
Figure BDA0003732351630000062
representing the posterior state estimation of the joint v with the number m in the t frame for the iteration result of the Kalman filtering module;
when t is 0
Figure BDA0003732351630000063
Defined as an identity matrix, when t ≠ 0,
Figure BDA0003732351630000064
estimating covariance of posterior state of the joint v with the number m in the t frame for an iteration result of the Kalman filtering module;
the state transition matrix a is defined as:
Figure BDA0003732351630000065
dt is the video frame time interval;
the process covariance matrix Q is defined as:
Figure BDA0003732351630000066
Figure BDA0003732351630000067
is the coordinate variance of the joints of the human body,
Figure BDA0003732351630000068
is coordinate movement velocity variance parameter.
The decision module specifically comprises the following steps:
s31, calculating the matching degree of the processing queue (numbered skeleton sequence) and all detected skeletons of the current frame to obtain a Mahalanobis distance matrix D t (ii) a The method specifically comprises the following steps: calculating a candidate set of each skeleton in the processing queue, wherein elements in the set are the current frames successfully matchedSkeleton, outputting results through prediction module
Figure BDA0003732351630000069
And
Figure BDA00037323516300000610
calculating each joint point v characteristic Z of all skeletons n of current frame t t,v,n A priori estimate of each skeleton m in the corresponding processing queue
Figure BDA00037323516300000611
Is a horse-like distance D t,v,nm
The mahalanobis distance matrix used for measuring the matching degree is as follows:
Figure BDA00037323516300000612
D t,nm= min(D t,v,nm ),v∈V
Figure BDA0003732351630000071
h is an observation matrix:
Figure BDA0003732351630000072
s32, the matching problem of the numbered skeleton sequences and all skeletons of the current frame is an assignment problem essentially, and the Hungarian algorithm is used for obtaining D t Optimal matching of matrix, taking matching threshold alpha, as long as D is satisfied t,nm If the matching is successful, the current frame observed value C is used t,v,m - Set as a joint point feature Z t,v,n ;D t,nm If alpha is greater than alpha, the matching fails, and the observed value is set as a predicted value
Figure BDA0003732351630000073
Figure BDA0003732351630000074
S33, converting factor D in S32 t,nm Alpha or N > M results in unmatched skeletons being added to the processing queue with new numbers.
The updating module is specifically as follows: calculating the Kalman gain K t,v,m Posterior estimation of joint point characteristics
Figure BDA0003732351630000075
Covariance with a posteriori estimate
Figure BDA0003732351630000076
Figure BDA0003732351630000077
Figure BDA0003732351630000078
Figure BDA0003732351630000079
Wherein
Figure BDA00037323516300000710
C t,v,m - Outputting results by a prediction module and a decision module, wherein R is an observation noise covariance matrix:
Figure BDA00037323516300000711
Figure BDA00037323516300000712
estimating a variance for the pose;
finally update C t,v,m
Figure BDA00037323516300000713
C t,v,m Is set C v,m Element (2) represents C v, And when m is at the t-th frame, the characteristics of the joints v in the skeleton number m.
And S4, repeatedly executing S3 on each frame of video until all frames are processed, wherein each numbered skeleton sequence in the processing queue is the constructed human skeleton sequence.
This example was experimentally verified using python3.8 in the Windows operating system using the data set Kinetics-700. And selecting the video which is relatively representative and can generate detection errors, detect missing frames and multi-person scenes as the material for constructing the skeleton sequence. The common parameters used in the verification process are listed in table 1.
TABLE 1 common parameters table
Figure BDA0003732351630000081
And inputting the video into an openposition human body posture estimation method, analyzing and outputting a result, normalizing the result, and inputting the result into a Kalman filtering module for constructing a human body skeleton sequence to obtain the constructed skeleton sequence.
Fig. 3 is a comparison diagram of a pose estimation result and a neck joint of a constructed skeleton sequence when frame skipping occurs in pose estimation, fig. 3(a) is an image drawn according to neck joint point features in an openposition output result when a 2 nd frame of data is shown, a human body skeleton is not detected at this time, fig. 3(b) is a comparison image of a normalized pose estimation result (marked with an asterisk) and a constructed skeleton sequence (marked with a dotted line) on an x axis of a frame, and fig. 3(c) is a comparison image on a corresponding y axis. In the frame 2, because the scene is complex, the human skeleton is not detected by openposition, namely the x and y characteristic coordinates are 0; at the moment, the outlier is filtered by the output value of the Kalman filtering module, and the framework sequence is updated by replacing the outlier with the prior estimation value.
Fig. 4 is a comparison diagram of the pose estimation result and the constructed skeleton sequence neck joint when the pose estimation has false detection, fig. 4(a) is an image drawn according to the neck joint point characteristics in the openposition output result when the 4 th frame of the data is shown, fig. 4(b) is a comparison image of the normalized pose estimation result (marked with an asterisk) and the constructed skeleton sequence (marked with a dotted line) on the x axis of the frame, and fig. 4(c) is a comparison image on the corresponding y axis. In the 4 th frame and the 5 th frame, because of background interference, openposition detection is wrong, x and y coordinates have large offset, and the x and y values are corrected after passing through a Kalman filtering module.
Fig. 5 is a diagram showing a comparison between the skeleton sequence and the posture estimation result when the skeleton number obtained by posture estimation is repeatedly switched, and fig. 5(a) is an image drawn based on the openposition output result. FIG. 5(b) shows the neck joint point characteristics of two skeletons in the openposition output result, the horizontal axis shows normalized x characteristics, the vertical axis shows normalized y characteristics, and the asterisks indicate the neck joint points Z with the posture estimation output skeleton number of 0 f,t,1,0 Characterised by the addition of a number 1Z f,t,1,1 Characterized in that the solid line and the dotted line respectively represent the neck joint point C in the skeleton sequences numbered 0 and 1 after the skeleton sequence is constructed t,1,0 And C t,1,1 The subscript v ═ 1 denotes the neck joint point. Fig. 5(c) and 5(d) are graphs comparing the results of the x-feature construction and the y-feature construction of fig. 5(b), respectively, in terms of frame number.
Fig. 5(b) shows that the two detected human skeletons are distributed in the upper left area and the lower right area, but openposition detects each frame, and the detection results between frames are independent, so the skeleton numbers are not fixed, and the skeleton numbers in the upper left area and the lower right area are frequently switched. The result shows that after a Kalman filtering module of the framework sequence is constructed, the characteristic changes of two framework frames and interframes are effectively tracked.
Fig. 6 is a comparison diagram before and after construction of a skeleton sequence when a simulation multi-person scene is partially overlapped, wherein a program generates two paths from (0.35, 0.47) to (0.39, 0.32) and from (0.5, 0.37) to (0.31, 0.46), samples the paths and adds gaussian noise, the gaussian noise is in accordance with N to (0, 0.0001), discrete points are obtained on the paths as simulation of joint coordinates, the skeleton serial number is randomly 0 or 1, joint point motion is simulated by the points, and fig. 6(a) is the generated discrete point coordinates. Fig. 6(b) and 6(c) are graphs comparing the results of constructing the x-feature and the y-feature of fig. 6(a), respectively, according to the number of frames.
It can be seen that in a multi-person scene environment, if correct joint point characteristics can be detected, even if the distance between joint points is small and the movement tracks are overlapped, the method can still capture the inter-frame joint information and has the capability of distinguishing different frameworks.
All the constructed skeleton sequences and the original data are input into an STGCN, the weights of the models are st _ gcn.kinetics.pt provided by the government, and the reasoning results of partial videos on the STGCN are shown in a table 2.
Sequence number: video sequence number, all videos are videos under the absering label in kinetics-700. The camera of the video 1 is stable, and the character behaviors are clear. The video 2 is the video shown in fig. 3, and the attitude estimation result is unstable, and has the conditions of missing detection and false detection. Video 3 is the video shown in fig. 5, and has a problem of switching the serial numbers of the multi-user frameworks. 1 ', 2 ', 3 ' are the results of the skeleton sequence constructed by the method of the invention corresponding to the 1, 2, 3 videos through STGCN inference:
frame number: representing the total number of identified video frames;
output frame number: the number of frames of the STGCN output characteristic is compressed to the original frame number T by the convolution of the multilayer TCN module
Figure BDA0003732351630000091
abselling: outputting the number of frames identified as the absering behavior in the frame number;
water skiing: outputting the number of frames identified as the water skiing behavior in the frame number, wherein the behavior is easy to be confused with absering;
other number of actions: outputting the behavior quantity identified as other behaviors in the frame number;
other behavior frame number: outputting the number of frames identified as other behaviors from the number of frames;
voting result: and (5) outputting the behavior with the maximum frame number in the frame number as the main behavior in the video as a voting result.
TABLE 2 comparison of inference results of partial videos on STGCN
Figure BDA0003732351630000092
Figure BDA0003732351630000101
The features of the abssering and the water skiing are similar in the case of ignoring the background, so that the recognition results of the abssering and the water skiing can be considered to be correct at the same time in this case. The comparison result between the original data and the human body skeleton recognition result constructed by the invention is obtained according to the table 2 and is shown in fig. 7, and in two adjacent bar graphs of each frame in the graph, the left side is data before processing, and the right side is data after processing. From the comparison result, the skeleton sequence constructed by the method is improved to different degrees in the identification accuracy compared with the direct identification of the openposition attitude estimation result. Meanwhile, the skeleton sequence constructed by the method can reduce the adverse effects of the false recognition and the missing recognition on the recognition result of the posture estimation method, namely the number of false recognition behaviors is reduced. For the STGCN needing to extract time information, the problem that the output framework sequence number of the attitude estimation method is random is effectively solved.
It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.

Claims (5)

1. A human body skeleton sequence construction method based on Kalman filtering is characterized by comprising the following steps:
s1, carrying out posture estimation on the video containing the human body behavior information frame by frame, and carrying out normalization processing on all joint point characteristics to obtain a skeleton set of related node characteristic information;
s2, numbering all skeletons of which the characteristics of the first frame of video are not all invalid values, and adding the skeletons into a processing queue;
s3, inputting all numbered skeleton sequences and skeleton sets in the processing queue into a Kalman filtering module for constructing human skeleton sequences, and updating the processing queue according to the processing result, wherein the updating unit is a frame;
and S4, repeatedly executing S3 on each frame of video until all frames are processed, wherein each numbered skeleton sequence in the processing queue is the constructed human skeleton sequence.
2. The method for constructing the human body skeleton sequence based on the kalman filter according to claim 1, wherein in the step S3, the kalman filter module for constructing the human body skeleton sequence includes a prediction module, a decision module and an update module connected in series.
3. The Kalman filtering based human body skeleton sequence construction method according to claim 2, characterized in that the prediction module is a Kalman filtering algorithm prediction part, prediction inputs are all numbered skeletons in a processing queue, and outputs specifically are:
Figure FDA0003732351620000011
Figure FDA0003732351620000012
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003732351620000013
for estimating posterior state according to t-1 time
Figure FDA0003732351620000014
And the estimated prior state at the moment T predicted by the motion model, wherein T is the total frame number, A is a state transition matrix,
Figure FDA0003732351620000015
estimating covariance for posterior state based on time t-1
Figure FDA0003732351620000016
The calculated prior state at the time T is used for estimating covariance, Q is a process covariance matrix, and superscript T represents transposition;
posterior state estimation
Figure FDA0003732351620000017
Is defined as:
Figure FDA0003732351620000018
x and y are defined as Z when t is 0 x,t,v,n ,Z y,t,v,n ,Z x,t,v,n 、Z y,t,v,n Normalizing the joint point characteristics of the joint v with the serial number n in the t frame; n is a framework serial number, and n is equal to m during initialization; v. of x ,v y When t is 0, t is not equal to 0,
Figure FDA0003732351620000019
representing the posterior state estimation of the joint v with the number m in the t frame for the iteration result of the Kalman filtering module;
when t is 0
Figure FDA00037323516200000110
Defined as an identity matrix, when t ≠ 0,
Figure FDA00037323516200000111
the posterior state estimation of the joint v with the number m in the t frame is represented as the iteration result of the Kalman filtering moduleA covariance;
the state transition matrix a is defined as:
Figure FDA0003732351620000021
dt is the video frame time interval;
the process covariance matrix Q is defined as:
Figure FDA0003732351620000022
Figure FDA0003732351620000023
is the coordinate variance of the joints of the human body,
Figure FDA0003732351620000024
is coordinate movement velocity variance parameter.
4. The Kalman filtering-based human body skeleton sequence construction method according to claim 2, characterized in that the decision module specifically comprises the following steps:
s31, calculating the matching degree of the processing queue and all detected skeletons of the current frame to obtain a Mahalanobis distance matrix D t (ii) a The method specifically comprises the following steps: calculating a candidate set of each framework in the processing queue, wherein elements in the set are successfully matched current frame frameworks, and outputting a result through a prediction module
Figure FDA0003732351620000025
And
Figure FDA0003732351620000026
calculating each joint point v characteristic Z of all skeletons n of current frame t t,v,n A priori estimate of each skeleton m in the corresponding processing queue
Figure FDA0003732351620000027
Is a horse-like distance D t,v,nm
The mahalanobis distance matrix used for measuring the matching degree is as follows:
Figure FDA0003732351620000028
D t,nm =min(D t,v,nm ),v∈V
Figure FDA0003732351620000029
h is an observation matrix:
Figure FDA00037323516200000210
s32, obtaining D by using Hungarian algorithm t Optimal matching of matrix, taking matching threshold alpha, as long as D is satisfied t,nm If the matching is successful, the current frame observed value C is used t,v,m - Set as a joint point feature Z t,v,n ;D t,nm If alpha is greater than alpha, the matching fails, and the observed value is set as a predicted value
Figure FDA00037323516200000211
Figure FDA00037323516200000212
S33, converting factor D in S32 t,nm Alpha or N > M results in unmatched skeletons being added to the processing queue with new numbers.
5. The Kalman filtering-based human body skeleton sequence construction method according to claim 2, characterized in that the updating module specifically is: calculating CarerMangan gain K t,v,m Posterior estimation of joint point characteristics
Figure FDA0003732351620000031
Covariance with a posteriori estimate
Figure FDA0003732351620000032
Figure FDA0003732351620000033
Figure FDA0003732351620000034
Figure FDA0003732351620000035
Wherein
Figure FDA0003732351620000036
C t,v,m - Outputting results by a prediction module and a decision module, wherein R is an observation noise covariance matrix:
Figure FDA0003732351620000037
Figure FDA0003732351620000038
estimating a variance for the pose;
finally update C t,v,m -
Figure FDA0003732351620000039
C t,v,m Is set C v,m Element (2) represents C v,m At frame t, the features of joint v in skeleton number m.
CN202210788077.4A 2022-07-06 2022-07-06 Human skeleton sequence construction method based on Kalman filtering Active CN115050055B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210788077.4A CN115050055B (en) 2022-07-06 2022-07-06 Human skeleton sequence construction method based on Kalman filtering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210788077.4A CN115050055B (en) 2022-07-06 2022-07-06 Human skeleton sequence construction method based on Kalman filtering

Publications (2)

Publication Number Publication Date
CN115050055A true CN115050055A (en) 2022-09-13
CN115050055B CN115050055B (en) 2024-04-30

Family

ID=83164491

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210788077.4A Active CN115050055B (en) 2022-07-06 2022-07-06 Human skeleton sequence construction method based on Kalman filtering

Country Status (1)

Country Link
CN (1) CN115050055B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109522793A (en) * 2018-10-10 2019-03-26 华南理工大学 More people's unusual checkings and recognition methods based on machine vision
CN110458944A (en) * 2019-08-08 2019-11-15 西安工业大学 A kind of human skeleton method for reconstructing based on the fusion of double-visual angle Kinect artis
CN110530365A (en) * 2019-08-05 2019-12-03 浙江工业大学 A kind of estimation method of human posture based on adaptive Kalman filter
CN111932580A (en) * 2020-07-03 2020-11-13 江苏大学 Road 3D vehicle tracking method and system based on Kalman filtering and Hungary algorithm
US20210000404A1 (en) * 2019-07-05 2021-01-07 The Penn State Research Foundation Systems and methods for automated recognition of bodily expression of emotion
CN112633205A (en) * 2020-12-28 2021-04-09 北京眼神智能科技有限公司 Pedestrian tracking method and device based on head and shoulder detection, electronic equipment and storage medium
CN114038056A (en) * 2021-10-29 2022-02-11 同济大学 Skip and squat type ticket evasion behavior identification method
CN114609912A (en) * 2022-03-18 2022-06-10 电子科技大学 Angle-only target tracking method based on pseudo-linear maximum correlation entropy Kalman filtering

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109522793A (en) * 2018-10-10 2019-03-26 华南理工大学 More people's unusual checkings and recognition methods based on machine vision
US20210000404A1 (en) * 2019-07-05 2021-01-07 The Penn State Research Foundation Systems and methods for automated recognition of bodily expression of emotion
CN110530365A (en) * 2019-08-05 2019-12-03 浙江工业大学 A kind of estimation method of human posture based on adaptive Kalman filter
CN110458944A (en) * 2019-08-08 2019-11-15 西安工业大学 A kind of human skeleton method for reconstructing based on the fusion of double-visual angle Kinect artis
CN111932580A (en) * 2020-07-03 2020-11-13 江苏大学 Road 3D vehicle tracking method and system based on Kalman filtering and Hungary algorithm
CN112633205A (en) * 2020-12-28 2021-04-09 北京眼神智能科技有限公司 Pedestrian tracking method and device based on head and shoulder detection, electronic equipment and storage medium
CN114038056A (en) * 2021-10-29 2022-02-11 同济大学 Skip and squat type ticket evasion behavior identification method
CN114609912A (en) * 2022-03-18 2022-06-10 电子科技大学 Angle-only target tracking method based on pseudo-linear maximum correlation entropy Kalman filtering

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
SUNGPHILL MOON等: "Multiple kinect sensor fusion for human skeleton tracking using Kalman filtering", 《SAGE JOURNALS》, 15 May 2017 (2017-05-15) *
刁宏健: "基于深度学习的斗殴行为识别技术研究", 《万方数据》, 2 October 2023 (2023-10-02) *
李扬;: "基于视频序列的运动目标追踪算法", 电子科技, no. 08, 15 August 2012 (2012-08-15) *

Also Published As

Publication number Publication date
CN115050055B (en) 2024-04-30

Similar Documents

Publication Publication Date Title
CN109460702B (en) Passenger abnormal behavior identification method based on human body skeleton sequence
Du et al. Hierarchical recurrent neural network for skeleton based action recognition
CN109871750A (en) A kind of gait recognition method based on skeleton drawing sequence variation joint repair
Dreuw et al. Tracking using dynamic programming for appearance-based sign language recognition
CN107833239B (en) Optimization matching target tracking method based on weighting model constraint
JP2008257425A (en) Face recognition device, face recognition method and computer program
CN114187665B (en) Multi-person gait recognition method based on human skeleton heat map
CN112131908A (en) Action identification method and device based on double-flow network, storage medium and equipment
Abdelkader et al. Integrated motion detection and tracking for visual surveillance
CN114582030A (en) Behavior recognition method based on service robot
CN110969078A (en) Abnormal behavior identification method based on human body key points
CN113608663B (en) Fingertip tracking method based on deep learning and K-curvature method
CN112966628A (en) Visual angle self-adaptive multi-target tumble detection method based on graph convolution neural network
Martinez-Contreras et al. Recognizing human actions using silhouette-based HMM
CN112200020A (en) Pedestrian re-identification method and device, electronic equipment and readable storage medium
Kishore et al. Selfie sign language recognition with convolutional neural networks
CN112926522A (en) Behavior identification method based on skeleton attitude and space-time diagram convolutional network
Ali et al. Deep Learning Algorithms for Human Fighting Action Recognition.
CN114694261A (en) Video three-dimensional human body posture estimation method and system based on multi-level supervision graph convolution
Parisi et al. Human action recognition with hierarchical growing neural gas learning
CN114332157A (en) Long-term tracking method controlled by double thresholds
Mattheus et al. A review of motion segmentation: Approaches and major challenges
CN113378799A (en) Behavior recognition method and system based on target detection and attitude detection framework
CN110348395B (en) Skeleton behavior identification method based on space-time relationship
CN115050055A (en) Human body skeleton sequence construction method based on Kalman filtering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant