CN116580066B

CN116580066B - Pedestrian target tracking method under low frame rate scene and readable storage medium

Info

Publication number: CN116580066B
Application number: CN202310809902.9A
Authority: CN
Inventors: 区英杰; 梁柱; 董万里; 谭焯康
Original assignee: Guangzhou Embedded Machine Tech Co ltd
Current assignee: Guangzhou Embedded Machine Tech Co ltd
Priority date: 2023-07-04
Filing date: 2023-07-04
Publication date: 2023-10-03
Anticipated expiration: 2043-07-04
Also published as: CN116580066A

Abstract

The invention discloses a pedestrian target tracking method under a low frame rate scene and a readable storage medium, wherein the method comprises the following steps: initializing Kalman filtering parameters, and determining a covariance matrix of observation noise of a Kalman filter for updating a pedestrian tracking target in each iteration according to the confidence coefficient of a pedestrian target frame obtained by a pedestrian target detection model and the change rate of the aspect ratio of the pedestrian target frame; calculating a globally optimal matching pair of a pedestrian tracking target and a pedestrian target frame by using a Hungary algorithm; and performing corresponding processing according to the matching result. The invention optimizes the low frame rate scene, solves the problem of easy target follow-up under the low frame rate scene, is applicable to security and protection scenes and low frame rate scenes, reduces the operation amount and parameter amount of a tracking algorithm by only utilizing the position and motion information of a target frame, and decouples the detection and tracking algorithm, so that a proper detection model can be used for solving the problem of complex scenes.

Description

Pedestrian target tracking method under low frame rate scene and readable storage medium

Technical Field

The present invention relates to the field of target tracking, and in particular, to a pedestrian target tracking method and readable storage medium in a low frame rate scene.

Background

Target tracking algorithms are currently one of the focus of research in computer vision as a typical problem in machine vision applications. With the development of artificial intelligence, intelligent monitoring devices are widely used. At present, a target tracking technology used on monitoring equipment mainly performs image capture on a target through a monitoring camera, and then detects and analyzes image features and motion models between frames in edge equipment or a central server by using a computer vision technology, so that a tracking result is obtained.

The intelligent monitoring device usually needs to process multiple paths of video streams, and under the condition of limited computational power (such as an edge device or a scene with low computing resources of a central server), the tracking algorithm used by the intelligent monitoring device is required to complete tracking under the condition of low frame rate, so that the intelligent monitoring device is an optional method. While the low frame rate scheme is adopted, there are a plurality of problems to be solved when target tracking is performed.

For example, the invention application number is CN201810452935.1, the invention name is a low frame rate video target detection and tracking method and application thereof in unmanned aerial vehicle, firstly, sampling the periphery of a target by an edge block algorithm (edgeboxes), and the sampled target is a potential target, and the potential target is discarded beyond a sampling frame; then, based on the local data, extracting local visual characteristics of the image, and simultaneously learning the local data to improve the stability of target change; and then judging the potential target by using a support vector machine, if the potential target is higher than a threshold value, considering the potential target as a new target position, and if the value of a classification result obtained by the support vector machine is lower, sampling the whole picture by using an edge block algorithm, and searching the target in the whole picture. And finally, updating the support vector machine model according to the predicted position. The technical scheme has the following defects: (1) Determining possible positions of the target frame by adopting a discriminant model, extracting features from a large number of possible positions around the target frame, and using the discriminant model, so that the speed is reduced; (2) The characteristic mode used by the judging model and the selection of the kernel function may have poor generalization capability for different visual target effects; (3) Each iteration needs to update parameters of the discriminant model, and the additional learning process is difficult to complete in real time at the edge end with weak calculation power.

For another example, the technical scheme of the invention with the application number of CN201210089248.0 and the invention name of "target tracking method for low frame rate video" includes the following main steps: (1) The target area is represented by a method of fusing main colors and spatial distribution characteristics of the main colors; (2) Matching the similarity between the candidate region and the target region by adopting a matching criterion based on the cross color proportion; (3) Characterizing the matching degree of the sample particles and the target template by adopting an adaptive value function based on a parameter integral graph; (4) An annealed particle swarm optimization framework that mimics the intelligence of a biological swarm is utilized to reduce mutations caused by searching for low frame rate video. The technical scheme has the following defects: (1) The main color mode obtained by clustering is used as a characteristic, and is not suitable for large scene with changed illumination conditions; (2) The use of the particle swarm algorithm avoids that a large number of particle iterative optimization processes cannot find a globally optimal solution, resulting in slower running speeds.

Disclosure of Invention

The invention aims to overcome the defects and shortcomings of the prior art and provides a pedestrian target tracking method in a low-frame-rate scene.

The aim of the invention is achieved by the following technical scheme:

a pedestrian target tracking method in a low frame rate scene comprises the following steps:

s1, initializing Kalman filtering parameters of a Kalman filter;

s2, acquiring a video image, inputting a picture of the acquired video image into a pedestrian target detection model,

determining a covariance matrix of observation noise of a Kalman filter in each iteration according to the confidence coefficient of a pedestrian target frame obtained by the pedestrian target detection model and the change rate of the aspect ratio of the pedestrian target frame;

s3, calculating a cost matrix according to the measurement between the pedestrian tracking target and the pedestrian target frame, wherein the cost matrix is used as the input of the Hungary algorithm; calculating a globally optimal matching pair of a pedestrian tracking target and a pedestrian target frame by using a Hungary algorithm;

s4, updating posterior distribution of the pedestrian tracking target for the matched pedestrian tracking target and the pedestrian target frame, and outputting a result to a Kalman filter;

if the maximum value of probability density of prior distribution of the unmatched pedestrian tracking targets is smaller than a preset threshold value, deleting the pedestrian tracking targets, and ending the life cycle; otherwise, the pedestrian tracking target is reserved, and the reserved pedestrian tracking target enters the next iteration period;

for the unmatched pedestrian target frame, a new pedestrian tracking target is initialized, and the initial state quantity of the new pedestrian tracking target is determined by the coordinates of the pedestrian target frame.

In step S1, the kalman filter parameter includes an observation matrix of a kalman filterCovariance matrix of process noise->；

The abscissa, the ordinate, the width and the height of the pedestrian target frame are obtained through a detection algorithm, and are assigned to the observation matrix of the Kalman filter；

Covariance matrix of process noise of Kalman filterInitializing to an empirical constant;

wherein Is->Time of day.

In step S2, the covariance matrix of the observed noise of the Kalman filterOptimization is performed by:

definition of karlUncertainty of observed noise of Mannheim filter：

；

wherein ,、/>is the weight of two coefficients, +.>Confidence level of pedestrian target frame obtained for pedestrian target detection model, < >>The rate of change of aspect ratio for the pedestrian target frame; />，/>The width and the height of the pedestrian target frame are respectively;

covariance matrix of observed noise of Kalman filterThe definition is as follows:

；

wherein Is->Time of day.

The pedestrian target detection model is based on a convolutional neural network model, is a yolov5s pedestrian target detection model, and defines input and output of the pedestrian target detection model: defining the number and the format of input pictures, and defining output as a target envelope frame and a class; and manually labeling the acquired data by using a defined output scheme to obtain training data, and finally training the target detection model and updating the model weight.

In step S3, the measurement between the pedestrian tracking target and the pedestrian target frame is defined as：

；

wherein ,the probability density of the location of the pedestrian target box in the prior distribution of pedestrian tracking target locations,is Gaussian distribution->Covariance matrix of prior distribution of target (i.e. covariance matrix of prior distribution of predicted state) for pedestrian tracking,>for the abscissa of the pedestrian target frame, +.>Is the ordinate of the pedestrian target frame, +.>Is the transpose operation of the matrix.

A priori distribution of the pedestrian tracking targetsCovariance matrixThe method comprises the following steps of obtaining by a state transition matrix and process noise:

；

wherein ,covariance matrix for the distribution of state variables, +.>Covariance matrix of process noise for Kalman filter, +.>Is->Time of day.

In step S3, the cost matrixIs defined as follows:

；

wherein ,tracking target for nth pedestrian>，/>Probability density of the position of the target frame for the pedestrian in a priori distribution of tracking target positions for the pedestrian,/->、/>The positions of the mth pedestrian target frame are respectively the abscissa and the ordinate, and the +.>Is the transpose operation of the matrix.

In step S3, the method uses a hungarian algorithm to calculate a globally optimal matching pair of the pedestrian tracking target and the pedestrian target frame, specifically:

performing row permutation operation on the cost matrix by using a Hungary algorithm to minimize the trace of the matrix:

；

wherein ,for the permutation matrix +.>A cost matrix;

after being matched by the Hungary algorithm, the global optimal matching pair is obtained, and the definition of the matching pair is as follows:

；

wherein ,for the abscissa of the pedestrian target frame, +.>Is the ordinate of the pedestrian target frame, +.>For the width of the pedestrian target frame +.>For the height of the pedestrian target frame, +.>Confidence level of pedestrian target frame obtained for pedestrian target detection model, < >>Id information of the target is tracked for pedestrians.

In step S4, for the matched pedestrian tracking target and pedestrian target frame, the following operations are performed:

calculating residual error of pedestrian observance：

；

wherein ,observation matrix for Kalman filter, < >>For observing the model matrix>Tracking a priori state quantity of a target for a pedestrian;

calculating residual error of pedestrian observanceCovariance matrix>：

；

wherein ,for transpose operation of matrix,/->Covariance matrix of a priori distribution of tracking targets for pedestrians,>covariance matrix for distribution of state quantity; />The covariance matrix of the observed noise of the Kalman filter is used for updating the prior distribution of the pedestrian tracking target;

computing optimal gain for Kalman filtering：

；

Updating posterior distribution of pedestrian state quantity:

；

wherein ,covariance matrix of posterior distribution of state quantity;

updating posterior state quantity of pedestrian tracking targets:

；

wherein ,、/>and respectively using the current state quantity and the distribution of the pedestrian tracking target in the next iteration period.

The prior state quantity of the pedestrian tracking targetThe method comprises the following steps of:

；

wherein ,for state transition matrix>For the state quantity->For the abscissa of the pedestrian target frame, +.>Is the ordinate of the pedestrian target frame, +.>For the width of the pedestrian target frame +.>For the height of the pedestrian target frame, +.>、/>、/>、/>Respectively->、/>、/>、Derivative with respect to time.

Meanwhile, the invention provides:

a server comprising a processor and a memory, the memory having stored therein at least one program loaded and executed by the processor to implement the pedestrian target tracking method in a low frame rate scenario described above.

A computer-readable storage medium having stored therein at least one program loaded and executed by a processor to implement the pedestrian target tracking method in a low frame rate scenario described above.

Compared with the prior art, the invention has the following advantages and beneficial effects:

1. the invention optimizes the covariance matrix, so that the pedestrian observation noise is more in line with the definition of Kalman filtering. The covariance matrix parameters are determined by the confidence of the pedestrian target box and the rate of change of the aspect ratio. When the pedestrian target is blocked or blurred, the confidence of the pedestrian target frame is reduced, and the variance is increased. When the aspect ratio change rate of the pedestrian frame is larger, the blocking condition of the tracked pedestrian target is likely to occur, and the variance is larger. Compared with the prior tracking algorithm scheme which mostly uses a covariance matrix with a fixed constant, the characteristic of uncertainty of observed quantity cannot be well expressed, and the uncertainty is better described by using two proxy variables for representation aiming at two scenes, namely uncertainty of a pedestrian detection model and occlusion of a tracking target.

2. The invention optimizes the definition of metrics between pedestrian tracking targets and pedestrian target frames. Compared with the use of iou and mahalanobis distances, the measurement is defined as the probability density of the position of the pedestrian target frame in the prior distribution of the pedestrian tracking target, and finally the optimal matching is carried out through a Hungary algorithm. Compared with the iou distance, the phenomenon that the iou distance can occur with the iou=0 under the condition of low frame rate is avoided. The problem that the optimal matching cannot be performed by the iou=0 is avoided, namely the pedestrian is lost. In comparison with the mahalanobis distance, in the case of tracking, the smaller and smaller characteristic of the mahalanobis distance enables the tracking to be matched with the tracking pedestrian target preferentially, so that the tracking is unstable. The probability density is directly used as a measure to solve the two problems.

3. The invention can realize the life cycle self-adaption of the pedestrian tracking target. By designating a probability threshold, the pedestrian tracking target below the threshold is deleted, and if the tracking loss occurs, the prior covariance of the pedestrian target with the tracking loss can be slowly increased for a relatively stable or stationary pedestrian target, such as a person sitting or standing still, so that the pedestrian target can be kept for a longer time. And for the pedestrian tracking target with large variation, the probability density is rapidly increased, so that the maximum value of the probability density is rapidly reduced, and if the probability density is lower than the threshold value, the probability density is deleted by the tracker. In contrast to previous tracking algorithms, a fixed lifecycle of k frames is typically given, without considering that different targets should be given different lengths of life. For example, a very stable or stationary target should remain for a long time after it is lost. On the contrary, a tracking target which moves rapidly cannot estimate its position in a short time after losing, so that it can be kept for a short time.

Drawings

Fig. 1 is a flowchart of a pedestrian target tracking method in a low frame rate scene.

Detailed Description

The present invention will be described in further detail with reference to examples and drawings, but embodiments of the present invention are not limited thereto.

The following preparation work is done before the implementation process in this embodiment:

1. and acquiring a video sequence of the pedestrian by using the installed camera, and storing image data to the edge equipment or the central server. The collected data are mainly used for training and testing a pedestrian target detection algorithm model. In order to ensure the performance of the model, data under different conditions need to be acquired, and the data cover different time periods, different illumination changes and different weather conditions.

2. The target detection algorithm is mainly based on a convolutional neural network model, and the specific model is a yolov5s pedestrian detection model. The algorithm based on the convolutional neural network model can be more suitable for the conditions of illumination change, shielding and the like on the detection effect. The convolutional neural network model needs to define input and output in advance, wherein the input is the number and format of pictures in the input of the defined model, and the output is the target envelope frame and class. And manually labeling the acquired data by using a defined output scheme to obtain training data, and finally training the target detection model and updating the model weight. In the invention, the model is a yolov5 target detection model, a picture with the size of 640 x 384 is input, the speed and the precision are both considered, and the detection model is quantized into an int8 quantization model of a corresponding hardware platform. The model outputs 1 category corresponding to the detection target of the pedestrian category.

After the preparation work is completed, the following specific implementation is as follows:

referring to fig. 1, a pedestrian target tracking method in a low frame rate scene includes the following steps:

s1, initializing Kalman filtering parameters of a Kalman filter;

the Kalman filtering parameter comprises an observation matrix of a Kalman filterCovariance matrix of process noise；

wherein Is->Time of day.

covariance matrix of observed noise of the Kalman filterOptimization is performed by:

definition of uncertainty of observed noise of Kalman filter：

；

wherein Is->Time of day.

In the present embodiment, covariance matrix of observed noise of Kalman filter is optimizedSuch that the observed noise complies with the definition of kalman filtering (i.e. the observed noise represents the uncertainty of the observed quantity), this uncertainty is represented by two proxy variables, namely: confidence degree of pedestrian target frame obtained by pedestrian target detection model +.>And the aspect ratio of the pedestrian target frame +.>。

When the pedestrian target is blocked or the pedestrian detection model is unstable, the uncertainty of the position of the pedestrian target frame is increased, and meanwhile, the confidence of the pedestrian target frame is also reduced, so that the uncertainty of the observation noise can be represented by the confidence of the pedestrian target frame.

When a pedestrian target is shielded, there is often a gradual shielding process, i.e. the shielding area is larger and larger, and the uncertainty of the observed noise can be represented by the change rate of the aspect ratio of the pedestrian target frame.

The present embodiment combines these two variables and can be used to describe the uncertainty of this observation noise. Compared with the traditional tracking algorithm, such as a sort algorithm, the observed noise of the Kalman filtering is usually given a constant, the uncertainty of the observed quantity cannot be measured, and the Kalman filtering plays no role.

conventional tracking algorithms typically use an iou distance (sort tracking algorithm) defined as the ratio of the intersection and union of the areas of two target frames and a mahalanobis distance (deepsorts tracking algorithm) defined as the distance in terms of the standard deviation of the distribution. Unlike with the iou distance and the mahalanobis distance, the metric of the present embodiment is defined as the probability density of the position of the detection frame in the prior distribution of the pedestrian tracking target position. Compared with iou, the probability density is taken as a measure to avoid the problem of heel-and-toe caused by that a pedestrian target frame and a pedestrian tracking target are not overlapped at all (iou=0) in a low frame rate scene. Compared with the mahalanobis distance, the method can avoid the problem that after the mahalanobis distance is lost, the covariance is larger and larger, so that the tracker is preferentially matched with the lost pedestrian target, and the pedestrian target which is continuously tracked is influenced, so that the normal pedestrian tracking target is unstable.

The measurement between the pedestrian tracking target and the pedestrian target frame is defined as：

；

Covariance matrix of prior distribution of the pedestrian tracking targetThe method comprises the following steps of obtaining by a state transition matrix and process noise:

；

The cost matrixIs defined as follows:

；

The method comprises the steps of calculating a globally optimal matching pair of a pedestrian tracking target and a pedestrian target frame by using a Hungary algorithm, wherein the globally optimal matching pair is specifically as follows:

；

wherein ,for the permutation matrix +.>A cost matrix;

；

and for the matched pedestrian tracking target and pedestrian target frame, performing the following operations:

calculating residual error of pedestrian observance：

；

calculating residual error of pedestrian observanceCovariance matrix>：

；

wherein ,for transpose operation of matrix,/->Tracking a target for a pedestrianCovariance matrix of the experimental distribution +.>Covariance matrix for distribution of state quantity; />The covariance matrix of the observed noise of the Kalman filter is used for updating the prior distribution of the pedestrian tracking target;

computing optimal gain for Kalman filtering：

；

Updating posterior distribution of pedestrian state quantity:

；

wherein ,covariance matrix of posterior distribution of state quantity;

updating posterior state quantity of pedestrian tracking targets:

；

The prior state quantity of the pedestrian tracking targetState of passageAnd (3) obtaining a conversion matrix:

；

in comparison to previous tracking algorithms, a fixed constant is typically used to determine the life cycle of the tracked object, i.e., the tracked object is deleted at k frames after the tracking is lost, which is simple but not good. The pedestrian tracking target in this embodiment does not have a specific k-frame life cycle, but dynamically determines the stability of the pedestrian tracking target, and if the pedestrian tracking target is always stable, for example, standing or sitting at a certain position, the prior distribution of the pedestrian tracking target will slowly increase with time after losing, so that a longer time is required for the pedestrian tracking target to be deleted. On the contrary, the pedestrian tracking targets with rapid motion changes, such as people running or being blocked, the prior distribution of the pedestrian tracking targets can rapidly increase along with time after losing, and the pedestrian tracking targets can be deleted in a short time.

At the same time, the unmatched pedestrian tracks the target、/>And respectively using the current state quantity and the distribution of the pedestrian tracking target in the next iteration period.

The low frame rate in this embodiment refers to an image processing speed of 5-10 fps.

Meanwhile, the present embodiment provides:

The pedestrian target tracking method in the low-frame-rate scene can be suitable for security scenes and low-frame-rate scenes. The method is different from other methods in that local visual features are required to be extracted, and the tracking algorithm only uses the position and motion information of the target frame, but does not use visual features in the target frame, so that the operation amount and parameter amount of the tracking algorithm are reduced. The invention decouples detection and tracking algorithms so that suitable detection models can be used to solve complex scene problems such as dense targets, partial occlusion, low resolution, illumination changes, etc. Meanwhile, the method provided by the embodiment optimizes the low-frame-rate scene, and solves the problem that other algorithms are easy to lose targets in the low-frame-rate scene.

The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.

Claims

1. The pedestrian target tracking method in the low frame rate scene is characterized by comprising the following steps:

s1, initializing Kalman filtering parameters of a Kalman filter;

s2, acquiring a video image, inputting a picture of the acquired video image into a pedestrian target detection model, and determining a covariance matrix of observation noise of a Kalman filter in each iteration according to the confidence coefficient of a pedestrian target frame obtained by the pedestrian target detection model and the change rate of the aspect ratio of the pedestrian target frame;

calculating residual error of pedestrian observance：

；

calculating residual error of pedestrian observanceCovariance matrix>：

；

computing optimal gain for Kalman filtering：

；

Updating posterior distribution of pedestrian state quantity:

；

wherein ,covariance matrix of posterior distribution of state quantity;

updating posterior state quantity of pedestrian tracking targets:

；

wherein ,、 />respectively serving as the current state quantity and distribution of the pedestrian tracking target in the next iteration period;

2. The pedestrian target tracking method in a low frame rate scene as claimed in claim 1, wherein in step S1, the kalman filter parameters include an observation matrix of a kalman filterCovariance matrix of process noise->；

wherein Is->Time of day.

3. The pedestrian target tracking method in a low frame rate scene as claimed in claim 1, wherein in step S2, a covariance matrix of observed noise of the kalman filterOptimization is performed by:

definition of uncertainty of observed noise of Kalman filter：

；

wherein Is->Time of day.

4. The pedestrian target tracking method in a low frame rate scene as claimed in claim 1, wherein the pedestrian target detection model is based on a convolutional neural network model, the pedestrian target detection model is a yolov5s pedestrian target detection model, and input and output of the pedestrian target detection model are defined: defining the number and the format of input pictures, and defining output as a target envelope frame and a class; and manually labeling the acquired data by using a defined output scheme to obtain training data, and finally training the target detection model and updating the model weight.

5. The method according to claim 1, wherein in step S3, the measure between the pedestrian tracking target and the pedestrian target frame is defined as：

；

wherein ,probability density of the position of the target frame for the pedestrian in a priori distribution of tracking target positions for the pedestrian,/->Is Gaussian distribution->Covariance matrix of a priori distribution of tracking targets for pedestrians,>for the abscissa of the pedestrian target frame, +.>Is the ordinate of the pedestrian target frame, +.>Is the transpose operation of the matrix.

6. The method for pedestrian target tracking in a low frame rate scene as set forth in claim 5, wherein the covariance matrix of the prior distribution of pedestrian targets is a matrixThe method comprises the following steps of obtaining by a state transition matrix and process noise:

；

wherein ,covariance matrix for distribution of state quantity; />Covariance matrix of process noise for Kalman filter, +.>Is->Time of day (I)>Is a state transition matrix.

7. The pedestrian target tracking method in a low frame rate scene as claimed in claim 1, wherein in step S3, the cost matrixIs defined as follows:

；

8. The method for tracking pedestrian target in low frame rate scene according to claim 1, wherein in step S3, the matching pair of the pedestrian tracking target and the pedestrian target frame that is globally optimal is calculated by using a hungarian algorithm, specifically:

；

wherein ,for the permutation matrix +.>A cost matrix;

；

wherein ,for the abscissa of the pedestrian target frame, +.>Is the ordinate of the pedestrian target frame, +.>For the width of the pedestrian target frame +.>For the height of the pedestrian target frame, +.>Confidence of the pedestrian target frame obtained for the pedestrian target detection model,id information of the target is tracked for pedestrians.

9. The pedestrian target tracking method in a low frame rate scene as set forth in claim 1, wherein the a priori state quantity of the pedestrian tracking targetThe method comprises the following steps of:

；

wherein ,for state transition matrix>For the state quantity->For the abscissa of the pedestrian target frame, +.>Is the ordinate of the pedestrian target frame, +.>For the width of the pedestrian target frame +.>For the height of the pedestrian target frame, +.>、/>、、/>Respectively->、/>、/>、/>Derivative with respect to time.

10. A server comprising a processor and a memory, wherein the memory has stored therein at least one program that is loaded and executed by the processor to implement the pedestrian target tracking method in the low frame rate scenario of any one of claims 1 to 9.

11. A computer readable storage medium having stored therein at least one program loaded and executed by a processor to implement the pedestrian target tracking method in a low frame rate scenario of any one of claims 1 to 9.