CN113658223B

CN113658223B - Multi-row person detection and tracking method and system based on deep learning

Info

Publication number: CN113658223B
Application number: CN202110917108.7A
Authority: CN
Inventors: 曹建荣; 朱亚琴; 张玉婷; 韩发通; 庄园
Original assignee: Shandong Jianzhu University
Current assignee: Shandong Jianzhu University
Priority date: 2021-08-11
Filing date: 2021-08-11
Publication date: 2023-08-04
Anticipated expiration: 2041-08-11
Also published as: CN113658223A

Abstract

A multi-line human detection and tracking method and system based on deep learning, which uses a method of detecting pedestrians by using a central Net network model at intervals, obtains pedestrian frame information, and can meet the requirements of pedestrian detection robustness and real-time. The tracking method for matching the pedestrian ID number by utilizing the color texture movement direction of the pedestrian characteristic region not only solves the problem of pedestrian tracking error caused by overlapping shielding of pedestrians in the monitoring video, but also improves the matching speed, so that the real-time performance and the accuracy of target tracking are further improved.

Description

Multi-row person detection and tracking method and system based on deep learning

Technical Field

The invention relates to the field of computer vision, in particular to a multi-row human detection and tracking method and system based on deep learning.

Background

The intelligent monitoring video analysis technology is an emerging application direction and a leading edge subject with great attention in the field of computer vision. With the rapid development of network technology and digital video technology, the intelligent monitoring technology brings true change to the video monitoring system. The intelligent monitoring system based on the computer vision algorithm can automatically capture important information in the monitoring video, and manpower and material resources are greatly reduced. The detection and tracking of the moving target serve as important bases of intelligent monitoring analysis technology, play an important role in personnel statistics, campus security and the like, and have wide application in industry.

In research and development of moving object detection technology for many years, new algorithms are continuously proposed, so that the moving object detection technology is mature. For example, the R-CNN series target detection method based on region generation, wang et al propose a Guided-fastening method, and guide the generation of an Anchor through image features, but the speed is too slow to meet the real-time performance. In order to solve the above problems, hei Law et al propose CornerNet algorithm, adopt Hourglass104 network as characteristic extraction network to directly predict upper left corner and lower right corner of human body to obtain detection frame, and use the object detection problem as key point detection problem to solve. The Zhou et al proposes an extreme net algorithm, innovations are made on the key point selection and key point combination modes, and the four extreme points of the upper, lower, left and right of a human body target and a central point are selected as key points, so that the edge and internal information of the target are focused more directly, and the detection is more stable. The moving object tracking algorithm is most widely influenced by a Kalman filtering algorithm, a mean shift algorithm and a particle filtering algorithm. The algorithms achieve the effects of moving target detection and tracking to a certain extent, but the traditional detection and tracking algorithms are used independently, so that the calculated amount is large, the robustness is insufficient, and the method cannot adapt to the change of a tracking target. Along with the development of deep learning, training and classification of samples are gradually introduced into moving target detection and tracking, but the problem of higher detection time consumption exists, and the real-time requirement of follow-up target tracking is difficult to meet.

The prior art does not fully exert the advantages of the relation of the moving targets of the frames before and after the monitoring video in the moving target detection and tracking, and most of the prior art cannot balance the two requirements of real-time performance and accuracy.

Disclosure of Invention

In order to overcome the defects of the technology, the invention provides a multi-line person detection and tracking method based on deep learning, which effectively avoids tracking errors caused by overlapping shielding of a plurality of lines of persons.

The technical scheme adopted for overcoming the technical problems is as follows:

a multi-line person detection and tracking method and system based on deep learning comprises the following steps:

a) The computer collects the monitoring video in real time, preprocesses the monitoring video, and inputs the preprocessed monitoring video to the central Net pedestrian detection model;

b) Pedestrian detection is carried out on the preprocessed monitoring video through a deep learning technology, and a pedestrian area in an image is detected by using a CenterNet pedestrian detection model;

c) Judging whether pedestrians appear, if so, establishing a tracking model based on the first frame of pedestrian image and marking the pedestrian ID of each pedestrian frame, and if not, detecting the next frame of pedestrian image;

d) Calculating according to the coordinates of each pedestrian frame to obtain the barycenter coordinates of each pedestrian, and calculating according to the barycenter coordinates to obtain the pedestrian direction;

e) If the distance between the pedestrian marshAN_SNe centers in different pedestrian frames is smaller than or equal to a threshold th1, the overlapping shielding exists between pedestrians, and if the distance between the pedestrian marshAN_SNe centers in different pedestrian frames is larger than the threshold th1, the overlapping shielding does not exist between pedestrians;

f) If overlapping shielding occurs between pedestrian targets of the current frame, selecting a non-overlapping area of each pedestrian frame as a characteristic area according to the pedestrian frame of the pedestrian image of the current frame obtained in the step c), extracting color and texture characteristics of the characteristic area, matching the color and texture characteristics of the pedestrian of the current frame and the pedestrian direction with the color and texture characteristics of the pedestrian of the previous frame and the pedestrian direction, obtaining the pedestrian ID of the previous frame as the pedestrian ID of the current frame, connecting the centroids of the same ID numbers in two adjacent frames, and realizing pedestrian tracking;

g) If no overlapping shielding exists between the pedestrian targets of the current frame, the pedestrian direction of the current frame is matched with the pedestrian direction of the previous frame, the pedestrian ID of the previous frame is obtained as the pedestrian ID of the current frame, the centroid of the same ID number in two adjacent frames is connected, and pedestrian tracking is achieved.

Further, the step of preprocessing the surveillance video in the step a) includes: and converting the monitoring video into an image, and selecting one frame of image as the input of a pedestrian detection model every k frames on the basis of the image of the pedestrian contained in the first frame.

Further, the step of pedestrian detection on the preprocessed surveillance video by the deep learning technology in the step b) includes: and (3) inputting a central Net pedestrian detection model after unifying the sizes of the input frame images in proportion, modeling a pedestrian target as a single point, finding a central point to serve as a center point of a pedestrian frame through a Keypoint Heatmap key point thermodynamic diagram, and obtaining coordinates and size information of the pedestrian frame according to image feature regression of the central point.

Further, the step of marking the ID of the pedestrian frame in the step c) is as follows: if the first frame image of the monitoring video has a pedestrian frame, a tracking model is built based on the first frame pedestrian image, pedestrian IDs of n pedestrian frames are marked as ID1 and ID2.

Further, step d) comprises the steps of:

d-1) coordinates of two opposite corners of the rectangular pedestrian frame IDi of the ith rectangle in the kth frame pedestrian image are (x) _i1,k ,y _i1,k ) And (x) _i2,k ,y _i2,k ) Coordinates of two opposite corners of a rectangular pedestrian frame IDj of the jth rectangle in the kth frame pedestrian image are (x _j1,k ,y _j1,k ) And (x) _j2,k ,y _j2,k ) Coordinates of two opposite corners of the rectangular pedestrian frame IDi of the ith rectangle in the k-1 th frame pedestrian image are (x _i1,k-1 ,y _i1,k-1 ) And (x) _i2,k-1 ,y _i2,k-1 ) Coordinates of two opposite corners of a rectangular pedestrian frame IDj of the jth rectangle in the kth frame pedestrian image are (x _j1,k-1 ,y _j1,k-1 ) And (x) _j2,k-1 ,y _j2,k-1 )；

d-2) is represented by formula C _i ＝(x _i,k ,y _i,k ) I.e. 1.,. N establishes the centroid C of the pedestrian frame IDi in the kth frame pedestrian image _i In the followingThrough formula C _i ＝(x _i,k-1 ,y _i,k-1 ) I.e. 1,.. _i In the following

d-3) is represented by formula C _j ＝(x _j,k ,y _j,k ) J epsilon 1, n establishes the centroid C of the pedestrian frame IDi in the k-th frame pedestrian image _j In the followingThrough formula C _j ＝(x _j,k-1 ,y _j,k-1 ) J epsilon 1, n establishes the centroid C of the pedestrian frame IDi in the k-1 frame pedestrian image _j In the following

d-4) passing through the formula Deltax _i ＝x _i,k -x _i,k-1 ,i∈1,...,n，Δy _i ＝y _i,k -y _i,k-1 I.e. 1, n. calculates the amount of change Δx of the pedestrian frame IDi from the k-1 frame to the k-frame centroid abscissa _i And the variation Δy of the ordinate _i By the formula deltax _j ＝x _j,k -x _j,k-1 ,j∈1,...,n，Δy _j ＝y _j,k -y _j,k-1 J.epsilon.1..n. calculating the variation Δx of the pedestrian frame IDj from the kth-1 frame to the kth frame centroid abscissa _j And the variation Δy of the ordinate _j ；

d-5) is represented by the formulaCalculating to obtain the angle theta of the motion direction of the pedestrian i _i By the formula->Calculating to obtain the angle theta of the motion direction of the pedestrian j _j 。

Further, step e) is performed by the formulaCalculating the distance d between the pedestrian i and the pedestrian j in the kth frame of pedestrian image _i,j 。

A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the deep learning based multi-row human detection and tracking method of any of the above steps.

A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the program, implements the steps of the deep learning based multi-row human detection and tracking method as described in the steps above.

The beneficial effects of the invention are as follows: the method for detecting pedestrians by utilizing the CenterNet network model at intervals acquires pedestrian frame information, and can simultaneously meet the requirements of pedestrian detection robustness and instantaneity. The tracking method for matching the pedestrian ID number by utilizing the color texture movement direction of the pedestrian characteristic region not only solves the problem of pedestrian tracking error caused by overlapping shielding of pedestrians in the monitoring video, but also improves the matching speed, so that the real-time performance and the accuracy of target tracking are further improved.

Drawings

FIG. 1 is a flow chart of a multi-row human detection and tracking method of the present invention;

FIG. 2 is a schematic view of the initial motion direction of a pedestrian object in accordance with the present invention;

FIG. 3 is a schematic diagram of the centroid coordinates of a k-frame pedestrian in accordance with the present invention;

FIG. 4 is a schematic diagram of a pedestrian crossing overlapping occlusion tracking process of the present invention;

FIG. 5 is a schematic view of a pedestrian feature region selected for a kth frame of the present invention;

fig. 6 is a schematic diagram of a pedestrian non-occlusion tracking process of the present invention.

Detailed Description

The invention is further described with reference to fig. 1 to 6.

As shown in fig. 1, a multi-line human detection and tracking method based on deep learning includes:

The method for detecting pedestrians by utilizing the CenterNet network model at intervals acquires pedestrian frame information, and can simultaneously meet the requirements of pedestrian detection robustness and instantaneity. The tracking method for matching the pedestrian ID number by utilizing the color texture movement direction of the pedestrian characteristic region not only solves the problem of pedestrian tracking error caused by overlapping shielding of pedestrians in the monitoring video, but also improves the matching speed, so that the real-time performance and the accuracy of target tracking are further improved.

Example 1:

the step of preprocessing the monitoring video in the step a) is as follows: and converting the monitoring video into an image, and selecting one frame of image as the input of a pedestrian detection model every k frames on the basis of the image of the pedestrian contained in the first frame.

Example 2:

the step of pedestrian detection on the preprocessed monitoring video through the deep learning technology in the step b) comprises the following steps: and (3) inputting a central Net pedestrian detection model after unifying the sizes of the input frame images in proportion, modeling a pedestrian target as a single point, finding a central point to serve as a center point of a pedestrian frame through a Keypoint Heatmap key point thermodynamic diagram, and obtaining coordinates and size information of the pedestrian frame according to image feature regression of the central point. The method comprises the following specific steps:

(1) Firstly, inputting a preprocessed monitoring video into a full convolution neural network, and generating a thermodynamic diagram for each frame of image; then taking the first 100 peak points on the thermodynamic diagram as detected pedestrian center points, and setting a threshold value for screening to obtain final pedestrian target center points; and finally, inputting the image characteristics corresponding to the central point on the thermodynamic diagram into the size information of the predicted pedestrian frame in the prediction network. The full convolution neural network of the central Net pedestrian detection model is a Hourgass network, a Hourgass-104 detection network is formed by cascading two Hourgass Hourglass modules, and multi-scale characteristics of a pedestrian target are extracted.

(2) The pedestrian detection training data set comprises an INRIA pedestrian data set and a pedestrian data set made of an acquired monitoring video, then a target detection marking tool Iabelmg is used for marking the position of a pedestrian in an image, and the corresponding. Json file is generated by marking the label in a parallel frame label format according to the MSCOCO data set format used in the CenterNet algorithm original experiment.

(3) In the network training process, setting the real key point of a training sample as p, and downsampling to correspond to the centerThe points areAnd->Wherein R is a downsampling step length, R=4, and mapping the real key points to the real key point thermodynamic diagram through a Gaussian kernel function>W is the width of the image, H is the height of the image, by the formula +.>Calculating to obtain a true key point thermodynamic diagram Y _xy ，σ _p Is a standard deviation related to the size of the target, (x, y) is the real key point coordinates, +.>And the coordinates of the corresponding central point after downsampling the real key points. The focal loss function is used as a training target loss function, and the training target loss function specifically comprises the following formula: />Wherein alpha is a super parameter, alpha=2, beta is a super parameter, beta=4, N is the number of key points of the input image, and +.>Network predicted keypoint thermodynamic diagram values are detected for the centrnet.

Since the true center points will have discrete deviations when the image is downsampled, a local offset prediction is added to each center point, i.eCenter point offset prediction +.>The training algorithm of (2) is that the function is expressed as the following formula:in->Is the center point->Is used for local offset prediction.

After the image is subjected to central point prediction, regression processing is carried out on each target, and the position coordinates of Bbox of target k are assumed to beAnd->Regression target size s _k In particular->Bbox size is used +.>As a predictive value +.>The Bbox size loss function is as follows: />S in _k Bbox size for target k, < +.>The size is predicted for the Bbox of target k.

By the formula l=l _k +λ _size L _size +λ _off L _off Calculating the total loss function L, wherein lambda _size ＝0.1，λ _off =0.1, the centrnet pedestrian detection model was optimally trained using the total loss function L.

Example 3:

the step of marking the ID of the pedestrian frame in the step c) is as follows: if the first frame image of the monitoring video has a pedestrian frame, a tracking model is built based on the first frame pedestrian image, pedestrian IDs of n pedestrian frames are marked as ID1 and ID2.

Example 4:

step d) comprises the steps of:

(1) Initializing a motion direction:

taking a monitoring scene in a corridor environment as an example, the boundary of a monitoring plan is generally a wall body, a door, a passageway and the like, the position of a moving object appearing from the boundary mainly has four movement entering conditions, as shown in fig. 2, a coordinate system is determined according to the monitoring area plan, the ID of each pedestrian frame is marked on the basis of a first frame video frame of the detected pedestrian object, the barycenter coordinate of the pedestrian is calculated, and the initial movement direction is determined according to the position of the object entering from the positive direction and the negative direction of the x-axis y-axis in the plan.

(2) Updating the movement direction:

the pedestrian target movement direction update of the kth frame mark is based on the change condition of the mass center coordinates of the kth-1 frame and the kth frame. First, the barycenter coordinates of the pedestrian are calculated based on the detected coordinates of the pedestrian frame, as shown in fig. 3, and the coordinates of the pedestrian frame IDi of the kth frame are (x _i1,k ,y _i1,k ) And (x) _i2,k ,y _i2,k ) Coordinates of two opposite corners of a rectangular pedestrian frame IDj of the jth rectangle in the kth frame pedestrian image are (x _j1,k ,y _j1,k ) And (x) _j2,k ,y _j2,k ) Coordinates of two opposite corners of the rectangular pedestrian frame IDi of the ith rectangle in the k-1 th frame pedestrian image are (x _i1,k-1 ,y _i1,k-1 ) And (x) _i2,k-1 ,y _i2,k-1 ) Rectangular pedestrian frame of jth in kth frame pedestrian imageThe coordinates of the two opposite corners of IDj are (x) _j1,k-1 ,y _j1,k-1 ) And (x) _j2,k-1 ,y _j2,k-1 )。

Through formula C _i ＝(x _i,k ,y _i,k ) I.e. 1.,. N establishes the centroid C of the pedestrian frame IDi in the kth frame pedestrian image _i In the followingThrough formula C _i ＝(x _i,k-1 ,y _i,k-1 ) I.e. 1,.. _i In the following

Through formula C _j ＝(x _j,k ,y _j,k ) J epsilon 1, n establishes the centroid C of the pedestrian frame IDi in the k-th frame pedestrian image _j In the followingThrough formula C _j ＝(x _j,k-1 ,y _j,k-1 ) J epsilon 1, n establishes the centroid C of the pedestrian frame IDi in the k-1 frame pedestrian image _j In the following

By the formula deltax _i ＝x _i,k -x _i,k-1 ,i∈1,...,n，Δy _i ＝y _i,k -y _i,k-1 I.e. 1, n. calculates the amount of change Δx of the pedestrian frame IDi from the k-1 frame to the k-frame centroid abscissa _i And the variation Δy of the ordinate _i By the formula deltax _j ＝x _j,k -x _j,k-1 ,j∈1,...,n，Δy _j ＝y _j,k -y _j,k-1 J.epsilon.1..n. calculating the variation Δx of the pedestrian frame IDj from the kth-1 frame to the kth frame centroid abscissa _j And the variation Δy of the ordinate _j 。Δx _i And Deltay _i Positive signs respectively represent the movement direction of the pedestrian i from left to right, from top to bottom and delta x _i And Deltay _i Is the negative sign to respectively represent the moving direction of the pedestrian iRight to left, down to up. If the pedestrian target i moves in the horizontal direction (i.e., the x-axis direction), Δy _i =0; if the pedestrian target i moves in the vertical direction (i.e., the y-axis direction), Δx _i ＝0。

In order to more clearly show the direction of movement of the pedestrian, the method is based on Δx _i And Deltay _i Calculating the motion angle of the pedestrian i, and specifically passing through a formulaCalculating to obtain the angle theta of the motion direction of the pedestrian i _i By the formulaCalculating to obtain the angle theta of the motion direction of the pedestrian j _j 。

Further, step e) is performed by the formulaCalculating the distance d between the pedestrian i and the pedestrian j in the kth frame of pedestrian image _i,j 。(x _i,k ,y _i,k ) Centroid coordinate C for pedestrian i _i ，(x _j,k ,y _j,k ) Barycentric coordinate C for pedestrian j _j 。

For the problem that the target tracking error can occur during the tracking of the crossed and overlapped pedestrians, as shown in (b) of fig. 4, if d _i,j If the value of (1) is smaller than or equal to a preset threshold value th1, the intersection overlapping shielding exists between the pedestrian i and the pedestrian j; as shown in fig. 4 (c), there may occur a problem of a tracking error of the pedestrians i and j in the case of the pedestrian tracking of the cross overlapping. Firstly, removing a cross overlapping area according to a pedestrian frame to obtain two non-overlapping areas of the pedestrian frames of the selected pedestrian i and the pedestrian j, taking the non-overlapping areas of the pedestrian frames as pedestrian characteristic areas, respectively selecting the characteristic areas of the pedestrian i and the pedestrian j in a kth frame image as shown in figure 5, then using a gray level co-occurrence matrix and a color moment to represent texture color characteristics of the characteristic areas, and calculating the movement direction theta of the pedestrian i and the pedestrian j from a kth-1 frame to a kth frame respectively _i And theta _j Using characteristic region colour texturesThe method combined with the pedestrian movement realizes pedestrian tracking. Specifically, the color textures and the motion directions of the characteristic areas of the pedestrians i and the pedestrians j in the kth frame image are matched with the color texture motion direction of each pedestrian in the kth-1 frame, when the similarity exceeds a preset threshold th2, the pedestrians are successfully matched, the ID number IDi of the pedestrian i in the kth-1 frame is obtained and used as the ID number of the pedestrian i in the kth frame, the ID number IDj of the pedestrian j is used as the ID number of the pedestrian j in the kth frame, the centroid of the same ID number in two adjacent frames is connected, and the tracking of each pedestrian is completed.

(2) For pedestrian tracking without cross-over, as shown in FIG. 6 (b), if the distance d between pedestrian i and pedestrian j in the kth frame image _i,j If the detection result is larger than the preset threshold th1, no overlapping shielding exists between the pedestrians i and j; as shown in fig. 6 (c), the pedestrian i and the pedestrian j can be correctly tracked when the pedestrian without the cross overlap is tracked. Only the motion direction theta of the pedestrian i and the pedestrian j from the kth-1 frame to the kth frame is calculated _i And theta _j And matching the motion directions of the pedestrians i and j in the k frame image with the motion direction of the pedestrians in the k-1 frame, acquiring the ID number IDi of the pedestrian i in the k-1 frame as the ID number of the pedestrian i in the k frame, and connecting the centroid of the same ID number in two adjacent frames by taking the ID number IDj of the pedestrian j as the ID number of the pedestrian j in the k frame, thereby completing the tracking of each pedestrian.

The invention also relates to a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the steps of the deep learning based multi-row human detection and tracking method as described in any of the above steps.

The invention also relates to a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, which processor, when executing the program, implements the steps of the deep learning based multi-line human detection and tracking method as described in any of the above steps.

Finally, it should be noted that: the foregoing description is only a preferred embodiment of the present invention, and the present invention is not limited thereto, but it is to be understood that modifications and equivalents of some of the technical features described in the foregoing embodiments may be made by those skilled in the art, although the present invention has been described in detail with reference to the foregoing embodiments. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A multi-row human detection and tracking method based on deep learning, comprising:

2. The deep learning based multi-line person detection and tracking method according to claim 1, wherein the step of preprocessing the surveillance video in step a) is as follows: and converting the monitoring video into an image, and selecting one frame of image as the input of a pedestrian detection model every k frames on the basis of the image of the pedestrian contained in the first frame.

3. The multi-line person detection and tracking method and system based on deep learning according to claim 1, wherein the step of performing pedestrian detection on the preprocessed surveillance video by the deep learning technique in the step b) is as follows: and (3) inputting a central Net pedestrian detection model after unifying the sizes of the input frame images in proportion, modeling a pedestrian target as a single point, finding a central point to serve as a center point of a pedestrian frame through a Keypoint Heatmap key point thermodynamic diagram, and obtaining coordinates and size information of the pedestrian frame according to image feature regression of the central point.

4. The deep learning based multi-row human detection and tracking method of claim 1 wherein the step of marking the ID of the pedestrian box in step c) is: if the first frame image of the monitoring video has a pedestrian frame, a tracking model is built based on the first frame pedestrian image, pedestrian IDs of n pedestrian frames are marked as ID1 and ID2.

5. The deep learning based multi-row human detection and tracking method of claim 4 wherein step d) comprises the steps of:

d-5) is represented by the formulaCalculating to obtain the angle theta of the motion direction of the pedestrian i _i By the formulaCalculating to obtain the angle theta of the motion direction of the pedestrian j _j 。

6. The deep learning based multi-row human detection and tracking method of claim 5, wherein: step e) is performed by the formulaCalculating the distance d between the pedestrian i and the pedestrian j in the kth frame of pedestrian image _i,j 。

7. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the steps of the deep learning based multi-line person detection and tracking method of any of claims 1-6.

8. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps in the deep learning based multi-row human detection and tracking method of any one of claims 1-6 when the program is executed by the processor.