CN111160101A - Video personnel tracking and counting method based on artificial intelligence - Google Patents

Video personnel tracking and counting method based on artificial intelligence Download PDF

Info

Publication number
CN111160101A
CN111160101A CN201911200873.6A CN201911200873A CN111160101A CN 111160101 A CN111160101 A CN 111160101A CN 201911200873 A CN201911200873 A CN 201911200873A CN 111160101 A CN111160101 A CN 111160101A
Authority
CN
China
Prior art keywords
pedestrian
pedestrians
samples
video
matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911200873.6A
Other languages
Chinese (zh)
Other versions
CN111160101B (en
Inventor
邹建红
高元荣
陈雯珊
王辉
陈哲
张兴
王宇奇
陈彬
陈凡千
孙建锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujian Nebula Big Data Application Service Co ltd
Original Assignee
Fujian Nebula Big Data Application Service Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujian Nebula Big Data Application Service Co ltd filed Critical Fujian Nebula Big Data Application Service Co ltd
Priority to CN201911200873.6A priority Critical patent/CN111160101B/en
Publication of CN111160101A publication Critical patent/CN111160101A/en
Application granted granted Critical
Publication of CN111160101B publication Critical patent/CN111160101B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/754Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries involving a deformation of the sample pattern or of the reference pattern; Elastic matching

Abstract

The invention discloses a video personnel tracking and counting method based on artificial intelligence, which comprehensively utilizes learning features extracted by a convolutional neural network and artificial features extracted by geometric calculation, utilizes a tracker capable of updating network parameters on line to carry out multi-target matching between video image sequences, and calculates personnel increment according to the change of inner and outer identification positions of the same pedestrian in adjacent frames. A group of feature sets are obtained by learning from a mass of public video data sets by using a sparse self-encoder and are used as a filter of the convolutional neural network, so that the online updating efficiency of the convolutional neural network is improved. In addition, common personnel shielding modes are considered, and counting errors caused by shielding are compensated. The method has robustness, real-time performance, relatively high precision and strong anti-blocking capability, is suitable for personnel counting under video big data, and can be integrated into a video monitoring software system.

Description

Video personnel tracking and counting method based on artificial intelligence
[ technical field ] A method for producing a semiconductor device
The invention belongs to the technical field of intelligent video monitoring and analysis, and particularly relates to a video personnel tracking and counting method based on artificial intelligence.
[ background of the invention ]
Generally, two ideas are available for detecting and counting the number of people in a building. The method comprises the steps of accumulating and summing the number of people detected in the monitoring video of each area of each floor of the building to serve as the total number of people in the building. This concept requires that building video surveillance must be fully covered. In addition, because the number of people detected by each monitoring video has a certain error, the error of summation is large. And subtracting the accumulated number of people who leave from the accumulated number of people who enter detected in the monitoring videos of all the entrances and exits of the building to obtain the total number of people in the building. The number of the network cameras related to the idea is small, the accumulated error is relatively small, and the feasibility is good.
Actually, the second idea is to analyze the monitoring videos at the entrance and exit of the public building in real time to realize the passenger flow statistics, which is a technical solution that receives much attention and is gradually applied in recent years. The technical scheme generally requires that a network camera with a vertical downward overlooking visual angle is installed at the top end of each entrance and exit of the building, videos of people entering and exiting the building are captured, and the aim of calculating passenger flow is fulfilled by detecting and counting the heads of the people through an intelligent front end or a background. However, many times, owners do not want to additionally deploy a network video monitoring system dedicated for passenger flow statistics, but want to add a certain video analysis software module on the basis of the deployed video monitoring system for security and protection purposes to realize the passenger flow statistics. Since this not only simplifies the deployment of the system, but also avoids increasing hardware costs.
However, in order to obtain a large monitoring range, the network camera of the security video monitoring system is usually installed on the roof of a house, and looks obliquely at a monitoring area at a certain angle. In this scenario, people detection and counting cannot be achieved simply by detecting their heads. In a monitoring video scene obtained from a vertical downward overlooking visual angle, the human head characteristics are simple and consistent, no mutual shielding phenomenon exists generally, and a video analysis algorithm is simpler. However, in a monitored video scene observed from an oblique viewing angle, the human head features are complex, and the phenomenon of blocking or covering with other people often occurs, which adds great difficulty to the video analysis technology.
[ summary of the invention ]
The invention aims to solve the technical problem of providing a video personnel tracking and counting method based on artificial intelligence, selecting proper characteristics and establishing a reasonable pedestrian shielding model to effectively improve the accuracy of personnel detection and counting, realizing continuous robust tracking matching by utilizing a pedestrian tracking matching algorithm and meeting the real-time and long-term counting requirement of video monitoring.
The invention is realized by the following technical scheme:
a video personnel tracking and counting method based on artificial intelligence comprises the following steps:
step 1: initializing a video frame number n to 1, and segmenting an nth frame video object to obtain a pedestrian connected domain set
Figure BDA0002295841150000021
Calculating the feature vector of the jth pedestrian
Figure BDA0002295841150000022
And motion vector
Figure BDA0002295841150000023
Setting the longest untracked matching times of jth pedestrian
Figure BDA0002295841150000024
The calculation method of the feature vector and the motion vector of the pedestrian is as follows:
the feature vector of the jth pedestrian is vj=(xj,yj,Sj) Wherein (x)j,yj) Is pjCenter of mass coordinate of SjIs pjArea of (d):
Figure BDA0002295841150000025
wherein, yhFor monitoring the height of the video image, NiAnd MiAre each pjThe number of pixels in the length and width directions of the circumscribed rectangle, fj(x, y) is pjThe binary image of (2):
Figure BDA0002295841150000031
the motion vector of the jth pedestrian is mj=(lj,λj) Wherein l isj=l(pj) For the inside and outside of the door by a pedestrian, lj0 denotes the inside of the door (in the building) |j1 indicates the outside of the door (outside the building); lambda [ alpha ]jFor the longest untracked matching times of jth pedestrian, i.e. lambdaj=λ(pj);
Step 2: dividing the (n +1) th frame video object
Figure BDA0002295841150000032
Computing
Figure BDA0002295841150000033
And
Figure BDA0002295841150000034
j=1,...,k;
and step 3: at P(n)Middle search and P(n+1)Adapted for
Figure BDA0002295841150000035
And is provided with
Figure BDA0002295841150000036
i=1,...,k;
To P(n+1)Each pedestrian in (1)
Figure BDA0002295841150000037
Are all from PnTo find the pedestrian matched with the tracking
Figure BDA0002295841150000038
If the matching is successful, calculating the number increment in: (1) if the pedestrians move from the inside of the building to the outside of the building, the increment of the number of the pedestrians is-1; (2) if the pedestrians move from the outside to the inside of the building, the increment of the number of the pedestrians is 1; (3) if the pedestrians move in the building all the time, the increment of the number of the pedestrians is 0; (4) if the pedestrians move outside the building all the time, the increment of the number of the pedestrians is 0;
successfully obtaining matched piThe longest untracked matching times lambdaiAll are reset;
if the matching is successful, the matching needs to be checked
Figure BDA0002295841150000039
Whether the judgment condition of the combined type shielding is met or not is judged, if yes, the in detection needs to be compensated;
if the matching fails, the judgment is needed
Figure BDA00022958411500000310
Whether the pedestrian is a blocked pedestrian in the nth frame; if it is
Figure BDA00022958411500000311
Meet the judgment of distributed shieldingIf the conditions are determined, compensating the in; otherwise, look at
Figure BDA00022958411500000312
For pedestrians newly appearing in the monitored area, let λi=0;
And 4, step 4: examination of P(n)Those failing to react with P(n+1)Successfully matched pedestrians, supplementing them to P(n+1)The maximum number of times of matching which is not tracked is added with 1; if the pedestrian is matched in the (n +2) th frame, judging that intermittent shielding occurs and in is correspondingly changed; otherwise, adding 1 again to the longest untracked matching times, and judging that the mobile terminal leaves the monitoring area once the threshold value is reached; if it is
Figure BDA0002295841150000041
If the judgment condition of the convergent type shielding is met, compensating the in;
and 5: rejecting pedestrians and misdetected pedestrians who have left the monitored area, for P(n+1)Checking whether the longest untracked matching frequency of each pedestrian exceeds a threshold value;
if the pedestrian is larger than the threshold value, the pedestrian is considered to leave the monitoring area and should be abandoned;
otherwise, the pedestrian is considered to be temporarily shielded and should be reserved;
meanwhile, whether the area of the pedestrian exceeds the range is checked, if the area of the pedestrian is not within the range, the pedestrian is considered to be detected wrongly and should be discarded;
updating P(n+1)
Step 6: and (4) making n equal to n +1, and jumping to the step 2 until the analysis of the whole video image sequence is completed.
Further, the tracking matching in step 3 specifically includes the following steps:
step 31: initializing a video frame number n ═ 1, tracker t (w);
step 32: handle
Figure BDA0002295841150000042
Quality of (1)Heart (x)i,yi) Translated to coordinates
Figure BDA0002295841150000043
To (3). The translation mode is that the center of mass is used as the center, and the center of mass is translated to D of the center of mass in the direction I8Pixel points with a distance equal to d, wherein
Figure BDA0002295841150000044
r is 0, ± 1, ± 2, ± 3, 4, d is 5, 10; in conjunction with
Figure BDA0002295841150000045
Obtaining 17 samples (labeled as i) of the ith class;
Figure BDA0002295841150000046
Figure BDA0002295841150000051
step 33: using the obtained samples to form a sample set C(1)And training the tracker T (W) to determine the parameter as W1
Step 34: detecting the (n +1) th frame to obtain P(n+1)Is provided with C(n+1)=C(n)
Step 35: p is to bej∈P(n+1)Input tracker T (W)n) Obtaining output; taking the maximum value o of the outputmAnd an upper threshold value sigma1Lower threshold σ21≥σ2) And (3) comparison:
(1) if o ismLess than a lower threshold σ2Then, consider pjIn the (n +1) th frame, the tracking matching fails for the newly appearing pedestrian. Handle pjTranslation of the center of mass to coordinates
Figure BDA0002295841150000052
Therein is disclosed
Figure BDA0002295841150000053
r is 0, ± 1, ± 2, ± 3, 4, d is 5, 10. Together with pjA total of 17 samples were obtained, increasing to C(n+1)As a new class of samples;
(2) if o ismGreater than the upper threshold σ1Then, consider pm∈P(n)And pj∈P(n+1)Are highly matched;
(3) if o ismGreater than a lower threshold value sigma2But is smaller than the upper threshold value sigma1Then, consider pm∈P(n)And pj∈P(n+1)Is matched, withjTranslation of the center of mass to coordinates
Figure BDA0002295841150000054
Therein is disclosed
Figure BDA0002295841150000055
r is 0, ± 1, ± 2, ± 3, 4, d is 5, 10. Together with pjObtaining 17 samples in total, adding the samples into a sample set with a label of m, and removing the 17 samples with the label of m which enter the first class if the number of the samples with the label of m is greater than the capacity V of each class of sample pool;
step 36: updating the sample set, removing the samples of pedestrians who leave the monitoring area and are detected by mistake, and updating the samples into 3 conditions:
(1) for newly emerging pedestrians, a new pedestrian category is created.
(2) For pedestrians whose appearance has changed, new samples should be collected and supplemented. When the number of samples exceeds the capacity V of each type of sample pool during sample expansion, the sample set is updated according to a first-in first-out rule, and the sample entering the sample pool at the earliest time is replaced by the newly supplemented sample. V34 was determined by experiment.
(3) And for the pedestrian which leaves the monitoring area and is detected by mistake, rejecting the sample of the category to which the pedestrian belongs. After updating, a new sample set C is obtained(n+1)
Step 37: update parameters of tracker t (w): use of C(n+1)Training T (W), determining the parameter as Wn+1Network parameters in training T (W)The initial value is Wn
Further, the tracker includes: the method comprises the following steps of (1) updating a filter, a convolutional neural network, a discriminant classifier and parameters on line;
obtaining a pedestrian set containing moving targets after the nth frame image is segmented by a video object, adjusting the area of each pedestrian rectangular frame to be 50 multiplied by 110, and inputting the pedestrian rectangular frames into a convolutional neural network;
the convolutional neural network inputs the extracted features into a discriminant classifier, and the discriminant classifier outputs a tracking result vector and gives the probability that the pedestrian in the current frame belongs to each class;
and if the tracking result shows that the appearance characteristics of newly appeared pedestrians are changed, the pedestrians leave the monitoring area and the false detection condition is detected, updating the sample set, retraining the hidden layer and the classifier, determining new network parameters, and then entering the pedestrian tracking of the (n +1) th frame.
Furthermore, the filter in the tracker is a set of feature sets pre-trained by a sparse self-encoder, and is obtained by training in a massive unsupervised auxiliary training set, so that the filter has good generality and completeness, the pre-training process of the features is an off-line process, and the trained features are not updated when a target tracking algorithm is executed.
Further, the convolution kernel used by the convolutional neural network in the tracker is a filter composed of 100 pre-training features with the size of 10 × 10.
Further, a mathematical model of a discriminant classifier in the tracker employs a SoftMax function.
The invention has the advantages that: the method has robustness, real-time performance, relatively high precision and strong anti-blocking capability, and can meet the application requirement of long-time uninterrupted operation of video monitoring. The method is suitable for personnel counting under video big data, and can be integrated into a video monitoring software system.
[ description of the drawings ]
The invention will be further described with reference to the following examples with reference to the accompanying drawings.
FIG. 1 is a flow chart of a video people tracking and counting method based on artificial intelligence of the present invention.
FIG. 2 is a schematic diagram of the same side type shielding of the present invention.
FIG. 3 is a schematic diagram of distributed occlusion according to the present invention.
FIG. 4 is a schematic diagram of the convergent occlusion of the present invention.
FIG. 5 is a schematic view of the intermittent occlusion of the present invention.
FIG. 6 is a merged occlusion diagram of the present invention.
FIG. 7 is a table of occlusion mode determination and compensation according to the present invention.
FIG. 8 is a flow chart of the trace matching algorithm of the present invention.
FIG. 9 is a block diagram of a convolutional neural network-based tracker of the present invention.
Fig. 10 is a network structure diagram of the sparse autoencoder of the present invention.
FIG. 11 is a sparse self-encoder training result visualization diagram of the present invention.
[ detailed description ] embodiments
Fig. 1 is a video human tracking and counting method based on artificial intelligence, which calculates human increment through the change of side identification position of the same pedestrian in adjacent frames, matches multiple pedestrians in adjacent frames using a tracker based on convolutional neural network feature extraction and online parameter update, detects common occlusion pattern and compensates human increment, and the method comprises the following steps:
step 1: initializing a video frame number n to 1, and segmenting an nth frame video object to obtain a pedestrian connected domain set
Figure BDA0002295841150000081
Calculating the feature vector of the jth pedestrian
Figure BDA0002295841150000082
And motion vector
Figure BDA0002295841150000083
Setting the longest untracked matching times of jth pedestrian
Figure BDA0002295841150000084
The following describes a method for calculating a feature vector and a motion vector of a pedestrian.
The feature vector of the jth pedestrian is vj=(xj,yj,Sj) Wherein (x)j,yj) Is pjCenter of mass coordinate of SjIs pjArea of (d):
Figure BDA0002295841150000085
wherein, yhFor monitoring the height of the video image, NiAnd MiAre each pjThe number of pixels in the length and width directions of the circumscribed rectangle, fi(x, y) is pjThe binary image of (2):
Figure BDA0002295841150000086
the motion vector of the jth pedestrian is mj=(lj,λj) Wherein l isj=l(pj) For the inside and outside of the door by a pedestrian, lj0 denotes the inside of the door (in the building) |j1 indicates the outside of the door (outside the building); lambda [ alpha ]jFor the longest untracked matching times of jth pedestrian, i.e. lambdaj=λ(pj)。
Step 2: dividing the (n +1) th frame video object
Figure BDA0002295841150000087
Computing
Figure BDA0002295841150000088
And
Figure BDA0002295841150000089
j=1,...,k。
and step 3: at P(n)In search andP(n+1)adapted for
Figure BDA0002295841150000091
And is provided with
Figure BDA0002295841150000092
i=1,...,k。
To P(n+1)Each pedestrian in (1)
Figure BDA0002295841150000093
Are all from PnTo find the pedestrian matched with the tracking
Figure BDA0002295841150000094
If the matching is successful, calculating the number increment in: (1) if the pedestrians move from the inside of the building to the outside of the building, the increment of the number of the pedestrians is-1; (2) if the pedestrians move from the outside to the inside of the building, the increment of the number of the pedestrians is 1; (3) if the pedestrians move in the building all the time, the increment of the number of the pedestrians is 0; (4) if the pedestrians move outside the building all the time, the increment of the number of the pedestrians is 0.
In either case, a successful match of p is obtainediThe longest untracked matching times lambdaiIs cleared. If the matching is successful, the matching needs to be checked
Figure BDA0002295841150000095
And if the judgment condition of the combined shielding is met, compensating the in detection.
If the matching fails, the judgment is needed
Figure BDA0002295841150000096
Whether it is a certain blocked pedestrian in the nth frame. If it is
Figure BDA0002295841150000097
If the judgment condition of distributed shielding is met, compensating the in; otherwise, look at
Figure BDA0002295841150000098
For a pedestrian newly present in the monitored area,and let λi=0。
And 4, step 4: examination of p(n)Those failing to react with p(n+1)Successfully matched pedestrians, supplementing them to p(n+1)And adds 1 to the longest untracked match. If the pedestrian is matched in the (n +2) th frame, judging that intermittent shielding occurs and in is correspondingly changed; otherwise, adding 1 again to the longest untracked matching times, and judging that the mobile terminal leaves the monitoring area once the threshold value is reached. If it is
Figure BDA0002295841150000099
And if the judgment condition of the convergent type shielding is met, compensating the in.
And 5: and eliminating pedestrians who have left the monitoring area and misdetected pedestrians. For P(n+1)Each pedestrian of (4) is checked whether the longest untracked match number exceeds a threshold. If the pedestrian is larger than the threshold value, the pedestrian is considered to leave the monitoring area and should be abandoned; otherwise, the pedestrian is considered to be temporarily blocked and should be reserved. And meanwhile, checking whether the area of the pedestrian exceeds the range, and if the area of the pedestrian is not within the range, determining that false detection occurs and abandoning the detection. Updating P(n+1)
Step 6: and (4) making n equal to n +1, and jumping to the step 2 until the analysis of the whole video image sequence is completed.
FIGS. 2, 3, 4, 5, 6 are schematic diagrams of unilateral occlusion, decentralized occlusion, convergent occlusion, intermittent occlusion, and merged occlusion, respectively. Table 1 shows the judgment conditions and the personnel count error compensation formula for the five common occlusion modes.
The design idea of anti-shielding is as follows: and regarding pedestrians appearing in the previous frame and not appearing in the current frame as blocked by default, adding the pedestrians to the pedestrian set of the current frame, and recording the blocking times by using the maximum tracking-free matching times, wherein the pedestrians still participate in the matching process of the pedestrian set of the next frame. If the pedestrians are detected again in the next few frames, the maximum number of times of the untracked tracking matching is cleared; otherwise, it can be considered that the pedestrians are not blocked but have leftThe area is monitored. Thus, if a certain pedestrian piMaximum number of untracked matches λiExceeds a threshold lambda0Considered to have left the surveillance zone (including entering the building interior from the inside of the door and moving away from the outside of the door); if λiNot exceeding lambda0And not equal to zero, the pedestrian is considered to be occluded in the nth frame. Lambda [ alpha ]iAnd piThe relationship between the states of (a) is: if λiWhen the value is 0, then piIs located in the monitoring area and is successfully detected; if 0 < lambdai<λ0Then p isiIs positioned in the monitoring area and is shielded; if λi≥λ0Then p isiLeaving the monitored area.
Fig. 7 is a flow chart of the trace matching algorithm used in step 3 of the method of the present invention. The following details how the various steps in the trace matching algorithm are implemented:
step 31: initializing video frame number n ═ 1, tracker t (w).
Step 32: handle
Figure BDA0002295841150000101
Center of mass (x)i,yi) Translated to coordinates
Figure BDA0002295841150000102
To (3). The translation mode is that the center of mass is used as the center, and the center of mass is translated to D of the center of mass in the direction I8Pixel points with a distance equal to d, wherein
Figure BDA0002295841150000103
r is 0, ± 1, ± 2, ± 3, 4, d is 5, 10. In conjunction with
Figure BDA0002295841150000104
A total of 17 samples (labeled i) from category i were obtained.
Figure BDA0002295841150000111
Figure BDA0002295841150000112
Step 33: using the obtained samples to form a sample set C(1)And training the tracker T (W) to determine the parameter as W1
Step 34: detecting the (n +1) th frame to obtain p(n+1)Is provided with C(n+1)=C(n)
Step 35: p is to bej∈P(n+1)Input tracker T (W)n) And obtaining output. Taking the maximum value O of the outputmAnd an upper threshold value sigma1Lower threshold σ21≥σ2) And (3) comparison:
(1) if o ismLess than a lower threshold σ2Then, consider pjIn the (n +1) th frame, the tracking matching fails for the newly appearing pedestrian. Handle pjTranslation of the center of mass to coordinates
Figure BDA0002295841150000113
Therein is disclosed
Figure BDA0002295841150000114
r is 0, ± 1, ± 2, ± 3, 4, d is 5, 10. Together with pjA total of 17 samples were obtained, increasing to C(n+1)As a new class of samples.
(2) If O is presentmGreater than the upper threshold σ1Then, consider pm∈P(n)And pj∈P(n+1)Are highly matched.
(3) If O is presentmGreater than a lower threshold value sigma2But is smaller than the upper threshold value sigma1Then, consider pm∈P(n)And pj∈P(n+1)Is matched, withjTranslation of the center of mass to coordinates
Figure BDA0002295841150000115
Therein is disclosed
Figure BDA0002295841150000116
r is 0, ± 1, ± 2, ± 3, 4, d is 5, 10. Together with pjA total of 17 samples were obtained, added to the set of samples labeled m. At this time, if the number of samples labeled m is greater than the per-class pool capacity V, 17 samples labeled m are removed first.
Step 36: and updating the sample set, and rejecting the samples of the pedestrians which leave the monitoring area and are detected by mistake. The update is divided into 3 cases:
(1) for newly emerging pedestrians, a new pedestrian category is created.
(2) For pedestrians whose appearance has changed, new samples should be collected and supplemented. When the number of samples exceeds the capacity V of each type of sample pool during sample expansion, the sample set is updated according to a first-in first-out rule, and the sample entering the sample pool at the earliest time is replaced by the newly supplemented sample. V34 was determined by experiment.
(3) And for the pedestrian which leaves the monitoring area and is detected by mistake, rejecting the sample of the category to which the pedestrian belongs.
After updating, a new sample set C is obtained(n+1)
Step 37: the parameters of the tracker t (w) are updated. Use of C(n+1)Training T (W), determining the parameter as Wn+1. In training T (W), the initial value of the network parameter is Wn
Fig. 8 is a diagram of a tracker structure in a tracking matching algorithm. The tracker T (W) mainly comprises a filter, a convolutional neural network, a discriminant classifier, parameter online updating and the like.
And (3) segmenting the nth frame image by using a video object to obtain a pedestrian set containing a moving target, adjusting the area of each pedestrian rectangular frame to be 50 x 110, and inputting the pedestrian rectangular frames into the convolutional neural network. The convolutional neural network inputs the extracted features into a discriminant classifier, and the classifier outputs tracking result vectors to give the probability that the pedestrians in the current frame belong to each class. And if the tracking result shows that the appearance characteristics of newly appeared pedestrians are changed, the pedestrians leave the monitoring area, the false detection is carried out and the like, updating the sample set, retraining the hidden layer and the classifier, determining new network parameters, and then entering the pedestrian tracking of the (n +1) th frame.
The design and training methods of the filter, the convolutional neural network, the discriminant classifier, and the like are described below.
1. Filter with a filter element having a plurality of filter elements
The filter is a set of feature sets pre-trained by a sparse autoencoder to serve as convolution kernels. The method is obtained by training in a massive unsupervised auxiliary training set, and the feature set has good generality and completeness. The pre-training process of the features is an off-line process, and the trained features are not updated when the target tracking algorithm is executed. Fig. 9 is a network configuration diagram of the sparse autoencoder. L is1Is an input layer, and inputs 10 × 10 images
Figure BDA0002295841150000121
L2Is a hidden layer, containing 100 hidden neurons. L is3Is an output layer, outputs hW,b(x) In that respect Is provided with
Figure BDA0002295841150000131
Is the connection weight between the jth cell of the ith layer and the ith cell of the (l +1) th layer,
Figure BDA0002295841150000132
is the offset term of the ith unit of the l +1 th layer, the parameter of the sparse self-encoder is (W, b) ═ W(1),b(1),W(2),b(2)) Wherein W is(l)(1, 2) is
Figure BDA0002295841150000133
Is a 100 × 100 matrix of elements, b(l)(1, 2) is
Figure BDA0002295841150000134
Is a 100-dimensional vector of elements.
The training process of the sparse autoencoder is as follows: (1) the gradient of the initialized weight and bias term is 0, with a normal distribution N (0, 0.01)2) The generated random value is used as an initial value of the network parameter (W, b); (2) the partial derivatives are calculated. Calculating and accumulating partial derivatives by using a back propagation algorithm; (3) updating the weight parameter; (4) and (4) repeating the steps (1) - (3) until convergence.
The method randomly selects one million pictures from a public data set Tiny Images Dataset containing a large number of pictures of objects, pedestrians, backgrounds and the like in real life as auxiliary unsupervised training data, and calculates and determines parameters (W, b).
If the input g (100-dimensional vector) has the following constraints:
Figure BDA0002295841150000135
the inputs that make the i-th element of the hidden layer get the maximum excitation are:
Figure BDA0002295841150000136
wherein the content of the first and second substances,
Figure BDA0002295841150000137
sequentially taking the maximum excitation value of the ith unit ( i 1, 2.., 100) of the hidden layer, and calculating g at the moment(i)Then 100 input images of 10 × 10 are obtained, as shown in fig. 10. These 100 images can be considered as "bases" of a training sample set, and any given image sample can be approximately represented by a combination of these bases. In the convolutional neural network, the substrates are used as convolution kernels, so that the features of the input picture can be effectively extracted.
2. Convolutional neural network
The convolution kernel is a filter consisting of 100 pre-training features of size 10 x 10. The filter may extract features of the input image. The step size of the filter is set to 5, and each filter convolves the input image to obtain a feature map with the size of 9 × 21. Then, pooling is performed for each 3 × 3 region of the feature map, and the pooling algorithm is an averaging, thereby obtaining a feature map with a size of 3 × 7. All 2100 nodes of the feature map are input into a neural network (namely a hidden layer) containing 350 nodes, and the features of a higher layer are further extracted while the dimension is reduced, so that the classifier can further conveniently distinguish.
3. Discriminant classifier
The mathematical model of the discriminant classifier is a SoftMax function. The minimum value of the cost function of the SoftMax regression algorithm can be solved by a gradient descent method, and a unique optimal solution is obtained.
4. Training of hidden layer and discriminant classifier cascade network
When the parameters need to be updated, the tracker should be retrained. The filter parameters need not be updated, and the parameters of the hidden layer and discriminant classifier need to be updated. And training the network formed by cascading the hidden layer and the discriminant classifier by using a gradient descent method as a whole. The training algorithm comprises the following steps: (1) performing feedforward transmission, and calculating feature maps after convolution and pooling, hidden layer weighted sum, activation value vector and classification probability vector; (2) calculating a residual error; (3) calculating a partial derivative; (4) updating the parameters; (5) and (4) repeating the steps (1) - (4) until convergence.
The relevant parameter settings for the process of the invention are shown in table 1.
TABLE 1 parameter settings
Figure BDA0002295841150000151
The method of the present invention was compared with ivt (inclusive Visual tracking), scm (sparse collectivity model), mil (multiple instant learning) methods on a self-built building doorway monitoring video data set, and the performance was as shown in table 2. From the result, the tracking accuracy of the method is closer to that of other algorithms, the execution efficiency is slightly higher than that of other algorithms, and the method is better than other algorithms in the aspects of robustness and counting precision.
TABLE 2 Performance and comparison of people number increment detection methods based on motion tracking
Figure BDA0002295841150000161
The method designs the shielding mode detection and compensation method for the common shielding mode, and has stronger shielding resistance. The method can meet the application requirement of video monitoring on long-time uninterrupted operation by only training the design of a hidden layer, a regression layer and the like when a convolution filter, a simple convolution neural network structure and online parameter updating are obtained through offline training in advance. The method has robustness, real-time performance and relatively high precision, is suitable for personnel counting under video big data, and can be integrated into a video monitoring software system.
The above description is only an example of the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (6)

1. A video personnel tracking and counting method based on artificial intelligence is characterized in that: the method comprises the following steps:
step 1: initializing a video frame number n to 1, and segmenting an nth frame video object to obtain a pedestrian connected domain set
Figure FDA0002295841140000011
Calculating the feature vector of the jth pedestrian
Figure FDA0002295841140000012
And motion vector
Figure FDA0002295841140000013
Setting the longest untracked matching times of jth pedestrian
Figure FDA0002295841140000014
The calculation method of the feature vector and the motion vector of the pedestrian is as follows:
the feature vector of the jth pedestrian is vj=(xj,yj,Sj) Wherein (x)j,yj) Is pjCenter of mass coordinate of SjIs pjArea of (d):
Figure FDA0002295841140000015
wherein, yhFor monitoring the height of the video image, NiAnd MiAre each pjThe number of pixels in the length and width directions of the circumscribed rectangle, fj(x, y) is pjThe binary image of (2):
Figure FDA0002295841140000016
the motion vector of the jth pedestrian is mj=(lj,λj) Wherein l isj=l(pj) For the inside and outside of the door by a pedestrian, lj0 denotes the inside of the door (in the building) |j1 indicates the outside of the door (outside the building); lambda [ alpha ]jFor the longest untracked matching times of jth pedestrian, i.e. lambdaj=λ(pj);
Step 2: dividing the (n +1) th frame video object
Figure FDA0002295841140000017
Computing
Figure FDA0002295841140000018
And
Figure FDA0002295841140000019
Figure FDA00022958411400000110
and step 3: at P(n)Middle search and P(n+1)Adapted for
Figure FDA00022958411400000111
And is provided with
Figure FDA00022958411400000112
Figure FDA00022958411400000113
To P(n+1)Each pedestrian in (1)
Figure FDA0002295841140000021
Are all from PnTo find the pedestrian matched with the tracking
Figure FDA0002295841140000022
If the matching is successful, calculating the number increment in: (1) if the pedestrians move from the inside of the building to the outside of the building, the increment of the number of the pedestrians is-1; (2) if the pedestrians move from the outside to the inside of the building, the increment of the number of the pedestrians is 1; (3) if the pedestrians move in the building all the time, the increment of the number of the pedestrians is 0; (4) if the pedestrians move outside the building all the time, the increment of the number of the pedestrians is 0;
successfully obtaining matched piThe longest untracked matching times lambdaiAll are reset;
if the matching is successful, the matching needs to be checked
Figure FDA0002295841140000023
Whether the judgment condition of the combined type shielding is met or not is judged, if yes, the in detection needs to be compensated;
if the matching fails, the judgment is needed
Figure FDA0002295841140000024
Whether the pedestrian is a blocked pedestrian in the nth frame; if it is
Figure FDA0002295841140000025
If the judgment condition of distributed shielding is met, compensating the in; otherwise, look at
Figure FDA0002295841140000026
For pedestrians newly appearing in the monitored area, let λi=0;
And 4, step 4: examination of P(n)Those failing to react with P(n+1)The successfully matched pedestrian willThese pedestrians are supplemented to P(n+1)The maximum number of times of matching which is not tracked is added with 1; if the pedestrian is matched in the (n +2) th frame, judging that intermittent shielding occurs and in is correspondingly changed; otherwise, adding 1 again to the longest untracked matching times, and judging that the mobile terminal leaves the monitoring area once the threshold value is reached; if it is
Figure FDA0002295841140000027
If the judgment condition of the convergent type shielding is met, compensating the in;
and 5: rejecting pedestrians and misdetected pedestrians who have left the monitored area, for P(n+1)Checking whether the longest untracked matching frequency of each pedestrian exceeds a threshold value;
if the pedestrian is larger than the threshold value, the pedestrian is considered to leave the monitoring area and should be abandoned;
otherwise, the pedestrian is considered to be temporarily shielded and should be reserved;
meanwhile, whether the area of the pedestrian exceeds the range is checked, if the area of the pedestrian is not within the range, the pedestrian is considered to be detected wrongly and should be discarded;
updating P(n+1)
Step 6: and (4) making n equal to n +1, and jumping to the step 2 until the analysis of the whole video image sequence is completed.
2. The artificial intelligence based video personnel tracking and counting method according to claim 1, wherein: the tracking matching in the step 3 specifically comprises the following steps:
step 31: initializing a video frame number n ═ 1, tracker t (w);
step 32: handle
Figure FDA0002295841140000031
Center of mass (x)i,yi) Translated to coordinates
Figure FDA0002295841140000032
To (3). The translation being in the form of a centroidCentering, translating the centroid to D with the centroid in the direction of8Pixel points with a distance equal to d, wherein
Figure FDA0002295841140000033
r is 0, ± 1, ± 2, ± 3, 4, d is 5, 10; in conjunction with
Figure FDA0002295841140000034
Obtaining 17 samples of the ith class, wherein the labels are i;
Figure FDA0002295841140000035
Figure FDA0002295841140000036
step 33: using the obtained samples to form a sample set C(1)And training the tracker T (W) to determine the parameter as W1
Step 34: detecting the (n +1) th frame to obtain P(n+1)Is provided with C(n+1)=C(n)
Step 35: p is to bej∈P(n+1)Input tracker T (W)n) Obtaining output; taking the maximum value O of the outputmAnd an upper threshold value sigma1Lower threshold σ21≥σ2) And (3) comparison:
(1) if O is presentmLess than a lower threshold σ2Then, consider pjIn the (n +1) th frame, the tracking matching fails for the newly appearing pedestrian. Handle pjTranslation of the center of mass to coordinates
Figure FDA0002295841140000037
Therein is disclosed
Figure FDA0002295841140000038
r is 0, ± 1, ± 2, ± 3, 4, d is 5, 10; together with pjA total of 17 samples were obtained, increasing to C(n+1)As a new class of samples;
(2) if o ismGreater than the upper threshold σ1Then, consider pm∈P(n)And pj∈P(n+1)Are highly matched;
(3) if O is presentmGreater than a lower threshold value sigma2But is smaller than the upper threshold value sigma1Then, consider pm∈P(n)And pj∈P(n+1)Is matched, withjTranslation of the center of mass to coordinates
Figure FDA0002295841140000041
Therein is disclosed
Figure FDA0002295841140000042
r is 0, ± 1, ± 2, ± 3, 4, d is 5, 10. Together with pjObtaining 17 samples in total, adding the samples into a sample set with a label of m, and removing the 17 samples with the label of m which enter the first class if the number of the samples with the label of m is greater than the capacity V of each class of sample pool;
step 36: updating the sample set, removing the samples of pedestrians who leave the monitoring area and are detected by mistake, and updating the samples into 3 conditions:
(1) for newly appeared pedestrians, a new pedestrian category is created;
(2) for pedestrians whose appearance has changed, new samples should be collected and supplemented. When the number of samples exceeds the capacity V of each type of sample pool during sample expansion, updating the sample set according to a first-in first-out rule, namely replacing the sample entering the sample pool at the earliest time with the latest supplemented sample, and determining V to be 34 through experiments;
(3) for the pedestrian which leaves the monitoring area and is detected by mistake, samples of the category of the pedestrian are removed;
after updating, a new sample set C is obtained(n+1)
Step 37: update parameters of tracker t (w): use of C(n+1)Training T (W), determining the parameter as Wn+1In training T (W), the initial value of the network parameter is Wn
3. The artificial intelligence based video personnel tracking and counting method according to claim 2, characterized in that: the tracker, comprising: the method comprises the following steps of (1) updating a filter, a convolutional neural network, a discriminant classifier and parameters on line;
obtaining a pedestrian set containing moving targets after the nth frame image is segmented by a video object, adjusting the area of each pedestrian rectangular frame to be 50 multiplied by 110, and inputting the pedestrian rectangular frames into a convolutional neural network;
the convolutional neural network inputs the extracted features into a discriminant classifier, and the discriminant classifier outputs a tracking result vector and gives the probability that the pedestrian in the current frame belongs to each class;
and if the tracking result shows that the appearance characteristics of newly appeared pedestrians are changed, the pedestrians leave the monitoring area and the false detection condition is detected, updating the sample set, retraining the hidden layer and the classifier, determining new network parameters, and then entering the pedestrian tracking of the (n +1) th frame.
4. The artificial intelligence based video personnel tracking and counting method according to claim 3, wherein: the filter in the tracker is a group of feature sets which are pre-trained by a sparse self-encoder, is obtained by training in a massive unsupervised auxiliary training set, and has good generality and completeness.
5. The artificial intelligence based video personnel tracking and counting method according to claim 3, wherein: the convolution neural network in the tracker uses a convolution kernel which is a filter composed of 100 pre-training features with the size of 10 x 10.
6. The artificial intelligence based video personnel tracking and counting method according to claim 3, wherein: and a mathematical model of a discriminant classifier in the tracker adopts a SoftMax function.
CN201911200873.6A 2019-11-29 2019-11-29 Video personnel tracking and counting method based on artificial intelligence Active CN111160101B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911200873.6A CN111160101B (en) 2019-11-29 2019-11-29 Video personnel tracking and counting method based on artificial intelligence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911200873.6A CN111160101B (en) 2019-11-29 2019-11-29 Video personnel tracking and counting method based on artificial intelligence

Publications (2)

Publication Number Publication Date
CN111160101A true CN111160101A (en) 2020-05-15
CN111160101B CN111160101B (en) 2023-04-18

Family

ID=70556257

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911200873.6A Active CN111160101B (en) 2019-11-29 2019-11-29 Video personnel tracking and counting method based on artificial intelligence

Country Status (1)

Country Link
CN (1) CN111160101B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112906590A (en) * 2021-03-02 2021-06-04 东北农业大学 FairMOT-based multi-target tracking pedestrian flow monitoring method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013189464A2 (en) * 2012-11-28 2013-12-27 中兴通讯股份有限公司 Pedestrian tracking and counting method and device for near-front top-view monitoring video
CN104112282A (en) * 2014-07-14 2014-10-22 华中科技大学 A method for tracking a plurality of moving objects in a monitor video based on on-line study
CN105224912A (en) * 2015-08-31 2016-01-06 电子科技大学 Based on the video pedestrian detection and tracking method of movable information and Track association
CN105989615A (en) * 2015-03-04 2016-10-05 江苏慧眼数据科技股份有限公司 Pedestrian tracking method based on multi-feature fusion
CN109146921A (en) * 2018-07-02 2019-01-04 华中科技大学 A kind of pedestrian target tracking based on deep learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013189464A2 (en) * 2012-11-28 2013-12-27 中兴通讯股份有限公司 Pedestrian tracking and counting method and device for near-front top-view monitoring video
CN104112282A (en) * 2014-07-14 2014-10-22 华中科技大学 A method for tracking a plurality of moving objects in a monitor video based on on-line study
CN105989615A (en) * 2015-03-04 2016-10-05 江苏慧眼数据科技股份有限公司 Pedestrian tracking method based on multi-feature fusion
CN105224912A (en) * 2015-08-31 2016-01-06 电子科技大学 Based on the video pedestrian detection and tracking method of movable information and Track association
CN109146921A (en) * 2018-07-02 2019-01-04 华中科技大学 A kind of pedestrian target tracking based on deep learning

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112906590A (en) * 2021-03-02 2021-06-04 东北农业大学 FairMOT-based multi-target tracking pedestrian flow monitoring method

Also Published As

Publication number Publication date
CN111160101B (en) 2023-04-18

Similar Documents

Publication Publication Date Title
Haritaoglu et al. Detection and tracking of shopping groups in stores
US11580747B2 (en) Multi-spatial scale analytics
CN108021848A (en) Passenger flow volume statistical method and device
Lian et al. Spatial–temporal consistent labeling of tracked pedestrians across non-overlapping camera views
WO2008070206A2 (en) A seamless tracking framework using hierarchical tracklet association
D'Orazio et al. Color brightness transfer function evaluation for non overlapping multi camera tracking
CN107153824A (en) Across video pedestrian recognition methods again based on figure cluster
Zhao et al. Robust unsupervised motion pattern inference from video and applications
Eng et al. A bayesian framework for robust human detection and occlusion handling human shape model
Xiao et al. Vehicle and person tracking in aerial videos
CN112616023A (en) Multi-camera video target tracking method in complex environment
Afonso et al. Automatic estimation of multiple motion fields from video sequences using a region matching based approach
CN111160101B (en) Video personnel tracking and counting method based on artificial intelligence
Leyva et al. Video anomaly detection based on wake motion descriptors and perspective grids
Liu et al. Multi-view vehicle detection and tracking in crossroads
Taj et al. Multi-view multi-object detection and tracking
Mazzeo et al. Visual players detection and tracking in soccer matches
Al Najjar et al. A hybrid adaptive scheme based on selective Gaussian modeling for real-time object detection
Panda et al. Robust real-time object tracking under background clutter
CN111191524A (en) Sports people counting method
Agrawal et al. Specific motion pattern detection: state-of-the-art and challenges
Song et al. Closed-loop tracking and change detection in multi-activity sequences
Wen et al. People tracking and counting for applications in video surveillance system
CN115424207B (en) Self-adaptive monitoring system and method
Zou et al. Occupancy measurement by object tracking at building entrances

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant