CN111160101A

CN111160101A - Video personnel tracking and counting method based on artificial intelligence

Info

Publication number: CN111160101A
Application number: CN201911200873.6A
Authority: CN
Inventors: 邹建红; 高元荣; 陈雯珊; 王辉; 陈哲; 张兴; 王宇奇; 陈彬; 陈凡千; 孙建锋
Original assignee: Fujian Nebula Big Data Application Service Co ltd
Current assignee: Fujian Nebula Big Data Application Service Co ltd
Priority date: 2019-11-29
Filing date: 2019-11-29
Publication date: 2020-05-15
Anticipated expiration: 2039-11-29
Also published as: CN111160101B

Abstract

The invention discloses a video personnel tracking and counting method based on artificial intelligence, which comprehensively utilizes learning features extracted by a convolutional neural network and artificial features extracted by geometric calculation, utilizes a tracker capable of updating network parameters on line to carry out multi-target matching between video image sequences, and calculates personnel increment according to the change of inner and outer identification positions of the same pedestrian in adjacent frames. A group of feature sets are obtained by learning from a mass of public video data sets by using a sparse self-encoder and are used as a filter of the convolutional neural network, so that the online updating efficiency of the convolutional neural network is improved. In addition, common personnel shielding modes are considered, and counting errors caused by shielding are compensated. The method has robustness, real-time performance, relatively high precision and strong anti-blocking capability, is suitable for personnel counting under video big data, and can be integrated into a video monitoring software system.

Description

Video personnel tracking and counting method based on artificial intelligence

[ technical field ] A method for producing a semiconductor device

The invention belongs to the technical field of intelligent video monitoring and analysis, and particularly relates to a video personnel tracking and counting method based on artificial intelligence.

[ background of the invention ]

Generally, two ideas are available for detecting and counting the number of people in a building. The method comprises the steps of accumulating and summing the number of people detected in the monitoring video of each area of each floor of the building to serve as the total number of people in the building. This concept requires that building video surveillance must be fully covered. In addition, because the number of people detected by each monitoring video has a certain error, the error of summation is large. And subtracting the accumulated number of people who leave from the accumulated number of people who enter detected in the monitoring videos of all the entrances and exits of the building to obtain the total number of people in the building. The number of the network cameras related to the idea is small, the accumulated error is relatively small, and the feasibility is good.

Actually, the second idea is to analyze the monitoring videos at the entrance and exit of the public building in real time to realize the passenger flow statistics, which is a technical solution that receives much attention and is gradually applied in recent years. The technical scheme generally requires that a network camera with a vertical downward overlooking visual angle is installed at the top end of each entrance and exit of the building, videos of people entering and exiting the building are captured, and the aim of calculating passenger flow is fulfilled by detecting and counting the heads of the people through an intelligent front end or a background. However, many times, owners do not want to additionally deploy a network video monitoring system dedicated for passenger flow statistics, but want to add a certain video analysis software module on the basis of the deployed video monitoring system for security and protection purposes to realize the passenger flow statistics. Since this not only simplifies the deployment of the system, but also avoids increasing hardware costs.

However, in order to obtain a large monitoring range, the network camera of the security video monitoring system is usually installed on the roof of a house, and looks obliquely at a monitoring area at a certain angle. In this scenario, people detection and counting cannot be achieved simply by detecting their heads. In a monitoring video scene obtained from a vertical downward overlooking visual angle, the human head characteristics are simple and consistent, no mutual shielding phenomenon exists generally, and a video analysis algorithm is simpler. However, in a monitored video scene observed from an oblique viewing angle, the human head features are complex, and the phenomenon of blocking or covering with other people often occurs, which adds great difficulty to the video analysis technology.

[ summary of the invention ]

The invention aims to solve the technical problem of providing a video personnel tracking and counting method based on artificial intelligence, selecting proper characteristics and establishing a reasonable pedestrian shielding model to effectively improve the accuracy of personnel detection and counting, realizing continuous robust tracking matching by utilizing a pedestrian tracking matching algorithm and meeting the real-time and long-term counting requirement of video monitoring.

The invention is realized by the following technical scheme:

a video personnel tracking and counting method based on artificial intelligence comprises the following steps:

step 1: initializing a video frame number n to 1, and segmenting an nth frame video object to obtain a pedestrian connected domain set

Calculating the feature vector of the jth pedestrian

And motion vector

Setting the longest untracked matching times of jth pedestrian

The calculation method of the feature vector and the motion vector of the pedestrian is as follows:

the feature vector of the jth pedestrian is v_j＝(x_j，y_j，S_j) Wherein (x)_j，y_j) Is p_jCenter of mass coordinate of S_jIs p_jArea of (d):

wherein, y_hFor monitoring the height of the video image, N_iAnd M_iAre each p_jThe number of pixels in the length and width directions of the circumscribed rectangle, f_j(x, y) is p_jThe binary image of (2):

the motion vector of the jth pedestrian is m_j＝(l_j，λ_j) Wherein l is_j＝l(p_j) For the inside and outside of the door by a pedestrian, l_j0 denotes the inside of the door (in the building) |_j1 indicates the outside of the door (outside the building); lambda [ alpha ]_jFor the longest untracked matching times of jth pedestrian, i.e. lambda_j＝λ(p_j)；

Step 2: dividing the (n +1) th frame video object

Computing

And

j＝1，...，k；

and step 3: at P⁽ⁿ⁾Middle search and P⁽ⁿ⁺¹⁾Adapted for

And is provided with

i＝1，...，k；

To P⁽ⁿ⁺¹⁾Each pedestrian in (1)

Are all from P_nTo find the pedestrian matched with the tracking

If the matching is successful, calculating the number increment in: (1) if the pedestrians move from the inside of the building to the outside of the building, the increment of the number of the pedestrians is-1; (2) if the pedestrians move from the outside to the inside of the building, the increment of the number of the pedestrians is 1; (3) if the pedestrians move in the building all the time, the increment of the number of the pedestrians is 0; (4) if the pedestrians move outside the building all the time, the increment of the number of the pedestrians is 0;

successfully obtaining matched p_iThe longest untracked matching times lambda_iAll are reset;

if the matching is successful, the matching needs to be checked

Whether the judgment condition of the combined type shielding is met or not is judged, if yes, the in detection needs to be compensated;

if the matching fails, the judgment is needed

Whether the pedestrian is a blocked pedestrian in the nth frame; if it is

Meet the judgment of distributed shieldingIf the conditions are determined, compensating the in; otherwise, look at

For pedestrians newly appearing in the monitored area, let λ_i＝0；

And 4, step 4: examination of P⁽ⁿ⁾Those failing to react with P⁽ⁿ⁺¹⁾Successfully matched pedestrians, supplementing them to P⁽ⁿ⁺¹⁾The maximum number of times of matching which is not tracked is added with 1; if the pedestrian is matched in the (n +2) th frame, judging that intermittent shielding occurs and in is correspondingly changed; otherwise, adding 1 again to the longest untracked matching times, and judging that the mobile terminal leaves the monitoring area once the threshold value is reached; if it is

If the judgment condition of the convergent type shielding is met, compensating the in;

and 5: rejecting pedestrians and misdetected pedestrians who have left the monitored area, for P⁽ⁿ⁺¹⁾Checking whether the longest untracked matching frequency of each pedestrian exceeds a threshold value;

if the pedestrian is larger than the threshold value, the pedestrian is considered to leave the monitoring area and should be abandoned;

otherwise, the pedestrian is considered to be temporarily shielded and should be reserved;

meanwhile, whether the area of the pedestrian exceeds the range is checked, if the area of the pedestrian is not within the range, the pedestrian is considered to be detected wrongly and should be discarded;

updating P⁽ⁿ⁺¹⁾；

Step 6: and (4) making n equal to n +1, and jumping to the step 2 until the analysis of the whole video image sequence is completed.

Further, the tracking matching in step 3 specifically includes the following steps:

step 31: initializing a video frame number n ═ 1, tracker t (w);

step 32: handle

Quality of (1)Heart (x)_i，y_i) Translated to coordinates

To (3). The translation mode is that the center of mass is used as the center, and the center of mass is translated to D of the center of mass in the direction I₈Pixel points with a distance equal to d, wherein

r is 0, ± 1, ± 2, ± 3, 4, d is 5, 10; in conjunction with

Obtaining 17 samples (labeled as i) of the ith class;

step 33: using the obtained samples to form a sample set C⁽¹⁾And training the tracker T (W) to determine the parameter as W₁；

Step 34: detecting the (n +1) th frame to obtain P⁽ⁿ⁺¹⁾Is provided with C⁽ⁿ⁺¹⁾＝C⁽ⁿ⁾；

Step 35: p is to be_j∈P⁽ⁿ⁺¹⁾Input tracker T (W)_n) Obtaining output; taking the maximum value o of the output_mAnd an upper threshold value sigma₁Lower threshold σ₂(σ₁≥σ₂) And (3) comparison:

(1) if o is_mLess than a lower threshold σ₂Then, consider p_jIn the (n +1) th frame, the tracking matching fails for the newly appearing pedestrian. Handle p_jTranslation of the center of mass to coordinates

Therein is disclosed

r is 0, ± 1, ± 2, ± 3, 4, d is 5, 10. Together with p_jA total of 17 samples were obtained, increasing to C⁽ⁿ⁺¹⁾As a new class of samples;

(2) if o is_mGreater than the upper threshold σ₁Then, consider p_m∈P⁽ⁿ⁾And p_j∈P⁽ⁿ⁺¹⁾Are highly matched;

(3) if o is_mGreater than a lower threshold value sigma₂But is smaller than the upper threshold value sigma₁Then, consider p_m∈P⁽ⁿ⁾And p_j∈P⁽ⁿ⁺¹⁾Is matched, with_jTranslation of the center of mass to coordinates

Therein is disclosed

r is 0, ± 1, ± 2, ± 3, 4, d is 5, 10. Together with p_jObtaining 17 samples in total, adding the samples into a sample set with a label of m, and removing the 17 samples with the label of m which enter the first class if the number of the samples with the label of m is greater than the capacity V of each class of sample pool;

step 36: updating the sample set, removing the samples of pedestrians who leave the monitoring area and are detected by mistake, and updating the samples into 3 conditions:

(1) for newly emerging pedestrians, a new pedestrian category is created.

(2) For pedestrians whose appearance has changed, new samples should be collected and supplemented. When the number of samples exceeds the capacity V of each type of sample pool during sample expansion, the sample set is updated according to a first-in first-out rule, and the sample entering the sample pool at the earliest time is replaced by the newly supplemented sample. V34 was determined by experiment.

(3) And for the pedestrian which leaves the monitoring area and is detected by mistake, rejecting the sample of the category to which the pedestrian belongs. After updating, a new sample set C is obtained⁽ⁿ⁺¹⁾。

Step 37: update parameters of tracker t (w): use of C⁽ⁿ⁺¹⁾Training T (W), determining the parameter as W_n+1Network parameters in training T (W)The initial value is W_n。

Further, the tracker includes: the method comprises the following steps of (1) updating a filter, a convolutional neural network, a discriminant classifier and parameters on line;

obtaining a pedestrian set containing moving targets after the nth frame image is segmented by a video object, adjusting the area of each pedestrian rectangular frame to be 50 multiplied by 110, and inputting the pedestrian rectangular frames into a convolutional neural network;

the convolutional neural network inputs the extracted features into a discriminant classifier, and the discriminant classifier outputs a tracking result vector and gives the probability that the pedestrian in the current frame belongs to each class;

and if the tracking result shows that the appearance characteristics of newly appeared pedestrians are changed, the pedestrians leave the monitoring area and the false detection condition is detected, updating the sample set, retraining the hidden layer and the classifier, determining new network parameters, and then entering the pedestrian tracking of the (n +1) th frame.

Furthermore, the filter in the tracker is a set of feature sets pre-trained by a sparse self-encoder, and is obtained by training in a massive unsupervised auxiliary training set, so that the filter has good generality and completeness, the pre-training process of the features is an off-line process, and the trained features are not updated when a target tracking algorithm is executed.

Further, the convolution kernel used by the convolutional neural network in the tracker is a filter composed of 100 pre-training features with the size of 10 × 10.

Further, a mathematical model of a discriminant classifier in the tracker employs a SoftMax function.

The invention has the advantages that: the method has robustness, real-time performance, relatively high precision and strong anti-blocking capability, and can meet the application requirement of long-time uninterrupted operation of video monitoring. The method is suitable for personnel counting under video big data, and can be integrated into a video monitoring software system.

[ description of the drawings ]

The invention will be further described with reference to the following examples with reference to the accompanying drawings.

FIG. 1 is a flow chart of a video people tracking and counting method based on artificial intelligence of the present invention.

FIG. 2 is a schematic diagram of the same side type shielding of the present invention.

FIG. 3 is a schematic diagram of distributed occlusion according to the present invention.

FIG. 4 is a schematic diagram of the convergent occlusion of the present invention.

FIG. 5 is a schematic view of the intermittent occlusion of the present invention.

FIG. 6 is a merged occlusion diagram of the present invention.

FIG. 7 is a table of occlusion mode determination and compensation according to the present invention.

FIG. 8 is a flow chart of the trace matching algorithm of the present invention.

FIG. 9 is a block diagram of a convolutional neural network-based tracker of the present invention.

Fig. 10 is a network structure diagram of the sparse autoencoder of the present invention.

FIG. 11 is a sparse self-encoder training result visualization diagram of the present invention.

[ detailed description ] embodiments

Fig. 1 is a video human tracking and counting method based on artificial intelligence, which calculates human increment through the change of side identification position of the same pedestrian in adjacent frames, matches multiple pedestrians in adjacent frames using a tracker based on convolutional neural network feature extraction and online parameter update, detects common occlusion pattern and compensates human increment, and the method comprises the following steps:

Calculating the feature vector of the jth pedestrian

And motion vector

Setting the longest untracked matching times of jth pedestrian

The following describes a method for calculating a feature vector and a motion vector of a pedestrian.

wherein, y_hFor monitoring the height of the video image, N_iAnd M_iAre each p_jThe number of pixels in the length and width directions of the circumscribed rectangle, f_i(x, y) is p_jThe binary image of (2):

the motion vector of the jth pedestrian is m_j＝(l_j，λ_j) Wherein l is_j＝l(p_j) For the inside and outside of the door by a pedestrian, l_j0 denotes the inside of the door (in the building) |_j1 indicates the outside of the door (outside the building); lambda [ alpha ]_jFor the longest untracked matching times of jth pedestrian, i.e. lambda_j＝λ(p_j)。

Step 2: dividing the (n +1) th frame video object

Computing

And

j＝1，...，k。

and step 3: at P⁽ⁿ⁾In search andP⁽ⁿ⁺¹⁾adapted for

And is provided with

i＝1，...，k。

To P⁽ⁿ⁺¹⁾Each pedestrian in (1)

Are all from P_nTo find the pedestrian matched with the tracking

If the matching is successful, calculating the number increment in: (1) if the pedestrians move from the inside of the building to the outside of the building, the increment of the number of the pedestrians is-1; (2) if the pedestrians move from the outside to the inside of the building, the increment of the number of the pedestrians is 1; (3) if the pedestrians move in the building all the time, the increment of the number of the pedestrians is 0; (4) if the pedestrians move outside the building all the time, the increment of the number of the pedestrians is 0.

In either case, a successful match of p is obtained_iThe longest untracked matching times lambda_iIs cleared. If the matching is successful, the matching needs to be checked

And if the judgment condition of the combined shielding is met, compensating the in detection.

If the matching fails, the judgment is needed

Whether it is a certain blocked pedestrian in the nth frame. If it is

If the judgment condition of distributed shielding is met, compensating the in; otherwise, look at

For a pedestrian newly present in the monitored area,and let λ_i＝0。

And 4, step 4: examination of p⁽ⁿ⁾Those failing to react with p⁽ⁿ⁺¹⁾Successfully matched pedestrians, supplementing them to p⁽ⁿ⁺¹⁾And adds 1 to the longest untracked match. If the pedestrian is matched in the (n +2) th frame, judging that intermittent shielding occurs and in is correspondingly changed; otherwise, adding 1 again to the longest untracked matching times, and judging that the mobile terminal leaves the monitoring area once the threshold value is reached. If it is

And if the judgment condition of the convergent type shielding is met, compensating the in.

And 5: and eliminating pedestrians who have left the monitoring area and misdetected pedestrians. For P⁽ⁿ⁺¹⁾Each pedestrian of (4) is checked whether the longest untracked match number exceeds a threshold. If the pedestrian is larger than the threshold value, the pedestrian is considered to leave the monitoring area and should be abandoned; otherwise, the pedestrian is considered to be temporarily blocked and should be reserved. And meanwhile, checking whether the area of the pedestrian exceeds the range, and if the area of the pedestrian is not within the range, determining that false detection occurs and abandoning the detection. Updating P⁽ⁿ⁺¹⁾。

FIGS. 2, 3, 4, 5, 6 are schematic diagrams of unilateral occlusion, decentralized occlusion, convergent occlusion, intermittent occlusion, and merged occlusion, respectively. Table 1 shows the judgment conditions and the personnel count error compensation formula for the five common occlusion modes.

The design idea of anti-shielding is as follows: and regarding pedestrians appearing in the previous frame and not appearing in the current frame as blocked by default, adding the pedestrians to the pedestrian set of the current frame, and recording the blocking times by using the maximum tracking-free matching times, wherein the pedestrians still participate in the matching process of the pedestrian set of the next frame. If the pedestrians are detected again in the next few frames, the maximum number of times of the untracked tracking matching is cleared; otherwise, it can be considered that the pedestrians are not blocked but have leftThe area is monitored. Thus, if a certain pedestrian p_iMaximum number of untracked matches λ_iExceeds a threshold lambda₀Considered to have left the surveillance zone (including entering the building interior from the inside of the door and moving away from the outside of the door); if λ_iNot exceeding lambda₀And not equal to zero, the pedestrian is considered to be occluded in the nth frame. Lambda [ alpha ]_iAnd p_iThe relationship between the states of (a) is: if λ_iWhen the value is 0, then p_iIs located in the monitoring area and is successfully detected; if 0 < lambda_i＜λ₀Then p is_iIs positioned in the monitoring area and is shielded; if λ_i≥λ₀Then p is_iLeaving the monitored area.

Fig. 7 is a flow chart of the trace matching algorithm used in step 3 of the method of the present invention. The following details how the various steps in the trace matching algorithm are implemented:

step 31: initializing video frame number n ═ 1, tracker t (w).

Step 32: handle

Center of mass (x)_i，y_i) Translated to coordinates

r is 0, ± 1, ± 2, ± 3, 4, d is 5, 10. In conjunction with

A total of 17 samples (labeled i) from category i were obtained.

Step 33: using the obtained samples to form a sample set C⁽¹⁾And training the tracker T (W) to determine the parameter as W₁。

Step 34: detecting the (n +1) th frame to obtain p⁽ⁿ⁺¹⁾Is provided with C⁽ⁿ⁺¹⁾＝C⁽ⁿ⁾。

Step 35: p is to be_j∈P⁽ⁿ⁺¹⁾Input tracker T (W)_n) And obtaining output. Taking the maximum value O of the output_mAnd an upper threshold value sigma₁Lower threshold σ₂(σ₁≥σ₂) And (3) comparison:

Therein is disclosed

r is 0, ± 1, ± 2, ± 3, 4, d is 5, 10. Together with p_jA total of 17 samples were obtained, increasing to C⁽ⁿ⁺¹⁾As a new class of samples.

(2) If O is present_mGreater than the upper threshold σ₁Then, consider p_m∈P⁽ⁿ⁾And p_j∈P⁽ⁿ⁺¹⁾Are highly matched.

(3) If O is present_mGreater than a lower threshold value sigma₂But is smaller than the upper threshold value sigma₁Then, consider p_m∈P⁽ⁿ⁾And p_j∈P⁽ⁿ⁺¹⁾Is matched, with_jTranslation of the center of mass to coordinates

Therein is disclosed

r is 0, ± 1, ± 2, ± 3, 4, d is 5, 10. Together with p_jA total of 17 samples were obtained, added to the set of samples labeled m. At this time, if the number of samples labeled m is greater than the per-class pool capacity V, 17 samples labeled m are removed first.

Step 36: and updating the sample set, and rejecting the samples of the pedestrians which leave the monitoring area and are detected by mistake. The update is divided into 3 cases:

(1) for newly emerging pedestrians, a new pedestrian category is created.

(3) And for the pedestrian which leaves the monitoring area and is detected by mistake, rejecting the sample of the category to which the pedestrian belongs.

After updating, a new sample set C is obtained⁽ⁿ⁺¹⁾。

Step 37: the parameters of the tracker t (w) are updated. Use of C⁽ⁿ⁺¹⁾Training T (W), determining the parameter as W_n+1. In training T (W), the initial value of the network parameter is W_n。

Fig. 8 is a diagram of a tracker structure in a tracking matching algorithm. The tracker T (W) mainly comprises a filter, a convolutional neural network, a discriminant classifier, parameter online updating and the like.

And (3) segmenting the nth frame image by using a video object to obtain a pedestrian set containing a moving target, adjusting the area of each pedestrian rectangular frame to be 50 x 110, and inputting the pedestrian rectangular frames into the convolutional neural network. The convolutional neural network inputs the extracted features into a discriminant classifier, and the classifier outputs tracking result vectors to give the probability that the pedestrians in the current frame belong to each class. And if the tracking result shows that the appearance characteristics of newly appeared pedestrians are changed, the pedestrians leave the monitoring area, the false detection is carried out and the like, updating the sample set, retraining the hidden layer and the classifier, determining new network parameters, and then entering the pedestrian tracking of the (n +1) th frame.

The design and training methods of the filter, the convolutional neural network, the discriminant classifier, and the like are described below.

1. Filter with a filter element having a plurality of filter elements

The filter is a set of feature sets pre-trained by a sparse autoencoder to serve as convolution kernels. The method is obtained by training in a massive unsupervised auxiliary training set, and the feature set has good generality and completeness. The pre-training process of the features is an off-line process, and the trained features are not updated when the target tracking algorithm is executed. Fig. 9 is a network configuration diagram of the sparse autoencoder. L is₁Is an input layer, and inputs 10 × 10 images

L₂Is a hidden layer, containing 100 hidden neurons. L is₃Is an output layer, outputs h_W，b(x) In that respect Is provided with

Is the connection weight between the jth cell of the ith layer and the ith cell of the (l +1) th layer,

is the offset term of the ith unit of the l +1 th layer, the parameter of the sparse self-encoder is (W, b) ═ W⁽¹⁾，b⁽¹⁾，W⁽²⁾，b⁽²⁾) Wherein W is^(l)(1, 2) is

Is a 100 × 100 matrix of elements, b^(l)(1, 2) is

Is a 100-dimensional vector of elements.

The training process of the sparse autoencoder is as follows: (1) the gradient of the initialized weight and bias term is 0, with a normal distribution N (0, 0.01)²) The generated random value is used as an initial value of the network parameter (W, b); (2) the partial derivatives are calculated. Calculating and accumulating partial derivatives by using a back propagation algorithm; (3) updating the weight parameter; (4) and (4) repeating the steps (1) - (3) until convergence.

The method randomly selects one million pictures from a public data set Tiny Images Dataset containing a large number of pictures of objects, pedestrians, backgrounds and the like in real life as auxiliary unsupervised training data, and calculates and determines parameters (W, b).

If the input g (100-dimensional vector) has the following constraints:

the inputs that make the i-th element of the hidden layer get the maximum excitation are:

wherein the content of the first and second substances,

sequentially taking the maximum excitation value of the ith unit (

i

1, 2.., 100) of the hidden layer, and calculating g at the moment⁽ⁱ⁾Then 100 input images of 10 × 10 are obtained, as shown in fig. 10. These 100 images can be considered as "bases" of a training sample set, and any given image sample can be approximately represented by a combination of these bases. In the convolutional neural network, the substrates are used as convolution kernels, so that the features of the input picture can be effectively extracted.

2. Convolutional neural network

The convolution kernel is a filter consisting of 100 pre-training features of size 10 x 10. The filter may extract features of the input image. The step size of the filter is set to 5, and each filter convolves the input image to obtain a feature map with the size of 9 × 21. Then, pooling is performed for each 3 × 3 region of the feature map, and the pooling algorithm is an averaging, thereby obtaining a feature map with a size of 3 × 7. All 2100 nodes of the feature map are input into a neural network (namely a hidden layer) containing 350 nodes, and the features of a higher layer are further extracted while the dimension is reduced, so that the classifier can further conveniently distinguish.

3. Discriminant classifier

The mathematical model of the discriminant classifier is a SoftMax function. The minimum value of the cost function of the SoftMax regression algorithm can be solved by a gradient descent method, and a unique optimal solution is obtained.

4. Training of hidden layer and discriminant classifier cascade network

When the parameters need to be updated, the tracker should be retrained. The filter parameters need not be updated, and the parameters of the hidden layer and discriminant classifier need to be updated. And training the network formed by cascading the hidden layer and the discriminant classifier by using a gradient descent method as a whole. The training algorithm comprises the following steps: (1) performing feedforward transmission, and calculating feature maps after convolution and pooling, hidden layer weighted sum, activation value vector and classification probability vector; (2) calculating a residual error; (3) calculating a partial derivative; (4) updating the parameters; (5) and (4) repeating the steps (1) - (4) until convergence.

The relevant parameter settings for the process of the invention are shown in table 1.

TABLE 1 parameter settings

The method of the present invention was compared with ivt (inclusive Visual tracking), scm (sparse collectivity model), mil (multiple instant learning) methods on a self-built building doorway monitoring video data set, and the performance was as shown in table 2. From the result, the tracking accuracy of the method is closer to that of other algorithms, the execution efficiency is slightly higher than that of other algorithms, and the method is better than other algorithms in the aspects of robustness and counting precision.

TABLE 2 Performance and comparison of people number increment detection methods based on motion tracking

The method designs the shielding mode detection and compensation method for the common shielding mode, and has stronger shielding resistance. The method can meet the application requirement of video monitoring on long-time uninterrupted operation by only training the design of a hidden layer, a regression layer and the like when a convolution filter, a simple convolution neural network structure and online parameter updating are obtained through offline training in advance. The method has robustness, real-time performance and relatively high precision, is suitable for personnel counting under video big data, and can be integrated into a video monitoring software system.

The above description is only an example of the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A video personnel tracking and counting method based on artificial intelligence is characterized in that: the method comprises the following steps:

Calculating the feature vector of the jth pedestrian

And motion vector

Setting the longest untracked matching times of jth pedestrian

Step 2: dividing the (n +1) th frame video object

Computing

And

and step 3: at P⁽ⁿ⁾Middle search and P⁽ⁿ⁺¹⁾Adapted for

And is provided with

To P⁽ⁿ⁺¹⁾Each pedestrian in (1)

Are all from P_nTo find the pedestrian matched with the tracking

if the matching is successful, the matching needs to be checked

if the matching fails, the judgment is needed

Whether the pedestrian is a blocked pedestrian in the nth frame; if it is

For pedestrians newly appearing in the monitored area, let λ_i＝0；

And 4, step 4: examination of P⁽ⁿ⁾Those failing to react with P⁽ⁿ⁺¹⁾The successfully matched pedestrian willThese pedestrians are supplemented to P⁽ⁿ⁺¹⁾The maximum number of times of matching which is not tracked is added with 1; if the pedestrian is matched in the (n +2) th frame, judging that intermittent shielding occurs and in is correspondingly changed; otherwise, adding 1 again to the longest untracked matching times, and judging that the mobile terminal leaves the monitoring area once the threshold value is reached; if it is

updating P⁽ⁿ⁺¹⁾；

2. The artificial intelligence based video personnel tracking and counting method according to claim 1, wherein: the tracking matching in the step 3 specifically comprises the following steps:

step 31: initializing a video frame number n ═ 1, tracker t (w);

step 32: handle

Center of mass (x)_i，y_i) Translated to coordinates

To (3). The translation being in the form of a centroidCentering, translating the centroid to D with the centroid in the direction of₈Pixel points with a distance equal to d, wherein

r is 0, ± 1, ± 2, ± 3, 4, d is 5, 10; in conjunction with

Obtaining 17 samples of the ith class, wherein the labels are i;

(1) if O is present_mLess than a lower threshold σ₂Then, consider p_jIn the (n +1) th frame, the tracking matching fails for the newly appearing pedestrian. Handle p_jTranslation of the center of mass to coordinates

Therein is disclosed

r is 0, ± 1, ± 2, ± 3, 4, d is 5, 10; together with p_jA total of 17 samples were obtained, increasing to C⁽ⁿ⁺¹⁾As a new class of samples;

Therein is disclosed

(1) for newly appeared pedestrians, a new pedestrian category is created;

(2) for pedestrians whose appearance has changed, new samples should be collected and supplemented. When the number of samples exceeds the capacity V of each type of sample pool during sample expansion, updating the sample set according to a first-in first-out rule, namely replacing the sample entering the sample pool at the earliest time with the latest supplemented sample, and determining V to be 34 through experiments;

(3) for the pedestrian which leaves the monitoring area and is detected by mistake, samples of the category of the pedestrian are removed;

after updating, a new sample set C is obtained⁽ⁿ⁺¹⁾；

Step 37: update parameters of tracker t (w): use of C⁽ⁿ⁺¹⁾Training T (W), determining the parameter as W_n+1In training T (W), the initial value of the network parameter is W_n。

3. The artificial intelligence based video personnel tracking and counting method according to claim 2, characterized in that: the tracker, comprising: the method comprises the following steps of (1) updating a filter, a convolutional neural network, a discriminant classifier and parameters on line;

4. The artificial intelligence based video personnel tracking and counting method according to claim 3, wherein: the filter in the tracker is a group of feature sets which are pre-trained by a sparse self-encoder, is obtained by training in a massive unsupervised auxiliary training set, and has good generality and completeness.

5. The artificial intelligence based video personnel tracking and counting method according to claim 3, wherein: the convolution neural network in the tracker uses a convolution kernel which is a filter composed of 100 pre-training features with the size of 10 x 10.

6. The artificial intelligence based video personnel tracking and counting method according to claim 3, wherein: and a mathematical model of a discriminant classifier in the tracker adopts a SoftMax function.