CN112396587B

CN112396587B - Method for detecting congestion degree in bus compartment based on collaborative training and density map

Info

Publication number: CN112396587B
Application number: CN202011315096.2A
Authority: CN
Inventors: 周尚波; 张力丹; 齐颖; 朱淑芳
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2020-11-20
Filing date: 2020-11-20
Publication date: 2024-01-30
Anticipated expiration: 2040-11-20
Also published as: CN112396587A

Abstract

The invention discloses a method for detecting the degree of congestion in a bus compartment based on collaborative training and a density map, which comprises the following steps: acquiring a congestion image in a bus compartment to be detected; inputting the congestion images in the buses to be detected into a trained bus congestion detection network, wherein the bus congestion detection network respectively trains the congestion images in the buses with congestion classification labels and the congestion images in the buses without the congestion classification labels; and obtaining a detection result. In the process of training the neural network, different strategies are adopted to distinguish the data with the classification labels and the data without the classification labels, the density map of the data without the classification labels is marked, the training set with the labels and the density map are respectively utilized to train the neural network, the problem that part of manual work cannot accurately classify the crowded degree image data in the bus compartment is solved, and the classification accuracy is improved.

Description

Method for detecting congestion degree in bus compartment based on collaborative training and density map

Technical Field

The invention relates to the field of image recognition, in particular to a method for detecting the degree of congestion in a bus compartment based on collaborative training and a density map.

Background

The intelligent video monitoring technology is that a computer is enabled to be like the brain of a person, a camera is enabled to be like the eyes of the person, and the computer and the camera can be matched to intelligently analyze images acquired from the camera so as to understand the content in a monitored scene. The crowding degree in the bus is a research hotspot for intelligent video monitoring, and the intelligent management energy for realizing urban public transportation is realized by extracting target information from vehicle-mounted video data based on a deep learning technology.

The detection of the crowding degree in the bus compartment belongs to the field of crowd density estimation, and the problem needs to be solved by extracting features from given images, predicting the crowd density in the images and judging the crowding degree of image scenes. Crowd density estimation methods are classified into conventional video and image density estimation algorithms and density estimation algorithms based on deep learning.

Traditional crowd density estimation algorithms fall into two categories, detection-based and regression-based methods. The detection-based method is to detect the population in a scene through a sliding window and count the population, and is mainly divided into detection based on the whole body of the person and detection based on a certain part of the body. The detection is carried out through the characteristics of wavelets, HOGs, edges and the like extracted from the whole body of a person based on the whole body detection, the classifier mainly comprises SVM, boosting, random forests and the like, the method is suitable for sparse people, as the crowd density rises, the shielding between people is more and more serious, and the method is not suitable for people. Therefore, in order to solve the problem of shielding between people, detection is performed on the basis of a certain part of the body, such as the head of a person, and compared with full-body detection, the effect of the method is slightly improved. No matter what kind of detection method is used, the problem of occlusion among people is difficult to solve, so that the regression-based method is gradually used for solving the problem of crowd density estimation, and the regression idea is mainly to learn a mapping relation of features to the number of people, and learn regression models, such as linear regression, ridge regression and Gaussian process regression, by extracting shallow features of scenes. Although the regression-based method solves the shielding problem to a certain extent, the regression is performed by using the characteristics of the whole image, the spatial information in the image is ignored, and then the crowd density estimation method based on the density map is provided, and the spatial information of the image is added in the density estimation and counting process by learning the mapping between the local characteristics of the image and the corresponding density map.

Compared with the traditional method, the deep learning method can more conveniently and efficiently extract high-level features, and the model obtained by training large-scale image data and fitting a large number of parameters in a network through constructing a multi-layer convolutional neural network model has very strong characterization capability. The shallow layer network of the convolutional neural network can extract shallow layer characteristics such as textures, edges and the like in the image; the deep network can extract high-level features with semantic information in the image, and the corresponding density map or crowd count of the deep network can be obtained from the original image by utilizing the learning capability of deep learning.

Therefore, how to accurately classify images difficult to classify the degree of congestion becomes a problem that needs to be solved by those skilled in the art.

Disclosure of Invention

In view of the above-mentioned shortcomings of the prior art, the problems actually solved by the present invention include: accurately classifying images difficult to classify the crowdedness level.

In order to solve the technical problems, the invention adopts the following technical scheme:

a method for detecting the degree of congestion in a bus compartment based on collaborative training and a density map comprises the following steps:

acquiring a congestion image in a bus compartment to be detected;

inputting the congestion images in the buses to be detected into a trained bus congestion detection network, wherein the bus congestion detection network respectively trains the congestion images in the buses with congestion classification labels and the congestion images in the buses without the congestion classification labels;

and obtaining a detection result.

Preferably, the method for training the congestion degree detection network in the bus compartment comprises the following steps:

acquiring a crowded image set in a bus compartment, classifying and labeling the crowded image in the bus compartment meeting preset conditions based on the crowded degree to obtain a crowded degree classification tag data set, and classifying the crowded image in the bus compartment and the crowded degree classification tag data set meeting the preset conditions to obtain a training set, a verification set and a test set; carrying out head marking on crowded images in the buses which do not meet preset conditions to obtain a density icon marking data set;

constructing a congestion degree detection network in a bus compartment, wherein the congestion degree detection network in the bus compartment comprises a residual network branch and a cavity convolution branch;

pre-training the residual network branch by using a first training set, a first verification set and a first test set, fine-adjusting network weights, extracting the characteristics of the crowding degree classification label data set, and training a softmax classifier of the residual network branch to classify the crowding degree grade;

taking the image which does not meet the preset condition as the input of a congestion degree detection network in the bus compartment, taking a corresponding density map label as the output of a cavity convolution branch, and cooperatively training the cavity convolution branch based on a residual network branch;

after the cavity convolution branch training is completed, inputting the crowded images in the buses, which do not meet the preset conditions, into a crowded detection network in the buses to obtain corresponding predicted crowded classification label numbers, and adding the credible crowded classification label numbers in the predicted crowded classification label numbers and the corresponding crowded images in the buses into a training set;

training is continued until all crowded images in the buses which do not meet the preset conditions are added into the training set.

Preferably, the method for labeling the head of a person to obtain the density map label comprises the following steps:

labeling the heads of crowded pictures in the buses;

generating a single-channel picture with the same size as a congestion picture in a bus compartment, wherein in the single-channel picture, the pixel value of a point marked by a head is 1, and the values of the rest pixel points are all 0;

and processing the single-channel picture through Gaussian filtering to generate a corresponding density map label.

Preferably, the residual network branch comprises conv1, conv2, conv3, conv4, conv5, conv1 x 1, softmax classifiers connected in sequence; pre-training the residual network branches, fine-tuning the network weights, extracting features of the crowdedness classification tag dataset, and training the softmax classifier of the residual network branches to classify crowdedness levels comprises:

initializing weights of residual network branches;

freezing the weights of conv1, conv2, conv3 and conv4 of the residual network branch, not carrying out back propagation to update the weights, only updating the weights of the conv5 and softmax classifier, thawing the weight of the previous convolution layer when the accuracy rate of the verification set is not increased any more, repeating training until the accuracy rates corresponding to the first training set and the first verification set are not changed any more, and ending training; during training, softmax cross entropy is selected as a loss function, the tensor of the convolution layer Conv5_3 is subjected to softmax operation, the obtained vector y' is subjected to cross entropy calculation with the independent heat coding value y of the classification label of the corresponding sample, and a loss value H is obtained _y'(y) The formula is as follows:

H _y'(y) ＝∑ _i y′ _i log(y _i )

wherein y 'is' _i Representing the dimension, y 'of the vector' _i ＝[y′ ₁ ，y' ₂ ，y' ₃ ，…，y'k]，y′ ₁ ，y' ₂ ，y' ₃ …, y' k represents the probability that the sample belongs to various kinds of congestion degree grades, and k is the number of congestion degree grade categories.

Preferably, the hole convolution branch comprises a hole convolution connected with conv5, and when the hole convolution branch is trained cooperatively based on a residual network branch, the loss function is as follows:

where Θ represents the weight of the hole convolution, N represents the number of samples during one training, Z (X) _i The method comprises the steps of carrying out a first treatment on the surface of the Θ) represents the predicted density map label for the i-th sample,and (5) representing a density map label corresponding to the ith sample.

Compared with the prior art, the invention has the beneficial effects that:

(1) When the data in the bus compartment is marked, different strategies are adopted to distinguish the data with the classification label from the data without the classification label, the density map of the data without the classification label is marked, and the training set with the label and the density map are respectively utilized for training the neural network, so that the problem that the crowded degree image data in the bus compartment can not be accurately classified by part of manpower is solved, and the classification accuracy is improved.

(2) According to the method, the data with the classification labels and the data without the classification labels are trained respectively, dense graph labeling is not needed for the whole data set, and the labeling cost is greatly saved.

Drawings

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings, in which:

FIG. 1 is a flow chart of one embodiment of a method for detecting congestion in a bus compartment based on co-training and density maps in accordance with the present disclosure;

fig. 2 is a schematic structural diagram of an embodiment of a congestion degree detection network in a bus compartment according to the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings.

Fig. 1 is a flowchart of an embodiment of a method for detecting congestion in a bus compartment based on collaborative training and a density map, where the embodiment includes:

acquiring a congestion image in a bus compartment to be detected;

and obtaining a detection result.

In specific implementation, the method for training the congestion degree detection network in the bus compartment comprises the following steps:

(1) Acquiring a crowded image set in a bus compartment, classifying and labeling the crowded image in the bus compartment which meets preset conditions (for images which can be accurately classified into various crowded levels, the images are considered to meet the preset conditions, only classification labels are needed for the images which cannot be accurately classified, the images are considered to be not met with the preset conditions, the images are marked by a head, the network is enabled to judge which category should belong to) based on the crowded level, classification label data sets are obtained by classifying the crowded images in the bus compartment which meets the preset conditions, the crowded images and the crowded level classification label data sets in the bus compartment are classified to obtain training sets, verification sets and test sets (the training sets are used for debugging a neural network (training stage), the verification sets are used for checking training effects (if the effect is not good, the training process is needed), the test sets are used for testing the actual learning ability of the network, for the verification sets are shown in the training process, for example, whether a model is over-trained by checking the change relation of the loss values of the training sets and the verification sets along with the epoch can be seen, if the training is stopped in time, then the model structure and super-parameters are adjusted according to the conditions, the time is greatly saved, the effect is shown in the final performance of the test is shown, the performance of the test is selected as the relevant characteristic, but the training algorithm is not selected, and the relevant characteristic is not selected; carrying out head marking on crowded images in the buses which do not meet preset conditions to obtain a density icon marking data set;

in the present invention, a plurality of levels may be classified according to the degree of congestion. For example, a classification into open, comfortable, relatively crowded and severely crowded, the criteria for each class are as follows:

open space: more than 50% of seats in the vehicle cabin have no passenger seats;

comfort: the empty seats in the carriage are lower than 50 percent, but the empty seats still exist;

comparing congestion: the seats in the carriage are full, passengers in the carriage walkway occupy less than or equal to 50 percent of the total quantity of the passengers accommodated in the walkway, and the passengers can conveniently move in the carriage;

severe congestion: the seats in the carriage are full, passengers are densely distributed on the walkway, and passengers are gathered at the front and rear doors of the vehicle.

The density map refers to the sparse matrix of head annotations in the image that the dataset needs to provide for a given image, which is transformed into a 2D density map by a gaussian filter. The sum of all cells in the density map is the actual number of people in the image. In actual classification, more image scenes are in the four states, the boundary is fuzzy, and manual classification is subjective, so that the image scenes which can be accurately classified are classified and marked, and the image scenes which cannot be accurately classified are marked by people's head, so that the density map label is generated.

The method for obtaining the density map label by labeling the head of a person comprises the following steps:

labeling the heads of crowded pictures in the buses;

The marking point in the image marked with the head is x _i Then the point is denoted as delta (x-x _i )，δ(x-x _i ) As an impulse function, a label image with N heads is therefore expressed as:

i.e. a density map with N head images, marked with x head positions in the image _i Where the pixel value of the point is 1 and the rest is 0.

The density map thus constructed is assumed to exist independently of the image plane, but the density map generated is very sparse, resulting in an overall output approaching 0 at the time of grid calculation loss, and is unfavorable for scenes where the statistical population density is large, so that it is necessary to convolve the above equation with a gaussian function, and the position marked as the head becomes the density function of the region. Thus, the sparse problem is solved to a certain extent, and the counting mode of the number of people is not changed.

Gaussian kernel selection is very important, in real scenes, especially when crowd density is high, each x _i Since the picture is not independent, the pixel and the surrounding samples are inconsistent in scale in different scene areas due to perspective distortion, perspective transformation needs to be considered for accurately estimating the population density function, and assuming that the population around each head is uniformly distributed, the average distance of the image between the head and the nearest K neighbors is taken as a reasonable estimate of geometric distortion, namely, the parameter sigma is determined according to the distances of K heads in the image. For each head x _i Give K neighbor head distancesCalculate its average distance +.>σ _i (x) _i Representing the variance of the gaussian kernel, x _i Corresponding area on the picture where crowd density and +.>Proportional, so an adaptive gaussian kernel is chosen for convolution, the gaussian kernel is variable, and σ _i (x) _i And->Proportional:

wherein the method comprises the steps ofBeta=0.3. K takes 5, because of the image scene of the training set, each head region is adjacent to about 5 heads, F (x) is a density function representing the image of the tag marked with the head convolved with the Gaussian kernel, < ->Representing a gaussian kernel.

Since the collected data is evenly distributed over the various time periods, most of the image data is distributed in both open and comfortable states, with relatively little congestion and heavy congestion state data. In order to obtain more data in crowded and severely crowded states, data can be amplified by using data enhancement techniques to balance the amount of various types of data. The final image data set in the bus compartment mainly comprises four types, each type comprises 2 ten thousand pieces of data, and the data is divided into a training set, a verification set and a test set. Features of the image data in the vehicle cabin: the angles of the cameras are inconsistent, so that data enhancement is performed to a certain extent, and the robustness of the model can be enhanced.

(2) Constructing a congestion degree detection network in a bus compartment, wherein the congestion degree detection network in the bus compartment comprises a residual network branch and a cavity convolution branch;

the network for detecting the congestion degree in the bus compartment in the invention is shown in fig. 2.

The invention can build a network model based on a tensorflow deep learning open source framework, and the front end of the network adopts a resnet_v2 network structure as a feature extraction module of the whole model. The position of the activation function is adjusted by the resnet_v2 on the basis of the resnet, and a more stable result is obtained. The resnet has 5 sets of convolutions in total. The first group of convolution inputs has 224 x 224, the fifth group of convolution outputs has 7*7, which is reduced by 32 times, and BN (BacthNormalization) layers are added in the network. In order to balance accuracy and resource expenditure, the built residual error network has 50 layers in total, and a full connection layer is removed, so that the network is a full convolution network, and the size of an input picture can be any size.

The rear end adopts a cavity convolution module, and a density map of a corresponding image is generated while the receptive field is enlarged. A two-dimensional hole convolution is defined as follows:

x (M, N) is an image with length and width of M and N respectively, and the output y (M, N) of the cavity convolution is obtained through a convolution kernel w (i, j), wherein r represents the cavity rate. When r=1, it is a normal convolution. If the hole convolution rate is r, the convolution kernel of K is enlarged to K+ (K-1) (r-1). The cavity convolution utilizes sparse convolution to verify the alternate convolution and pooling operation, increases the receptive field on the premise of not increasing the calculation scale of network parameters, and is more suitable for crowd density estimation. And 6 layers of built cavity convolution networks are used, the cavity rates are the same, and a 1*1 convolution layer is adopted to output a result. All convolution layers are padded to maintain the original size.

To understand the effect of void fraction on the quality of the resulting density map, a comparison experiment was set up, with r= {1,2,4}, respectively, where a further set was r=2, 4 mixed. Experimental results show that r=2 works best.

(3) Pre-training the residual network branch by using a first training set, a first verification set and a first test set, fine-adjusting network weights, extracting the characteristics of the crowding degree classification label data set, and training a softmax classifier of the residual network branch to classify the crowding degree grade;

the residual network branch comprises conv1, conv2, conv3, conv4, conv5, conv1 x 1 and softmax classifiers which are connected in sequence; pre-training the residual network branches, fine-tuning the network weights, extracting features of the crowdedness classification tag dataset, and training the softmax classifier of the residual network branches to classify crowdedness levels comprises:

initializing weights of residual network branches;

the invention can employ a residual network pre-training weights on the ImageNet dataset as initial weights. The ImageNet data set is a large-scale labeled image data set organized according to the WordNet architecture, about 1500 tens of thousands and 2.2 tens of thousands are mainly used for tasks such as image classification, target identification and the like, and the transfer learning on the data set can not only accelerate the convergence rate of a model, but also improve the robustness of the model.

Freezing the weights of conv1, conv2, conv3 and conv4 of the residual network branch, not carrying out back propagation to update the weights, only updating the weights of the conv5 and softmax classifier, thawing the weight of the previous convolution layer when the accuracy rate of the verification set is not increased any more, repeating training until the accuracy rates corresponding to the first training set and the first verification set are not changed any more, and ending training; the learning rate can be set smaller due to the initial weight of pre-training during training. The batch was set at 32 according to the computational power of the GPU, eventually 10 epochs were trained and the model loss was no longer changing.

During training, softmax cross entropy is selected as a loss function, tensors of a convolution layer conv5_3 (the last convolution of Conv 5) are subjected to softmax operation, and the obtained vector y' and the independent thermal coding value y of the classification label of the corresponding sample are subjected to cross entropy calculation to obtain a loss value H _y'(y) The formula is as follows:

H _y'(y) ＝∑ _i y′ _i log(y _i )

wherein y 'is' _i Representing the dimension, y 'of the vector' _i ＝[y′ ₁ ，y' ₂ ，y' ₃ ，…，y' _k ]，y ₁ '，y' ₂ ，y' ₃ ，…，y' _k And respectively representing the probability that the samples belong to various crowding degree grades, wherein k is the crowding degree grade class number.

(4) Taking the image which does not meet the preset condition as the input of a congestion degree detection network in the bus compartment, taking a corresponding density map label as the output of a cavity convolution branch, and cooperatively training the cavity convolution branch based on a residual network branch;

the cavity convolution branch comprises a cavity convolution connected with conv5, and when the cavity convolution branch is trained cooperatively based on a residual network branch, the loss function is as follows:

where Θ represents the weight of the hole convolution, N represents the number of samples during one training, Z (X) _i The method comprises the steps of carrying out a first treatment on the surface of the Θ) represents the predicted density map label for the i-th sample,representing a density map label corresponding to the ith sample;

the hole convolution branches share the characteristics of residual branches, namely the weights of the residual branches are frozen at the moment and are not updated in the back propagation process, only the weights of the hole convolution branches are updated, and the trained hole convolution branches can predict the density map characteristics corresponding to the input image.

(5) After the cavity convolution branch training is completed, inputting the crowded images in the buses, which do not meet the preset conditions, into a crowded detection network in the buses to obtain corresponding predicted crowded classification label numbers, and adding the credible crowded classification label numbers in the predicted crowded classification label numbers and the corresponding crowded images in the buses into a training set;

(6) Training is continued (training of residual network branches and cavity convolution branches is performed again) until all crowded images in the bus which do not meet preset conditions are added into the training set.

Finally, it is noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be understood that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. The method for detecting the degree of congestion in the bus compartment based on the collaborative training and the density map is characterized by comprising the following steps of:

acquiring a congestion image in a bus compartment to be detected;

obtaining a detection result;

the method for training the congestion degree detection network in the bus compartment comprises the following steps:

acquiring a crowded image set in a bus compartment, classifying and labeling the crowded image in the bus compartment meeting preset conditions based on the crowded degree to obtain a crowded degree classification tag data set, and classifying the crowded image in the bus compartment and the crowded degree classification tag data set meeting the preset conditions to obtain a training set, a verification set and a test set; carrying out head marking on crowded images in the buses which do not meet preset conditions to obtain a density icon marking data set; the method comprises the steps that for images which can be accurately divided into congestion levels, preset conditions are considered to be met; regarding the images which cannot be accurately divided, considering that the images do not meet preset conditions;

2. The method for detecting the degree of congestion in the bus compartment based on cooperative training and density map as set forth in claim 1, wherein the method for labeling the head of a person to obtain the density map label comprises the steps of:

labeling the heads of crowded pictures in the buses;

3. The method for detecting the congestion degree in the bus compartment based on the cooperative training and the density map according to claim 1, wherein the residual network branch comprises conv1, conv2, conv3, conv4, conv5, conv1 x 1 and softmax classifiers which are sequentially connected; pre-training the residual network branches, fine-tuning the network weights, extracting features of the crowdedness classification tag dataset, and training the softmax classifier of the residual network branches to classify crowdedness levels comprises:

initializing weights of residual network branches;

H _y'(y) ＝∑ _i y′ _i log(y _i )

wherein y 'is' _i Representing the dimension, y 'of the vector' _i ＝[y′ ₁ ，y′ ₂ ，y′ ₃ ，…，y′ _k ]，y′ ₁ ，y′ ₂ ，y′ ₃ ，…，y′ _k And respectively representing the probability that the samples belong to various crowding degree grades, wherein k is the crowding degree grade class number.

4. A co-training and density map-based method for detecting congestion in a bus compartment as set forth in claim 3, wherein said hole convolution branch comprises a hole convolution connected to conv5, and the loss function when the hole convolution branch is co-trained based on the residual network branch is as follows: