CN111767799A

CN111767799A - Improved down-going human target detection algorithm for fast R-CNN tunnel environment

Info

Publication number: CN111767799A
Application number: CN202010484802.XA
Authority: CN
Inventors: 赵敏; 唐毅; 王卫平; 孙棣华; 王世森; 陈星州; 李莹英; 杨国峰; 何雪宁
Original assignee: Chongqing Expressway Group Co ltd; Chongqing University
Current assignee: Chongqing Expressway Group Co ltd; Chongqing University
Priority date: 2020-06-01
Filing date: 2020-06-01
Publication date: 2020-10-13

Abstract

The invention discloses an improved human target detection algorithm under the Faster R-CNN tunnel environment, which comprises the following steps: establishing a pedestrian target data set under an expressway tunnel environment, and randomly dividing the pedestrian target data set into a training set and a testing set; optimizing the Anchor in the Faster R-CNN network by adopting an unsupervised learning algorithm based on the training set obtained in the step to obtain the Anchor setting; establishing a cavity convolution pyramid structure; designing an attention mechanism for processing feature information and enhancing the expression capability of the features; and establishing a pedestrian detection framework under the environment of the expressway tunnel. The method improves the pedestrian target feature extraction capability under the conditions of dark images, small target relative scale, vehicle lamp influence and the like, and improves the pedestrian target detection rate under the tunnel environment.

Description

Improved down-going human target detection algorithm for fast R-CNN tunnel environment

Technical Field

The invention relates to the technical field of pedestrian detection in computer vision, in particular to an improved pedestrian target detection algorithm under the Faster R-CNN tunnel environment.

Background

In related laws and regulations, pedestrians are prohibited to appear on expressways, but some people take lucky psychology to pass through the expressways, which causes great potential safety hazards to the safe operation of the expressways, so that the automatic detection technology of the pedestrian targets has important practical significance to the safe operation of the expressways. Through carrying out the analysis to the surveillance video under the highway tunnel environment, it is dim and pedestrian's target is less relatively to discover that the picture is general, has the interference of car light simultaneously for pedestrian's characteristic draws the difficulty, leads to pedestrian's target detection effect not very good under the tunnel environment. Therefore, the technology for detecting the pedestrian target under the tunnel environment has important theoretical and practical significance.

Reading the existing patents and papers, the detection algorithm of the pedestrian target with good detection effect mostly adopts a deep learning method. For example, in a multi-scale pedestrian target detection neural network based on feature fusion (CN110490174A) applied by university of electronic technology, a feature-fused multi-scale pedestrian target detection neural network is constructed, and feature information in different feature layers of the neural network is fully utilized by fusing feature information in different layers in the neural network, so that the neural network can effectively extract target feature information, and the pedestrian target detection rate is improved; the method for detecting the small-scale intensive pedestrians, which is applied by Beijing deep-waken technology Co., Ltd (CN110414464A), has the advantages that since the lower-layer network in the deep learning network contains more position information and the higher-layer network contains more semantic information, the detection rate of the small-size pedestrian target is effectively improved by fusing the semantic information of the lower layer into the semantic information of the higher layer to detect the pedestrian target and simultaneously extracting the mask; in a pedestrian detection method based on video sequence interframe information (CN110348329A) applied by the university of electronic science and technology, a Faster R-CNN network is used as a basic frame for detecting a pedestrian target, the detection result of the previous frame of image in a video is added to a detection frame in the pedestrian detection of the current frame, and then a softening non-maximum value inhibition method is used for processing a candidate frame; the pedestrian feature extraction module is built and up sampled, and the high-level semantic information is up-sampled and then fused with the low-level semantic information, so that the high-level semantic information and the low-level position information are fully fused, the feature extraction capability of a neural network is enhanced, and the pedestrian target detection precision is improved; the method and the system for detecting the pedestrian based on the two-stage attention mechanism are applied to Shanghai transportation university (CN110135243A), and two different attention mechanisms are added to an RPN network in an Faster R-CNN network and are respectively used for weighting object positioning and characteristic information, so that the positioning accuracy and the detection accuracy of the pedestrian target are effectively improved.

Aiming at the problems that a video image is dim, a pedestrian target is small and is easily influenced by factors such as vehicle lamps and the like in a tunnel environment, the method is improved on the basis of the Faster R-CNN algorithm with high target detection precision. Firstly, aiming at the difference of data sets, redesigning an anchor of Faster R-CNN to enable the anchor to adapt to the extraction of a pedestrian target candidate frame in a tunnel environment; then aiming at the small size of a pedestrian target in the image, the invention designs a hole convolution pyramid module, collects feature information in different sub-regions in a feature map by using hole convolution, and increases the relation between network feature information; meanwhile, aiming at the problem of dispersion of feature information after the fast R-CNN network is added with the hollow convolution pyramid structure, the invention designs an attention mechanism to process the feature information and enhance the expression capability of the feature.

In conclusion, the method starts from the actual environment of the highway tunnel, firstly optimizes the anchor in the fast R-CNN network by using the K-means clustering algorithm according to different sample data sets, and then designs the hole convolution gold tower module to collect and fuse the information of different sub-areas in the characteristic diagram, thereby increasing the relation between the characteristic information. Meanwhile, aiming at the dispersion of the characteristic information, the characteristic information is enhanced by using an attention mechanism. Finally, a pedestrian target detection algorithm suitable for the tunnel environment is formed. The method can effectively improve the pedestrian target feature extraction capability under the conditions of dark images, small target relative scale, vehicle lamp influence and the like, and improve the pedestrian target detection rate under the tunnel environment.

Disclosure of Invention

In view of the above, the present invention provides a detection algorithm for improving a pedestrian target detection rate in a tunnel environment.

The purpose of the invention is realized by the following technical scheme:

an improved algorithm for detecting a descending human target in the fast R-CNN tunnel environment comprises the following steps:

the method comprises the following steps: establishing a pedestrian target data set under an expressway tunnel environment, and randomly dividing the pedestrian target data set into a training set and a testing set;

step two: optimizing the Anchor in the Faster R-CNN network by adopting an unsupervised learning algorithm based on the training set obtained in the first step to obtain the Anchor setting;

step three: establishing a cavity convolution pyramid structure;

step four: designing an attention mechanism for processing feature information and enhancing the expression capability of the features;

step five: a pedestrian detection framework under an expressway tunnel environment is established, and the specific process is as follows:

1) adding a cavity convolution pyramid module behind the fast R-CNN feature extraction layer,

2) the algorithm frame added with the cavity pyramid module is processed by convolution, dimension reduction and activation function,

3) and adding an attention mechanism module for further processing.

Further, the specific acquisition process of the pedestrian target data set in the step one is as follows:

1) and acquiring a video image containing a pedestrian target in a tunnel environment from the expressway monitoring center, and converting the video image into a picture format.

2) And (3) making a pedestrian data set in a VOC (volatile organic compound) format by using a LabelImg tool, and randomly dividing the made data set into a training set and a testing set according to a ratio of 9: 1.

Further, the specific process of the second step is as follows:

1) inputting the data of the training set in the step one into K-means for clustering processing,

2) and further processing the data obtained after the clustering processing to obtain the Anchor setting.

Further, the specific process of the third step is as follows:

1) firstly, the feature map is convoluted, and then the feature map after convolution is processed by the void convolution layers with four different void rates.

2) And carrying out convolution and dimension reduction processing on the feature map passing through the void convolution layer, and combining the feature map and the feature map passing through the convolution processing.

Further, the specific process of the step four is as follows:

1) compressing the characteristic diagram according to the length and width directions to obtain a channel real number with a global receptive field,

2) and (4) carrying out convolution operation on the characteristic diagram, and multiplying the characteristic diagram by the channel real number obtained in the previous step to obtain a final characteristic diagram.

Due to the adoption of the technical scheme, the invention has the following beneficial effects:

the invention optimizes the Anchor in the Faster R-CNN network by using an unsupervised learning method, so that the Anchor is suitable for extracting the pedestrian target candidate frame in the tunnel environment; a hole convolution pyramid module is designed, and the hole convolution is utilized to collect feature information in different sub-regions in a feature map, so that the relation between network feature information is increased; the invention designs an attention mechanism, processes the characteristic information and enhances the expression capability of the characteristic; through the three designs, the pedestrian target feature extraction capability under the conditions of dark images, small target relative scale, vehicle lamp influence and the like is improved, and the pedestrian target detection rate under the tunnel environment is improved.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the means of the instrumentalities and combinations particularly pointed out hereinafter.

Drawings

The drawings of the present invention are described below.

FIG. 1 is a schematic diagram of a void convolution pyramid structure;

FIG. 2 is a schematic illustration of an attention mechanism;

FIG. 3 is a schematic diagram of a partial pedestrian object detection framework.

Detailed Description

The invention is further illustrated by the following figures and examples.

Example 1

As shown in fig. 1-3, the improved human target detection algorithm under the fast R-CNN tunnel environment provided by this embodiment includes the following steps:

step three: establishing a cavity convolution pyramid structure;

3) and adding an attention mechanism module for further processing.

In this implementation, the specific acquisition process of the pedestrian target data set in the first step is as follows:

1) the method comprises the steps of obtaining video images containing pedestrian targets in a tunnel environment from a highway monitoring center, and saving every 30 frames of the video images into pictures in a jpg format with the size of 704 x 576.

2) Marking the image in the jpg format by using a LabelImg tool, firstly, completely framing the pedestrian target by using a rectangular frame, simultaneously marking a person data label, and finally, storing the marked image as VOC format data. And continuously repeating the processes until all the pictures are marked to form a pedestrian target data set in the expressway tunnel environment. And randomly setting the labeled pedestrian data set as a training set and a data set according to a ratio of 9: 1.

In this embodiment, the specific process of step two is as follows:

1) and (3) inputting the data of the training set in the step one into K-means for clustering, analyzing the VOC07 data set used in the original Faster R-CNN network and the data set made by the user, wherein the two data sets have large difference and can not directly detect the pedestrian target in the expressway tunnel environment by using the default Anchor setting in the Faster R-CNN network. Firstly, carrying out scaling processing on a rectangular frame in a marked pedestrian data set according to the size of a picture processed in a Faster R-CNN algorithm, inputting the width-to-height ratio data of the processed pedestrian marking frame into K-means as a sample set for clustering processing, and obtaining the clustering results of the height and the width-to-height ratio of the marking frame.

2) And processing the result obtained after clustering, and obtaining the pedestrian target frame with the height of 30-200 and the aspect ratio of 0.42 according to the clustering result. And increasing the height by the size of the ratio 2, and setting 4 detection frames in total. The new Anchor setting can be obtained through the clustering result, and compared with the original detection frame, the method has fewer candidate frames, improves the detection speed of the algorithm to a certain extent, and can be more suitable for pedestrian target detection in the tunnel environment.

In this embodiment, the specific process of the third step is as follows:

1) describing the structure of the hollow convolution pyramid in detail by combining with the graph 1, firstly processing a feature map of a feature extraction layer of the fast R-CNN network by a convolution layer with the size of 3 × 3 convolution kernels, and processing the feature map by using a Relu activation function so as to further enhance the nonlinearity of the network; and then, the feature map obtained through the operation passes through the void convolution layers with the void ratios of (1,6,12 and 18), the void convolution can extract features on the basis of not changing the resolution of the feature map, and the information of different areas in the feature map is collected through the convolution layers with different voids, so that the relation between feature information in a network can be effectively improved, and the positioning accuracy of a target can be effectively improved.

2) In order to ensure the weight occupied by the original characteristic diagram in the hollow convolution pyramid structure, the characteristic diagram passing through the hollow convolution layer is subjected to dimensionality reduction through the convolution layer with the size of 1 × 1 convolution kernel, the channel number is changed into one fourth of the original channel number, and then the characteristic diagram subjected to dimensionality reduction passes through an activation function, so that the nonlinearity of the network is further enhanced. And finally, combining the obtained feature map with the feature map which is subjected to convolution and activation function in the previous step to form a new feature map.

In this embodiment, the specific process of the step four is as follows:

1) in detail, the attention mechanism is designed with reference to fig. 2, and assuming that an input feature map is X, the feature map X is first compressed in a width-height direction of the feature map to obtain a real number in a channel direction, where the real number has a global receptive field to some extent. When the compression processing is performed, the formula is as follows;

in the formula F_sq(. cndot.) represents compression of the feature map, X represents the feature map, W represents the width of the feature map, H represents the height of the feature map, and (m, n) each represent a pixel value at a certain point.

2) The feature map X is compressed and simultaneously subjected to convolution processing, so that the target feature is further extracted, and meanwhile, a proper activation function is added after the convolution operation, so that the fitting degree of the network to the feature information is further enhanced. And finally, multiplying the feature information subjected to convolution processing by the channel information obtained in the previous step to obtain a final feature map.

In this embodiment, the fifth step is as follows:

1) the pedestrian detection framework in the highway tunnel environment is described in detail in conjunction with fig. 3. Firstly, a hole convolution module is added behind a FasterR-CNN feature extraction layer, so that the association degree between feature information is enhanced, and the feature extraction capability of the network is improved.

2) After the hole convolution gold tower is added in the feature extraction layer of the Faster R-CNN network, the number of channels of the feature map is changed from 512 to 1024, and the number of channels is changed to 512. Firstly, extracting the feature information of the target through the 1 × 1 convolution layer, then performing dimensionality reduction processing through the 3 × 3 convolution layer to change the number of channels into 512, and meanwhile, adding a Relu activation function to enhance the nonlinear features of the network and improve the feature extraction capability of the network.

3) After the convolution operation is performed, the network dimension reduction can cause the dispersion of the network characteristic information. And then adding the designed attention mechanism to convolution operation, further processing the characteristic information, enhancing the characteristic extraction capability of the network, and finally forming a target detection algorithm for the descending people in the expressway tunnel environment.

Example 2

The difference between the embodiment and the embodiment 1 is that in the second step, a mean shift algorithm is adopted to process the aspect ratio of the pedestrian labeling box to obtain a clustering result.

Finally, the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions, and all of them should be covered in the protection scope of the present invention.

Claims

1. An improved algorithm for detecting a downlink human target in a Faster R-CNN tunnel environment is characterized by comprising the following steps:

step three: establishing a cavity convolution pyramid structure;

3) and adding an attention mechanism module for further processing.

2. The pedestrian object detection algorithm of claim 1, wherein the pedestrian object data set in the first step is obtained by the following specific steps:

1) acquiring a video image containing a pedestrian target in a tunnel environment from a highway monitoring center, and converting the video image into a picture format;

3. The pedestrian object detection algorithm of claim 1, wherein the specific process of the second step is as follows:

4. The pedestrian object detection algorithm of claim 1, wherein the specific process of the third step is as follows:

1) firstly, carrying out convolution processing on the feature map, and then processing the feature map subjected to convolution processing through the void convolution layers with four different void ratios;

5. The pedestrian object detection algorithm of claim 1, wherein the specific process of the fourth step is as follows: