CN110188807B

CN110188807B - Tunnel pedestrian target detection method based on cascading super-resolution network and improved Faster R-CNN

Info

Publication number: CN110188807B
Application number: CN201910425679.1A
Authority: CN
Inventors: 赵敏; 孙棣华; 梅莹
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2019-05-21
Filing date: 2019-05-21
Publication date: 2023-04-21
Anticipated expiration: 2039-05-21
Also published as: CN110188807A

Abstract

The invention discloses a tunnel pedestrian target detection method based on a cascading super-resolution network and an improved Faster R-CNN, which comprises the following steps: step S1: training a super-resolution network to obtain an SRCNN super-resolution network model; step S2: acquiring a tunnel pedestrian training sample and marking pedestrians; step S3: clustering the size proportion of the marking frames, and selecting proper anchor frame sizes in the RPN network; step S4: training a Faster R-CNN network to obtain a trained model; step S5: and detecting the tunnel pedestrian target by adopting the trained model to obtain a detection result. Compared with the original fast R-CNN network, the method has higher detection precision, and can be effectively applied to the problem of low-resolution pedestrian target detection in a tunnel environment.

Description

Tunnel pedestrian target detection method based on cascading super-resolution network and improved Faster R-CNN

Technical Field

The invention relates to the field of traffic data analysis and processing, in particular to a tunnel pedestrian target detection method based on a cascading super-resolution network and an improved Faster R-CNN.

Background

With the rapid development and progress of artificial intelligence, pedestrian detection is one of the main research directions in the field of computer vision, and plays an important role in intelligent video monitoring, and world-related scholars have conducted extensive research on pedestrian detection problems. According to traffic regulations, highway tunnels only allow vehicles to pass, but do not allow pedestrians to enter, but pedestrians do not pass through the highway tunnels according to traffic regulations. In the tunnel, the ambient light is insufficient, the driver's line of sight is limited, and when the vehicle passes in and out the tunnel, the driver can be caused to briefly blindness due to the sudden change of ambient light. And the automobile running speed on the expressway is high, the number of traffic lanes is small, the traffic flow is dense under the tunnel environment, and serious casualties are often caused once traffic accidents occur in the expressway tunnel. The pedestrian enters the expressway tunnel in a violation manner to bring great potential safety hazard to traffic safety, so that pedestrian target detection in the tunnel monitoring video plays an important role in guaranteeing tunnel safety.

At present, most urban highway tunnels are provided with cameras, but the traditional video monitoring system mainly discovers abnormal events through human eye observation, has great work labor intensity, requires very concentrated attention of monitoring staff, and has extremely high vigilance and quick response capability to the abnormal events. Video monitoring works are single and interesting, and the method is a great challenge to the patience of monitoring personnel. And under the condition of more camera clusters, the safety and the effectiveness of the monitoring system cannot be ensured even if the monitoring personnel pay attention to the high concentration and the full mind. Therefore, intelligent video monitoring becomes a necessary trend in the monitoring field. The data scale in the video picture is often very big, and intelligent video technology can combine the high-efficient data processing ability of computer to carry out the analysis to it and realize the automated inspection of target, discovers the abnormal conditions, realizes automatic alarm to remind relevant staff to handle more effectively. And the monitoring system can work continuously for 24 hours, so that manpower and material resources are greatly saved, and the accuracy and the safety of the monitoring system are greatly improved.

In the existing pedestrian detection technology, the traditional pedestrian detection method based on image processing mainly carries out detection in a mode of manually constructing pedestrian characteristics and then classifying by a classifier. The common pedestrian features include HOG features, haar-like features, LBP features and the like, the features mainly describe the shape of a human body according to contour information, texture information and the like of the human body, the feature expression capability is insufficient, and the detection effect is difficult to meet the requirements. The pedestrian detection method based on video monitoring (publication number: CN 101887524) applied by Hunan innovative manufacturing limited company, detects pedestrians by using the expanded gradient histogram feature and the Adaboost algorithm, and then further verifies the pedestrians detected in the front by using the gradient histogram feature and a support vector machine. The method needs to extract gradient histogram features of the image, and when the resolution of the image is not high and the pedestrian target is small in a tunnel environment, the obtained features are not ideal, so that the detection effect is poor. Deep learning has been developed rapidly in recent years, has achieved great success in the field of target detection, and by constructing a multi-layer convolutional neural network, target features can be automatically learned, and multiple bottom features can be combined into high-level features with stronger representation capability and richer semantic information. Therefore, the target detection method based on the convolutional neural network can obtain a better detection effect. For example, "a small target pedestrian detection method aiming at a complex scene" applied by Guangzhou Guangdong finance and electronic technology limited company, a neural network is adopted to train pedestrian samples, a shared feature extraction network is utilized to extract features to obtain a feature map, a classification feature extraction network is utilized to extract classification features from the feature map to obtain a classification feature map, and corresponding classification features are extracted according to the classification feature map and a candidate region to classify whether the pedestrian targets are. The invention can effectively solve the problem of high false detection rate of small target detection in complex scenes. However, in the tunnel environment, the image resolution is low, and the neural network feature extraction effect is poor, so that the detection effect is poor in the actual tunnel environment.

Disclosure of Invention

In view of the above, the present invention aims to provide a tunnel pedestrian target detection method based on a cascading super-resolution network and an improved fast R-CNN. The invention starts from the actual environment of a highway tunnel, develops and researches the problem of poor pedestrian characteristic extraction effect of Faster R-CNN in the tunnel environment, and designs a pedestrian target detection network of the super-resolution network and Faster R-CNN cascade connection. Aiming at the problem that the size proportion of the candidate frame in the RPN in the original Faster R-CNN is not suitable for the tunnel pedestrian target detection task, the K-Means algorithm is adopted to carry out clustering statistics on the pedestrian real labeling frame so as to generate a candidate window with higher quality. Compared with the original fast R-CNN network, the method has higher detection precision.

In a first aspect, the invention provides a tunnel pedestrian target detection method based on a cascading super-resolution network and an improved fast R-CNN, comprising the following steps:

step S1: training a super-resolution network to obtain an SRCNN super-resolution network model;

step S2: acquiring a tunnel pedestrian training sample and marking pedestrians;

step S3: obtaining the size and the length-width ratio of the tunnel pedestrians in the training sample according to the marking information of the previous step; then, clustering the pedestrian size and the aspect ratio obtained in the steps by adopting a K-Means clustering algorithm to obtain the anchor frame size proportion which is finally suitable for the tunnel pedestrian target;

step S4: training a Faster R-CNN network to obtain a trained model;

step S5: by trainingSRCNN model and Faster R-CNNAnd detecting the tunnel pedestrian target by the model to obtain a detection result.

In particular, said step S1 comprises in particular the following sub-steps:

s11, acquiring an original low-resolution image, and amplifying the low-resolution image by adopting an algorithm to obtain a training sample of a super-resolution network;

and S22, training the super-resolution network according to the training sample to obtain the SRCNN super-resolution network model.

In particular, in the step S2, an image frame is extracted from the tunnel video to form a training sample, and then a labeling tool is used to label the pedestrians in the pictures.

In particular, the step S4 includes the following steps:

step S41: preparing a VOC format data set;

step S42: building a training network;

step S43: pre-training the model;

step S44: and (3) training on the training sample by using the pre-training model obtained in the step S43 to obtain a final trained fast R-CNN model.

In particular, said step S5 comprises the following sub-steps:

step S51: inputting the picture to be detected into a trained SRCNN super-resolution network to obtain a picture with amplified resolution;

step S52: inputting the amplified picture in the previous step into a trained Faster R-CNN network model for detection to obtain a final detection result;

in a second aspect, the present invention provides a tunnel pedestrian target detection device based on a cascaded super-resolution network and an improved fast R-CNN, comprising:

super-resolution network training module: and amplifying the low-resolution image by adopting an algorithm to obtain a training sample of the super-resolution network, and training the super-resolution network according to the training sample to obtain the SRCNN super-resolution network model.

Pedestrian training sample acquisition module: the method comprises the steps of extracting image frames from tunnel videos to form training samples, and marking pedestrians in the pictures by using marking tools;

faster R-CNN network training module: firstly, acquiring a pre-training model, and continuously training the pre-training model on a training sample to obtain a final trained detection model;

and a detection module: and detecting the tunnel pedestrian target by adopting a trained model obtained by the super-resolution network training module and the fast R-CNN network training module to obtain a detection result.

The beneficial effects of the invention are as follows: aiming at the problems of special tunnel environment, frequent light change, fuzzy tunnel monitoring video image, more noise, low pedestrian target resolution and the like, the invention improves on the basis of a fast R-CNN detection network with better detection effect. Firstly, aiming at the problem that the characteristic expression capability of the Faster R-CNN network for extracting the low-resolution pedestrian target is insufficient in a tunnel environment, the invention provides a novel SR-CNN pedestrian target detection network for cascading the super-resolution network and the Faster R-CNN. The high-frequency information of the image is supplemented based on super-resolution reconstruction, image detail information is increased, and the fast R-CNN generates a feature map with richer semantic information, so that the pedestrian target detection precision in the tunnel environment is improved. And aiming at the problem that the extracted candidate window is inaccurate because the dimension of an Anchor candidate frame in an Anchor frame generation algorithm is manually designed and the dimension prior information of pedestrians is not utilized when the candidate region is extracted by an RPN network in the Faster R-CNN, the invention adopts a K-Means clustering algorithm to count the real labeling frames of the pedestrians to obtain the dimension of the Anchor frame so as to generate a candidate window with higher quality, and the regression accuracy of the prediction frame is improved, so that the detection precision is improved.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof.

Drawings

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings, in which:

FIG. 1 is a schematic flow chart of a tunnel pedestrian detection method based on a cascading super-resolution network and a fast R-CNN.

Detailed Description

Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. It should be understood that the preferred embodiments are presented by way of illustration only and not by way of limitation.

The invention discloses a tunnel pedestrian target detection method based on a cascading super-resolution network and an improved Faster R-CNN, which comprises the following steps:

step S1: the super-resolution network training method mainly comprises the following specific steps of:

step S11: and acquiring a training sample, wherein the training sample comprises a low-resolution picture and a corresponding high-resolution picture. The obtained original picture is generally a low-resolution picture, super-resolution processing is carried out on the original picture by adopting bicubic interpolation, and the length and the width of the picture are amplified by two times, so that a corresponding high-resolution training sample can be obtained;

step S12: the SRCNN network is selected as a basic super-resolution network, and mainly comprises three convolution layers, namely extraction and feature representation of image blocks, feature nonlinear mapping between LR image blocks and HR image blocks and final reconstruction of the HR image blocks. Inputting the training sample into the SRCNN network, and obtaining a super-resolution network model after training;

step S2: obtaining a tunnel pedestrian training sample:

in the embodiment, a large number of video images of different scenes are acquired from cameras installed in a tunnel, then an image is saved every 15 frames, the images with poor quality are removed, and a total of 6000 images are acquired as a training set and a testing set, wherein the ratio of the two is 4:1;

manually labeling the acquired video image by using a LabelImg tool, wherein the labeling object only comprises pedestrian categories, and labeling information is the category of a target in the image and coordinate values surrounding a target boundary frame;

step S3: selecting a proper anchor frame size in an RPN network, comprising the following specific steps:

step S31: extracting the length and width size information of the pedestrian target marking frame in the manual marking information in the previous step, and taking the length and width size information as a clustered sample;

step S32, clustering the sample data extracted in the step S31 by adopting a KMeans clustering algorithm, wherein the specific process is as follows:

(1) Randomly selecting 5 samples from the sample set, and taking the samples as an initial clustering center;

(2) The distances from all the remaining samples in the sample set to the 5 centers are calculated and the samples are assigned to the clusters nearest to them, the distance calculation formula is:

d＝1-IOU(i,c)

wherein d represents the distance from the sample to the cluster center, i represents the ith sample, c represents the c-th cluster center, and IOU represents the cross-over ratio of the sample to the area of the cluster center;

(3) For each cluster, calculating the average value of all points in the cluster and taking the average value as a new cluster center value;

(4) Calculating the distance between the new cluster center value and the original cluster center value;

(5) Judging whether the distance between the central values is smaller than a set threshold value or whether the maximum iteration number is reached, if so, exiting, otherwise, repeatedly executing the steps 2-5.

And obtaining the anchor frame size suitable for tunnel pedestrian target detection through the clustering algorithm.

Step S4: training a Faster R-CNN network, comprising the following specific steps:

and S41, manufacturing the VOC data set. In this embodiment, the image data and the labeling information are manufactured into a training dataset in the PASCAL VOC format, which mainly includes three folders, the image folders store xml files of the picture labeling information, the ImageSets store txt files, each line of the txt files includes a name of a picture, the txt files divide the pictures of the dataset into various sets, such as a training set, a test set, etc., and the JPEGImages includes all training and test verification pictures;

step S42: building a training model, in the embodiment, building a detection algorithm in a CAFFE (Convolutional Architecture for Fast Feature Embedding) deep learning frame, and selecting VGG16 as a feature extraction network based on a fast R-CNN detection algorithm;

step S43: pre-training between models; in the embodiment, the network is pre-trained by using an ImageNet large-scale classification data set, the pre-training is performed by using a random gradient descent method, the initial learning rate is set to be 0.1, and the total iteration times are 100k, so that a pre-training model is obtained;

step S44: training on the training sample by using the pre-training model obtained in the step 43), and obtaining a trained model. Specifically, in this embodiment, the parameters of the original fast R-CNN model are modified, the size proportion of the anchor frame in the RPN network is changed to the actual size proportion obtained by the clustering, and the original target class 21 is changed to 2 (the original fast R-CNN network needs to detect 20 classes of targets, plus a background, so the class 21 is used for detecting pedestrians in the tunnel, only one class is used for detecting pedestrians in the tunnel, plus a background, and thus is 2), and then the pre-training model obtained in the step 3) is used for training on a training data set to obtain a final detection model;

step S5: detecting a tunnel pedestrian target by adopting a trained model: specifically, the method comprises the following steps:

step S51: inputting the tunnel image into a trained super-resolution network to obtain a high-resolution image amplified by four times (the length and the width are amplified by two times);

step S52: inputting the picture processed by super resolution in the last step into the trained fast R-CNN network for detection, and obtaining a preliminary detection result under a given confidence threshold (generally taking 0.5), wherein the preliminary detection result comprises a target category and a target boundary frame coordinate; and then removing redundant target boundary boxes by using a non-maximum suppression algorithm, wherein the specific flow of the non-maximum suppression algorithm is as follows:

(1) Obtaining all target detection windows and scores S thereof according to a detection algorithm;

(2) Sequencing the detection windows according to the score S of the detection windows from high to low;

(3) The window M with the highest score after sequencing is selected as a suppression window;

(4) The window with lower score than the suppression window is used as the suppressed window B _i The area overlapping rate overlap of the suppressed window and the suppressed window is calculated as follows:

(5) If the area coincidence rate is higher than the set threshold T, the window is restrained, namely the window is removed;

(6) Ending when only one detection window is left, otherwise, continuing to select the window with the highest score as the inhibition window, and turning to the step 4;

after non-maximum suppression, the final detection result can be obtained.

According to the method, a super-resolution network is trained firstly, then a training sample for detecting the pedestrian target of the tunnel is obtained, and then the size proportion information of the pedestrian target marking frame in the marked sample is clustered to obtain the size of the anchor frame suitable for detecting the pedestrian target of the tunnel. And then, based on a Faster R-CNN detection algorithm, modifying network parameters according to the clustering result and the pedestrian target detection task, and training a Faster R-CNN target detection model. And finally, inputting the picture to be detected of the tunnel into a trained super-resolution network to obtain a picture with amplified resolution, inputting the picture subjected to super-resolution processing into the trained fast R-CNN network, and processing the primary detection result by adopting a non-maximum suppression algorithm to obtain a final detection result. The method mainly aims at solving the problem that the tunnel pedestrian target detection is difficult due to the severe tunnel environment and low image resolution, and has higher practical value.

In combination with the improvement, a new pedestrian target detection network is formed. Through verification, the novel pedestrian target detection method provided by the invention can obtain a good detection effect in the problem of pedestrian target detection in the tunnel environment.

Based on the design thought of the method, the invention also provides a tunnel pedestrian target detection device based on a cascading super-resolution network and an improved Faster R-CNN, which comprises the following steps:

(1) Super-resolution network training module: and amplifying the low-resolution image by adopting an algorithm to obtain a training sample of the super-resolution network, and training the super-resolution network according to the training sample to obtain the SRCNN super-resolution network model.

(2) Pedestrian training sample acquisition module: the method comprises the steps of extracting image frames from tunnel videos to form training samples, and marking pedestrians in the pictures by using marking tools;

(3) Faster R-CNN network training module: firstly, acquiring a pre-training model, and continuously training the pre-training model on a training sample to obtain a final trained detection model;

(4) And a detection module: and detecting the tunnel pedestrian target by adopting a trained model obtained by the super-resolution network training module and the fast R-CNN network training module to obtain a detection result.

It should be appreciated that embodiments of the invention may be implemented or realized by computer hardware, a combination of hardware and software, or by computer instructions stored in a non-transitory computer readable memory. The methods may be implemented in a computer program using standard programming techniques, including a non-transitory computer readable storage medium configured with a computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner, in accordance with the methods and drawings described in the specific embodiments. Each program may be implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the program(s) can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language. Furthermore, the program can be run on a programmed application specific integrated circuit for this purpose.

Furthermore, the operations of the processes described herein may be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The processes (or variations and/or combinations thereof) described herein may be performed under control of one or more computer systems configured with executable instructions, and may be implemented as code (e.g., executable instructions, one or more computer programs, or one or more applications), by hardware, or combinations thereof, collectively executing on one or more processors. The computer program includes a plurality of instructions executable by one or more processors.

Further, the method may be implemented in any type of computing platform operatively connected to a suitable computing platform, including, but not limited to, a personal computer, mini-computer, mainframe, workstation, network or distributed computing environment, separate or integrated computer platform, or in communication with a charged particle tool or other imaging device, and so forth. Aspects of the invention may be implemented in machine-readable code stored on a non-transitory storage medium or device, whether removable or integrated into a computing platform, such as a hard disk, optical read and/or write storage medium, RAM, ROM, etc., such that it is readable by a programmable computer, which when read by a computer, is operable to configure and operate the computer to perform the processes described herein. Further, the machine readable code, or portions thereof, may be transmitted over a wired or wireless network. When such media includes instructions or programs that, in conjunction with a microprocessor or other data processor, implement the steps described above, the invention described herein includes these and other different types of non-transitory computer-readable storage media. The invention also includes the computer itself when dynamically configuring social recommendation technology based on urban public transportation travel data according to the invention.

The computer program can be applied to the input data to perform the functions described herein, thereby converting the input data to generate output data that is stored to the non-volatile memory. The output information may also be applied to one or more output devices such as a display. In a preferred embodiment of the invention, the transformed data represents physical and tangible objects, including specific visual depictions of physical and tangible objects produced on a display.

Finally, it is noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the present invention, which is intended to be covered by the claims of the present invention.

Claims

1. The tunnel pedestrian target detection method based on the cascade super-resolution network and the improved Faster R-CNN is characterized by comprising the following steps of:

step S2: acquiring a tunnel pedestrian training sample and marking pedestrians;

step S4: training a Faster R-CNN network to obtain a trained model;

s41, manufacturing a VOC (volatile organic compound) format data set;

step S42, building a training network: constructing a detection algorithm in a CAFFE deep learning framework, and selecting VGG16 as a feature extraction network based on a fast R-CNN detection algorithm;

step S43, pre-training the model: pre-training the network by using an ImageNet large-scale classification data set, wherein the pre-training uses a random gradient descent method, the initial learning rate is set to be 0.1, the total iteration number is 100k, and a pre-training model is finally obtained;

step S44, training on a training sample by using the pre-training model obtained in the step S43 to obtain a final trained fast R-CNN model: modifying the parameters of an original Faster R-CNN model, changing the size proportion of an anchor frame in an RPN network into the actual size proportion clustered in the step S3, changing the original target class from 21 to 2, and training on a training data set by using the pre-training model obtained in the step S43 to finally obtain a trained Faster R-CNN model;

step S5: and detecting the tunnel pedestrian target by adopting a trained SRCNN model and a trained fast R-CNN model to obtain a detection result.

2. The tunnel pedestrian target detection method based on the cascade super-resolution network and the improved fast R-CNN according to claim 1, wherein the method is characterized by: the step S1 specifically comprises the following substeps:

s11, acquiring an original low-resolution image, and amplifying the low-resolution image by adopting an interpolation algorithm to obtain a training sample of a super-resolution network;

and step S12, constructing an SRCNN super-resolution network, and training the super-resolution network according to the training sample to obtain an SRCNN super-resolution network model.

3. The tunnel pedestrian target detection method based on the cascade super-resolution network and the improved fast R-CNN according to claim 1, wherein the method is characterized by: in the step S2, an image frame is extracted from the tunnel video to form a training sample, and then a labeling tool is used to label pedestrians in the picture.

4. The tunnel pedestrian target detection method based on the cascade super-resolution network and the improved fast R-CNN according to claim 1, wherein the method is characterized by: said step S5 comprises the sub-steps of:

step S52: inputting the amplified picture in the last step into a trained Faster R-CNN network model for detection, and obtaining a final detection result.

5. Tunnel pedestrian target detection device based on cascading super-resolution network and improved Faster R-CNN, which is characterized by comprising:

super-resolution network training module: amplifying the low-resolution image by adopting an algorithm to obtain a training sample of the super-resolution network, and training the super-resolution network according to the training sample to obtain a SRCNN super-resolution network model;

6. An electronic device, comprising: a processor, a memory, and a bus, wherein,

the processor and the memory complete communication with each other through the bus;

the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of any of claims 1-4.

7. A non-transitory computer readable storage medium storing computer instructions that cause the computer to perform the method of any of claims 1-4.