CN110929632A

CN110929632A - Complex scene-oriented vehicle target detection method and device

Info

Publication number: CN110929632A
Application number: CN201911133216.4A
Authority: CN
Inventors: 周华东; 冯瑞
Original assignee: Fudan University
Current assignee: Fudan University
Priority date: 2019-11-19
Filing date: 2019-11-19
Publication date: 2020-03-27

Abstract

The invention provides a vehicle target detection method facing a complex scene, which can complete recognition aiming at a vehicle monitoring video of the complex scene so as to complete vehicle target detection with good generalization capability and classification accuracy, and is characterized by comprising the following steps: step S1, acquiring picture frames from the vehicle monitoring video; step S2, adopting Laplace variance algorithm to analyze the frame quality of the picture frame and storing the high quality picture frame with clear frame quality into a picture queue; step S3, sequentially taking out high-quality picture frames from the picture queue; step S4, inputting high-quality picture frames into a pre-trained detection net model for detection and classification so as to obtain a vehicle detection result; and step S5, outputting the vehicle detection result, wherein the detection network model is a combination model of a Faster RCNN model and a Resnet-18 model, and the convolution value between the Faster RCNN model and the Resnet-18 model is shared.

Description

Complex scene-oriented vehicle target detection method and device

Technical Field

The invention belongs to the field of computer vision and image processing, and particularly relates to a vehicle target detection method and device for complex scenes.

Background

Traffic transportation is an important pillar of national economy, and a traffic monitoring system is an important part of the traffic transportation. Through traffic monitoring, various traffic parameter information can be accurately obtained in real time by related management departments. The detection of vehicles is the core and key of traffic monitoring. Vehicle detection plays an important role in intelligent transportation systems.

The traditional and common vehicle detection methods include: annular magnetic coil detection, ultrasonic detection, microwave radar detection, infrared detection, pneumatic tube detection, photoelectric detection, video-based vehicle detection, and the like. Video detection is a branch of computer vision applications, a technique that combines video images with computerized pattern recognition. The function of human eyes is simulated by the video camera and the computer. Video detection technology is increasingly becoming the most advantageous and most potential detection method in traffic monitoring systems.

Vehicle detection technology has developed to this day, and vehicle detection based on deep learning has been widely applied in special fields. The traditional machine learning method extracts target features, such as HOG \ FIFT and other methods, and inputs the extracted features into a classifier, such as a support vector machine, an iterator and the like, for classification and identification. The characteristics are essentially manually designed characteristics, and for different recognition problems, the quality of extracted characteristics has a direct influence on the system performance, so that researchers are required to carry out deep research on the field to be solved so as to design characteristics with better adaptability.

However, the machine learning method only aims at a specific recognition task, and the data scale is not large, the generalization capability is poor, and it is difficult to realize an accurate recognition effect in practical application problems. Although the most successful method from extracting target bottom layer features to extracting representation features of images along with continuous optimization of a machine learning method is based on a multi-scale deformation component model, which describes an object by using an uncertain relation and achieves good effects on tasks such as face detection, pedestrian detection and the like, the multi-scale deformation component model is relatively complex and has slow detection speed and is seriously dependent on geometric characteristics such as sample target aspect ratio and the like.

With the rapid development of deep learning theory and practice, the convolutional neural network can well extract the high-level features of the image, and the defect of extracting manual features is avoided. The feature extraction selection and classification are fused together through the convolutional neural network, so that the accuracy rate can be greatly improved. However, in real traffic scenarios, vehicle target detection is still influenced by many factors, such as: illumination, angle, deformation, shading, etc. Therefore, the influence of these factors is large in vehicle detection, which easily causes the inefficiency and poor generalization of the vehicle identification method.

Disclosure of Invention

In order to solve the problems, the invention provides a vehicle target detection method which can finish identification aiming at a vehicle monitoring video of a complex scene and has good generalization capability and classification accuracy, and the invention adopts the following technical scheme:

the invention provides a complex scene-oriented vehicle target detection method, which is used for identifying a vehicle monitoring video with a complex scene so as to complete vehicle target detection and is characterized by comprising the following steps: step S1, acquiring picture frames from the vehicle monitoring video; step S2, adopting Laplace variance algorithm to analyze the frame quality of the picture frame and storing the high quality picture frame with clear frame quality into a picture queue; step S3, sequentially taking out high-quality picture frames from the picture queue; step S4, inputting high-quality picture frames into a pre-trained detection net model for detection and classification so as to obtain a vehicle detection result; and step S5, outputting the vehicle detection result, wherein the detection network model is a combination model of a Faster RCNN model and a Resnet-18 model, and the convolution value between the Faster RCNN model and the Resnet-18 model is shared.

The complex scene-oriented vehicle target detection method provided by the invention can also have the technical characteristics that the step S2 comprises the following sub-steps: step S2-1, converting the picture frame into a gray scale image; step S2-2, carrying out convolution calculation on the gray level image by utilizing a Laplace mask to obtain a convolution calculation result; step S2-3, calculating the variance according to the convolution calculation result to obtain a variance value; step S2-4, comparing the variance value with a preset quality threshold, if the variance value is lower than the quality threshold, entering step S2-5, and if the variance value is higher than the quality threshold, entering step S2-6; step S2-5, discarding the corresponding picture frame as the blurred image; and step S2-6, storing the corresponding picture frame as a high-quality picture frame into a picture queue.

The complex scene-oriented vehicle target detection method provided by the invention can also have the technical characteristics that the detection network model at least comprises a convolutional layer, an RPN (resilient packet network), an ROI (region of interest) pooling layer and a classification network Resnet18, wherein the convolutional layer is composed of 13 conv layers, 13 relu layers and 4 posing layers, the RPN is used for generating a candidate region, the ROI pooling layer is used for obtaining a target feature map with a fixed size according to the candidate region and a feature map obtained by the last layer of VGG16, and the classification network Resnet18 is used for classifying the previously obtained target feature map.

The complex scene-oriented vehicle target detection method provided by the invention can also have the technical characteristics that the step S4 comprises the following sub-steps: step S4-1, extracting the characteristics of the picture frame through the convolution layer; step S4-2, generating an area target on the feature map by using an RPN network on the final convolution layer by using k different rectangular frames; step S4-3, normalizing the region target, and pooling the region of interest through the ROI pooling layer to obtain a pooling result; step S4-5, according to the pooling result, using a classification network Resnet18 to perform target classification to obtain a target classification result, and simultaneously performing position regression to obtain target position information; and step S4-6, taking the target classification result and the target position information as the vehicle detection result.

The complex scene-oriented vehicle target detection method provided by the invention can also have the technical characteristics that the detection net model is obtained through the following model training steps: step T1, constructing a picture library with labels and tags as a training sample set; step T2, constructing an initial detection network model; and step T3, training the detection net model by using the training sample set until a training completion condition is reached.

The invention also provides a vehicle target detection device facing to a complex scene, which is used for identifying the vehicle monitoring video to complete vehicle target detection and is characterized by comprising the following steps: the image frame acquisition part is used for acquiring a vehicle monitoring video and an image frame in the vehicle monitoring video; the picture queue generating part is used for carrying out frame quality analysis on the picture frames by adopting a Laplace variance algorithm and storing the high-quality picture frames with clear frame quality into the picture queue; and a vehicle detection output part, which is used for detecting the image queue according to a pre-stored detection network model so as to obtain and output a vehicle detection result, wherein the detection network model is a combined model of a Faster RCNN model and a Resnet-18 model, convolution values between the Faster RCNN model and the Resnet-18 model are shared, and the vehicle detection output part completes the detection through the following steps: a picture frame acquisition step, which is to take out high-quality picture frames from the picture queue in sequence; a picture frame detection step, wherein high-quality picture frames are input into a detection network model which is trained in advance to be detected and classified so as to obtain a vehicle detection result; and a detection result output step of outputting the vehicle detection result.

Action and Effect of the invention

According to the vehicle target detection method and device for the complex scene, the frame quality analysis is carried out on each obtained picture frame through the Laplacian variance algorithm, the high-quality picture frames with clear quality are preliminarily screened out, the recognition rate is improved, and the picture frames are further recognized through constructing a combined model of a fast RCNN model and a Resnet-18 model, so that the method and device for detecting the vehicle target for the complex scene have good generalization capability and classification accuracy rate when the picture frames are recognized. Meanwhile, the convolution value of the fast RCNN model and the Resnet-18 model in the combined model is shared, so that repeated convolution operation of the model on the picture frame can be greatly reduced, and the identification efficiency of the model algorithm is further improved. According to the vehicle target detection method and device, accurate and rapid identification of the vehicle target in a complex scene can be realized, so that the vehicle target detection method and device are not only limited to vehicle detection in the traffic road field, but also can be used in the business fields of intelligent security, community vehicle management and the like.

Drawings

FIG. 1 is a flow chart of a complex scenario-oriented vehicle target detection method in an embodiment of the present invention;

FIG. 2 is a flow chart of a training process for a test net model in an embodiment of the present invention; and

fig. 3 is a schematic structural diagram of a training net model in an embodiment of the present invention.

Detailed Description

In order to make the technical means, creation features, achievement purposes and effects of the invention easy to understand, the following describes the vehicle target detection method and device facing complex scenes specifically with reference to the embodiments and the accompanying drawings.

< example >

The training set and the test set used in this embodiment are UA-detac data sets. The UA-detac dataset provides a large amount of vehicle frame location data, which is classified into four categories car, bus, van, and others. The data set included 10 hours of video taken using a Cannon EOS 550D camera at 24 different locations in Beijing and Tianjin, China. Video is recorded at 25 frames per second (fps) with a resolution of 960 x 540 pixels. There are over 14 million image frames in the UA-detac dataset, 8250 vehicles are manually labeled, totaling 121 million labeled object bounding boxes.

FIG. 1 is a flowchart of a complex scene-oriented vehicle target detection method in an embodiment of the present invention.

As shown in fig. 1, the complex scene-oriented vehicle target detection method includes the following steps:

and step S1, acquiring picture frames from the vehicle monitoring video.

In this embodiment, each picture frame is obtained from a vehicle monitoring video by using opencv.

And step S2, performing frame quality analysis on the picture frames by adopting a Laplace variance algorithm, and storing the high-quality picture frames with clear frame quality into a picture queue.

In this embodiment, step S2 includes the following sub-steps:

step S2-1, converting the picture frame into a gray scale image;

step S2-2, carrying out convolution calculation on the gray level image by utilizing a Laplace mask to obtain a convolution calculation result;

step S2-3, calculating the variance according to the convolution calculation result to obtain a variance value;

step S2-4, comparing the variance value with a preset quality threshold, if the variance value is lower than the quality threshold, entering step S2-5, and if the variance value is higher than the quality threshold, entering step S2-6;

step S2-5, discarding the corresponding picture frame as the blurred image;

and step S2-6, storing the corresponding picture frame as a high-quality picture frame into a picture queue.

In this embodiment, the variance value calculated in step S2-3 is the variance between the gray-level value obtained by convolution in step S2-2 and the average gray-level value in the gray-level map. Meanwhile, in step S2-4, the quality threshold set in the present embodiment is 100, that is, the variance value of 100 is a blurred image, and the variance value >100 is a sharp image (high-quality picture frame).

Step S3, sequentially fetching the good quality picture frames from the picture queue.

And step S4, inputting the high-quality picture frames extracted in the step S3 into a pre-trained detection net model for detection and classification so as to obtain a vehicle detection result.

In this embodiment, the detection network model is a combination model of a fast RCNN model and a Resnet-18 model, and includes a convolutional layer, an RPN network, an ROI pooling layer, and a classification network Resnet18, and a specific training method and structure of the detection network model will be described in detail later.

In this embodiment, step S4 includes the following steps:

step S4-1, extracting the characteristics of the picture frame through the convolution layer;

step S4-2, using k different rectangle frames to generate region target in the final convolution layer, and using RPN network to generate region target on the feature map;

step S4-3, normalizing the region target, and pooling the region of interest through the ROI pooling layer to obtain a pooling result;

step S4-5, according to the pooling result, using a classification network Resnet18 to perform target classification to obtain a target classification result, and simultaneously performing position regression to obtain target position information;

and step S4-6, taking the target classification result and the target position information as the vehicle detection result.

In step S4-2 of this embodiment, K has a value of 9, and specifically, for each convolution map location, the object scores and regression boundaries suggested for K (3 × 3 ═ 9) regions of various scales (areas) (3 types) and aspect ratios (3 types) at that location are output. (aspect ratio 1:1,1:2,2:1)

In step S5, the vehicle detection result is output.

FIG. 2 is a flowchart of a training process of a detection net model in an embodiment of the invention.

As shown in fig. 2, the detection net model is obtained by training through the following training process:

and step T1, constructing a picture library with labels and tags as a training sample set and a testing sample set.

In step T1 of this embodiment, the open source calibration software label img is used to construct a picture library with labels and tags, and the open source data set is used as the training sample set and the testing sample set. The training sample set also needs to be preprocessed, and the preprocessing process is as follows: the size of each image in the training sample set is first unified to 416x416 (i.e. 416 pixels by 416 pixels), and then the value of each pixel is divided by 255 to normalize the images.

Step T2, constructing a combination model of the fast RCNN model and the Resnet-18 model as an initial combination model.

In step S2 of the present embodiment, a feature map to be finally used for prediction is first obtained: after the pictures are input into the network, the pictures sequentially pass through a series of feature maps obtained by conv + relu layers (only a common classification network on ImageNet is used, in this embodiment, 5 layers of ZF and 16 layers of VGG-16 are adopted), a conv + relu layer is additionally added, and 51 × 39 256 dimensional features (feature maps) are output. Ready for subsequent use to select the propofol and at this point the coordinates can still be mapped back to the artwork.

Second, calculate Anchors: a plurality of region responses are predicted at each feature point on the feature map. The specific method comprises the following steps: each feature point is mapped back to the central point of the receptive field of the original image as a reference point, and then anchors of 9 different scales and aspect ratios are selected around the reference point. Of these, 3 scales (three areas), 3 aspectratio ({1:1,1:2,2:1 }).

And step T3, training the detection net model by using the training sample set until a training completion condition is reached.

In step T3 of this embodiment, a combination model of the fast-RCNN model and the Resnet-18 model is built through the existing deep learning framework Pytorch, and the model construction and training process specifically includes the following steps:

the first step is as follows: initializing by using an ImageNet model, and independently training an RPN network;

the second step is that: still using ImageNet model to initialize, but using the proposal generated by the RPN network in the previous step as input to train a Fast-RCNN network, so far, the parameters of each layer of the two networks are not shared at all;

the third step: initializing a new RPN network by using the Fast-RCNN network parameters in the second step, but setting the learning rate of the convolution layers shared by the RPN and the Fast-RCNN to 0, namely, not updating, only updating the network layers specific to the RPN, and retraining, wherein at the moment, the two networks already share all the common convolution layers;

the fourth step: the network layers which are still fixedly shared are added, the network layer which is specific to Fast-RCNN is formed into a unified network, training is continued, and the network layer which is specific to fine tune Fast-RCNN is formed, at the moment, the network already realizes a network internal prediction characteristic diagram (proposal) and realizes a detection function;

the fifth step: convolutional layer extraction features of Fast-RCNN were fed into the Resnet18 network for training.

In this embodiment, the training completion condition is the same as the conventional training completion condition, that is, the training is considered to be completed after the model converges. At the moment, the test sample set can be input into the trained detection net model for effect detection.

As shown in fig. 3, the training net model includes a convolutional layer, an RPN network, an ROI pooling layer, and a classification network Resnet 18.

The convolutional layer consists of 13 conv layers, 13 relu layers and 4 pooling layers.

The RPN network is used to generate candidate regions (regions).

The ROI pooling layer is used to obtain a fixed size target feature map (generic feature map) according to the candidate region and the feature map obtained by the last layer of VGG 16.

The classification network Resnet18 is used to classify previously obtained target feature maps.

Through the above model training and construction processes, a trained detection net model can be obtained, and step S4 can be executed by using the detection net model, so that detection classification is completed and a vehicle detection result is obtained.

In addition, for practical convenience, corresponding computer programs may be designed according to steps S1 to S5 of the complex scene-oriented vehicle object detection method and packaged into the picture frame acquisition unit, the picture queue generation unit, and the vehicle detection output unit, respectively, so as to form a complex scene-oriented vehicle object detection device capable of completing the recognition task.

In this embodiment, the vehicle target detection device is a computer and has a GTX1080Ti graphics card for GPU acceleration.

Specifically, the function executed by the picture frame acquiring portion corresponds to step S1 in the vehicle object detection method, and is used for acquiring a vehicle monitoring video input by a computer user (or derived from other systems) and a picture frame in the vehicle monitoring video.

The function executed by the picture queue generating section corresponds to step S2 in the vehicle object detection method for performing frame quality analysis on the picture frames using the laplacian variance algorithm and storing the high-quality picture frames with clear frame quality screened into the picture queue.

The vehicle detection output part is also stored with a trained detection net model, and the executed function corresponds to the steps S3 to S5 in the vehicle target detection method, and is used for detecting the picture queue according to the pre-stored detection net model so as to obtain and output a vehicle detection result.

In this embodiment, the output result of the vehicle detection output unit may be displayed on a display of the computer so as to be viewed by a user or output to another system for use.

Examples effects and effects

According to the vehicle target detection method for the complex scene, the frame quality analysis is carried out on each obtained picture frame through the Laplacian variance algorithm, high-quality picture frames with clear quality are preliminarily screened out, the recognition rate is improved, and the picture frames are further recognized through constructing a combined model of a Faster RCNN model and a Resnet-18 model, so that the method has good generalization capability and classification accuracy rate when the picture frames are recognized. Meanwhile, the convolution value of the fast RCNN model and the Resnet-18 model in the combined model is shared, so that repeated convolution operation of the model on the picture frame can be greatly reduced, and the identification efficiency of the model algorithm is further improved. According to the vehicle target detection method and device, accurate and rapid identification of the vehicle target in a complex scene can be realized, so that the vehicle target detection method and device are not only limited to vehicle detection in the traffic road field, but also can be used in the business fields of intelligent security, community vehicle management and the like.

In addition, in the embodiment, the RPN is used for replacing a Selective Search method to generate the suggestion window, so that the number of suggestion frames is greatly reduced, the quality is greatly improved, and the calculation amount required for searching the candidate area is greatly reduced. Meanwhile, the RPN network and the target detection network share convolution, so that the identification efficiency is further improved. The ROI pooling layer is used for mapping the original image area to the feature map and pooling the feature map to a fixed size, so that the calculation amount is greatly reduced. Finally, as the residual error network has better learning and optimizing capabilities, the Resnet18 is used, so that the detection network model disclosed by the invention can better perform target identification and classification.

The above-described embodiments are merely illustrative of specific embodiments of the present invention, and the present invention is not limited to the description of the above-described embodiments.

Claims

1. A vehicle target detection method facing complex scenes is used for recognizing vehicle monitoring videos with complex scenes so as to complete vehicle target detection, and is characterized by comprising the following steps:

step S1, acquiring picture frames from the vehicle monitoring video;

step S2, adopting Laplace variance algorithm to analyze the frame quality of the picture frame and storing the high-quality picture frame with clear frame quality into a picture queue;

step S3, sequentially taking out the high-quality picture frames from the picture queue;

step S4, inputting the high-quality picture frame into a pre-trained detection net model for detection and classification so as to obtain a vehicle detection result;

step S5, outputting the vehicle detection result,

the detection network model is a combination model of a fast RCNN model and a Resnet-18 model, and convolution values of the fast RCNN model and the Resnet-18 model are shared.

2. The complex-scene-oriented vehicle object detection method according to claim 1, characterized in that:

wherein the step S2 includes the following sub-steps:

step S2-1, converting the picture frame into a gray scale image;

step S2-3, calculating the variance according to the convolution calculation result to obtain the variance value;

step S2-5, discarding the corresponding picture frame as a blurred image;

and step S2-6, storing the corresponding picture frame as a high-quality picture frame into the picture queue.

3. The complex-scene-oriented vehicle object detection method according to claim 1, characterized in that:

wherein the detection network model at least comprises a convolutional layer, an RPN network, an ROI pooling layer and a classification network Resnet18,

the convolution layer consists of 13 conv layers, 13 relu layers and 4 posing layers,

the RPN network is used to generate a candidate region,

the ROI pooling layer is used for obtaining a target feature map with a fixed size according to the candidate region and a feature map obtained by the last layer of the convolutional layer,

the classification network Resnet18 is used to classify the target feature map.

4. The complex-scene-oriented vehicle object detection method according to claim 3, characterized in that:

wherein the step S4 includes the following sub-steps:

step S4-1, the picture frame is subjected to feature extraction through the convolution layer;

step S4-2, generating an area target on the final convolutional layer by using the RPN network on a feature map by using k different rectangular frames;

step S4-5, according to the pooling result, using the classification network Resnet18 to perform target classification to obtain a target classification result and perform position regression to obtain target position information;

5. The complex-scene-oriented vehicle object detection method according to claim 1, characterized in that:

the detection net model is obtained through the following model training steps:

step T1, constructing a picture library with labels and tags as a training sample set;

step T2, constructing an initial detection network model;

6. A vehicle target detection device facing complex scenes is used for recognizing vehicle monitoring videos so as to complete vehicle target detection, and is characterized by comprising:

the image frame acquisition part is used for acquiring the vehicle monitoring video and an image frame in the vehicle monitoring video;

the picture queue generating part is used for carrying out frame quality analysis on the picture frames by adopting a Laplace variance algorithm and storing the high-quality picture frames with clear frame quality into a picture queue; and

a vehicle detection output part for detecting the image queue according to a pre-stored detection net model so as to obtain and output a vehicle detection result,

wherein the detection network model is a combination model of a fast RCNN model and a Resnet-18 model, convolution values of the fast RCNN model and the Resnet-18 model are shared,

the vehicle detection output portion completes detection by:

a picture frame obtaining step, namely sequentially taking the high-quality picture frames out of the picture queue;

a picture frame detection step, wherein the high-quality picture frame is input into a detection network model which is trained in advance to be detected and classified so as to obtain a vehicle detection result;

and a detection result output step of outputting the vehicle detection result.