CN116883956A

CN116883956A - Vehicle target detection method suitable for night highway monitoring scene

Info

Publication number: CN116883956A
Application number: CN202310872870.7A
Authority: CN
Inventors: 赵敏; 孙棣华; 夏龙强
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2023-07-17
Filing date: 2023-07-17
Publication date: 2023-10-13

Abstract

The application discloses a vehicle target detection method suitable for a night highway monitoring scene, which comprises the following steps: extracting a monitoring video of a highway, and acquiring an image to be detected; improving a backbone network structure of the FCOS deep learning model, and extracting a feature map; selecting a feature pyramid network FPN fusion feature map to obtain a multi-scale feature map; performing target classification and bounding box regression on the multi-scale feature map to obtain a detection target; training a target detection model; and detecting the vehicle target under the monitoring video by using the trained target detection model. The method is suitable for vehicle target detection in night highway monitoring scenes, can effectively improve the accuracy of vehicle target feature extraction at night, better cope with the conditions of large target scale change, multiple interference factors and the like in the scenes, and can improve the detection accuracy while guaranteeing the detection efficiency.

Description

Vehicle target detection method suitable for night highway monitoring scene

Technical Field

The application belongs to the technical field of image processing, and particularly relates to a vehicle target detection method suitable for a night highway monitoring scene.

Background

Vehicle detection plays an important role in traffic information acquisition, safety management, and the like. The expressway is taken as an important component of a road traffic system, and the operation safety and smoothness of the expressway are important for improving the traffic and transportation efficiency and promoting the regional economic development. Because the traffic flow on the expressway is large and the running speed of the vehicle is high, once a traffic accident occurs, serious life and property losses are extremely easy to cause, and the consequences are not considered. Particularly, under the night condition, due to the factors of insufficient ambient light, blocked sight of drivers and the like, the probability of traffic accidents on the expressway is greatly increased, and the accident consequences are more serious. The real-time reliable expressway vehicle detection system is established, is beneficial to timely capturing abnormal behaviors of vehicle individuals and abnormal states of traffic, and is a foundation for promoting expressway driving safety and realizing management and control decision and emergency response integration.

Compared with the traditional vehicle detection mode, the video-based detection mode has the advantages of simplicity in installation and maintenance, no influence on the service life of a road surface, low cost, wide coverage range, rich perception information and the like, and can promote the development of a traffic monitoring system towards an intelligent direction. However, most existing video vehicle detection methods are directed to daytime scenes with sufficient light and wide field of view, and their performance is compromised once in a severe night scene.

The existing night vehicle target detection methods can be roughly classified into four categories: 1) Based on the motion information, the basic idea of the method is to extract a moving vehicle target by utilizing the difference between the pixel value change speeds of a motion foreground area and a fixed background area in continuous video frames, wherein the method mainly comprises an optical flow method, an inter-frame difference method, a background difference method and the like; 2) Based on the matching of the car lights, the method firstly performs threshold segmentation and morphological operation on the video image according to the difference of the pixel area of the car lights and other areas on gray values to extract the car light area, and then effectively combines car lights belonging to the same target according to a series of pairing criteria such as brightness, size, symmetry and the like, so as to realize the detection of the car at night; 3) Based on machine learning, the method generally utilizes the characteristics of the edges, the shapes, the colors and the like of the vehicle region in the image to describe the vehicle target, and mainly comprises three steps of feature extraction, classifier training and target detection; 4) Based on deep learning, the method utilizes structures such as Convolutional Neural Network (CNN) and the like to automatically extract deep features of an input image, and completes two subtasks of classification and positioning according to the extracted features, wherein the implementation process generally comprises a training stage and a testing stage.

At night, due to the extreme lack of ambient illumination and the appearance of noise interference, the contrast and visibility of the highway monitoring image are extremely poor, and the appearance characteristics and detail information of the vehicle in the highway monitoring image are extremely fuzzy, so that great difficulty is brought to vehicle detection. The vehicle detection task in the night expressway scene has the problems of various backgrounds, various target scales and the like, and has the problems of low target feature identification degree, multiple interference factors and the like. The detection accuracy of the conventional computer vision algorithm on the task does not reach a more ideal level yet and needs further strengthening.

Therefore, a method for detecting a vehicle target suitable for a night highway monitoring scene is needed, so that the detection accuracy of the vehicle target in the night highway monitoring scene is improved, and the method has important practical significance for managing and controlling the vehicles on the highway.

Disclosure of Invention

In view of the above, the present application is directed to a vehicle target detection method suitable for a night highway monitoring scene. The application aims to solve the problems of poor night detection effect and low precision of the existing night vehicle target detection method.

In order to achieve the above object, the present application provides a vehicle target detection method suitable for a night highway monitoring scene, comprising the steps of:

s1, extracting a monitoring video of a highway to obtain an image to be detected;

s2, improving a backbone network structure of the FCOS deep learning model, and extracting features of an image to be detected to obtain a feature map;

s3, selecting a feature pyramid network FPN to fuse the feature images to obtain an enhanced multi-scale feature image;

s4, carrying out target classification and bounding box regression on the multi-scale feature map to obtain a detection target;

s5, self-making a data set, and training a target detection model;

s6, detecting a vehicle target under the monitoring video by using the trained target detection model.

Further, in the step S2, resNet-101 is selected as a backbone network of the FCOS deep learning model.

Further, residual units of Conv3 to Conv5 phases of the res net-101 backbone network introduce SE attention modules.

Further, in step S3, the feature graphs are fused from top to bottom through the feature pyramid network FPN, so as to obtain an enhanced multi-scale feature graph.

Further, the step S4 includes the following substeps:

s4.1, sequentially inputting the multi-scale feature map into two parallel detection branches, classifying and carrying out regression prediction on each sample point on the multi-scale feature map, and obtaining a prediction frame with class confidence after decoding and reduction;

and S4.2, carrying out post-processing operation on all the prediction frames to obtain a detection target.

Further, the step S5 includes the following substeps:

s5.1, collecting a monitoring video of the expressway, and dividing a training set;

s5.2, training a deep learning model based on an FCOS algorithm by taking a current frame image of a video sequence obtained from a training set as input and taking a target classification result and a target regression result as output to obtain an initial target detection model;

s5.3, on the training data set, calculating a loss value of the target detection model according to the predicted output and the expected output of the target detection model, training a plurality of epochs on the initial target detection model by using an Adam optimizer until a stopping condition is met, and obtaining and storing a final target detection model and the weight of the target detection model.

Further, in the step S6, the vehicle target is detected on a frame-by-frame basis using the weighted target detection model for the input surveillance video stream data.

The application also provides a vehicle target detection device suitable for the night highway monitoring scene, which comprises an image extraction module and a target detection module, wherein the target detection module comprises a feature extraction module, a feature fusion module and a classification-regression module;

the image extraction module is used for extracting the monitoring video to obtain an image to be detected of the current frame;

the target detection module is used for processing the image to be detected to obtain a detection target;

the feature extraction module is used for extracting features of the image to be detected to obtain a feature map;

the feature fusion module is used for fusing the feature images to obtain an enhanced multi-scale feature image;

and the classification-regression module is used for carrying out target classification and bounding box regression on the multi-scale feature map to obtain a detection target.

The application has the beneficial effects that:

according to the vehicle target detection method suitable for the night highway monitoring scene, resNet-101 is selected as a basic backbone, and an SE attention mechanism is introduced into a residual unit of the basic backbone to enhance the characteristic expression capability and the anti-interference capability of a network, so that the extraction effect of a target detection model on the vehicle target characteristics is fully improved, and the prediction precision of the classification and regression processes is further improved; the positioning quality of the boundary box output by the regression branch is measured by introducing an intersection ratio (IoU) label, the positioning quality estimation and the class prediction vector are jointly expressed (Classification-IoU joint score vector is directly predicted by using the Classification branch), and meanwhile, the training of the Classification branch is optimized by introducing a new loss function, so that the detection precision and efficiency of the target detection model on the multi-scale vehicle target are effectively improved. The method can effectively improve the accuracy of extracting the target features of the vehicle at night, better cope with the conditions of large target scale change, multiple interference factors and the like in the scene, and improve the detection accuracy while ensuring the detection efficiency.

Additional advantages, objects, and features of the application will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the application. The objects and other advantages of the application may be realized and obtained by means of the instrumentalities and combinations particularly pointed out in the specification.

Drawings

FIG. 1 is a flow chart of a method for detecting a vehicle target for a night highway monitoring scenario according to the present application;

FIG. 2 is a schematic diagram of residual units of ResNet-101 that introduce the SE attention mechanism;

FIG. 3 is a schematic diagram of a detection head based on a joint representation of classification-positioning quality estimation;

fig. 4 is a flowchart of a night vehicle detection apparatus according to the present application.

Detailed Description

In order to make the technical scheme, advantages and objects of the present application more clear, the technical scheme of the embodiment of the present application will be clearly and completely described below with reference to the accompanying drawings of the embodiment of the present application. It will be apparent that the described embodiments are some, but not all, embodiments of the application. All other embodiments, which can be obtained by a person skilled in the art without creative efforts, based on the described embodiments of the present application belong to the protection scope of the present application.

The method and the system have the advantages that the problems of difficult extraction of the vehicle target characteristics and large scale change are required to be considered when the vehicle detection is carried out in the night highway monitoring scene, and the vehicle target detection in the night highway monitoring scene is realized on the basis of the FCOS deep learning model. First, for the backbone network part of the FCOS model, resNet-101 is selected as a basic backbone network, and attention mechanisms are introduced into residual units of the basic backbone network for strengthening multi-scale feature expression capability and noise suppression capability of the model. And secondly, adopting a feature pyramid network FPN to perform fusion enhancement on the feature graphs extracted by the backbone network so as to cope with variable scale detection of the vehicle target. And finally, classifying and regressing the enhanced multi-scale feature map by adopting a detection head based on classification-positioning quality estimation joint representation, and finally obtaining a detection result of the vehicle target.

As shown in fig. 1, the present application provides a vehicle target detection method suitable for a night highway monitoring scene, comprising:

s5, self-making a data set, and training a target detection model;

S2, improving a backbone network structure of the FCOS deep learning model, and extracting features of an image to be detected to obtain a feature map, wherein the method comprises the following specific steps of:

extracting features of the image to be detected through a ResNet-101 network; the short circuit connection (Shortcut Connection) architecture of the ResNet-101 can combine the image detail information extracted by the shallow layer convolution layer with the semantic information extracted by the deep layer convolution layer, so that the problem that a deep layer network cannot converge due to gradient disappearance is effectively solved, a good characteristic representation effect is achieved, and a corresponding characteristic diagram is obtained after an input image passes through the ResNet-101 network; wherein, the residual unit of ResNet-101 is in the form of:

H(x)＝F(x)+x

wherein x is an input feature; h (x) is an output feature; f (x) represents a conventional convolution operation;

considering that the ResNet-101 backbone network has a defect of insensitivity to a vehicle target in a night highway monitoring scene including noise interference such as Light Effects (Light Effects), as shown in fig. 2, the embodiment is used for modeling a characteristic channel relation by introducing an SE attention module into a residual unit from Conv3 to Conv5 stages of ResNet-101, so that the network can adaptively adjust characteristic responses among channels, and thereby influence of image noise on a target detection task is weakened.

The workflow of the SE (Squeeze-and-expression) attention mechanism is: 1) Compressing the input feature map U using a global averaging pooling (Global Average Pooling) operation to generate a feature map z (channel descriptor) with global spatial information; 2) Carrying out self-adaptive recalibration on z to obtain importance degree information s (channel weight) of each channel; 3) Rescaling (Scale) the feature map U using s to obtain an output feature map X. The above procedure can be expressed as:

s＝F _ex (z,W)＝σ(g(z,W))＝σ(W ₂ δ(W ₁ z))

X＝F _scale (u _c ,s _c )＝s _c ·u _c

wherein u is _c A c-th channel representing a feature map U; h and W represent the height and width of U, respectively; w (W) ₁ And W is ₂ Representing two fully connected layers; sigma and delta represent Sigmoid and ReLU activation functions, respectively; the weight vectors s and U are subjected to channel level multiplication operation to generate a final weighted feature diagram X.

S3, selecting a feature pyramid network FPN to fuse the feature graphs to obtain an enhanced multi-scale feature graph, wherein the method comprises the following specific steps of:

and (3) carrying out top-down fusion on the feature graphs through a feature pyramid network FPN to obtain multi-scale enhanced features.

Specifically, aiming at the problem that a vehicle target in a monitoring scene has unbalanced scale, the embodiment selects a feature pyramid structure FPN to complete the preliminary enhancement of a feature space, and the structure forms a new feature map by fusing low-level features of two adjacent stages and high-level features after double up-sampling operation.

S4, performing object classification and bounding box regression on the multi-scale feature map to obtain a detection object, wherein the method specifically comprises the following sub-steps:

s4.1, sequentially inputting the multi-scale feature map into two parallel detection branches, classifying and carrying out regression prediction on each sample point on the multi-scale feature map, and obtaining a prediction frame with class confidence after decoding and reduction, as shown in FIG. 3.

And S4.2, performing post-processing operations such as confidence threshold filtering, non-maximum suppression (NMS) and the like on all the prediction frames, and finally obtaining the detection target.

S5, self-making a data set, training a target detection model, and specifically, the method comprises the following steps:

in this embodiment, the target detection model is a deep learning model based on FCOS algorithm. Before training the target detection model, a relevant data set needs to be made and divided into a training set and a testing set according to a certain proportion, and the proportion of the two sets is 3:1 in the embodiment.

Training an initial network by taking a current frame image as input and taking a target classification result and a target regression result as output to obtain an initial target detection model;

and on the training data set, calculating a loss value of the target detection model according to the predicted output and the expected output of the target detection model, training a plurality of epochs on the network by using an Adam optimizer until a stopping condition is met, and obtaining and storing a final target detection model and the weight of the target detection model.

In the training process, a target detection loss function is adopted to supervise and learn the network, and parameters of the model are continuously iterated and optimized until the model converges. The objective detection loss function consists of classification loss and regression loss weighting to achieve joint training of classification and regression:

wherein (x, y) represents coordinates of any sample point on the feature map; p (P) _x,y A class prediction vector for the network for the sample points (x, y); t is t _x,y Regression bounding boxes for the network to sample points (x, y); ". Times." indicates that the corresponding variable is the desired value; l (L) _cls (Classification loss) is GeneralizedFocalLoss (GFL), L _reg (regression loss) GIoULSS, N _pos Indicating the number of positive samples. Lambda (lambda) _reg Is L _reg In the present embodiment, lambda is taken as the balance weight of (a) _reg =1. f is an indication function whenF is 1 when (i.e., the corresponding sample point is a positive sample), otherwise it is 0. The training Loss is obtained by summing the Loss values at all sample points on the feature map. The typical form of GFL is:

GFL(σ)＝-|y-σ| ^β ((1-y)log(1-σ)+ylog(σ))

wherein y e [0,1] represents a continuous class confidence label; sigma is the class confidence of the actual prediction of the classified branch; beta.gtoreq.0, which is an adjustable focusing parameter, is used to control the rate of decrease of the loss weight of a simple sample, in this example beta=2. GFL can automatically reduce the loss weight of easily predictable simple samples during training, focusing the training of the network on difficult samples that are not easily predictable, thereby improving the training effect of the model.

After training of the target detection model is completed, loading model weights obtained by training by using the designed target detection model; aiming at the input night highway monitoring video stream data, carrying out frame-by-frame vehicle target detection.

According to the application, resNet-101 is selected as a basic backbone, and an SE attention mechanism is introduced into a residual unit of the ResNet-101 to enhance the characteristic expression capability and the anti-interference capability of the network, so that the extraction effect of a target detection model on the target characteristics of the vehicle is fully improved, and the prediction precision of the classification and regression processes is further improved; the positioning quality of the boundary box output by the regression branch is measured by introducing an intersection ratio (IoU) label, the positioning quality estimation and the class prediction vector are jointly expressed (Classification-IoU joint score vector is directly predicted by using the Classification branch), and meanwhile, the training of the Classification branch is optimized by introducing a new loss function, so that the detection precision and efficiency of the target detection model on the multi-scale vehicle target are effectively improved. Then training a designed target detection model on the manufactured night highway vehicle target data set until the model converges, and storing corresponding model weights; and the vehicle target detection under the night highway monitoring scene can be realized based on the target detection model and the weight obtained by training. The method can effectively improve the accuracy of extracting the target features of the vehicle at night, better cope with the conditions of large target scale change, multiple interference factors and the like in the scene, and improve the detection accuracy while ensuring the detection efficiency.

As shown in fig. 4, the present application further provides a night vehicle target detection device, which includes an image extraction module and a target detection module, where the target detection module includes a feature extraction module, a feature fusion module, and a classification-regression module:

Note that, the night vehicle target detection device is substantially the same as the specific embodiment of the vehicle detection method, and will not be described here again.

Finally, it is noted that the above embodiments are only for illustrating the technical solution of the present application and not for limiting the same, and although the present application has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solution of the present application may be modified or substituted without departing from the spirit and scope of the technical solution, and the present application is intended to be covered in the scope of the present application.

Claims

1. A vehicle target detection method suitable for a night highway monitoring scene, comprising the steps of:

s5, self-making a data set, and training a target detection model;

2. A vehicle object detection method for a night highway monitoring scene according to claim 1, wherein: in the step S2, resNet-101 is selected as a backbone network of the FCOS deep learning model.

3. A vehicle object detection method for a night highway monitoring scene according to claim 2, wherein: residual units of Conv3 to Conv5 phases of the res net-101 backbone network introduce SE attention modules.

4. A vehicle object detection method for a night highway monitoring scene according to claim 1, wherein: in step S3, the feature graphs are fused from top to bottom through the feature pyramid network FPN, so as to obtain an enhanced multi-scale feature graph.

5. A vehicle object detection method for a night highway monitoring scene according to claim 1, wherein: said step S4 comprises the sub-steps of:

6. A vehicle object detection method for a night highway monitoring scene according to claim 1, wherein said step S5 comprises the sub-steps of:

7. The method according to claim 6, wherein in step S6, the input surveillance video stream data is subjected to frame-by-frame detection of the vehicle target using a weighted target detection model.

8. The vehicle target detection device suitable for the night highway monitoring scene is characterized by comprising an image extraction module and a target detection module, wherein the target detection module comprises a feature extraction module, a feature fusion module and a classification-regression module;