CN116343136A

CN116343136A - Road surface casting detection method based on expressway monitoring video

Info

Publication number: CN116343136A
Application number: CN202310177342.XA
Authority: CN
Inventors: 孙健
Original assignee: Jiangsu Ninghang Expressway Co ltd
Current assignee: Jiangsu Ninghang Expressway Co ltd
Priority date: 2023-02-24
Filing date: 2023-02-24
Publication date: 2023-06-27

Abstract

The invention provides a road surface casting detection method based on expressway monitoring video, which belongs to the technical field of intelligent traffic and comprises the following steps: acquiring a background image data set of a highway section and acquiring a throwing object image data set; combining the casting data set with the road background image to construct a road casting data set; modeling the VIBE background to obtain a background and a foreground; improving and optimizing a YOLOv5 network model; and classifying and detecting the casting matters by using the improved YOLOv5 model. Compared with the method for acquiring the highway scene casting data set by field survey, the method for constructing the highway scene casting data set provided by the invention has the advantages that the time and the cost are greatly saved, and the safety is improved. In addition, the YOLOv5 network is improved, so that target tracking can be performed in real time and efficiently under the side view angle of the road, and meanwhile, the detection precision is improved.

Description

Road surface casting detection method based on expressway monitoring video

Technical Field

The invention belongs to the technical field of intelligent traffic, and particularly relates to a road surface casting object detection method based on a highway monitoring video.

Background

Along with the rapid development of the economy of China, the expressway mileage of China is continuously increased. The traffic flow and the cargo traffic are continuously improved, and meanwhile, the accident rate of the expressway is continuously improved due to the fact that the driving speed of the expressway is high.

Congestion due to traffic accidents causes billions of dollars worldwide in lost productivity, lost property, and personal injury each year. Highway sprinkles are one of the important reasons for disrupting the normal transportation process. Truck cargo leakage and random disposal of refuse by driving personnel on the highway constitute a major source of highway casting. The objects are small in size and not easy to be found by a driver in time, so that the vehicles cannot avoid the vehicles, and traffic accidents are caused.

With the development of informatization, automatic detection of sprinkles in roads has become a necessary condition for intelligent expressway to reduce the probability of traffic accidents and congestion caused by the sprinkles.

At present, the detection methods for throwing five roads on the expressway mainly comprise two types, namely traditional manual inspection and automatic detection. The manual inspection efficiency is low and the coverage rate is low, and the video monitoring popularity of the expressway is higher and higher, so that the automatic detection of the sprinkle through the monitoring video becomes a novel and effective mode.

Disclosure of Invention

The invention provides a road surface casting detection method based on a highway monitoring video, and aims to solve the problem of highway casting detection.

The embodiment of the invention provides a road surface casting detection method based on a highway monitoring video, which comprises the following steps:

s1: a base data set is constructed. And acquiring a highway monitoring video, acquiring a road section background image data set, and downloading a corresponding casting image data set from the ImageNet.

S2: and (5) image fusion. And overlapping the center of the throwing object with the randomly selected pavement area in the background image, and then pasting the overlapping pavement area into the scene image to generate a composite image, so as to construct the highway scene throwing object data set.

S3: VIBE background modeling. The background and foreground are acquired.

S4: and constructing a neural network. And (5) building a neural network model based on the YOLOv5 network improvement.

S5: and (5) training an optimization model. Inputting the casting data set into a neural network model for training, and optimizing the model according to the training result to obtain the training weight and the classification result of the casting detection model.

S6: and detecting the casting matters. And detecting the casting object by using the trained deep learning network.

Preferably, if the throwing object is put in the actual scene, obtaining a large number of images is dangerous and expensive, and the highway throwing object image formed by fusing the actual background image and the throwing object image is safe and efficient.

Preferably, the highway monitoring video is at a fixed viewing angle, and the picture change is relatively small, so that the background modeling method is adopted, and VIBE is an algorithm for simulating the background and detecting the foreground.

Preferably, the neural network is built using the YOLOv5 model, and two improvements are made on the YOLOv5 network. One is to reduce the size of the convolution kernel to 1 x 1. Thus, the forward convolution layer can obtain smaller scale kernels without increasing the number of parameters, and the backward convolution layer can build higher level features on this basis, such as structural features on the level of edges, shapes, and object types. Another improvement is to add convolutional connections between different convolutional layers, such as a jump connection on the res net.

The beneficial effects of the invention are as follows:

1. according to the invention, road casting objects are detected and identified based on the expressway monitoring video, so that the labor cost is reduced, and the detection efficiency is improved.

2. According to the invention, the background picture obtained by the expressway video monitoring is fused with the open-source throwing five-picture data set, so that throwing objects do not need to be manually applied to an actual road surface, the labor cost is greatly saved, and the method is safer and more efficient.

3. The present invention makes two improvements over the YOLOv5 network, including reducing the size of the convolution kernel to 1 x 1 and using a jump connection in the ResNet to enhance the ability to observe object details, increasing computational power. Conventional datasets are taken from the front and highway video surveillance is at a relatively low angle with more complex brightness and shadows, which makes it difficult to analyze details in images for convolutional layers in improved YOLOv5 networks, which have improved ability to view image details, enabling good analysis of image details.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly pointed out in the written description and drawings.

Drawings

The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:

FIG. 1 is a flow chart of a method for detecting road casting based on expressway monitoring video;

FIG. 2 shows three conditions for the VIBE model update of the present invention;

FIG. 3 is a flow chart of the improved YOLOv5 network detection of the present invention;

fig. 4 is a flowchart of an implementation of the method for detecting road casting based on the expressway monitoring video.

Detailed Description

In order to make the objects, technical solutions and advantages of the technical solutions of the present invention more clear, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings of specific embodiments of the present invention. It should be noted that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be made by a person skilled in the art without creative efforts, based on the described embodiments of the present invention fall within the protection scope of the present invention.

The invention designs a road surface casting detection method based on a highway monitoring video, and a flow chart of the method is shown in figure 1 and comprises the following steps:

step 1: a base data set is constructed. And obtaining a road section background image based on the expressway monitoring video, and constructing an expressway scene data set. Downloading corresponding throwing object images from the ImageNet data set, supplementing samples aiming at the throwing object characteristics of the expressway, and constructing a throwing object data set covering ten categories, wherein the specific categories are as follows: boxes, cartons, papers, bottles, bags, roadblocks, stones, sand, plastic bags and wraps.

Step 2: and (5) image fusion. And overlapping the center of the throwing object with the randomly selected pavement area in the background image, and then pasting the overlapping pavement area into the scene image to generate a composite image, so as to construct the highway scene throwing object data set. The generated image is manually inspected to resize the image of the projectile so that the entire projectile is contained within the pavement area pixels. And finally, manually checking the synthesized images, and eliminating synthesized images which do not accord with the natural scene. And the type, size and position of the object thrown by each image are marked.

Step 3: VIBE background modeling. Firstly, initializing a background model of an input video frame without vehicle running, establishing a background model M (x, y) containing N samples for each pixel (x, y), forming a sample sampling space NB (x, y) by the pixel points and 8 neighborhood pixels, randomly selecting the pixel points in NB (x, y) to initialize the background model, wherein the initialization calculation of M (x, y) is shown as a formula:

M(x，y)＝{v ₁ (x,y)，v ₂ (x,y)...v _n (x,y)}

the MB (x, y) initialization calculation is shown in the following figure:

M _B (x,y)＝{v _i (x,y)|(x,y)∈N _B (x,y)}

where the i-th sample value in the sample space sample set is set to 20, and N is the number of samples. And comparing the pixel value in the current image with the established background model to distinguish whether the pixel point is a foreground target pixel point. Meanwhile, the VIBE model has three different model update strategies, as shown in FIG. 2. The VIBE model has low complexity, short model initializing time and capability of automatically updating and generating a new background in time when the background changes.

Step 4: and constructing a deep learning network. Two improvements were made on the basis of the YOLOv5 network. One is to reduce the size of the convolution kernel to 1 x 1 as shown in fig. 3. Thus, the forward convolution layer can obtain smaller scale kernels without increasing the number of parameters, and the backward convolution layer can build higher level features on this basis, such as structural features on the level of edges, shapes, and object types. Another improvement is to add convolutional connections between different convolutional layers, such as jump connections on the res net, so that some layers do not emphasize the output results of the previous layer too much, but rather emphasize all previous outputs. The information transferred by the model improved with the traditional YOLOv5 model is simpler due to the removal of the burden on the convolutional layer holding data from the upper layer. The network performs global computation on the complete image and all objects in the image. The modified YOLOv5 model divides the input image into an S x S grid, which will identify the object if any part of the object is within the grid cell. Each network cell predicts B bounding boxes and the confidence scores of those bounding boxes. These consistency scores reflect the consistency with which the prediction box contains an object in the model and the accuracy with which it considers the prediction box. Each prediction block uses 5 parameters: x, y, w, h and confidence score. The (x, y) coordinates represent the position of the center of the prediction box relative to the boundary of the image. w and h represent the predicted weight and height relative to the size of the entire image. The confidence score is represented as the intersection of the IOU and Pr (SPILLED LOADS). IOU (Intersection over Union), is the ratio of the intersection of the prediction box and any ground truth box to the union.

The calculation formula of the IOU is as follows:

pr (SPILLED LOADS) indicates the likelihood of inclusion of a casting in the prediction box. If the prediction box contains a casting, pr (SPILLED LOADS) =1, otherwise Pr (SPILLED LOADS) =0. The trust score is set to IOU x Pr (SPILLED LOADS).

The conditional probability of a class is multiplied by the consistency prediction of a single prediction block,

Pr(Classi|SPILLED LOADS)×Pr(SPILLED LOADS)×IOU＝Pr(Classi×IOU)。

it provides a class-specific confidence score for each prediction box that represents the probability that a particular class of casting will appear in the prediction box and the impact of the prediction box on the casting.

Each grid cell outputs a (bx 4+1+c) prediction. These predictions are encoded as sxs× (bx5+1+c). Improved YOLOv5 model.

A Non-maximum suppression (Non-Maximum Suppression) algorithm is used to identify each category separately. First, the detected image of each bounding box sets a merge threshold, and training is performed on the composite dataset, resulting in a convolution, and values less than the threshold are returned to the layer parameters. Then, the box in the category with the highest confidence is extracted. The IOU of the prediction block and the remaining blocks are continuously calculated using NMS method. If its value is greater than the threshold, the trusted threshold is set to 0. Finally, each determination box category takes the category with the output threshold value not being 0 as the identification result.

The convolutional layer of the modified YOLOv5 model is trained on the synthetic dataset to obtain convolutional layer parameters. The classification model uses 20 convolutional layers of the first YOLOv5 model. The developed model is then written to a cfg file, where the convolutional layer parameters are put in. The parameters of the model are then optimized using a non-maximal suppression (NMS) method. After multiple optimizations, the calculation accuracy was tested on the validation dataset. The optimization is continued until the accuracy stops when it no longer increases.

Step 5: and detecting the casting matters. And inputting the pictures to be detected by using the YOLOv5 configuration file, the python calling interface and the detection weight file generated by training, and then performing target detection to obtain information such as the type, size, position and confidence of the throwing object of each picture, wherein the specific detection steps are shown in fig. 4.

It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. The road surface casting detection method based on the expressway monitoring video is characterized by comprising the following steps of:

s1: constructing a basic data set; acquiring a highway monitoring video, acquiring a road section background image data set, and downloading a corresponding throwing object image data set from an ImageNet;

s2: fusing images; overlapping the center of the throwing object with a randomly selected pavement area in the background image, and then pasting the overlapping center of the throwing object into the scene image to generate a composite image, so as to construct a highway scene throwing object data set;

s3: modeling a VIBE background; acquiring a background and a foreground;

s4: constructing a network; building a neural network model based on the YOLOv5 network improvement;

s5: training an optimization model; inputting the casting data set into a neural network model for training, and optimizing the model according to the training result to obtain the training weight and the classification result of the casting detection model;

s6: detecting a casting object; and detecting the casting object by using the trained deep learning network.

2. The method for detecting road surface casting based on expressway monitoring video according to claim 1, wherein the method comprises the steps of: the throwing object image of the road surface is formed by manually combining the expressway monitoring image and the throwing object image.

3. The method for detecting road surface casting based on expressway monitoring video according to claim 1, wherein the method comprises the steps of: the content of the data label comprises the category, the size and the position of the throwing object.

4. The method for detecting road surface casting based on expressway monitoring video according to claim 1, wherein the method comprises the steps of: the VIBE background modeling firstly initializes a background model of an input video frame without vehicle running, and then compares a pixel value in a current image with the established background model to distinguish whether the pixel point is a foreground target pixel point. The VIBE model has low complexity, short model initializing time and capability of automatically updating and generating a new background in time when the background changes.

5. The method for detecting road surface casting based on expressway monitoring video according to claim 1, wherein the method comprises the steps of: the deep learning network is constructed, and two improvements are made on the basis of the YOLOv5 network; one is to reduce the size of the convolution kernel to 1 x 1 and another improvement is to increase the convolution connection between different convolution layers.

6. The method for detecting road surface casting based on expressway monitoring video according to claim 1, wherein the method comprises the steps of: the network carries out global calculation on the complete image and all objects in the image; confidence scores are expressed as IOU and Pr; is a complex of the intersection of (a) and (b); IOU is the ratio of the intersection of the prediction box and any ground truth box to the union; the calculation formula of the IOU is as follows:

pr represents the possibility of containing a throwing object in the prediction frame; pr=1 if the prediction box contains a casting, else pr=0.

7. The method for detecting road surface casting based on expressway monitoring video according to claim 1, wherein the method comprises the steps of: the convolutional layer of the modified YOLOv5 model is trained on the synthetic dataset to obtain convolutional layer parameters; the classification model adopts 20 convolution layers of a first YOLOv5 model; then, writing the developed model into a cfg file, wherein the convolutional layer parameters are put in; the parameters of the model are then optimized using a non-maximal suppression (NMS) method. The calculation accuracy is tested on the verification data set through multiple times of optimization; the optimization is continued until the accuracy stops when it no longer increases.