CN114299438A - Tunnel parking event detection method integrating traditional parking detection and neural network - Google Patents

Tunnel parking event detection method integrating traditional parking detection and neural network Download PDF

Info

Publication number
CN114299438A
CN114299438A CN202111665332.8A CN202111665332A CN114299438A CN 114299438 A CN114299438 A CN 114299438A CN 202111665332 A CN202111665332 A CN 202111665332A CN 114299438 A CN114299438 A CN 114299438A
Authority
CN
China
Prior art keywords
vehicle
sample
identification model
parking
vehicle identification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111665332.8A
Other languages
Chinese (zh)
Inventor
宋永端
陈欢
庞思袁
凌凯
赵梦雯
卫佳
王攀
程霜雄
魏大创
廖昕怡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
DIBI Chongqing Intelligent Technology Research Institute Co Ltd
Star Institute of Intelligent Systems
Original Assignee
Chongqing University
DIBI Chongqing Intelligent Technology Research Institute Co Ltd
Star Institute of Intelligent Systems
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University, DIBI Chongqing Intelligent Technology Research Institute Co Ltd, Star Institute of Intelligent Systems filed Critical Chongqing University
Priority to CN202111665332.8A priority Critical patent/CN114299438A/en
Publication of CN114299438A publication Critical patent/CN114299438A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention relates to a tunnel parking event detection method integrating traditional parking detection and a neural network, which comprises the steps of collecting driving videos from cameras under different scenes of a highway tunnel, obtaining pictures in the videos and marking the pictures to obtain a VOC data set; clustering the pictures in the VOC data set to obtain the most suitable size of the vehicle target boundary frame of each vehicle type, and taking the size as the Anchor size in the SSD neural network; constructing and training a vehicle identification model based on an SSD neural network to obtain an optimal vehicle identification model; and inputting a section of video to be detected into a traditional parking detection algorithm to obtain a corresponding video frame picture with a fixed foreground target picture as a picture to be predicted, and inputting the picture to be predicted into an optimal vehicle identification model to obtain a judgment result. Compared with the traditional parking event detection algorithm, the method has higher accuracy.

Description

Tunnel parking event detection method integrating traditional parking detection and neural network
Technical Field
The invention relates to the technical field of tunnel parking inspection, in particular to a tunnel real-time parking event detection method fusing traditional parking detection and an SSD neural network.
Background
The tunnel is a bottleneck road section of road traffic, and the brightness difference, the environmental difference and the like inside and outside the tunnel can cause certain influence on the road traffic safety, and especially in the illegal parking event in the tunnel, the problems of personal casualties, traffic jam and the like can be caused. At present, a parking detection system completely suitable for a tunnel scene does not exist, the false detection rate is high based on the traditional parking detection algorithm, and the requirements of real-time performance and accuracy of tunnel parking event detection cannot be completely met. The deep learning algorithm can extract deep features of the target, effectively solves the problem of vehicle identification in a complex scene, and has good effects on real-time performance and accuracy of target detection. Therefore, the tunnel parking event is detected by using a deep learning method, and the problem of high false detection rate of the traditional parking detection algorithm can be effectively solved.
In the field of parking event detection, how to reduce the false detection rate of a parking event as much as possible while ensuring the detection in time is a problem to be solved. The current video-based parking event detection method is divided into a deep learning-based method and a traditional parking detection-based method.
Based on the traditional parking detection method, the foreground target change of the image area is sensed through background modeling, and whether parking behaviors exist is judged through related constraint conditions. The method is applied to the electronic science and technology university and is a illegal parking detection method based on background modeling (CN107491753A), and vehicle detection is directly carried out in a background image obtained by modeling through background modeling. The invention does not perform the detection of moving objects, thus eliminating the interference of moving objects in actual video frames. Based on the fact that objects appearing in the background image are most likely stationary or slowly moving vehicle targets, disturbances to the environment, such as lighting, shadows, etc., can be mistakenly detected as parking vehicles, resulting in false detection of parking event detection. Under the tunnel scene, because of the characteristics of darkness and changeable illumination, the feature extraction of the traditional parking detection algorithm becomes more difficult, so that the detection accuracy is greatly reduced, and the detection of the parking event cannot be accurately finished under the tunnel scene by the traditional parking detection algorithm.
The deep learning can simulate the complicated hierarchical cognitive law of the human brain, extract the deep level characteristics of the target and effectively solve the vehicle identification problem in a complicated scene. The method comprises the steps of setting a parking detection area, detecting a current frame vehicle and recording vehicle detection frame information, comparing intersection and comparison of the current vehicle detection frame information and historical vehicle detection frame information, and determining parking behavior if the intersection and comparison is greater than a threshold value and the vehicle stagnation time exceeds a set threshold value (CN 107609491A). If a vehicle is slowly driven in the detection area or a condition that a plurality of vehicles are slowly driven exists, the intersection ratio of the vehicles is larger than a threshold value, and the method can generate false detection of parking event detection. The method adopts a background difference method to extract a vehicle target foreground and preprocess, obtains a suspected static target area by tracking the speed of an estimated target in a short time, detects whether vehicles exist in the suspected static target area by adopting a deep learning method for images of the suspected static target area, and judges as a parking event if the vehicle target is detected in the static target area. However, in a tunnel scene, tunnel illumination, frequent flashing of vehicle lights when vehicles are stationary, and under the condition that multiple vehicles are shielded from each other, the problems of ID jump, missing detection, false detection and the like can be caused to vehicle tracking, and further, false detection of a vehicle parking event can be caused.
Disclosure of Invention
Aiming at the problems in the prior art, the technical problems to be solved by the invention are as follows: the traditional parking detection algorithm is high in false detection rate and cannot be applied to the technical problem of a tunnel scene.
In order to solve the technical problems, the invention adopts the following technical scheme: the tunnel parking event detection method fusing the traditional parking detection and the neural network comprises the following steps:
s1: collecting driving videos from cameras in different scenes of the expressway tunnel, intercepting and storing pictures according to a fixed frame rate, and collecting a plurality of pictures as a data set;
marking the vehicle target in each picture in the data set by adopting an image marking tool, wherein the marking content comprises the vehicle type and the coordinate value of a boundary frame surrounding the vehicle target, and all marked pictures are used as a first sample to form a VOC data set;
s2: clustering the boundary box surrounding the vehicle target marked in each picture in the VOC data set obtained in the S1 to obtain the most suitable size of the boundary box of the vehicle target for each vehicle type, and taking the most suitable size of the boundary box of the vehicle target for each vehicle type as the size of the Anchor in the SSD neural network;
s3: the method comprises the following steps of constructing and training a vehicle identification model based on the SSD neural network, wherein the structure of the vehicle identification model is as follows:
selecting VGG16 as a backbone network by using a small size convolution kernel instead of a large size convolution kernel;
and the following modifications are made to VGG 16: removing the fully-connected layer which is finally used for classification of the VGG16, changing the fully-connected layers fc6 and fc7 into convolutional layers Conv6 and Conv7 correspondingly to the remaining two fully-connected layers, then additionally adding 4 convolutional layers named as Conv8_2, Conv9_2, Conv10_2 and Conv11_2, and finally selecting 6 convolutional layers of Conv4-3, Conv7, Conv8-2, Conv9-2, Conv10-2 and Conv11-2 to form a feature pyramid multi-scale detection structure;
taking the Anchor size in the SSD neural network obtained in the S2 as the size of the vehicle identification model, and training the vehicle identification model by adopting the first sample to obtain an optimal vehicle identification model;
s4: inputting a section of video to be detected into a traditional parking detection algorithm, taking an obtained corresponding video frame picture with a fixed foreground target picture as a picture to be predicted, and transmitting the picture to be predicted as input to an optimal vehicle identification model;
s5: and the optimal vehicle identification model obtains whether the vehicle target in the video to be detected is a tunnel parking event through two judgments.
As an improvement, the step of clustering the images in the VOC data set to select an appropriate Anchor size in the SSD neural network in S2 is as follows:
s21: extracting length and width dimension information of a bounding box surrounding the vehicle target in each first sample
S22: taking the length and width dimension information surrounding the vehicle target boundary frame in each sample I and the vehicle target size information corresponding to the vehicle category in the sample I as sample II to obtain a sample II set, and clustering all sample II in the sample II set by adopting a K-Means clustering algorithm, wherein the method comprises the following steps:
s221: randomly selecting B samples II from the sample II set as initial clustering centers, wherein each initial clustering center is used as a cluster center of one cluster;
s222: calculating the distance from all the other second samples in the second sample set to the B initial clustering centers, and allocating the second sample to the closest cluster, wherein the distance calculation formula is as follows:
di,c=1-IOU(i,c) (1);
in the formula (d)i,cThe distance from the ith sample to the c cluster center is represented, i represents the ith sample, c represents the c cluster center, and IOU represents the intersection ratio of the areas of the ith sample and the c cluster center;
s223: calculating the average value of the distance from each second sample in the c cluster to the center of the cluster
Figure BDA0003451726470000031
Will be closest to
Figure BDA0003451726470000032
D ofi,cThe corresponding ith sample number two is used as a new cluster center of the c cluster;
s224: calculating the distance d between the new cluster center of the c-th cluster and the initial cluster center of the c-th cluster;
s225: judging whether d is smaller than a set threshold or reaches the maximum iteration times, if d is smaller than the set threshold, exiting, otherwise, updating the new cluster center of the c-th cluster to the initial cluster center, and returning to the step S222;
and if the current iteration times reach the maximum iteration times, exiting, otherwise, updating the initial cluster center of the new cluster center of the c-th cluster, and returning to the step S222.
As an improvement, in S3, the process of training the vehicle identification model by using the sample number one to obtain the optimal vehicle identification model is as follows:
s31: pre-training a vehicle identification model by using an ImageNet large-scale classification data set to obtain a suboptimal vehicle identification model, initializing each layer of the VGG16 network on the basis of the suboptimal vehicle identification model, wherein initializing a newly added layer by adopting an Xavier method;
s32: modifying the total number of categories in the SSD neural network into 2;
s33: inputting all the first samples in the first sample set into a suboptimal vehicle identification model, calculating the loss of the current iteration, and updating the parameters of the suboptimal vehicle identification model by using a random gradient descent method according to the loss;
s34: and judging whether the maximum iteration times is reached, if so, judging that the current suboptimal vehicle identification model is the optimal vehicle identification model, and if not, returning to the step S33.
As an improvement, the process of obtaining whether the vehicle target in the video to be detected is the tunnel parking event by the optimal vehicle identification model in S5 is as follows:
s51: obtaining a preliminary detection result under a given confidence threshold, identifying a vehicle target in the picture through a preferred vehicle identification model and obtaining coordinate information (x1, y1, x2 and y2) of a vehicle target boundary box, wherein (x1 and y1) represent coordinates of the upper left corner of the vehicle target boundary box, and (x2 and y2) represent coordinates of the lower right corner of the vehicle target boundary box;
s52: calculating the Area _ Det of the vehicle target boundary frame, wherein the concrete expression form of the formula Area _ Det is as follows:
Area_Det=(x2-x1)(y2-y1)(2);
s53: counting the number Q of parking foreground pixel points in the vehicle target boundary frame to represent the area occupied by the fixed foreground target in the detection frame;
s54: counting the proportion P of the parking target foreground in the vehicle target boundary frame, wherein the foreground proportion P is expressed by a formula as follows:
Figure BDA0003451726470000041
and traversing the fixed foreground target picture through the coordinates of the vehicle target boundary frame, if the ratio P exceeds a set threshold value T, judging that the vehicle target has a tunnel parking event, and otherwise, judging that the tunnel parking event does not exist.
As an improvement, the step of S33 calculating the loss of the current iteration is as follows:
the loss function loss of the current iteration is represented by a weighted sum of the position loss function loc and the confidence loss function conf, which is formulated as:
Figure BDA0003451726470000042
wherein alpha is used for adjusting the proportion between the position loss function loc and the confidence loss function conf, N is the total number of default frames matched with the labeling frame by the Anchor, if N is 0, the loss function loss is defined to be 0, and the position loss is the smooth loss smooth between a prediction frame output by a first sample through a suboptimal vehicle recognition model and a labeled bounding frame surrounding the vehicle targetL1Default box d has center (cx, cy), width w, height h, and position penalty function as follows:
Figure BDA0003451726470000043
wherein the content of the first and second substances,
Figure BDA0003451726470000044
indicates whether the ith prediction box and the jth label box match with respect to the category k, the match is 1, the mismatch is 0,
Figure BDA0003451726470000045
the ith prediction box is represented as a block of the ith prediction,
Figure BDA0003451726470000046
indicating that the jth label box Pos represents a positive sample;
the confidence loss function is the cross-validation of the confidence of the softmax loss on the class and setting the weight ratio to 1, and is as follows:
Figure BDA0003451726470000051
Figure BDA0003451726470000052
wherein i represents a predicted frame number, j represents a labeled frame number, p is a category number, and p ═ 0 represents the background, wherein
Figure BDA0003451726470000053
Taking 1 to indicate that the ith prediction box is matched with the jth label box, the category of the label box is P,
Figure BDA0003451726470000054
indicating the probability value of the prediction category p of the ith prediction box.
Compared with the prior art, the invention has at least the following advantages:
compared with a 'deep learning vehicle parking detection method based on monitoring video' (CN109919053A) applied by the university of Tai Ching worker, the method well solves the influence of interference such as ambient illumination, vehicle lamp flicker, vehicle ID jump and the like on the judgment of the parking event; compared with the traditional parking event detection algorithm, the method has higher accuracy.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings.
Aiming at the problems of dark tunnel environment, serious illumination interference and the like and the requirements on real-time performance and accuracy of parking target detection in a tunnel scene, the invention selects an SSD detection network with higher detection speed as a basic network under a Tensorflow deep learning framework, and obtains a vehicle identification model through training. In order to eliminate the influence of interference such as tunnel environment illumination, vehicle illumination, water stain and the like on the detection of the parking event, the method of the invention takes the parking target foreground detected by the traditional parking detection algorithm as a primary detection result, then takes the corresponding video frame picture of the picture with the parking target foreground as the input of a tunnel vehicle identification model based on the SSD neural network to carry out vehicle target identification to obtain the coordinates of vehicle detection frames, then counts the parking target foreground ratio in each detection frame, and if the ratio is always kept to exceed the set threshold value state within the set time, the parking event is judged. The double detection method fusing the SSD neural network and the traditional parking detection algorithm can improve the accuracy of parking event detection and reduce the false detection rate.
The method comprises the steps of firstly, acquiring a video with parking behaviors in a tunnel, intercepting and storing the video as a picture according to a fixed frame number, marking the picture as a label, using the picture as a data set for training a vehicle identification model, and then training the tunnel vehicle identification model based on an SSD neural network; establishing a background model by using a Gaussian mixture model, preprocessing the picture, extracting a parking target foreground by using a background difference method, and taking a video frame picture corresponding to the picture with the parking target foreground as the input of a vehicle identification model based on an SSD (solid state disk) neural network; and finally, detecting the picture by using a vehicle identification model based on the SSD neural network, obtaining the coordinates of vehicle target detection frames, counting the parking target foreground ratio in each detection frame, and if the ratio is always kept to exceed a set threshold value state within a set time, determining that the tunnel parking event occurs.
In the following description, for the sake of simplifying terms, the bounding box surrounding the vehicle object to be labeled is also referred to as a labeling box, and the result of the optimal vehicle recognition model prediction is also referred to as a prediction box.
The tunnel parking event detection method fusing the traditional parking detection and the neural network comprises the following steps:
s1: collecting driving videos from cameras in different scenes of the expressway tunnel, intercepting and storing pictures according to a fixed frame rate, and collecting a plurality of pictures as a data set; and marking the vehicle target in each picture in the data set by adopting an image marking tool, wherein the marking content comprises the vehicle type and the coordinate value of a boundary frame surrounding the vehicle target, and all marked pictures are used as a first sample to form the VOC data set.
The method comprises the steps of collecting multiple sections of driving videos with parking events from fixed cameras in different scenes of a highway tunnel, intercepting and storing pictures in a video stream every 25 frames through an automatic screenshot program, eliminating pictures with lens switching, fuzzy images and no vehicle targets, and collecting about 12000 pictures as a data set.
And manually marking the vehicle target in the intercepted picture by adopting an image marking tool LabelImg, wherein the marked object only comprises the vehicle type, the mark type is car, the mark type is a positive sample, and the coordinate value of a boundary box surrounding the target, and storing the coordinate value to obtain an xml file, so as to obtain the VOC data set.
According to the following steps of 4: 1: 1, randomly dividing the data set into a training set, a verification set and a test set, namely 8000 training sets and 2000 verification sets respectively.
And preprocessing the training image, including image turning, scale transformation, randomly erasing a vehicle to generate a mask image, subtracting an average value and the like.
S2: and clustering the bounding box surrounding the vehicle target marked in each picture in the VOC data set obtained in the step S1 to obtain the most suitable size of the vehicle target bounding box of each vehicle type, and taking the most suitable size of the vehicle target bounding box of each vehicle type as the Anchor size in the SSD neural network.
S21: extracting length and width dimension information of a bounding box surrounding the vehicle target in each first sample;
s22: taking the length and width dimension information surrounding the vehicle target boundary frame in each sample I and the vehicle target size information corresponding to the vehicle category in the sample I as sample II to obtain a sample II set, and clustering all sample II in the sample II set by adopting a K-Means clustering algorithm, wherein the method comprises the following steps:
s221: randomly selecting B samples II from the sample II set as initial clustering centers, wherein each initial clustering center is used as a cluster center of one cluster;
s222: calculating the distance from all the other second samples in the second sample set to the B initial clustering centers, and allocating the second sample to the closest cluster, wherein the distance calculation formula is as follows:
di,c=1-IOU(i,c) (1);
in the formula (d)i,cThe distance from the ith sample to the c cluster center is represented, i represents the ith sample, c represents the c cluster center, and IOU represents the intersection ratio of the areas of the ith sample and the c cluster center;
s223: calculating the average value of the distance from each second sample in the c cluster to the center of the cluster
Figure BDA0003451726470000071
Will be closest to
Figure BDA0003451726470000072
D ofi,cThe corresponding ith sample number two is used as a new cluster center of the c cluster;
s224: calculating the distance d between the new cluster center of the c-th cluster and the initial cluster center of the c-th cluster;
s225: judging whether d is smaller than a set threshold or reaches the maximum iteration times, if d is smaller than the set threshold, exiting, otherwise, updating the new cluster center of the c-th cluster to the initial cluster center, and returning to the step S222;
and if the current iteration times reach the maximum iteration times, exiting, otherwise, updating the initial cluster center of the new cluster center of the c-th cluster, and returning to the step S222.
S3: the method comprises the following steps of constructing and training a vehicle identification model based on the SSD neural network, wherein the structure of the vehicle identification model is as follows:
selecting VGG16 as a backbone network by using a small size convolution kernel instead of a large size convolution kernel; on the premise of ensuring that the receptive field is not changed, the number of model parameters is limited;
and the following modifications are made to VGG 16: the fully-connected layer which is finally used for classification of the VGG16 is removed, the remaining two fully-connected layers fc6 and fc7 are changed into convolutional layers Conv6 and Conv7, then 4 convolutional layers are additionally added, namely Conv8_2, Conv9_2, Conv10_2 and Conv11_2, and finally 6 convolutional layers, namely Conv4-3, Conv7, Conv8-2, Conv9-2, Conv10-2 and Conv11-2, are selected to form the feature pyramid multi-scale detection structure.
When the input image is 300 × 300, the resolution is as shown in table 1:
TABLE 1 size of resolution of selected feature layer for backbone network
Feature layer Conv4-3 Conv7 Conv8-2 Conv9-2 Conv10-2 Conv11-2
Resolution ratio 38×38 19×19 10×10 5×5 3×3 1×1
S31: pre-training a vehicle identification model by using an ImageNet large-scale classification data set to obtain a suboptimal vehicle identification model, initializing each layer of the VGG16 network on the basis of the suboptimal vehicle identification model, wherein initializing a newly added layer by adopting an Xavier method; training is carried out on a training sample by a pre-training model, so that the tunnel vehicle recognition model can be trained more rapidly, and the model is more accurate.
S32: modifying the total number of categories in the SSD neural network into 2; dividing into background and vehicle;
s33: inputting all the first samples in the first sample set into a suboptimal vehicle identification model, calculating the loss of the current iteration, and updating the parameters of the suboptimal vehicle identification model by using a random gradient descent method according to the loss; training used a stochastic gradient descent method with an initial learning rate set to 0.004, a learning rate adjusted to polynomial decay, and a batch-size set to 4.
Specifically, the steps of the loss of the current iteration are as follows:
the loss function loss of the current iteration is represented by a weighted sum of the position loss function loc and the confidence loss function conf, which is formulated as:
Figure BDA0003451726470000081
wherein, α is used for adjusting the proportion between the position loss function loc and the confidence loss function conf, the default α is 1, N is the total number of default frames matched to the labeling frame by the Anchor, if N is 0, the loss function loss is defined to be 0, and the position loss is the smooth loss smooth between the prediction frame output by the suboptimal vehicle identification model and the labeled bounding frame surrounding the vehicle target by the sample oneL1Default box d has center (cx, cy), width w, height h, and position penalty function as follows:
Figure BDA0003451726470000082
wherein the content of the first and second substances,
Figure BDA0003451726470000083
indicates whether the ith prediction box and the jth label box match with respect to the category k, the match is 1, the mismatch is 0,
Figure BDA0003451726470000084
the ith prediction box is represented as a block of the ith prediction,
Figure BDA0003451726470000085
indicating that the jth label box Pos represents a positive sample.
Figure BDA0003451726470000086
The confidence loss function is the cross-validation of the confidence of the softmax loss on the class and setting the weight ratio to 1, and is as follows:
Figure BDA0003451726470000087
Figure BDA0003451726470000088
wherein i represents a predicted frame number, j represents a labeled frame number, p is a vehicle class number, and p-0 represents a background, wherein
Figure BDA0003451726470000089
Taking 1 to indicate that the ith prediction box is matched with the jth label box, the category of the label box is P,
Figure BDA00034517264700000810
representing the probability value of the prediction class p of the ith prediction box, the first half of the formula is the loss of positive samples (Pos), namely the loss classified into a certain class (excluding background), and the second half is the negative sampleLoss of the present (Neg), i.e. loss of class background.
S34: and judging whether the maximum iteration times is reached, if so, judging that the current suboptimal vehicle identification model is the optimal vehicle identification model, and if not, returning to the step S33.
And taking the Anchor size in the SSD neural network obtained in the S2 as the size of the vehicle identification model, and training the vehicle identification model by adopting the first sample to obtain the optimal vehicle identification model.
S4: and inputting a section of video to be detected into a traditional parking detection algorithm, taking the obtained corresponding video frame picture with the fixed foreground target picture as a picture to be predicted, and transmitting the picture to be predicted as input to the optimal vehicle identification model.
The method mainly comprises the following two steps:
1) obtaining a background model of a video to be detected based on a Gaussian mixture model, and then obtaining the size, position and shape information of a vehicle target by a background difference method;
2) performing frame extraction and fixed frame number AND processing on a video to be detected, setting that 1 frame extraction processing is performed on each 12 frames of the video, taking a target foreground obtained by adopting a method of performing AND processing on a current frame and an extracted first 6 frames of images as a parking target foreground, performing closed operation to remove small noise and threshold operation to remove target shadow, and taking a corresponding video frame picture with a fixed foreground target picture as a picture to be predicted.
S5: the optimal vehicle identification model obtains whether the vehicle target in the video to be detected is a tunnel parking event through two judgments, and the specific steps are as follows:
s51: obtaining a preliminary detection result under a given confidence threshold (usually 0.5), identifying a vehicle target in the picture through a good vehicle identification model and obtaining coordinate information (x1, y1, x2 and y2) of a vehicle target boundary box, wherein (x1 and y1) represent coordinates of the upper left corner of the vehicle target boundary box, and (x2 and y2) represent coordinates of the lower right corner of the vehicle target boundary box; removing redundant detection frames by using a non-maximum suppression algorithm to obtain a more accurate detection result;
s52: calculating the Area _ Det of the vehicle target boundary frame, wherein the concrete expression form of the formula Area _ Det is as follows:
Area_Det=(x2-x1)(y2-y1) (2);
s53: counting the number Q of parking foreground pixel points in the vehicle target boundary frame to represent the area occupied by the fixed foreground target in the detection frame;
s54: counting the proportion P of the parking target foreground in the vehicle target boundary frame, wherein the foreground proportion P is expressed by a formula as follows:
Figure BDA0003451726470000091
and the process of secondarily judging that the vehicle has the parking behavior is that the fixed foreground target picture is traversed through the coordinates of the vehicle target boundary frame, if the ratio P exceeds a set threshold value T (the threshold value T is set to be 0.7), the vehicle target is judged to have the tunnel parking event, and otherwise, the tunnel parking event is judged not to exist.
Wherein the formula for determining the existence of parking behavior is represented as:
Figure BDA0003451726470000092
the picture number batch _ size of each training is set according to the condition that a computer is configured with a display card, the larger the picture number of each training is, the more accurate the training is, and meanwhile, the training shock is reduced, the invention is carried out under an NVIDIA GTX 1060 display card, and in order to ensure the feasibility of the training, the batch _ size is set to be 4; setting the initial learning rate is very important, problems can be caused when the initial learning rate is set too large or too small, the loss is large or the loss is not reduced and becomes a shaking condition when the initial learning rate is set too large, the reduction direction cannot be quickly found when the initial learning rate is set too small, and the final initial learning rate is set to be 0.004 after multiple attempts; training a total of 100 epochs. In the first three epochs, in order to ensure the stability of model training, the invention adopts a WarmUp preheating means, as shown in formula (10), lrminIs empirically set to 10-6,lrbaseFor an initial learning rate of 0.004, Iter and Iter respectively represent the number of iterations required for an epoch and the time whenThe number of previous iterations. Therefore, as the number of iterations increases, the learning rate of the warm-up phase increases from 10-6Linear growth is started until the initial learning rate is reached.
Figure BDA0003451726470000101
And after the first three preheating epochs are finished, entering a conventional learning rate attenuation strategy. The invention utilizes the cosine annealing algorithm to reduce the learning rate, as shown in the formula (5.6), TcurAnd TsumThe sub-table represents the current iteration times and the total iteration times. Cosine annealing is smoother than step learning rate decay, and a better solution can be found in a gradient descent algorithm.
Figure BDA0003451726470000102
The invention selects SGD with momentum as the optimizer of the algorithm, and can effectively accelerate the convergence of the algorithm; while using a weight decay strategy of five parts per million to prevent overfitting.
Comparing the method of the invention with the parking detection result of the traditional vehicle parking algorithm, whether the detection effect of the invention is improved can be obtained, and the comparison effect is shown in table 1:
TABLE 1 statistical table of tunnel parking event detection results
Figure BDA0003451726470000103
Finally, the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, and all of them should be covered in the claims of the present invention.

Claims (5)

1. The tunnel parking event detection method fusing the traditional parking detection and the neural network is characterized by comprising the following steps of:
s1: collecting driving videos from cameras in different scenes of the expressway tunnel, intercepting and storing pictures according to a fixed frame rate, and collecting a plurality of pictures as a data set;
marking the vehicle target in each picture in the data set by adopting an image marking tool, wherein the marking content comprises the vehicle type and the coordinate value of a boundary frame surrounding the vehicle target, and all marked pictures are used as a first sample to form a VOC data set;
s2: clustering the boundary box surrounding the vehicle target marked in each picture in the VOC data set obtained in the S1 to obtain the most suitable size of the boundary box of the vehicle target for each vehicle type, and taking the most suitable size of the boundary box of the vehicle target for each vehicle type as the size of the Anchor in the SSD neural network;
s3: the method comprises the following steps of constructing and training a vehicle identification model based on the SSD neural network, wherein the structure of the vehicle identification model is as follows:
selecting VGG16 as a backbone network by using a small size convolution kernel instead of a large size convolution kernel;
and the following modifications are made to VGG 16: removing the fully-connected layer which is finally used for classification of the VGG16, changing the fully-connected layers fc6 and fc7 into convolutional layers Conv6 and Conv7 correspondingly to the remaining two fully-connected layers, then additionally adding 4 convolutional layers named as Conv8_2, Conv9_2, Conv10_2 and Conv11_2, and finally selecting 6 convolutional layers of Conv4-3, Conv7, Conv8-2, Conv9-2, Conv10-2 and Conv11-2 to form a feature pyramid multi-scale detection structure;
taking the Anchor size in the SSD neural network obtained in the S2 as the size of the vehicle identification model, and training the vehicle identification model by adopting the first sample to obtain an optimal vehicle identification model;
s4: inputting a section of video to be detected into a traditional parking detection algorithm, taking an obtained corresponding video frame picture with a fixed foreground target picture as a picture to be predicted, and transmitting the picture to be predicted as input to an optimal vehicle identification model;
s5: and the optimal vehicle identification model obtains whether the vehicle target in the video to be detected is a tunnel parking event through two judgments.
2. The method for detecting tunnel parking events fusing conventional parking detection and a neural network as claimed in claim 1, wherein: the step of clustering the images in the VOC data set to select an appropriate Anchor size in the SSD neural network in S2 is as follows:
s21: extracting length and width dimension information of a bounding box surrounding the vehicle target in each first sample
S22: taking the length and width dimension information surrounding the vehicle target boundary frame in each sample I and the vehicle target size information corresponding to the vehicle category in the sample I as sample II to obtain a sample II set, and clustering all sample II in the sample II set by adopting a K-Means clustering algorithm, wherein the method comprises the following steps:
s221: randomly selecting B samples II from the sample II set as initial clustering centers, wherein each initial clustering center is used as a cluster center of one cluster;
s222: calculating the distance from all the other second samples in the second sample set to the B initial clustering centers, and allocating the second sample to the closest cluster, wherein the distance calculation formula is as follows:
di,c=1-IOU(i,c) (1);
in the formula (d)i,cThe distance from the ith sample to the c cluster center is represented, i represents the ith sample, c represents the c cluster center, and IOU represents the intersection ratio of the areas of the ith sample and the c cluster center;
s223: calculating the average value of the distance from each second sample in the c cluster to the center of the cluster
Figure FDA0003451726460000021
Will be closest to
Figure FDA0003451726460000022
D ofi,cThe corresponding ith sample number two is used as a new cluster center of the c cluster;
s224: calculating the distance d between the new cluster center of the c-th cluster and the initial cluster center of the c-th cluster;
s225: judging whether d is smaller than a set threshold or reaches the maximum iteration times, if d is smaller than the set threshold, exiting, otherwise, updating the new cluster center of the c-th cluster to the initial cluster center, and returning to the step S222;
and if the current iteration times reach the maximum iteration times, exiting, otherwise, updating the initial cluster center of the new cluster center of the c-th cluster, and returning to the step S222.
3. The method for detecting tunnel parking events fusing conventional parking detection and neural networks according to claim 1 or 2, characterized in that: the process of training the vehicle identification model by using the first sample in the step S3 to obtain the optimal vehicle identification model is as follows:
s31: pre-training a vehicle identification model by using an ImageNet large-scale classification data set to obtain a suboptimal vehicle identification model, initializing each layer of the VGG16 network on the basis of the suboptimal vehicle identification model, wherein initializing a newly added layer by adopting an Xavier method;
s32: modifying the total number of categories in the SSD neural network into 2;
s33: inputting all the first samples in the first sample set into a suboptimal vehicle identification model, calculating the loss of the current iteration, and updating the parameters of the suboptimal vehicle identification model by using a random gradient descent method according to the loss;
s34: and judging whether the maximum iteration times is reached, if so, judging that the current suboptimal vehicle identification model is the optimal vehicle identification model, and if not, returning to the step S33.
4. The method for detecting tunnel parking events fusing conventional parking detection and neural networks according to claim 3, wherein: the process of obtaining whether the vehicle target in the video to be detected is the tunnel parking event or not by the optimal vehicle identification model in the S5 is as follows:
s51: obtaining a preliminary detection result under a given confidence threshold, identifying a vehicle target in the picture through a preferred vehicle identification model and obtaining coordinate information (x1, y1, x2 and y2) of a vehicle target boundary box, wherein (x1 and y1) represent coordinates of the upper left corner of the vehicle target boundary box, and (x2 and y2) represent coordinates of the lower right corner of the vehicle target boundary box;
s52: calculating the Area _ Det of the vehicle target boundary frame, wherein the concrete expression form of the formula Area _ Det is as follows:
Area_Det=(x2-x1)(y2-y1) (2);
s53: counting the number Q of parking foreground pixel points in the vehicle target boundary frame to represent the area occupied by the fixed foreground target in the detection frame;
s54: counting the proportion P of the parking target foreground in the vehicle target boundary frame, wherein the foreground proportion P is expressed by a formula as follows:
Figure FDA0003451726460000031
and traversing the fixed foreground target picture through the coordinates of the vehicle target boundary frame, if the ratio P exceeds a set threshold value T, judging that the vehicle target has a tunnel parking event, and otherwise, judging that the tunnel parking event does not exist.
5. The method for detecting tunnel parking events fusing conventional parking detection and neural networks according to claim 3, wherein: the step of S33 calculating the loss of the current iteration is as follows:
the loss function loss of the current iteration is represented by a weighted sum of the position loss function loc and the confidence loss function conf, which is formulated as:
Figure FDA0003451726460000032
wherein alpha is used for adjusting the proportion between the position loss function loc and the confidence loss function conf, N is the total number of default frames matched with the labeling frame by the Anchor, if N is 0, the loss function loss is defined to be 0, and the position loss is a prediction frame of a first sample output by a suboptimal vehicle recognition model and a labeled enclosureSmooth loss smooths between the vehicle target boundary framesL1Default box d has center (cx, cy), width w, height h, and position penalty function as follows:
Figure FDA0003451726460000033
wherein the content of the first and second substances,
Figure FDA0003451726460000034
indicates whether the ith prediction box and the jth label box match with respect to the category k, the match is 1, the mismatch is 0,
Figure FDA0003451726460000035
the ith prediction box is represented as a block of the ith prediction,
Figure FDA0003451726460000036
indicating that the jth label box Pos represents a positive sample;
the confidence loss function is the cross-validation of the confidence of the softmax loss on the class and setting the weight ratio to 1, and is as follows:
Figure FDA0003451726460000037
Figure FDA0003451726460000038
wherein i represents a predicted frame number, j represents a labeled frame number, p is a category number, and p ═ 0 represents the background, wherein
Figure FDA0003451726460000041
Taking 1 to indicate that the ith prediction box is matched with the jth label box, the category of the label box is P,
Figure FDA0003451726460000042
indicating the probability value of the prediction category p of the ith prediction box.
CN202111665332.8A 2021-12-31 2021-12-31 Tunnel parking event detection method integrating traditional parking detection and neural network Pending CN114299438A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111665332.8A CN114299438A (en) 2021-12-31 2021-12-31 Tunnel parking event detection method integrating traditional parking detection and neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111665332.8A CN114299438A (en) 2021-12-31 2021-12-31 Tunnel parking event detection method integrating traditional parking detection and neural network

Publications (1)

Publication Number Publication Date
CN114299438A true CN114299438A (en) 2022-04-08

Family

ID=80973514

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111665332.8A Pending CN114299438A (en) 2021-12-31 2021-12-31 Tunnel parking event detection method integrating traditional parking detection and neural network

Country Status (1)

Country Link
CN (1) CN114299438A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115762172A (en) * 2022-11-02 2023-03-07 济南博观智能科技有限公司 Method, device, equipment and medium for identifying vehicles entering and exiting parking places

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115762172A (en) * 2022-11-02 2023-03-07 济南博观智能科技有限公司 Method, device, equipment and medium for identifying vehicles entering and exiting parking places

Similar Documents

Publication Publication Date Title
CN111368687B (en) Sidewalk vehicle illegal parking detection method based on target detection and semantic segmentation
CN108304798B (en) Street level order event video detection method based on deep learning and motion consistency
CN107563372B (en) License plate positioning method based on deep learning SSD frame
Tian et al. Rear-view vehicle detection and tracking by combining multiple parts for complex urban surveillance
Anagnostopoulos et al. A license plate-recognition algorithm for intelligent transportation system applications
CN111709416B (en) License plate positioning method, device, system and storage medium
CN111080620A (en) Road disease detection method based on deep learning
CN111814621A (en) Multi-scale vehicle and pedestrian detection method and device based on attention mechanism
CN104978567A (en) Vehicle detection method based on scenario classification
CN109255326B (en) Traffic scene smoke intelligent detection method based on multi-dimensional information feature fusion
Ap et al. Automatic number plate detection in vehicles using faster R-CNN
CN114463390A (en) Multi-twin-countermeasure network cross-camera vehicle tracking method with coupled motorcade following strengthening
Tao et al. A three-stage framework for smoky vehicle detection in traffic surveillance videos
CN113657305B (en) Video-based intelligent detection method for black smoke vehicle and ringeman blackness level
CN114299438A (en) Tunnel parking event detection method integrating traditional parking detection and neural network
Yang et al. High-speed rail pole number recognition through deep representation and temporal redundancy
CN116152758A (en) Intelligent real-time accident detection and vehicle tracking method
CN113313008B (en) Target and identification tracking method based on YOLOv3 network and mean shift
CN115019039A (en) Example segmentation method and system combining self-supervision and global information enhancement
CN114782891A (en) Road spray detection method based on contrast clustering self-learning
CN114550134A (en) Deep learning-based traffic sign detection and identification method
CN114581841A (en) Method for detecting weak and small targets by using deep learning method in complex traffic environment
Yang et al. An instance segmentation algorithm based on improved mask R-CNN
CN112347967A (en) Pedestrian detection method fusing motion information in complex scene
NGUYEN License plate detection and refinement based on deep convolutional neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination