CN111401148B - Road multi-target detection method based on improved multi-stage YOLOv3 - Google Patents

Road multi-target detection method based on improved multi-stage YOLOv3 Download PDF

Info

Publication number
CN111401148B
CN111401148B CN202010124052.5A CN202010124052A CN111401148B CN 111401148 B CN111401148 B CN 111401148B CN 202010124052 A CN202010124052 A CN 202010124052A CN 111401148 B CN111401148 B CN 111401148B
Authority
CN
China
Prior art keywords
convolution
detection
improved
feature
road
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010124052.5A
Other languages
Chinese (zh)
Other versions
CN111401148A (en
Inventor
王海
王宽
蔡英凤
李祎承
刘擎超
刘明亮
张田田
李洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu University
Original Assignee
Jiangsu University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu University filed Critical Jiangsu University
Priority to CN202010124052.5A priority Critical patent/CN111401148B/en
Publication of CN111401148A publication Critical patent/CN111401148A/en
Application granted granted Critical
Publication of CN111401148B publication Critical patent/CN111401148B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/588Recognition of the road, e.g. of lane markings; Recognition of the vehicle driving pattern in relation to the road
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • G06F18/2193Validation; Performance evaluation; Active pattern learning techniques based on specific statistical tests
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an improved multi-stage YOLOv 3-based road multi-target detection method, which comprises the following steps of 1, making a data set: creating a road multi-objective dataset based on the disclosed driving dataset BDD 100K; step2, calculating aspect ratio of the road target candidate frame based on a K-means clustering algorithm; step3, designing an improved YOLOv3 neural network model; step4, setting training super parameters and network parameters, inputting a training set into a network, training an improved YOLOv3 network, and storing a trained weight file; step5, outputting predicted boundary box information and class probability; and 6, using a softening non-maximum value filtering detection frame to visualize the detection picture, and generating a final target detection frame and a recognition result. Compared with the original YOLOv3 neural network model, the mAP reaches 58.09% under the verification set of BDD100K, the nearly 9 percentage points are improved, and the detection accuracy is higher; the real-time performance is better, the FPS after statistics is 0.03 s/sheet, and the time consumption is only increased by 1.65% compared with the traditional YOLOv3, so that the real-time performance requirement is met.

Description

Road multi-target detection method based on improved multi-stage YOLOv3
Technical Field
The invention belongs to the technical field of detection of automobile environment perception targets, and particularly relates to a road multi-target detection method based on improved multi-stage YOLOv 3.
Background
Road target detection is an important direction in the field of image recognition, a computer vision algorithm based on deep learning is used as a later-on part in the field of computer vision, and with the continuous increase of data volume and the rapid advance of hardware level in recent years, great success is achieved in various computer vision tasks, such as target classification, target detection, semantic segmentation and the like. In particular, for target detection, a large number of algorithms with outstanding effects and good real-time performance are available at present. These algorithms are classified into single-stage and two-stage detection algorithms according to whether a region candidate network (RPN) is used or not, and first a detection frame regression of positive samples. The single-stage target detection algorithm includes YOLOv3, SSD, retinanet, etc., and the double-stage detection algorithm includes RCNN, RFCN, fasterrnn, cascadercnn, etc. The single-stage target detection algorithm has good real-time performance, and the double-stage detection algorithm has high accuracy. In the field of target detection, road target detection is a very important direction, and research on a road target detection algorithm is very important for traffic safety. In an autopilot scenario, detection and identification of road targets plays a very important role. Accurate detection plays a decisive role in subsequent identification, assisted positioning and navigation. The invention uses a method based on improved YOLOv3 for road multi-target detection.
Disclosure of Invention
The invention aims to solve the problem of poor accuracy of the existing road target detection, and provides a road multi-target detection method based on improved YOLOv3, which can improve the safety in the driving process. Firstly, a data set is manufactured by utilizing a public driving data set BDD100K, then an improved YOLOv3 neural network model is designed, then the neural network model is trained by utilizing the BDD100K data set, stored model parameters are imported into the improved YOLOv3 neural network model, and finally road targets in pictures are detected.
Compared with the original YOLOv3 network framework, the improved YOLOv3 neural network model of the invention adds two feature detection graphs, and the resolution of the modified 5 feature detection graphs are 13×13,26×26,52×52,104×104 and 208×208 respectively, and the improved YOLOv3 neural network model has 104×104 and 208×208 more output feature detection graphs than the original YOLOv3 detection graph. 5 candidate frames are distributed on the feature map of each scale, and the principle that the large-size feature frames detect small-size objects and the small-size feature frames detect large-size objects is followed. Training the training set image and the verification set image through the YOLO neural network to obtain a final YOLO v 3-based network weight model. Meanwhile, when the road targets in the picture are detected in real time, a plurality of prediction boundary boxes exist for each target in the picture, and the softening non-maximum value is used for inhibiting and eliminating redundant prediction boundary boxes. And the positioning accuracy and the detection accuracy of the network are improved.
The beneficial effects of the invention include:
1. compared with the original YOLOv3 neural network model, the method has the advantages that the mAP reaches 58.09% under the verification set of BDD100K, the nearly 9 percentage points are improved, and the detection accuracy is high.
2. The real-time performance is better, the time for detecting each picture by the improved YOLOv3 neural network model is used for counting the FPS, and the FPS after counting is 0.03 s/sheet, so that the real-time performance requirement is met.
Drawings
FIG. 1 is a modified YOLOv3 neural network model
FIG. 2 is a diagram showing the detection effect
FIG. 3 is a diagram showing the detection effect
Detailed Description
The invention is further described below with reference to the accompanying drawings.
As shown in fig. 1, a road multi-target detection method based on improved YOLOv3 includes the following steps:
step1 dataset fabrication
Based on the disclosed driving data set BDD100K, a road multi-target data set is manufactured, the number of the data sets is 10 ten thousand, and the number of GT frame labels in the data sets is 10, wherein the classification is respectively as follows: bus, light traffic lights, sign traffic signs, person pedestrians, bike, truck, motorbike, car, train, rider, and a total of about 184 ten thousand calibration frames. The resolution of the data set pictures is 1280 multiplied by 720, the BDD100k data set contains pictures of different weather, scenes and time, and the pictures of high definition and blurring are large in scale and diversified and are real driving scenes. According to the invention, training sets, test sets and verification sets are divided according to the ratio of 7:2:1, wherein the training sets 70000 are divided into the test sets 20000, the verification sets 10000 are divided into the BDD100k data sets which are arranged into the VOC data set format, the VOC data sets comprise three folders which are respectively JPEGImas files, annotations files and imageses files, the JPEGImas stores the training sets and the test set pictures, the Annotations files store xml type labeling files, the imageses stores txt texts, each line of txt texts corresponds to the name of one picture, the improved YOLOV3 network model reads the file name according to the txt texts, then the corresponding pictures and labeling information are searched in the JPEGImas and the annuas files, the labeling information of a road is extracted from the found picture labels, and the frame parameters of the labeling information are obtained. Then randomly dividing the pictures into different batches, and before the pictures are sent into an improved YOLOv3 network model, carrying out data enhancement modes such as random rotation, clipping, translation transformation, overturn transformation, noise disturbance and the like on the pictures, expanding the scene diversity of the pictures, and uniformly adjusting the sizes of the pictures to 416 multiplied by 416.
Step2 carries out road target candidate frame length-width ratio calculation based on K-means clustering algorithm
The boundary frame markers of the BDD100K dataset were calculated based on the K-means++ algorithm and clustered to obtain 15 anchor frame sizes of (4, 8), (6, 16), (10, 10), (8, 31), (13, 20), (22, 16), (22, 30), (13,51), (36, 42), (25,89), (54,66), (83,95), (57,155), (116, 156), (155,249), respectively.
Step3 improved YOLOv3 neural network model
Original YOLOv3 is a depth residual convolutional neural network of a full convolutional framework, and the network alternately uses 3×3 and 1×1 to extract the characteristics of a target in a picture, reduce resolution, adjust the size of the number of image channels and 2 times the characteristics of a front layer of an up-sampling layer fusion network. The YOLOv3 network is a feature interaction output layer of the network from 75 to 106 layers, the feature interaction output layer is divided into three resolutions, and in each resolution feature map, local feature interaction combination is realized by means of convolution (3×3 and 1×1 kernels). The final output of the network is generated by applying a 1 x1 convolution kernel on the signature, and object detection is accomplished by applying a 1 x1 detection grid on three different layers, three different sizes of the signature in the network. The original YOLOv3 was predicted with three resolution detection maps.
The improved YOLOv3 neural network model of the invention is shown in figure 1, and the detailed process is as follows:
firstly, the normalized image is reduced by half through two 3×3 convolutions, then sequentially passes through a residual error module, a 3×3 convolution, two residual error modules, a 3×3 convolution, eight residual error modules, a 3×3 convolution and seven residual error modules to obtain a 13×13 feature detection diagram, the picture size of an input 416×416 is adjusted to an output detection diagram of 13×13×45, and then an up-sampling layer with a step length of 2 is connected to promote the feature diagram to 26×26×256;
secondly, a 26 multiplied by 26 feature detection diagram is obtained by sequentially passing through a 3 multiplied by 3 convolution and eight residual modules; the following 52×52,104×104,208×208 feature maps were obtained by a 3×3 convolution and eight residual modules. Wherein the residual module is operated by a convolution of 1×1, a convolution of 3×3 and residual in sequence. Secondly, initially generating anchor blocks with three different scales on a 13X 13 feature map, and then sequentially carrying out 3X 3 convolution, a CONV module, 3X 3 convolution and 1X 1 convolution to obtain tensor data under the 13X 13 scale; then, carrying out up-sampling on the 13X 13 feature map sequentially through a 3X 3 convolution, a CONV module and a 1X 1 convolution, carrying out feature fusion on the feature map obtained by up-sampling and the 26X 26 feature map obtained by the neural network part of the YOLO network, initially generating anchor point frames with three different scales on the feature map obtained by feature fusion, and then sequentially carrying out the CONV module, the 3X 3 convolution and the 1X 1 convolution to obtain tensor data under 26X 26; then, the tensor data of 52×52,104×104 and 208×208 are all obtained as before, and feature fusion is carried out on the feature map obtained by up-sampling and the feature map of the upper layer obtained by the basic neural network part of the YOLO network through a vector splicing method, anchor blocks of three different scales are initially generated on the feature map obtained by the feature fusion, and then tensor data is obtained after a CONV module, a 3×3 convolution and a 1×1 convolution are sequentially carried out. The CONV module refers to an operation process of sequentially performing a 1×1 convolution, a 3×3 convolution, and a 1×1 convolution. The resolution sizes of the modified 5 feature detection maps are 13×13,26× 26,52 ×52,104×104,208×208, respectively. The improved network has 104×104 and 208×208 more output characteristic detection patterns than the original YOLOv3 detection pattern. 5 candidate boxes are allocated on the feature map of each scale, and the overall process of the detection model of the improved YOLOv3 neural network is shown in fig. 1.
Step4, setting training super parameters and network parameters, inputting a training set into a network, training an improved YOLOv3 network, and storing a trained weight file;
super parameters during training are set as follows: batch number is 4, learning rate=0.001, maximum iteration number 50000, learning strategy is set to sps= 40000,45000, 50000. A learning rate of 0.1 times the current value between 40000 and 45000, and a learning rate of 0.1 times the current value between 45000 and 50000;
main parameters of the experimental platform: a processor: inter (R) core (TM) i5-8600K CPU@3.60GHZ; memory: 64GB; display card: NVIDIA GeForce GTX1080TI.
The improved YOLOV3 model utilizes the regression loss function of the prediction boundary box to carry out loss calculation, the class score, the confidence score, the center coordinate and the width and height loss of each predicted correction frame relative to the real calibration frame class, the center coordinate and the width and height are calculated through the loss function, the weight is updated through counter propagation to obtain the gradient, the updated weight parameters are obtained, in order to make the loss smaller and smaller, the model weight is updated when each batch is sent into the improved neural network model until the loss value converges, the model parameters are stored once every ten thousands of times, verification is carried out under a verification set, and the learning rate is adjusted according to the loss curve and the detection effect on the verification set. Finally, the model converges in 90000 times, training is stopped, and a final detection model based on the improved YOLOv3 neural network is obtained after the iteration is performed 90000 times. Model parameters were saved for 90000 training times.
Step5 outputs predicted bounding box information and class probabilities.
Introducing the model parameters stored in the previous step into an improved YOLOv3 model, sending the test picture into the improved YOLOv3 model, activating the x, y, confidence and class probability of network prediction by using a logistic function, and judging by a threshold value to obtain the coordinates, confidence and class probability of all prediction frames; and outputting predicted boundary box information and class probability.
b x =σ(t x )+C x
b y =σ(t y )+C y
Figure GDA0004101207060000054
Figure GDA0004101207060000055
Wherein: c (C) X ,C Y For the offset of the current grid relative to the top left grid of the current feature map, the sigma () function is a logistic function used to apply t x 、t y Normalized to between 0 and 1, P w ,P h Is intersected with the marked boundary frame and is wider and higher than the largest anchor frame, t w 、t h 、t x 、t y Is the vertex coordinates of the prediction box.
Step6 uses softening non-maxima filtering detection frame
At this time, the road targets in the pictures are provided with a plurality of prediction boundary boxes, the traditional non-maximum value inhibition sorts the detection frames according to scores, then the frame with the highest score is reserved, and other frames with the overlapping area larger than a certain proportion with the frame are deleted, so that the omission of the targets is easy to cause. Finally, the detected picture is visualized to generate a final target detection frame and a recognition result, as shown in fig. 2 and 3.
Step7 detection accuracy comparison
According to the invention, mAP is used for evaluating and improving the target detection performance of the YOLOv3 network, mAP (mean Average Precision) is the accumulation sum of detection precision of each category on recall rate, is an important index for evaluating the target detection network performance, mAP is calculated under 10000 pictures in a verification set of BDD100K, and marking information of less categories of Train, divider, motor and like in a data set is removed, so that mAP of six categories are calculated, namely Bus, car, person, traffic light, traffic sign and Truck.
The calculation formula of the AP is as follows: ap= c PdR,
wherein P is the detection precision (precision), R is the Recall ratio Recall, and the calculation formula is as follows:
(1)
Figure GDA0004101207060000051
(2)
Figure GDA0004101207060000052
table 1 shows the results of the improved YOLOv3 network versus the original performance:
Figure GDA0004101207060000053
Figure GDA0004101207060000061
TABLE 1
As shown in Table 1, the improved YOLOv3 has improved detection accuracy, compared with the original YOLOv3, mAP is increased by about 9 percentage points, 58.09% is reached, and the detection accuracy is higher. And secondly, counting FPS (field programmable gate array) by counting the time of detecting each picture through a program, wherein the FPS is 0.03 s/sheet, which shows that the road multi-target detection method based on the YOLOv3 neural network can also meet the requirement of real-time.
The above list of detailed descriptions is only specific to practical embodiments of the present invention, and they are not intended to limit the scope of the present invention, and all equivalent manners or modifications that do not depart from the technical scope of the present invention should be included in the scope of the present invention.

Claims (7)

1. The road multi-target detection method based on the improved multi-stage YOLOv3 is characterized by comprising the following steps of:
step1, manufacturing a data set: creating a road multi-objective dataset based on the disclosed driving dataset BDD 100K;
step2, calculating aspect ratio of the road target candidate frame based on a K-means clustering algorithm;
step3, designing an improved YOLOv3 neural network model; the specific method comprises the following steps:
firstly, reducing the scale of a normalized image by half after two 3×3 convolutions, then sequentially passing through a residual error module, a 3×3 convolution, two residual error modules, a 3×3 convolution, eight residual error modules, a 3×3 convolution and seven residual error modules to obtain a 13×13 feature detection diagram, adjusting the size of an input 416×416 picture to an output detection diagram of 13×13×45, and then connecting an up-sampling layer with a step length of 2 to promote the feature diagram to 26×26×256;
secondly, sequentially passing the 26 multiplied by 26 feature detection diagram through a 3 multiplied by 3 convolution and eight residual modules; then, obtaining 52×52,104×104 and 208×208 feature maps after a 3×3 convolution and eight residual modules; wherein, the residual error module sequentially carries out convolution of 1 multiplied by 1, convolution of 3 multiplied by 3 and residual error operation;
secondly, anchor blocks with three different scales are initially generated on a 13X 13 feature map, and tensor data under the 13X 13 scale is obtained through a 3X 3 convolution, a CONV module, a 3X 3 convolution and a 1X 1 convolution in sequence; then, carrying out up-sampling on the 13X 13 feature map sequentially through a 3X 3 convolution, a CONV module and a 1X 1 convolution, carrying out feature fusion on the feature map obtained by up-sampling and the 26X 26 feature map obtained by the neural network part of the YOLO network, initially generating anchor point frames with three different scales on the feature map obtained by feature fusion, and then sequentially carrying out the CONV module, the 3X 3 convolution and the 1X 1 convolution to obtain tensor data under 26X 26; then, the tensor data of 52×52,104×104 and 208×208 are all obtained as before, the feature map obtained by up-sampling and the feature map of the upper layer obtained by the basic neural network part of the YOLO network are subjected to feature fusion by a vector splicing method, three anchor point frames with different scales are initially generated on the feature map obtained by the feature fusion, and then tensor data are obtained by a CONV module, a 3×3 convolution and a 1×1 convolution in sequence; the CONV module is used for sequentially performing operations of a 1×1 convolution, a 3×3 convolution, a 1×1 convolution, a 3×3 convolution and a 1×1 convolution; the resolution sizes of the modified 5 feature detection maps are 13×13,26×26,52×52,104×104,208×208, respectively;
finally, 5 candidate frames are distributed on the feature detection graph of each scale;
step4, setting training super parameters and network parameters, inputting a training set into a network, training an improved YOLOv3 network, and storing a trained weight file;
step5, outputting predicted boundary box information and class probability;
and 6, using a softening non-maximum value filtering detection frame to visualize the detection picture, and generating a final target detection frame and a recognition result.
2. The improved multi-level YOLOv 3-based road multi-objective detection method according to claim 1, wherein in step1, the dataset BDD100K is designed into a VOC dataset format, the VOC dataset includes three folders, respectively, a JPEGImages file, an anotalons file, and an imagets file, wherein the JPEGImages stores training set and test set pictures, the anotals folder stores xml-type labeling files, the imagets folder stores txt text, each line of txt text corresponds to a picture name, the improved YOLOv3 network model reads the file name according to the txt text, then searches the JPEGImages and anotals folders for corresponding pictures and labeling information, extracts the labeling information of the road objective in the found picture labels, and obtains frame parameters of the labeling information.
3. The improved multi-level YOLOv 3-based road multi-objective detection method of claim 2, wherein the pictures in the VOC dataset are randomly divided into different batches, and the pictures are randomly rotated, cropped, translated, flipped, noise disturbance data enhanced before being sent into the improved YOLOv3 network model, thereby expanding the scene diversity of the pictures and uniformly adjusting the picture size to 416 x 416.
4. The improved multi-level YOLOv 3-based road multi-objective detection method of claim 1, wherein GT frame tags in the dataset BDD100K are divided into 10 categories of: bus, light, sign, person, bike, truck, motor, car, train, rider, there are 184 total ten thousand calibration frames; the resolution of the data set pictures is 1280 multiplied by 720, and the training set, the testing set and the verification set are divided according to the proportion of 7:2:1, wherein 70000 pieces of the training set, 20000 pieces of the testing set and 10000 pieces of the verification set are used.
5. The improved multi-level YOLOv 3-based road multi-objective detection method of claim 1, wherein the implementation method of step2 is as follows: the boundary frame markers of the BDD100K dataset were calculated based on the K-means++ algorithm and clustered to obtain 15 anchor frame sizes of (4, 8), (6, 16), (10, 10), (8, 31), (13, 20), (22, 16), (22, 30), (13,51), (36, 42), (25,89), (54,66), (83,95), (57,155), (116, 156), (155,249), respectively.
6. The improved multi-level YOLOv 3-based road multi-objective detection method of claim 1, wherein in step4, super parameters during training are set as follows: batch number is 4, learning rate=0.001, burn_in=1000, maximum iteration number 000, learning strategy is set to sps= 40000,45000,50000; a learning rate of 0.1 times the current value between 40000 and 45000, and a learning rate of 0.1 times the current value between 45000 and 50000;
in the training process, the regression loss function of the prediction boundary box is utilized to carry out loss calculation, the class score, the confidence score, the center coordinate and the width-height loss of each predicted correction box relative to the real calibration box class, the center coordinate and the width-height are calculated through the loss function, the gradient is obtained through counter propagation to update the weight, the updated weight parameters are obtained, each batch is sent into the improved neural network model to update the model weight until the loss value converges, the model parameters are stored once every ten thousands of times, verification is carried out under a verification set, and the learning rate is adjusted according to the loss curve and the detection effect on the verification set.
7. The improved multi-level YOLOv 3-based road multi-objective detection method of claim 1, wherein the implementation method of step6 is as follows: and (3) using a softening non-maximum value to reduce the confidence coefficient, designating a confidence coefficient threshold value, reserving a detection frame with a score larger than the threshold value, and cycling the step in the rest prediction boundary frames to finally obtain a prediction boundary frame corresponding to each road target.
CN202010124052.5A 2020-02-27 2020-02-27 Road multi-target detection method based on improved multi-stage YOLOv3 Active CN111401148B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010124052.5A CN111401148B (en) 2020-02-27 2020-02-27 Road multi-target detection method based on improved multi-stage YOLOv3

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010124052.5A CN111401148B (en) 2020-02-27 2020-02-27 Road multi-target detection method based on improved multi-stage YOLOv3

Publications (2)

Publication Number Publication Date
CN111401148A CN111401148A (en) 2020-07-10
CN111401148B true CN111401148B (en) 2023-06-20

Family

ID=71428505

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010124052.5A Active CN111401148B (en) 2020-02-27 2020-02-27 Road multi-target detection method based on improved multi-stage YOLOv3

Country Status (1)

Country Link
CN (1) CN111401148B (en)

Families Citing this family (44)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110826572B (en) * 2018-08-09 2023-04-21 京东方科技集团股份有限公司 Non-maximum value inhibition method, device and equipment for multi-target detection
CN111986156A (en) * 2020-07-20 2020-11-24 华南理工大学 Axe-shaped sharp tool detection method, system, device and storage medium
CN113971755B (en) * 2020-07-22 2024-05-03 中国科学院沈阳自动化研究所 All-weather sea surface target detection method based on improved YOLOV model
CN112084890B (en) * 2020-08-21 2024-03-22 杭州电子科技大学 Method for identifying traffic signal sign in multiple scales based on GMM and CQFL
CN112070729B (en) * 2020-08-26 2023-07-07 西安交通大学 Anchor-free remote sensing image target detection method and system based on scene enhancement
CN111986436B (en) * 2020-09-02 2022-12-13 成都视道信息技术有限公司 Comprehensive flame detection method based on ultraviolet and deep neural networks
CN112183255A (en) * 2020-09-15 2021-01-05 西北工业大学 Underwater target visual identification and attitude estimation method based on deep learning
CN112633052A (en) * 2020-09-15 2021-04-09 北京华电天仁电力控制技术有限公司 Belt tearing detection method
CN112085728B (en) * 2020-09-17 2022-06-21 哈尔滨工程大学 Submarine pipeline and leakage point detection method
CN112132130B (en) * 2020-09-22 2022-10-04 福州大学 Real-time license plate detection method and system for whole scene
CN112200225B (en) * 2020-09-23 2022-07-26 西南交通大学 Steel rail damage B display image identification method based on deep convolution neural network
CN112132033B (en) * 2020-09-23 2023-10-10 平安国际智慧城市科技股份有限公司 Vehicle type recognition method and device, electronic equipment and storage medium
CN112233175B (en) * 2020-09-24 2023-10-24 西安交通大学 Chip positioning method and integrated positioning platform based on YOLOv3-tiny algorithm
CN112287977B (en) * 2020-10-06 2024-02-09 武汉大学 Target detection method based on bounding box key point distance
CN112329768A (en) * 2020-10-23 2021-02-05 上善智城(苏州)信息科技有限公司 Improved YOLO-based method for identifying fuel-discharging stop sign of gas station
CN112380921A (en) * 2020-10-23 2021-02-19 西安科锐盛创新科技有限公司 Road detection method based on Internet of vehicles
CN112434583B (en) * 2020-11-14 2023-04-07 武汉中海庭数据技术有限公司 Lane transverse deceleration marking line detection method and system, electronic equipment and storage medium
CN112365324A (en) * 2020-12-02 2021-02-12 杭州微洱网络科技有限公司 Commodity picture detection method suitable for E-commerce platform
CN112560918B (en) * 2020-12-07 2024-02-06 杭州电子科技大学 Dish identification method based on improved YOLO v3
CN112507929B (en) * 2020-12-16 2022-05-13 武汉理工大学 Vehicle body spot welding slag accurate detection method based on improved YOLOv3 network
CN112434672B (en) * 2020-12-18 2023-06-27 天津大学 Marine human body target detection method based on improved YOLOv3
CN112884705B (en) * 2021-01-06 2024-05-14 西北工业大学 Two-dimensional material sample position visualization method
CN112906485B (en) * 2021-01-25 2023-01-31 杭州易享优智能科技有限公司 Visual impairment person auxiliary obstacle perception method based on improved YOLO model
CN112906523B (en) * 2021-02-04 2022-12-27 上海航天控制技术研究所 Hardware-accelerated deep learning target machine type identification method
CN112819804B (en) * 2021-02-23 2024-07-12 西北工业大学 Insulator defect detection method based on improved YOLOv convolutional neural network
CN112949633B (en) * 2021-03-05 2022-10-21 中国科学院光电技术研究所 Improved YOLOv 3-based infrared target detection method
CN113139615A (en) * 2021-05-08 2021-07-20 北京联合大学 Unmanned environment target detection method based on embedded equipment
CN113409250A (en) * 2021-05-26 2021-09-17 杭州电子科技大学 Solder joint detection method based on convolutional neural network
CN113255524B (en) * 2021-05-27 2022-08-16 山东省交通规划设计院集团有限公司 Pavement information identification method and system based on YOLO v4
CN113313128B (en) * 2021-06-02 2022-10-28 东南大学 SAR image target detection method based on improved YOLOv3 network
CN113378739A (en) * 2021-06-19 2021-09-10 湖南省气象台 Foundation cloud target detection method based on deep learning
CN113486764B (en) * 2021-06-30 2022-05-03 中南大学 Pothole detection method based on improved YOLOv3
CN113592784A (en) * 2021-07-08 2021-11-02 浙江科技学院 Method and device for detecting pavement diseases based on lightweight convolutional neural network
CN113538389B (en) * 2021-07-23 2023-05-09 仲恺农业工程学院 Pigeon egg quality identification method
CN113537106B (en) * 2021-07-23 2023-06-02 仲恺农业工程学院 Fish ingestion behavior identification method based on YOLOv5
CN113569968B (en) * 2021-07-30 2024-05-17 清华大学苏州汽车研究院(吴江) Model training method, target detection method, device, equipment and storage medium
CN113822148B (en) * 2021-08-05 2024-04-12 同济大学 Intelligent identification method for trace tiny carryover based on convolutional neural network
CN113743233B (en) * 2021-08-10 2023-08-01 暨南大学 Vehicle model identification method based on YOLOv5 and MobileNet V2
CN114120057A (en) * 2021-11-09 2022-03-01 华侨大学 Confusion matrix generation method based on Paddledetection
CN113903009B (en) * 2021-12-10 2022-07-05 华东交通大学 Railway foreign matter detection method and system based on improved YOLOv3 network
CN114998220B (en) * 2022-05-12 2023-06-13 湖南中医药大学 Tongue image detection and positioning method based on improved Tiny-YOLO v4 natural environment
CN115439765B (en) * 2022-09-17 2024-02-02 艾迪恩(山东)科技有限公司 Marine plastic garbage rotation detection method based on machine learning unmanned aerial vehicle visual angle
CN115311458B (en) * 2022-10-10 2023-02-14 南京信息工程大学 Real-time expressway pedestrian intrusion event detection method based on multi-task learning
CN116343175A (en) * 2023-05-24 2023-06-27 岚图汽车科技有限公司 Pedestrian guideboard detection method, device, equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109325418A (en) * 2018-08-23 2019-02-12 华南理工大学 Based on pedestrian recognition method under the road traffic environment for improving YOLOv3
CN109815886B (en) * 2019-01-21 2020-12-18 南京邮电大学 Pedestrian and vehicle detection method and system based on improved YOLOv3
CN110378210B (en) * 2019-06-11 2023-04-18 江苏大学 Vehicle and license plate detection and long-and-short-focus fusion distance measurement method based on lightweight YOLOv3
CN110796168B (en) * 2019-09-26 2023-06-13 江苏大学 Vehicle detection method based on improved YOLOv3

Also Published As

Publication number Publication date
CN111401148A (en) 2020-07-10

Similar Documents

Publication Publication Date Title
CN111401148B (en) Road multi-target detection method based on improved multi-stage YOLOv3
CN109447034B (en) Traffic sign detection method in automatic driving based on YOLOv3 network
CN110175613B (en) Streetscape image semantic segmentation method based on multi-scale features and codec model
CN110348384B (en) Small target vehicle attribute identification method based on feature fusion
CN111401410B (en) Traffic sign detection method based on improved cascade neural network
CN111783844B (en) Deep learning-based target detection model training method, device and storage medium
CN111310773A (en) Efficient license plate positioning method of convolutional neural network
CN113688652A (en) Method and device for processing abnormal driving behaviors
CN111553201A (en) Traffic light detection method based on YOLOv3 optimization algorithm
CN112016605A (en) Target detection method based on corner alignment and boundary matching of bounding box
CN111178451A (en) License plate detection method based on YOLOv3 network
CN113076804B (en) Target detection method, device and system based on YOLOv4 improved algorithm
JP7373624B2 (en) Method and apparatus for fine-grained image classification based on scores of image blocks
CN113239753A (en) Improved traffic sign detection and identification method based on YOLOv4
CN112766170B (en) Self-adaptive segmentation detection method and device based on cluster unmanned aerial vehicle image
CN114612883A (en) Forward vehicle distance detection method based on cascade SSD and monocular depth estimation
CN116824543A (en) Automatic driving target detection method based on OD-YOLO
Cai et al. Vehicle Detection Based on Deep Dual‐Vehicle Deformable Part Models
CN117152625A (en) Remote sensing small target identification method, system, equipment and medium based on CoordConv and Yolov5
CN115690549A (en) Target detection method for realizing multi-dimensional feature fusion based on parallel interaction architecture model
CN117975218A (en) Small target detection method based on mixed attention and feature centralized multi-scale fusion
Liang et al. Car detection and classification using cascade model
CN117975418A (en) Traffic sign detection method based on improved RT-DETR
Alam et al. Faster RCNN based robust vehicle detection algorithm for identifying and classifying vehicles
Song et al. Sign-YOLO: a novel lightweight detection model for Chinese traffic sign

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant