CN111738056B - Heavy truck blind area target detection method based on improved YOLO v3 - Google Patents

Heavy truck blind area target detection method based on improved YOLO v3 Download PDF

Info

Publication number
CN111738056B
CN111738056B CN202010344037.1A CN202010344037A CN111738056B CN 111738056 B CN111738056 B CN 111738056B CN 202010344037 A CN202010344037 A CN 202010344037A CN 111738056 B CN111738056 B CN 111738056B
Authority
CN
China
Prior art keywords
detection
yolo
sample data
feature
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010344037.1A
Other languages
Chinese (zh)
Other versions
CN111738056A (en
Inventor
朱仲杰
屠仁伟
白永强
王玉儿
杨跃平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Wanli University
Original Assignee
Zhejiang Wanli University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Wanli University filed Critical Zhejiang Wanli University
Priority to CN202010344037.1A priority Critical patent/CN111738056B/en
Publication of CN111738056A publication Critical patent/CN111738056A/en
Application granted granted Critical
Publication of CN111738056B publication Critical patent/CN111738056B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a heavy truck blind area target detection method based on improved YOLO v3, which is characterized by comprising the following steps of: collecting mixed pictures of a vehicle, a person in a falling state and a person in a normal state under a real road condition, and establishing a sample data set; after preprocessing, carrying out category calibration and position information extraction on detection targets in a sample data set, and dividing the detection targets into a training set and a testing set; performing cluster analysis on the training set, and selecting an anchor value; improving and optimizing a network structure; setting training parameters, and training the optimized network by using a training set to obtain a target detection model; inputting the monitored picture into a target detection model to obtain a detection result of a real-time blind area; the optimized and improved YOLO v3 network has the advantages that the detection performance of a centering object and a small object is enhanced, the defects of network missed detection and false detection in the prior art are overcome, the situation of people in a vehicle and falling state and people in a normal state in a blind area environment around a vehicle can be mastered in time by the heavy truck driver, and traffic accidents are avoided.

Description

Heavy truck blind area target detection method based on improved YOLO v3
Technical Field
The invention relates to a target detection method, in particular to a heavy-duty blind area target detection method based on improved YOLO v 3.
Background
The heavy truck plays an important role in the development of the logistics industry, but because of the characteristics of the length of the truck and the height of the cab, a large visual field blind area exists for a driver, so that the visual field of the driver is limited and accurate judgment cannot be timely made. One of the current methods for solving the dead zone of the heavy truck is to install a camera, but the driver is required to manually identify and judge the dead zone target; one is that a camera combines a traditional algorithm to automatically identify single-type targets, but is only suitable for the conditions of simple detection background and small number of detection targets; still another is 360 ° panoramic detection in combination with radar, but still requires manual detection of obstructions and sometimes false alarms.
In recent years, the target detection algorithm has made a great breakthrough. The YOLO v3 adopts a CNN network to realize detection, so that the speed of target detection is greatly accelerated, the accuracy is improved, the performance of the existing YOLO v3 for large, medium and small-size target detection is balanced, but in the actual detection of a large number of medium and small-size targets, some missed detection and false detection still exist for the medium and small-size images, and in the process of detecting the targets by the frames, the positioning of some detection frames is inaccurate, and the targets cannot be completely framed.
Disclosure of Invention
The invention aims to solve the technical problem of providing the heavy card blind area target detection method based on improved YOLO v3, which is used for detecting the people in the state of falling and the people in the normal state in real time in the heavy card blind area range, accurately detecting the targets and accurately positioning the detection frame
The technical scheme adopted for solving the technical problems is as follows: a heavy truck blind area target detection method based on improved YOLO v3 comprises the following steps:
(1) collecting mixed pictures of a vehicle, a person in a falling state and a person in a normal state under a real road condition mainly with a medium size and a small size, establishing a sample data set, preprocessing the sample data set, performing category calibration and position information extraction on a detection target in the sample data set, and dividing the sample data set into a training set and a testing set;
(2) performing cluster analysis on the training set, and selecting an anchor value;
(3) improving the network structure of the original detection model to obtain an optimized YOLO v3 network;
(4) setting training parameters, and training the optimized YOLO v3 network by using the training set to obtain a target detection model;
(5) inputting the video monitored in real time in the dead zone range of the heavy truck into the target detection model for detection;
(6) and outputting detection results of the vehicle, the person in a falling state and the person in a normal state in the dead zone range of the heavy truck.
The specific method for performing category calibration and position information extraction on the detection targets in the sample data set in the step (1) is as follows:
a, selecting different light factors, different shooting angles, different road environments and different resolutions for the sample data set;
b, adjusting the image size of the training set in the sample data set to uniform pixels;
c, performing category calibration on the detection targets in the sample data set, and respectively using 0,1 and 2 to represent a person in a vehicle or falling state and a person in a normal state;
d, extracting position information of the sample data set, and representing the detection target as a four-dimensional vector { x, y, w, h }; wherein: x represents the coordinate of the detection target in the x-axis direction, y represents the coordinate of the detection target in the y-axis direction, w represents the width of the detection target, and h represents the height of the detection target;
and e, generating a labeling file. Different illumination, different vehicle shooting visual angles, different resolutions and different sample data sets of road environments and road conditions are selected, so that the requirements of sample diversity are met, the requirements of purposeful optimization are achieved, the method has important significance for improving the algorithm detection robustness, and better detection effects can be achieved by using fewer training samples. The image pixels of the training set are tuned to be identical to facilitate convolution operations in the next training model.
In the step (1), the sample data set is divided into 80% training set and 20% test set.
In the step (2), the training set is subjected to cluster analysis by adopting a K-means algorithm, and different anchor values are obtained by setting the number of different cluster centers K; ioU (cross-over ratio) was used as a clustering index, and an anchor value was set to { (12, 26), (18, 71), (31, 43), (66, 73), (35, 151), (98, 121), (61, 260), (110, 310), (238, 212) } by analysis of avg iou (average cross-over ratio).
In the step (3), dark-53 is selected as a basic network for image feature extraction, YOLO v3 upsamples deep information of a convolution layer and then splices the deep information and shallow information together through a concat function to realize feature fusion, and 3 groups of feature information with different depths are fused to output 13×13, 26×26 and 52×52 feature graphs to obtain an FPN (feature pyramid) structure; on the basis, the information of the splicing shallow layer increases the information quantity of the features, the 11 th layer of the Darknet-53 is spliced on the 52X 52 feature map to obtain an improved 52X 52 feature map, and the feature information of the improved 52X 52 feature map consists of three parts: feature information after being downsampled by the layer 11 of the Darknet-53, feature information of the layer 36 of the Darknet-53 and feature information from downsampled 26X 26 feature images; splicing the layer 36 of the Darknet-53 to the 26×26 feature map to obtain an improved 26×26 feature map, wherein the feature information of the improved 26×26 feature map is composed of three parts: the feature information after the layer 36 downsampling of the dark net-53, the layer 61 feature information of the dark net-53 and the feature information from the 13×13 feature map downsampling. The information quantity of the features is increased by splicing the shallow information, the features between the global features and the local features are enhanced, and the detection capability of the middle and small targets is improved.
The training parameters in the step (4) are set as follows: batch 512, sub division 256, max batches 12000.
Compared with the prior art, the method has the advantages that different illumination, different vehicle shooting visual angles, different resolutions and different sample data sets of road environments and road conditions are selected, the requirement of sample diversity is met, the sample diversity is purposefully optimized, the method has important significance for improving the algorithm detection robustness, and better detection effect can be achieved by using fewer training samples; compared with the characteristic diagrams of 13×13, 26×26 and 52×52 in the prior art, which are used for detecting large, medium and small targets respectively, the performance of the characteristic diagrams is balanced, the improved network structure in the invention carries out characteristic enhancement on the characteristic diagrams of 26×26 and 52×52, the detection performance of the medium and small targets is improved, the accurate anchor values under the medium and small target data sets are obtained through K-means clustering, and compared with the prior art, the invention compensates the defect of missed detection in the prior art to a certain extent, so that the detection targets are more accurately detected and the detection frame is also accurately positioned.
Drawings
FIG. 1 is a schematic flow chart of the present invention;
FIG. 2 is a diagram of an Avg IoU line diagram corresponding to different k values in step (2) of the present invention;
FIG. 3 is a schematic diagram of the optimized YOLO v3 network structure in step (3) of the present invention;
FIG. 4 is a graph showing the comparison of the Loss and mAP of unmodified Yolo v3 with the Loss and mAP of the modified Yolo v3 of the present invention, wherein the left is a graph of unmodified Yolo v3, and the right is a graph of modified Yolo v 3;
FIG. 5 is a schematic diagram of the comparison of the Loss and mAP of unmodified Yolo v3 with the Loss and mAP of the modified Yolo v3 of the present invention, the left side being a schematic diagram of unmodified Yolo v3, the right side being a schematic diagram of modified Yolo v 3;
FIG. 6 is a schematic diagram of the comparison of the Loss and mAP of unmodified Yolo v3 with the Loss and mAP of the integrally modified Yolo v3 of the present invention, with the unmodified Yolo v3 on the left and the integrally modified Yolo v3 on the right;
FIG. 7a is a schematic picture of the result of detecting a first scene using the unmodified prior art;
FIG. 7b is a schematic image of the result of detecting a first scene using the method of the present invention;
FIG. 8a is a schematic picture of the result of detecting a second scene using the unmodified prior art;
FIG. 8b is a schematic image of the result of detecting a second scene using the method of the present invention;
FIG. 9a is a schematic picture of the result of detecting a third scene using the unmodified prior art;
FIG. 9b is a schematic image of the result of detecting a third scene using the method of the present invention;
FIG. 10a is a schematic picture of the result of detecting a fourth scene using the unmodified prior art;
FIG. 10b is a schematic image of the result of detecting a fourth scene using the method of the present invention;
FIG. 11a is a schematic picture of the result of detecting a fifth scene using the unmodified prior art;
FIG. 11b is a schematic image of the result of detecting a fifth scene using the method of the present invention;
FIG. 12a is a schematic picture of the result of detecting a sixth scene using the unmodified prior art;
FIG. 12b is a schematic image of the result of detecting a sixth scene using the method of the present invention;
FIG. 13a is a schematic picture of the result of detecting a seventh scene using the unmodified prior art;
FIG. 13b is a schematic image of the result of detecting a seventh scene using the method of the present invention;
FIG. 14a is a schematic picture of the result of detection of an eighth scene using the unmodified prior art;
fig. 14b is a schematic picture of the result of detection of an eighth scene using the method of the present invention.
Detailed Description
The invention is described in further detail below with reference to the embodiments of the drawings.
A heavy truck blind area target detection method based on improved YOLO v3 comprises the following steps:
(1) collecting mixed pictures of a vehicle, a person in a falling state and a person in a normal state under a real road condition mainly with a medium size and a small size, establishing a sample data set, preprocessing the sample data set, carrying out category calibration and position information extraction on a detection target in the sample data set, and dividing the sample data set into a training set and a testing set:
the specific method for carrying out category calibration and position information extraction on the detection targets in the sample data set in the step comprises the following steps:
a, selecting different light factors, different shooting angles, different road environments and different resolutions for a sample data set;
b, adjusting a training set in the sample data set to be uniform 416×416 pixels through a programming program;
c, performing category calibration on detection targets in a sample data set by using YOLO-Mark software of the YOLO, and respectively using 0,1 and 2 to represent people in a vehicle state, a falling state and people in a normal state;
d, extracting position information from the sample data set, and representing the detection target as a four-dimensional vector { x, y, w, h }; wherein: x represents the coordinate of the detection target in the x-axis direction, y represents the coordinate of the detection target in the y-axis direction, w represents the width of the detection target, and h represents the height of the detection target;
and e, generating a labeling file.
The sample data set is divided into 80% training set and 20% test set.
(2) Cluster analysis is carried out on the training set, and an anchor value is selected:
performing cluster analysis on the training set by adopting a K-means algorithm, and obtaining different anchor values by setting the number of different cluster centers K; ioU is taken as a clustering index, the larger the Avg IoU is, the more accurate the anchor is, and the anchor value is obtained through analysis of the Avg IoU. IoU formula is:where DetectionResult represents the predicted bounding box, groundTruth represents the actual bounding box, and a larger value for IoU indicates better performance of the detector. IoU is 1 if the predicted and actual borders completely overlap.
The clustering results of Table 1 were obtained according to the K-means method:
TABLE 1K-means clustering results
K anchor
1 47,86
2 20,39,92,166
3 17,34,63,106,134,258
4 16,32,55,79,76,215,199,231
5 14,28,34,67,81,98,68,265,198,228
6 14,28,33,66,76,95,66,266,149,156,221,287
7 13,27,29,53,67,74,39,179,98,123,85,297,227,221
8 12,27,30,43,22,95,65,73,48,213,97,122,94,307,230,216
9 12,26,18,71,31,43,66,73,35,151,98,121,61,260,110,310,238,212
As shown in fig. 2, as the k value increases, avg IoU increases, and increases first and then decreases, and finally tends to converge.
In summary, when k=9, avg iou= 66.91% reaches the maximum value, the present invention selects the anchor at k=9 as the anchor value of the present invention, i.e., { (12, 26), (18, 71), (31, 43), (66, 73), (35, 151), (98, 121), (61, 260), (110, 310), (238, 212) }.
(3) And improving the network structure of the original detection model to obtain an optimized YOLO v3 network:
according to the invention, darknet-53 is selected as a basic network for image feature extraction, and YOLO v3 is used for realizing feature fusion by splicing deep information of a convolution layer with shallower information through a concat function after upsampling, and 3 groups of feature information with different depths are fused to output feature images of 13×13, 26×26 and 52×52, so that an FPN structure is obtained; on the basis, the information of the splicing shallow layer increases the information quantity of the features, the 11 th layer of the Darknet-53 is spliced on the 52X 52 feature map to obtain an improved 52X 52 feature map, and the feature information of the improved 52X 52 feature map consists of three parts: feature information after being downsampled by the layer 11 of the Darknet-53, feature information of the layer 36 of the Darknet-53 and feature information from downsampled 26X 26 feature images; splicing the layer 36 of the Darknet-53 to the 26×26 feature map to obtain an improved 26×26 feature map, wherein the feature information of the improved 26×26 feature map is composed of three parts: the feature information after being downsampled by the layer 36 of the Darknet-53, the feature information of the layer 61 of the Darknet-53 and the feature information from the downsampled 13X 13 feature map, and the improved network structure is shown in figure 3.
The output of the 11 th layer of the Darknet-53 is 104 multiplied by 128, after downsampling, the convolution kernel size is set to be 3 multiplied by 3, the sliding step length is set to be 2, and the output of the convolution kernel number is set to be 256: 52×52×256, the output of the layer 36 of the dark-53 is 52×52×256, and the up-sampled output from the 26×26 feature map is 52×52×128, so that the three are spliced together to output 52×52×640;
similarly, the output of the 36 th layer of the Darknet-53 is 52 multiplied by 256, after downsampling, the convolution kernel size is set to be 3 multiplied by 3, the sliding step length is set to be 2, and the output of the convolution kernel number is set to be 512: 26×26×512, the layer 61 output of Darknet-53 is 26×26×512, and the upsampled output from the 13×13 feature map is 26×26×256, so the three are spliced together to output 26×26×1280.
(4) Setting training parameters, and training the optimized YOLO v3 network by using a training set to obtain a target detection model:
the experimental development environment selected by the invention is as follows, CPU: intel i9 9920x,3.5ghz; GPU: NVIDIA GeForce RTX2080Ti 11G; RAM:16G; deep learning network framework: dark-53. The invention sets 3 kinds of detection targets, sets the iteration times of each kind to 4000 times, and sets 3 kinds of detection targets to 12000 times in total. Setting 1000 times per iteration to generate one model, the training is finished to generate 12 models in total. The input resolution is set to 416 x 416 and multi-scale training is turned on. The learning rate determines the update speed of the weight, and is set to be too large so that the result exceeds an optimal value, and too small so that the descent speed is too slow, so that the dynamically-changed learning rate is set to obtain a better target detection model. When 0 < Iteration number) < 9600, lr (learning rate) =0.001; 9600 < operation < 10800, lr=0.0001; when 10800 < Iteration < 12000, lr=0.00001, and the learning rate of the whole training process decays 100 times. The key training parameter settings for unmodified and modified YOLO v3 networks are shown in table 2:
table 2 key training parameter settings
In summary, the training parameters of the present invention are set as follows: batch is 512, subdivision is 256, and Max batches is 12000.
(5) Inputting the video monitored in real time in the dead zone range of the heavy truck into a target detection model for detection: the real-time video picture can be obtained by monitoring the dead zone range of the heavy truck through electronic equipment such as a camera, and is input into a trained target detection model,
(6) and outputting detection results of the vehicle, the person in a falling state and the person in a normal state in the dead zone range of the heavy truck, obtaining a picture framed by the detected frame of the target to be detected by the steps, and simultaneously displaying the name of the detected target at the upper left corner of the detected frame.
In order to further verify the advantages of the method of the invention, the performance of the target detection model is verified by using a test set, and the experimental result is as follows:
recording Precision as the proportion of the real target left by removing false detection from all the targets detected by the frame; recall is a ratio of the number of correctly detected targets to the number of all real targets in the test set,
wherein TP is the number of positive samples correctly detected by the model as the number of positive samples, namely the number of correct samples detected by the model as the person in the car or fall state or the person in the normal state, and FP is the number of negative samples incorrectly detected by the model as the number of positive samples, namely the number of correct samples detected by the model as the person in the car or fall state or the person in the normal stateThe number of samples of the person in the car or the fall or the person in the normal state is measured, FN is the number of positive samples which are erroneously detected as negative samples by the model, namely the number of samples of the person in the car or the fall or the person in the normal state which are calibrated as the person in the car or the fall but are not detected as the person in the car or the fall or the person in the normal state, and the higher the result of obtaining Precision, the better the target detection model used, and the result of obtaining Recall is close to 1.
Table 3 lists various important parameters, including precision, recovery and mAP, from 7000 to 12000 iterations for comparison before and after network structure improvement for different iteration numbers. Experimental data prove that the YOLO v3 after the network structure is improved has higher mAP under the condition that the precision and the recovery are basically unchanged. In the experimental result of 7000 iterations, YOLO v3 after improving the network structure has a higher recovery value, 21% improvement over unmodified YOLO v3, and 7.5% improvement over mAP. The best experimental result is obtained by 12000 iterations, and the YOLO v3 with the improved network structure and the YOLO v3 without the improvement obtain the same recovery, but the precision is improved from 93% to 95%, and the mAP is improved from 85.03% to 87.24%. As can be seen from the mAP curve of fig. 4, the mAP of YOLO v3 after the modification of the network structure is significantly larger than the mAP of YOLO v3 without modification. Thus, the characteristic enhancement of the network structure improves the characteristic extraction efficiency of the target detection model to a certain extent.
TABLE 3 comparison of network Structure improvement with different iteration times
Table 4 comparison of various detection targets before and after network structure improvement
As can be seen from table 4. The YOLO v3 with the improved network structure has higher mAP, and the AP values of three detection targets are larger than those of the non-improved YOLO v3, so that the characteristic enhancement is integral to the performance improvement of the target detection model and not only the improvement of the detection capability of a single detection target.
Table 5 shows the comparison of different iterations before and after the model was modified, and from 10000 iterations, the mAP value of the modified YOLO v3 was greater than that of the unmodified YOLO v3, while ensuring that precision and recovery were substantially the same as those of the unmodified YOLO v 3. The best experimental results were obtained at 12000 iterations, with the mAP= 86.31% of YOLO v3 after the anchor improvement being 1.28% higher than the mAP of YOLO v3 without the improvement. The anchor of unmodified YOLO v3 was obtained by clustering the published COCO data. As can be seen from the mAP curve of fig. 5, the mAP of YOLO v3 after the anchor is improved is significantly larger than the mAP before the improvement.
TABLE 5 comparison of Anchor improvement before and after different iteration times
TABLE 6 comparison of various monitoring targets before and after Anchor improvement
As shown in table 6, the improvement of the anchor significantly improved the detection ability of the person whose detection target was in a falling state, the ap=89.26% of YOLO v3 which was not improved, the ap=95.80% of YOLO v3 after the improvement of the anchor, and the AP values of the other two detection targets were substantially the same. Thus, the improvement of the anchor has obvious lifting effect on the positioning detection of the person with the falling object in the invention.
Table 7 shows the comparison of the overall improvement of different iteration times, the overall improvement is the common improvement of the network structure and the anchors, the K-means clustering is performed to obtain a new anchor by optimizing the neural network structure, and the precision and the recovery before and after the improvement are basically the same from 7000 iterations to 12000 iterations, but the mAP value of the YOLO v3 after the overall improvement is obviously improved compared with that of the YOLO v3 after the improvement. At 12000 iterations, the mAP of the overall improved Yolo v3 reached 87.82% improvement over 85.03% of the mAP of the unmodified Yolo v3 by 2.79%. As can be seen from the mAP curve of fig. 6, the mAP of YOLO v3 after overall improvement is significantly larger than that of YOLO v3 without improvement.
TABLE 7 comparison of overall improvement before and after different iteration times
Table 8 comparison of various detection targets before and after overall improvement
Table 9 comparison of parameters before and after overall improvement
Tables 8 and 9 show that the overall improved YOLO v3 has a significant improvement in detection ability of the person whose detection target is in a falling state, and the improvement in mAP of the overall improved YOLO v3 is also significant, from 85.03% to 87.82%, whereas the size of the target detection model is increased by only 7m and total BFLOPS is increased by only about 3.5. Therefore, the overall improved YOLO v3 still has real-time detection performance, and the detection speed reaches 13.792ms/frame. The overall improvement enables the detection performance of the whole target detection model to be greatly improved.
As shown in fig. 7a to 14b, fig. 7a, 8a, 9a, 10a, 11a, 12a, 13a, and 14a are schematic pictures of results of detection using the conventional technique, and fig. 7b, 8b, 9b, 10b, 11b, 12b, 13b, and 14b are schematic pictures of results of detection using the method of the present invention. From comparison of experimental results of the eight scenes, the prior art still has some missed detection and false detection in actual detection. As shown in fig. 7b, 11b, and 13b, a person in a normal state and a person in a falling state, which are not detected in fig. 7a, 11a, and 13a, are detected; detecting a person whose vehicle is in a normal state at the detection position of fig. 8a as shown in fig. 8 b; as shown in fig. 9a, the umbrella in the figure is detected as a person, whereas fig. 9b does not; as shown in fig. 10b and 12b, the detection target is more accurately and completely framed than the detection frames of fig. 10a and 12 a.
In conclusion, the method is obviously superior to the prior art, has more excellent detection capability, particularly has obviously enhanced detection capability of the middle and small targets, overcomes the defect of missed detection in the prior art to a certain extent, and has more accurate detection and more accurate detection frame positioning.

Claims (5)

1. The heavy truck blind area target detection method based on the improved YOLO v3 is characterized by comprising the following steps of:
(1) collecting mixed pictures of a vehicle, a person in a falling state and a person in a normal state under a real road condition mainly with a medium size and a small size, establishing a sample data set, preprocessing the sample data set, performing category calibration and position information extraction on a detection target in the sample data set, and dividing the sample data set into a training set and a testing set;
(2) performing cluster analysis on the training set, and selecting an anchor value;
(3) and improving the network structure of the original detection model to obtain an optimized YOLO v3 network:
selecting Darknet-53 as a basic network for image feature extraction, and performing feature fusion on the deep information of a convolution layer after being up-sampled by YOLO v3 and shallower information through a concat function, wherein 3 groups of feature information with different depths are fused and output 13×13, 26×26 and 52×52 feature images to obtain an FPN structure; on the basis, the information of the splicing shallow layer increases the information quantity of the features, the 11 th layer of the Darknet-53 is spliced on the 52X 52 feature map to obtain an improved 52X 52 feature map, and the feature information of the improved 52X 52 feature map consists of three parts: feature information after being downsampled by the layer 11 of the Darknet-53, feature information of the layer 36 of the Darknet-53 and feature information from downsampled 26X 26 feature images; splicing the layer 36 of the Darknet-53 to the 26×26 feature map to obtain an improved 26×26 feature map, wherein the feature information of the improved 26×26 feature map is composed of three parts: feature information after being downsampled by a layer 36 of the Darknet-53, feature information of a layer 61 of the Darknet-53 and feature information from downsampled 13X 13 feature images;
(4) setting training parameters, and training the optimized YOLO v3 network by using the training set to obtain a target detection model;
(5) inputting the video monitored in real time in the dead zone range of the heavy truck into the target detection model for detection;
(6) and outputting detection results of the vehicle, the person in a falling state and the person in a normal state in the dead zone range of the heavy truck.
2. The heavy truck blind area target detection method based on improved YOLO v3 according to claim 1, wherein the specific method for performing category calibration and position information extraction on the detection target in the sample data set in step (1) is as follows:
a, selecting different light factors, different shooting angles, different road environments and different resolutions for the sample data set;
b, adjusting the image size of the training set in the sample data set to uniform pixels;
c, performing category calibration on the detection targets in the sample data set, and respectively using 0,1 and 2 to represent a person in a vehicle or falling state and a person in a normal state;
d, extracting position information of the sample data set, and representing the detection target as a four-dimensional vector { x, y, w, h }; wherein: x represents the coordinates of the detection target in the x-axis direction, y represents the coordinates of the detection target in the y-axis direction, w represents the width of the detection target, and h represents the height of the detection target;
and e, generating a labeling file.
3. The heavy truck blind area target detection method based on improved YOLO v3 according to claim 1, wherein the sample data set is divided into 80% training set and 20% test set.
4. The heavy truck blind area target detection method based on improved YOLO v3 according to claim 1, wherein in the step (2), a K-means algorithm is adopted to perform cluster analysis on the training set, and different anchor values are obtained by setting the number of different cluster centers K; ioU is used as a clustering index, and by analysis of Avg IoU, an anchor value is set to { (12, 26), (18, 71), (31, 43), (66, 73), (35, 151), (98, 121), (61, 260), (110, 310), (238, 212) }.
5. The heavy truck blind area target detection method based on improved YOLO v3 according to claim 1, wherein the training parameters in the step (4) are set as follows: batch is 512, subdivision is 256, and Max batches is 12000.
CN202010344037.1A 2020-04-27 2020-04-27 Heavy truck blind area target detection method based on improved YOLO v3 Active CN111738056B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010344037.1A CN111738056B (en) 2020-04-27 2020-04-27 Heavy truck blind area target detection method based on improved YOLO v3

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010344037.1A CN111738056B (en) 2020-04-27 2020-04-27 Heavy truck blind area target detection method based on improved YOLO v3

Publications (2)

Publication Number Publication Date
CN111738056A CN111738056A (en) 2020-10-02
CN111738056B true CN111738056B (en) 2023-11-03

Family

ID=72646899

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010344037.1A Active CN111738056B (en) 2020-04-27 2020-04-27 Heavy truck blind area target detection method based on improved YOLO v3

Country Status (1)

Country Link
CN (1) CN111738056B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113469953B (en) * 2021-06-10 2022-06-14 南昌大学 Transmission line insulator defect detection method based on improved YOLOv4 algorithm
CN113591575A (en) * 2021-06-29 2021-11-02 北京航天自动控制研究所 Target detection method based on improved YOLO v3 network
CN114373121A (en) * 2021-09-08 2022-04-19 武汉众智数字技术有限公司 Method and system for improving small target detection of yolov5 network
CN113989763B (en) * 2021-12-30 2022-04-15 江西省云眼大视界科技有限公司 Video structured analysis method and analysis system
CN114782923B (en) * 2022-05-07 2024-05-03 厦门瑞为信息技术有限公司 Detection system for dead zone of vehicle
CN115775381B (en) * 2022-12-15 2023-10-20 华洋通信科技股份有限公司 Mine electric locomotive road condition identification method under uneven illumination

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105336207A (en) * 2015-12-04 2016-02-17 黄左宁 Vehicle recorder and public security comprehensive monitoring system
CN109584558A (en) * 2018-12-17 2019-04-05 长安大学 A kind of traffic flow statistics method towards Optimization Control for Urban Traffic Signals
CN109684803A (en) * 2018-12-19 2019-04-26 西安电子科技大学 Man-machine verification method based on gesture sliding
CN109829429A (en) * 2019-01-31 2019-05-31 福州大学 Security protection sensitive articles detection method under monitoring scene based on YOLOv3
CN110210452A (en) * 2019-06-14 2019-09-06 东北大学 It is a kind of based on improve tiny-yolov3 mine truck environment under object detection method
CN110232406A (en) * 2019-05-28 2019-09-13 厦门大学 A kind of liquid crystal display panel CF image identification method based on statistical learning
CN110356325A (en) * 2019-09-04 2019-10-22 魔视智能科技(上海)有限公司 A kind of urban transportation passenger stock blind area early warning system
AU2019101133A4 (en) * 2019-09-30 2019-10-31 Bo, Yaxin MISS Fast vehicle detection using augmented dataset based on RetinaNet
CN110766098A (en) * 2019-11-07 2020-02-07 中国石油大学(华东) Traffic scene small target detection method based on improved YOLOv3
CN110807496A (en) * 2019-11-12 2020-02-18 智慧视通(杭州)科技发展有限公司 Dense target detection method
CN110889324A (en) * 2019-10-12 2020-03-17 南京航空航天大学 Thermal infrared image target identification method based on YOLO V3 terminal-oriented guidance

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10740607B2 (en) * 2017-08-18 2020-08-11 Autel Robotics Co., Ltd. Method for determining target through intelligent following of unmanned aerial vehicle, unmanned aerial vehicle and remote control

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105336207A (en) * 2015-12-04 2016-02-17 黄左宁 Vehicle recorder and public security comprehensive monitoring system
CN109584558A (en) * 2018-12-17 2019-04-05 长安大学 A kind of traffic flow statistics method towards Optimization Control for Urban Traffic Signals
CN109684803A (en) * 2018-12-19 2019-04-26 西安电子科技大学 Man-machine verification method based on gesture sliding
CN109829429A (en) * 2019-01-31 2019-05-31 福州大学 Security protection sensitive articles detection method under monitoring scene based on YOLOv3
CN110232406A (en) * 2019-05-28 2019-09-13 厦门大学 A kind of liquid crystal display panel CF image identification method based on statistical learning
CN110210452A (en) * 2019-06-14 2019-09-06 东北大学 It is a kind of based on improve tiny-yolov3 mine truck environment under object detection method
CN110356325A (en) * 2019-09-04 2019-10-22 魔视智能科技(上海)有限公司 A kind of urban transportation passenger stock blind area early warning system
AU2019101133A4 (en) * 2019-09-30 2019-10-31 Bo, Yaxin MISS Fast vehicle detection using augmented dataset based on RetinaNet
CN110889324A (en) * 2019-10-12 2020-03-17 南京航空航天大学 Thermal infrared image target identification method based on YOLO V3 terminal-oriented guidance
CN110766098A (en) * 2019-11-07 2020-02-07 中国石油大学(华东) Traffic scene small target detection method based on improved YOLOv3
CN110807496A (en) * 2019-11-12 2020-02-18 智慧视通(杭州)科技发展有限公司 Dense target detection method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Detection of Infrared Small Targets Using Feature Fusion Convolutional Network;KAIDI WANG等;《IEEE Access》;第7卷;第146081-146092页 *
基于深度学习的航空对地小目标检测;梁华等;《液晶与显示》;第33卷(第9期);第793-800页 *
基于深度神经网络的航拍图像小目标检测算法研究;张敏桐;《中国优秀硕士学位论文全文数据库 信息科技辑》(第2期);第I138-1566页 *
盲区车辆检测与跟踪算法研究;刘海洋;《中国优秀硕士学位论文全文数据库 工程科技Ⅱ辑》(第7期);第C035-144页 *

Also Published As

Publication number Publication date
CN111738056A (en) 2020-10-02

Similar Documents

Publication Publication Date Title
CN111738056B (en) Heavy truck blind area target detection method based on improved YOLO v3
CN111444809B (en) Power transmission line abnormal target detection method based on improved YOLOv3
CN112199993B (en) Method for identifying transformer substation insulator infrared image detection model in any direction based on artificial intelligence
Akagic et al. Pothole detection: An efficient vision based method using rgb color space image segmentation
CN109345547B (en) Traffic lane line detection method and device based on deep learning multitask network
CN111611861B (en) Image change detection method based on multi-scale feature association
CN114973002A (en) Improved YOLOv 5-based ear detection method
CN112330593A (en) Building surface crack detection method based on deep learning network
CN112634257B (en) Fungus fluorescence detection method
CN115272204A (en) Bearing surface scratch detection method based on machine vision
CN115995056A (en) Automatic bridge disease identification method based on deep learning
CN114596316A (en) Road image detail capturing method based on semantic segmentation
CN113313107A (en) Intelligent detection and identification method for multiple types of diseases on cable surface of cable-stayed bridge
CN110826364B (en) Library position identification method and device
CN114494845A (en) Artificial intelligence hidden danger troubleshooting system and method for construction project site
CN105787955A (en) Sparse segmentation method and device of strip steel defect
CN113762247A (en) Road crack automatic detection method based on significant instance segmentation algorithm
CN112257514B (en) Infrared vision intelligent detection shooting method for equipment fault inspection
CN115797314A (en) Part surface defect detection method, system, equipment and storage medium
CN114677670A (en) Automatic identification and positioning method for identity card tampering
CN110533698B (en) Foundation pit construction pile detection control method based on visual detection
CN117392043A (en) Steel plate surface defect video detection method and system based on deep learning
CN113963161A (en) System and method for segmenting and identifying X-ray image based on ResNet model feature embedding UNet
CN114694090A (en) Campus abnormal behavior detection method based on improved PBAS algorithm and YOLOv5
CN114648736B (en) Robust engineering vehicle identification method and system based on target detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant