CN112270252A - Multi-vehicle target identification method for improving YOLOv2 model - Google Patents

Multi-vehicle target identification method for improving YOLOv2 model Download PDF

Info

Publication number
CN112270252A
CN112270252A CN202011158555.0A CN202011158555A CN112270252A CN 112270252 A CN112270252 A CN 112270252A CN 202011158555 A CN202011158555 A CN 202011158555A CN 112270252 A CN112270252 A CN 112270252A
Authority
CN
China
Prior art keywords
target
model
value
training
layers
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011158555.0A
Other languages
Chinese (zh)
Inventor
李珣
时斌斌
聂婷婷
张玥
李林鹏
贠鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Polytechnic University
Original Assignee
Xian Polytechnic University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Polytechnic University filed Critical Xian Polytechnic University
Priority to CN202011158555.0A priority Critical patent/CN112270252A/en
Publication of CN112270252A publication Critical patent/CN112270252A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/54Surveillance or monitoring of activities, e.g. for recognising suspicious objects of traffic, e.g. cars on the road, trains or boats
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/08Detecting or categorising vehicles

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a multi-vehicle target identification method for improving a YOLOv2 model, which comprises the steps of firstly collecting sample data in an actual traffic environment, and dividing the sample data into sample images of a training set and a test set according to a ratio of 7: 3; then, performing data enhancement on the sample images of the training set, including random scaling of the sample images and adjustment of exposure and saturation, so that the processed images are used as input of a training model; extracting the target region feature vector of the processed training set through an improved Darknet-19 network; inputting the training set into a Darknet-19 network model for training to obtain a detection and identification model; and finally, inputting the test set into the model for testing to obtain a result of multi-target vehicle identification. The invention solves the problems of low detection rate, poor robustness and unsatisfactory classification effect of the prior art aiming at the road vehicle multi-target detection and vehicle type classification method.

Description

Multi-vehicle target identification method for improving YOLOv2 model
Technical Field
The invention belongs to the technical field of image detection and classification, and particularly relates to a multi-vehicle target identification method for improving a YOLOv2 model.
Background
Image detection and image classification techniques are important components of image processing techniques, and are widely applied in many fields, such as remote sensing image identification, military criminal investigation, modern biomedicine, intelligent transportation and the like. However, the conventional target detection and identification method, such as a Cascade classifier based on Haar features, mainly aims at the detection of specific targets, is limited to multi-classified targets, and the region selection process of the targets is complex and the detection and identification efficiency is low. When an object is selected, the feature extraction has the defects of strong subjectivity, poor robustness, weak generalization capability and the like, and the accurate identification effect is difficult to achieve in practical application. Compared with the traditional method, the deep learning method has obvious advantages. Vehicle detection and identification technologies based on deep learning have become a current research trend.
The current detection algorithm based on deep learning is basically divided into three directions: the first scheme is a scheme of extracting candidate regions and classifying corresponding regions mainly by a deep learning method, such as: RCNN, SPP-net, Fast-RCNN, R-FCN, etc.; a regression method based on deep learning, such as a method of YOLO, SSD, etc.; and thirdly, RRC method combined with RNN algorithm and Deformable CNN method combined with DPM. Vehicle detection and other methods based on CNN, R-CNN and Fast-RCNN models cannot achieve the effect of real-time detection on detection precision and detection speed in practical application. YOLOv2 is a real-time object detection algorithm, follows the design concept of end-to-end training and real-time detection, can directly go from input images to detection output in the detection process, directly takes the confidence scores of the target position and the corresponding position as output, omits the step of generating a candidate frame, and greatly shortens the detection time. The detection speed of YOLO can reach 45Fps/s, but the detection and identification precision is slightly lower than that of fast-RCNN. In order to improve the detection and identification precision, the invention improves the network model on the basis of YOLOv2, and improves the detection and identification precision and the robustness of the algorithm while keeping the original speed.
Disclosure of Invention
The invention aims to provide a multi-vehicle target identification method for improving a YOLOv2 model, and solves the problems of low detection rate, poor robustness and unsatisfactory classification effect of the prior art for multi-target detection of road vehicles and vehicle type classification methods.
The technical scheme adopted by the invention is that the multi-vehicle target identification method for improving the YOLOv2 model is implemented according to the following steps:
step 1, collecting sample data in an actual traffic environment, and dividing the sample data into sample images of a training set and a test set according to a ratio of 7: 3;
step 2, performing data enhancement on the sample images of the training set, including random scaling of the sample images and adjustment of exposure and saturation, so that the processed images are used as input of a training model;
step 3, extracting the target region characteristic vector of the training set processed in the step 2 through an improved Darknet-19 network;
step 4, inputting the training set in the step 2 into a Darknet-19 network model for training to obtain a detection and identification model;
and 5, inputting the test set obtained in the step 2 into the model obtained in the step 4 for testing to obtain a multi-target vehicle identification result.
The present invention is also characterized in that,
the step 1 is as follows:
step 1.1, shooting vehicle information in a real-time road traffic environment, framing and extracting a shot video into an image format, and deleting a picture with poor image quality;
step 1.2, labeling vehicles in the selected pictures by using a LabImage labeling tool, framing out a target area, classifying the vehicles in the target area, and manufacturing labels, wherein the labels are car, bus, van and truck, each picture generates an xml file, and finally, randomly distributing the xml files by using Matlab to generate a training set, a test set and a verification set to form a complete data set;
step 1.3, the training set comprises 3 folders, namely indication, ImageSets and JPEGImages, wherein the XML files are stored in the folder indication, each XML file corresponds to an image, the position and the category information of each marked target are stored in each XML file, the names of the position and the category information are the same as those of the corresponding original image, a text file is stored in a Main folder under the ImageSets folder, the formats of the text file are train.txt and test.txt, the content in the ImageSets folder is the name of the image which needs to be used for training or testing, and the JPEGImages folder stores the original image which is named according to a uniform rule.
The step 3 is as follows:
step 3.1, respectively carrying out parameter adjustment on the convolution layer number, the pooling layer number, the BN layer number and the activation function in the Darknet-19 network to finally obtain an improved YOLOv2-S network, wherein the YOLOv2-S network comprises 20 convolution layers, 5 pooling layers and 20 batch normalization layers, namely the BN layer and the Leaky-Linear activation function;
step 3.2, extracting the feature vectors, which is specifically as follows:
(1) the 1 st, 3 rd, 5 th, 6 th, 7 th, 9 th, 10 th, 11 th, 13 th, 14 th, 15 th, 16 th, 17 th, 19 th, 20 th, 21 th, 22 th, 23 th, 24 th, 31 th layers are convolution layers, the 2 nd, 4 th, 8 th, 12 th, 18 th layers are maximum pooling layers, the 26 th, 29 th layers are route layers, and the 32 th layers are detection layers;
(2) the sizes of convolution kernels of layers 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21 and 23 in the convolution layers are set to be 3 x 3, the depths of the convolution kernels are set to be 32, 64, 128, 256, 512, 1024 and 1024 respectively, the sizes of convolution kernels of layers 6, 10, 14, 16, 20, 22, 24 and 27 are set to be 1 x 1, and the depths of the convolution kernels are set to be 64, 128, 256, 512, 256, 1024 and 5030 respectively;
(3) the sizes of convolution kernels of layers 2, 4, 8, 12 and 18 in the maximum pooling layer are set to be 2 multiplied by 2, and the step size is set to be 2;
(4) the route layer is used for laminating, namely, the characteristics of a plurality of layers are fused to the next layer and output together, and the 29 th layer of the route layer combines the 28 layers and the 25 layers of the convolution layer together and outputs a characteristic vector.
The step 4 is as follows:
step 4.1, inputting a training set, wherein the process is as follows:
step 4.1.1, dividing the picture obtained in the step 2 into s multiplied by s unit cells, and if the central position of the target to be identified falls into one unit cell, enabling the corresponding unit cell to be responsible for detecting the target; then, directly predicting the position of each unit cell to generate the positions of the required B bounding boxes, wherein each bounding box obtains 5 predicted values: are respectively (t)x,ty)、(tw,th) And a Confidence.
The offset of the center of each bounding box from the cell boundary where the bounding box is located is sigma (t)x),σ(ty),(tw,th) The actual width and height of the target relative to the proportional width and height of the whole image are shown, and the edge distance between the bounding box and the upper left corner of the image is (c)x,cy) The length and width of the bounding box corresponding to the cell are (p)w,ph) Where x represents the length of the cell, y represents the width of the cell, w represents the width of the bounding box, and h represents the height of the bounding box, the real position of the bounding box is as follows:
bx=σ(tx)+cx
by=σ(ty)+cy
bw=pwetw
bh=pheth
the method comprises the following steps of representing the precision of the predicted position of a bounding box of the bounding box by the relation between the bounding box of the bounding box and the probability of a target to be detected and the IOU product of the bounding box and a real position, and specifically calculating the Confidence coefficient of a candidate frame as shown in the following formula:
Figure BDA0002743588200000051
wherein, truth represents the real value of IOU, pred represents the predicted value of IOU, Pr (object) represents the probability value of the object existing in the grid, if the object exists in one grid, the value of Pr (object) is 1; if no target object appears, the value of pr (object) is 0, that is, the value of Confidence is also 0;
step 4.1.2, clustering the real target frames of the targets to be recognized marked in the training set, obtaining the initial candidate frames of the predicted targets in the training set by using the area interaction ratio IOU value as an evaluation index, and inputting the initial candidate frames as initial parameters into a Yolov2-S network model, wherein the specific steps are as follows:
using K-means method, with distance formula d (box),centroid)=1-IOU(box,centroid) clustering the real target frame of the training data set; wherein, IOU (box) is the area interaction ratio of the predicted target frame and the real target frame, and IOU (box) is used for calculating the area interaction ratio of the predicted target frame and the real target frame,centroid) as an initial target frame when the threshold value is not less than 0.5;
the area interaction ratio IOU (box, centroid) formula is shown as follows:
Figure BDA0002743588200000052
wherein, boxpredRepresenting the area, box, of the predicted target frametruthThe area of the real target frame is represented, and the proportion of the intersection and the union of the real target frame and the real target frame is the average interaction ratio of the real target frame and the initial candidate frame of the predicted initial target;
step 4.1.3, when an object exists in the grid, the object class needs to be predicted, a conditional probability Pre (class | object) is used for representing, and a value obtained by class prediction is multiplied by a Confidence of a candidate frame, so as to obtain a Confidence C (M) of a certain class M, as shown in the following formula:
Figure BDA0002743588200000061
step 4.2, 70000 times of iterative training is carried out on the training set obtained in the step 1 by using a Darknet-19 network, the network input of the model is set to be 416 multiplied by 416, the decade is set to be 0.0005, the momentum is set to be 0.9 and the learning rate is set to be 0.001, the training is stopped until the loss value output by the training data set is smaller than a certain threshold value Q or reaches the preset maximum iteration number N, and the trained YOLOv2-S network model is obtained;
the loss function loss (object) represents:
Figure BDA0002743588200000062
the loss function comprises a loss function, a first term and a fifth term, wherein the first term of the loss function is the coordinate loss of the anchor of the calculation prediction target, the third term of the loss function is the confidence loss of the anchor of the calculation prediction target, and the fifth term of the loss function is the category loss of the anchor of the calculation prediction target; the second term adds a limit in the hope of returning directly to its own anchor box, the fourth term only calculates anchor boxes that are below the IOU threshold, where,
Figure BDA0002743588200000063
error coefficients that are predicted coordinates;
Figure BDA0002743588200000064
as error coefficients not containing confidence in identifying the object, S2Representing the number of meshes into which the input image is divided; b represents the predicted target frame number of each grid;
Figure BDA0002743588200000065
an abscissa representing the center point of the predicted target,
Figure BDA0002743588200000071
An ordinate indicating the center point of the predicted target,
Figure BDA0002743588200000072
Width of center point representing predicted target,
Figure BDA0002743588200000073
Representing the height of the predicted center point of the target;
Figure BDA0002743588200000074
the ith grid in which the jth candidate box is positioned is shown to be responsible for detecting the object;
Figure BDA0002743588200000075
indicating that the ith grid in which the jth candidate box is positioned is not responsible for detecting the object;
Figure BDA0002743588200000076
an actual abscissa representing the center point of the target frame,
Figure BDA0002743588200000077
The actual ordinate representing the center of the target frame,
Figure BDA0002743588200000078
indicating the prediction confidence of the target existing in the ith mesh of the jth candidate box,
Figure BDA0002743588200000079
indicating the predicted probability value of the object in the ith grid of the jth candidate box belonging to a certain category,
Figure BDA00027435882000000710
in the ith grid representing the jth candidate frameTrue probability value that a target belongs to a certain category;
step 4.3, the training process specifically includes forward propagation and backward propagation, and the model is saved every 1000 iterations, the momentum adopted is 0.9, the optimization is performed by using random gradient descent, the initial learning rate is 0.001, the attenuation coefficient is set to 0.0005, the learning rate learning _ rate adopted in the previous 10000 iterations is 0.001, the learning rate adopted in 10000-45000 iterations is 0.0001, the subsequent learning rate is adjusted to 0.00001, and finally the network model for detection and identification is obtained.
The step 5 is as follows:
step 5.1, loading the network weight trained in the step 4, and inputting the test set obtained in the step 2 into the network trained in the step 4 to obtain a multi-scale feature map;
step 5.2, restraining and reserving the prior frame with the maximum confidence score according to the non-maximum value to obtain a finally identified detection frame and a classification result of the multi-target vehicle;
step 5.3, testing the existing Yolov2, Yolov2-voc and Yolov3 protogenic model network models by using the prepared data set;
and 5.4, evaluating the performance of the obtained model by utilizing the evaluation index Recall, Precision value Precision and F1 value, wherein the evaluation index Recall, Precision value Precision and F1 value are as follows:
Figure BDA0002743588200000081
Figure BDA0002743588200000082
Figure BDA0002743588200000083
wherein, Total represents the actual number of the bounding box, namely the number of the actual targets to be detected; correct represents the number of correctly detected bounding boxes, namely after a picture is put into a network, the network detects the number of redundant targets of the bounding boxes, each bounding box has a confidence probability, the bounding box with the probability larger than a set threshold value and the actual bounding box, namely the content of txt in labels, calculate the IOU, find out the bounding box with the largest IOU, and if the maximum value is larger than the preset threshold of the IOU, add 1 to the count value; the Proposal represents the number of the detected bounding boxes which are larger than a set threshold value; precision represents a Precision value; recall represents the Recall rate, which is the ratio of the number of the detected targets to the number of all targets in the verification set; f1 represents F1 Score, namely F1-Score, also called balanced F Score, which is defined as the harmonic mean of accuracy and recall ratio and considers the recall ratio and accuracy of the model, the value range is between 0 and 1, and the higher the F1 is, the better the effect is.
The method has the advantages that the method for identifying the multiple vehicle targets of the improved YOLOv2 model realizes the detection and identification of the multiple target vehicles in the end-to-end actual traffic scene, has higher accuracy and robustness compared with the traditional method, and can identify the multiple vehicle instance targets in the image sample at one time; the multi-target vehicle identification method provided by the invention is improved on the basis of a basic Darknet-19 network model, the operation speed is improved, and the identification accuracy rate of multi-target vehicles is improved. The invention provides an effective method for identifying the multi-target vehicles, and a large number of empirical experiments show that the method has strong robustness and better identification performance compared with the existing multi-target vehicle identification method.
Drawings
FIG. 1 is a general flow chart of a multi-vehicle object recognition method of the present invention that improves the YOLOv2 model;
FIG. 2 is a graph of regression target calculation in a multiple vehicle target identification method of the present invention with an improved YOLOv2 model;
FIG. 3 is a model structure diagram of a multi-vehicle object recognition method of the present invention with improved YOLOv2 model;
FIG. 4 is an experimental comparison of the multiple vehicle target recognition method of the present invention with an improved YOLOv2 model, wherein diagram (a) is the YOLOv2 model, diagram (b) is the YOLOv2-voc model, diagram (c) is the YOLOv3 model, and diagram (d) is the YOLOv2-voc _ mul model;
FIG. 5 is a partial experimental result display diagram of a multi-vehicle target identification method of the invention with an improved YOLOv2 model.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
The invention relates to a multi-vehicle target recognition method for improving a YOLOv2 model, which is implemented by the following steps in detail as shown in a flow chart shown in FIG. 1:
step 1, collecting sample data in an actual traffic environment, and dividing the sample data into sample images of a training set and a test set according to a ratio of 7: 3;
the step 1 is as follows:
step 1.1, shooting vehicle information in a real-time road traffic environment, framing and extracting a shot video into an image format, and deleting a picture with poor image quality;
step 1.2, labeling vehicles in the selected pictures by using a LabImage labeling tool, framing out a target area, classifying the vehicles in the target area, and manufacturing labels, wherein the labels are car, bus, van and truck, each picture generates an xml file, and finally, randomly distributing the xml files by using Matlab to generate a training set, a test set and a verification set to form a complete data set;
step 1.3, the training set comprises 3 folders, namely indication, ImageSets and JPEGImages, wherein the XML files are stored in the folder indication, each XML file corresponds to an image, the position and the category information of each marked target are stored in each XML file, the names of the position and the category information are the same as those of the corresponding original image, a text file is stored in a Main folder under the ImageSets folder, the formats of the text file are train.txt and test.txt, the content in the ImageSets folder is the name of the image which needs to be used for training or testing, and the JPEGImages folder stores the original image which is named according to a uniform rule.
Step 2, performing data enhancement on the sample images of the training set, including random scaling of the sample images and adjustment of exposure and saturation, so that the processed images are used as input of a training model;
step 3, extracting the target region characteristic vector of the training set processed in the step 2 through an improved Darknet-19 network;
the step 3 is as follows:
step 3.1, respectively carrying out parameter adjustment on the convolution layer number, the pooling layer number, the BN layer number and the activation function in the Darknet-19 network to finally obtain an improved YOLOv2-S network, wherein the YOLOv2-S network comprises 20 convolution layers, 5 pooling layers and 20 batch normalization layers, namely the BN layer and the Leaky-Linear activation function;
step 3.2, extracting the feature vectors, which is specifically as follows:
(1) the 1 st, 3 rd, 5 th, 6 th, 7 th, 9 th, 10 th, 11 th, 13 th, 14 th, 15 th, 16 th, 17 th, 19 th, 20 th, 21 th, 22 th, 23 th, 24 th, 31 th layers are convolution layers, the 2 nd, 4 th, 8 th, 12 th, 18 th layers are maximum pooling layers, the 26 th, 29 th layers are route layers, and the 32 th layers are detection layers;
(2) the sizes of convolution kernels of layers 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21 and 23 in the convolution layers are set to be 3 x 3, the depths of the convolution kernels are set to be 32, 64, 128, 256, 512, 1024 and 1024 respectively, the sizes of convolution kernels of layers 6, 10, 14, 16, 20, 22, 24 and 27 are set to be 1 x 1, and the depths of the convolution kernels are set to be 64, 128, 256, 512, 256, 1024 and 5030 respectively;
(3) the sizes of convolution kernels of layers 2, 4, 8, 12 and 18 in the maximum pooling layer are set to be 2 multiplied by 2, and the step size is set to be 2;
(4) the route layer is used for laminating, namely, the characteristics of a plurality of layers are fused to the next layer and output together, and the 29 th layer of the route layer combines the 28 layers and the 25 layers of the convolution layer together and outputs a characteristic vector.
As shown in fig. 2 to fig. 3, step 4, inputting the training set in step 2 into a Darknet-19 network model for training to obtain a model for detection and recognition;
the step 4 is as follows:
step 4.1, inputting a training set, wherein the process is as follows:
step 4.1.1, dividing the picture obtained in the step 2 into s multiplied by s unit cells, and if the central position of the target to be identified falls into one unit cell, enabling the corresponding unit cell to be responsible for detecting the target; then, directly predicting the position of each unit cell to generate the positions of the required B bounding boxes, wherein each bounding box obtains 5 predicted values: are respectively (t)x,ty)、(tw,th) And a Confidence.
The offset of the center of each bounding box from the cell boundary where the bounding box is located is sigma (t)x),σ(ty),(tw,th) The actual width and height of the target relative to the proportional width and height of the whole image are shown, and the edge distance between the bounding box and the upper left corner of the image is (c)x,cy) The length and width of the bounding box corresponding to the cell are (p)w,ph) Where x represents the length of the cell, y represents the width of the cell, w represents the width of the bounding box, and h represents the height of the bounding box, the real position of the bounding box is as follows:
bx=σ(tx)+cx
by=σ(ty)+cy
bw=pwetw
bh=pheth
the method comprises the following steps of representing the precision of the predicted position of a bounding box of the bounding box by the relation between the bounding box of the bounding box and the probability of a target to be detected and the IOU product of the bounding box and a real position, and specifically calculating the Confidence coefficient of a candidate frame as shown in the following formula:
Figure BDA0002743588200000123
wherein, truth represents the real value of IOU, pred represents the predicted value of IOU, Pr (object) represents the probability value of the object existing in the grid, if the object exists in one grid, the value of Pr (object) is 1; if no target object appears, the value of pr (object) is 0, that is, the value of Confidence is also 0;
step 4.1.2, clustering the real target frames of the targets to be recognized marked in the training set, obtaining the initial candidate frames of the predicted targets in the training set by using the area interaction ratio IOU value as an evaluation index, and inputting the initial candidate frames as initial parameters into a Yolov2-S network model, wherein the specific steps are as follows:
using K-means method, with distance formula d (box),centroid)=1-IOU(box,centroid) clustering the real target frame of the training data set; wherein, IOU (box) is the area interaction ratio of the predicted target frame and the real target frame, and IOU (box) is used for calculating the area interaction ratio of the predicted target frame and the real target frame,centroid) as an initial target frame when the threshold value is not less than 0.5;
the area interaction ratio IOU (box, centroid) formula is shown as follows:
Figure BDA0002743588200000122
wherein, boxpredRepresenting the area, box, of the predicted target frametruthThe area of the real target frame is represented, and the proportion of the intersection and the union of the real target frame and the real target frame is the average interaction ratio of the real target frame and the initial candidate frame of the predicted initial target;
step 4.1.3, when an object exists in the grid, the object class needs to be predicted, a conditional probability Pre (class | object) is used for representing, and a value obtained by class prediction is multiplied by a Confidence of a candidate frame, so as to obtain a Confidence C (M) of a certain class M, as shown in the following formula:
Figure BDA0002743588200000131
step 4.2, (use Darknet-19 network to carry on 70000 times of iterative training on the training set that step 1 gets, set the network input of the model as 416 x 416, adopt the gradient descent algorithm, set decade as 0.0005, momentum as 0.9, learning rate as 0.001, stop training after the loss value that the training data set outputs is smaller than certain threshold value Q or reaches the maximum iteration number N that is set up in advance, get the good YOLOv2-S network model of training;
the loss function loss (object) represents:
Figure BDA0002743588200000132
the loss function comprises a loss function, a first term and a fifth term, wherein the first term of the loss function is the coordinate loss of the anchor of the calculation prediction target, the third term of the loss function is the confidence loss of the anchor of the calculation prediction target, and the fifth term of the loss function is the category loss of the anchor of the calculation prediction target; the second term adds a limit in the hope of returning directly to its own anchor box, the fourth term only calculates anchor boxes that are below the IOU threshold, where,
Figure BDA0002743588200000133
error coefficients that are predicted coordinates;
Figure BDA0002743588200000134
as error coefficients not containing confidence in identifying the object, S2Representing the number of meshes into which the input image is divided; b represents the predicted target frame number of each grid;
Figure BDA0002743588200000135
an abscissa representing the center point of the predicted target,
Figure BDA0002743588200000141
An ordinate indicating the center point of the predicted target,
Figure BDA0002743588200000142
Width of center point representing predicted target,
Figure BDA0002743588200000143
Representing the height of the predicted center point of the target;
Figure BDA0002743588200000144
the ith grid in which the jth candidate box is positioned is shown to be responsible for detecting the object;
Figure BDA0002743588200000145
indicating that the ith grid in which the jth candidate box is positioned is not responsible for detecting the object;
Figure BDA0002743588200000146
an actual abscissa representing the center point of the target frame,
Figure BDA0002743588200000147
The actual ordinate representing the center of the target frame,
Figure BDA0002743588200000148
indicating the prediction confidence of the target existing in the ith mesh of the jth candidate box,
Figure BDA0002743588200000149
indicating the predicted probability value of the object in the ith grid of the jth candidate box belonging to a certain category,
Figure BDA00027435882000001410
representing the real probability value of the target in the ith grid of the jth candidate box belonging to a certain category;
step 4.3, the training process specifically includes forward propagation and backward propagation, and the model is saved every 1000 iterations, the momentum adopted is 0.9, the optimization is performed by using random gradient descent, the initial learning rate is 0.001, the attenuation coefficient is set to 0.0005, the learning rate learning _ rate adopted in the previous 10000 iterations is 0.001, the learning rate adopted in 10000-45000 iterations is 0.0001, the subsequent learning rate is adjusted to 0.00001, and finally the network model for detection and identification is obtained.
And 5, inputting the test set in the step 2 into the model obtained in the step 4 for testing to obtain a multi-target vehicle identification result.
The step 5 is as follows:
step 5.1, loading the network weight trained in the step 4, and inputting the test set obtained in the step 2 into the network trained in the step 4 to obtain a multi-scale feature map;
step 5.2, restraining and reserving the prior frame with the maximum confidence score according to the non-maximum value to obtain a finally identified detection frame and a classification result of the multi-target vehicle;
step 5.3, testing the existing Yolov2, Yolov2-voc and Yolov3 protogenic model network models by using the prepared data set;
and 5.4, evaluating the performance of the obtained model by utilizing the evaluation index Recall, Precision value Precision and F1 value, wherein the evaluation index Recall, Precision value Precision and F1 value are as follows:
Figure BDA0002743588200000151
Figure BDA0002743588200000152
Figure BDA0002743588200000153
wherein, Total represents the actual number of the bounding box, namely the number of the actual targets to be detected; correct represents the number of correctly detected bounding boxes, namely after a picture is put into a network, the network detects the number of redundant targets of the bounding boxes, each bounding box has a confidence probability, the bounding box with the probability larger than a set threshold value and the actual bounding box, namely the content of txt in labels, calculate the IOU, find out the bounding box with the largest IOU, and if the maximum value is larger than the preset threshold of the IOU, add 1 to the count value; the Proposal represents the number of the detected bounding boxes which are larger than a set threshold value; precision represents a Precision value; recall represents the Recall rate, which is the ratio of the number of the detected targets to the number of all targets in the verification set; f1 represents F1 Score, namely F1-Score, also called balanced F Score, which is defined as the harmonic mean of accuracy and recall ratio and considers the recall ratio and accuracy of the model, the value range is between 0 and 1, and the higher the F1 is, the better the effect is.
FIG. 4 is the verification results of 4 training sets and verification sets of models, wherein (a) is the Yolov2 model, (b) is the Yolov2-voc model, (c) is the Yolov3 model, and (d) is the Yolov2-voc _ mul model, and it can be seen that: the recall rates of the 4 models are greatly fluctuated initially, but when the number of detected targets is increased, the recall rate of the YOLOv2 model is gradually stabilized at 96%, YOLOv2-voc tends to 94.5%, the recall rate of the improved YOLOv2-voc _ mul model is stabilized at 95.5%, and the recall rate of the YOLOv3 model is fluctuated between 40% and 60%, which shows that the 3 models of the YOLOv2 can ensure good accuracy in a simple background, and the accuracy of the YOLOv3 is low; the accuracy curve of the YOLOv2 model has large fluctuation, the accuracy curve of the YOLOv2-voc model has jump when the target number increases and is gradually stabilized at 98.6% after the jump, the improved YOLOv2-voc _ mul model is stabilized to about 99.2% after the jump is relatively small, the good accuracy and stability are kept, the accuracy curve of the YOLOv3 model has large jump, and the final accuracy value fluctuates about 60%; meanwhile, the intersection ratio of the YOLOv2 model fluctuates between 0.75 and 0.83 by comparing the intersection ratio curves of the 4 models, namely the stability of the detection number is low, the intersection ratio of the YOLOv2-voc model and the YOLOv2-voc _ mul model is improved by comparing the YOLOv2 and can be kept between 0.8 and 0.83, and the curve change shows that the intersection ratio of the YOLOv2-voc _ mul is similar to that of the YOLOv2-voc model and fluctuates up and down at 0.83 when the target number is increased, and the intersection ratio of the YOLOv3 fluctuates only between 0.4 and 0.7 and is the worst in stability compared with the other 3 models.
FIG. 5 is a graph of the partial detection results of the YOLOv2-voc _ mul model. From the test results, the categories of different vehicles are tested and accurately defined as car, bus, van, truck.
Table 1 shows an evaluation index table of the multi-vehicle target identification method of the present invention, which improves the YOLOv2 model.
TABLE 1 evaluation index Table
Model Total Correct Proposal Precision(%) Recall(%) F1(%)
YOLOv2 154 147 152 96.71 95.45 96.07
YOLOv2-voc 154 143 147 97.28 92.86 95.01
YOLOv3 154 84 151 55.63 54.55 55.08
YOLOv2-S 154 146 148 98.62 94.81 96.67

Claims (5)

1. A multi-vehicle target identification method for improving a YOLOv2 model is characterized by comprising the following steps:
step 1, collecting sample data in an actual traffic environment, and dividing the sample data into sample images of a training set and a test set according to a ratio of 7: 3;
step 2, performing data enhancement on the sample images of the training set, including random scaling of the sample images and adjustment of exposure and saturation, so that the processed images are used as input of a training model;
step 3, extracting the target region characteristic vector of the training set processed in the step 2 through an improved Darknet-19 network;
step 4, inputting the training set in the step 2 into a Darknet-19 network model for training to obtain a detection and identification model;
and 5, inputting the test set in the step 2 into the model obtained in the step 4 for testing to obtain a multi-target vehicle identification result.
2. The method for identifying multiple vehicle targets based on an improved YOLOv2 model according to claim 1, wherein the step 1 is as follows:
step 1.1, shooting vehicle information in a real-time road traffic environment, framing and extracting a shot video into an image format, and deleting a picture with poor image quality;
step 1.2, labeling vehicles in the selected pictures by using a LabImage labeling tool, framing out a target area, classifying the vehicles in the target area, and manufacturing labels, wherein the labels are car, bus, van and truck, each picture generates an xml file, and finally, randomly distributing the xml files by using Matlab to generate a training set, a test set and a verification set to form a complete data set;
step 1.3, the training set comprises 3 folders, namely indication, ImageSets and JPEGImages, wherein the XML files are stored in the folder indication, each XML file corresponds to an image, the position and the category information of each marked target are stored in each XML file, the names of the position and the category information are the same as those of the corresponding original image, a text file is stored in a Main folder under the ImageSets folder, the formats of the text file are train.txt and test.txt, the content in the ImageSets folder is the name of the image which needs to be used for training or testing, and the JPEGImages folder stores the original image which is named according to a uniform rule.
3. The method for identifying multiple vehicle targets based on an improved YOLOv2 model according to claim 2, wherein the step 3 is as follows:
step 3.1, respectively carrying out parameter adjustment on the convolution layer number, the pooling layer number, the BN layer number and the activation function in the Darknet-19 network to finally obtain an improved YOLOv2-S network, wherein the YOLOv2-S network comprises 20 convolution layers, 5 pooling layers and 20 batch normalization layers, namely the BN layer and the Leaky-Linear activation function;
step 3.2, extracting the feature vectors, which is specifically as follows:
(1) the 1 st, 3 rd, 5 th, 6 th, 7 th, 9 th, 10 th, 11 th, 13 th, 14 th, 15 th, 16 th, 17 th, 19 th, 20 th, 21 th, 22 th, 23 th, 24 th, 31 th layers are convolution layers, the 2 nd, 4 th, 8 th, 12 th, 18 th layers are maximum pooling layers, the 26 th, 29 th layers are route layers, and the 32 th layers are detection layers;
(2) the sizes of convolution kernels of layers 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21 and 23 in the convolution layers are set to be 3 x 3, the depths of the convolution kernels are set to be 32, 64, 128, 256, 512, 1024 and 1024 respectively, the sizes of convolution kernels of layers 6, 10, 14, 16, 20, 22, 24 and 27 are set to be 1 x 1, and the depths of the convolution kernels are set to be 64, 128, 256, 512, 256, 1024 and 5030 respectively;
(3) the sizes of convolution kernels of layers 2, 4, 8, 12 and 18 in the maximum pooling layer are set to be 2 multiplied by 2, and the step size is set to be 2;
(4) the route layer is used for laminating, namely, the characteristics of a plurality of layers are fused to the next layer and output together, and the 29 th layer of the route layer combines the 28 layers and the 25 layers of the convolution layer together and outputs a characteristic vector.
4. The method for identifying multiple vehicle targets based on the improved YOLOv2 model of claim 3, wherein the step 4 is as follows:
step 4.1, inputting a training set, wherein the process is as follows:
step 4.1.1, dividing the picture obtained in the step 2 into s multiplied by s unit cells, and if the central position of the target to be identified falls into one unit cell, enabling the corresponding unit cell to be responsible for detecting the target; then, directly predicting the position of each unit cell to generate the positions of the required B bounding boxes, wherein each bounding box obtains 5 predicted values: are respectively (t)x,ty)、(tw,th) And a Confidence;
the offset of the center of each bounding box from the cell boundary where the bounding box is located is sigma (t)x),σ(ty),(tw,th) The actual width and height of the target relative to the proportional width and height of the whole image are shown, and the edge distance between the bounding box and the upper left corner of the image is (c)x,cy) The length and width of the bounding box corresponding to the cell are (p)w,ph) Where x represents the length of the cell, y represents the width of the cell, w represents the width of the bounding box, and h represents the height of the bounding box, the real position of the bounding box is as follows:
bx=σ(tx)+cx
by=σ(ty)+cy
bw=pwetw
bh=pheth
the method comprises the following steps of representing the precision of the predicted position of a bounding box of the bounding box by the relation between the bounding box of the bounding box and the probability of a target to be detected and the IOU product of the bounding box and a real position, and specifically calculating the Confidence coefficient of a candidate frame as shown in the following formula:
Figure FDA0002743588190000041
wherein, truth represents the real value of IOU, pred represents the predicted value of IOU, Pr (object) represents the probability value of the object existing in the grid, if the object exists in one grid, the value of Pr (object) is 1; if no target object appears, the value of pr (object) is 0, that is, the value of Confidence is also 0;
step 4.1.2, clustering the real target frames of the targets to be recognized marked in the training set, obtaining the initial candidate frames of the predicted targets in the training set by using the area interaction ratio IOU value as an evaluation index, and inputting the initial candidate frames as initial parameters into a Yolov2-S network model, wherein the specific steps are as follows:
using K-means method, with distance formula d (box),centroid)=1-IOU(box,centroid) clustering the real target frame of the training data set; wherein, IOU (box) is the area interaction ratio of the predicted target frame and the real target frame, and IOU (box) is used for calculating the area interaction ratio of the predicted target frame and the real target frame,centroid) as the threshold valueThe candidate frame predicted at 0.5 is taken as an initial target frame;
the area interaction ratio IOU (box, centroid) formula is shown as follows:
Figure FDA0002743588190000042
wherein, boxpredRepresenting the area, box, of the predicted target frametruthThe area of the real target frame is represented, and the proportion of the intersection and the union of the real target frame and the real target frame is the average interaction ratio of the real target frame and the initial candidate frame of the predicted initial target;
step 4.1.3, when an object exists in the grid, the object class needs to be predicted, a conditional probability Pre (class | object) is used for representing, and a value obtained by class prediction is multiplied by a Confidence of a candidate frame, so as to obtain a Confidence C (M) of a certain class M, as shown in the following formula:
Figure FDA0002743588190000043
step 4.2, (use Darknet-19 network to carry on 70000 times of iterative training on the training set that step 1 gets, set the network input of the model as 416 x 416, adopt the gradient descent algorithm, set decade as 0.0005, momentum as 0.9, learning rate as 0.001, stop training after the loss value that the training data set outputs is smaller than certain threshold value Q or reaches the maximum iteration number N that is set up in advance, get the good YOLOv2-S network model of training;
the loss function loss (object) represents:
Figure FDA0002743588190000051
wherein the first term of the loss function is the coordinate loss of the anchor of the calculated prediction target, the third term is the confidence loss of the anchor of the calculated prediction target,the fifth term is to calculate the category loss of the anchor of the predicted target; the second term adds a limit in the hope of returning directly to its own anchor box, the fourth term only calculates anchor boxes that are below the IOU threshold, where,
Figure FDA0002743588190000052
error coefficients that are predicted coordinates;
Figure FDA0002743588190000053
as error coefficients not containing confidence in identifying the object, S2Representing the number of meshes into which the input image is divided; b represents the predicted target frame number of each grid;
Figure FDA0002743588190000054
an abscissa representing the center point of the predicted target,
Figure FDA0002743588190000055
An ordinate indicating the center point of the predicted target,
Figure FDA0002743588190000056
Width of center point representing predicted target,
Figure FDA0002743588190000057
Representing the height of the predicted center point of the target;
Figure FDA0002743588190000058
the ith grid in which the jth candidate box is positioned is shown to be responsible for detecting the object;
Figure FDA0002743588190000059
indicating that the ith grid in which the jth candidate box is positioned is not responsible for detecting the object;
Figure FDA00027435881900000510
actual representation of center point of target frameThe abscissa,
Figure FDA00027435881900000511
The actual ordinate representing the center of the target frame,
Figure FDA00027435881900000512
indicating the prediction confidence of the target existing in the ith mesh of the jth candidate box,
Figure FDA00027435881900000513
indicating the predicted probability value of the object in the ith grid of the jth candidate box belonging to a certain category,
Figure FDA0002743588190000061
representing the real probability value of the target in the ith grid of the jth candidate box belonging to a certain category;
step 4.3, the training process specifically includes forward propagation and backward propagation, and the model is saved every 1000 iterations, the momentum adopted is 0.9, the optimization is performed by using random gradient descent, the initial learning rate is 0.001, the attenuation coefficient is set to 0.0005, the learning rate learning _ rate adopted in the previous 10000 iterations is 0.001, the learning rate adopted in 10000-45000 iterations is 0.0001, the subsequent learning rate is adjusted to 0.00001, and finally the network model for detection and identification is obtained.
5. The method for identifying multiple vehicle targets based on the improved YOLOv2 model of claim 4, wherein the step 5 is as follows:
step 5.1, loading the network weight trained in the step 4, and inputting the test set obtained in the step 2 into the network trained in the step 4 to obtain a multi-scale feature map;
step 5.2, restraining and reserving the prior frame with the maximum confidence score according to the non-maximum value to obtain a finally identified detection frame and a classification result of the multi-target vehicle;
step 5.3, testing the existing Yolov2, Yolov2-voc and Yolov3 protogenic model network models by using the prepared data set;
and 5.4, evaluating the performance of the obtained model by utilizing the evaluation index Recall, Precision value Precision and F1 value, wherein the evaluation index Recall, Precision value Precision and F1 value are as follows:
Figure FDA0002743588190000062
Figure FDA0002743588190000063
Figure FDA0002743588190000064
wherein, Total represents the actual number of the bounding box, namely the number of the actual targets to be detected; correct represents the number of correctly detected bounding boxes, namely after a picture is put into a network, the network detects the number of redundant targets of the bounding boxes, each bounding box has a confidence probability, the bounding box with the probability larger than a set threshold value and the actual bounding box, namely the content of txt in labels, calculate the IOU, find out the bounding box with the largest IOU, and if the maximum value is larger than the preset threshold of the IOU, add 1 to the count value; the Proposal represents the number of the detected bounding boxes which are larger than a set threshold value; precision represents a Precision value; recall represents the Recall rate, which is the ratio of the number of the detected targets to the number of all targets in the verification set; f1 represents F1 Score, namely F1-Score, also called balanced F Score, which is defined as the harmonic mean of accuracy and recall ratio and considers the recall ratio and accuracy of the model, the value range is between 0 and 1, and the higher the F1 is, the better the effect is.
CN202011158555.0A 2020-10-26 2020-10-26 Multi-vehicle target identification method for improving YOLOv2 model Pending CN112270252A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011158555.0A CN112270252A (en) 2020-10-26 2020-10-26 Multi-vehicle target identification method for improving YOLOv2 model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011158555.0A CN112270252A (en) 2020-10-26 2020-10-26 Multi-vehicle target identification method for improving YOLOv2 model

Publications (1)

Publication Number Publication Date
CN112270252A true CN112270252A (en) 2021-01-26

Family

ID=74342539

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011158555.0A Pending CN112270252A (en) 2020-10-26 2020-10-26 Multi-vehicle target identification method for improving YOLOv2 model

Country Status (1)

Country Link
CN (1) CN112270252A (en)

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112926681A (en) * 2021-03-29 2021-06-08 复旦大学 Target detection method and device based on deep convolutional neural network
CN112949750A (en) * 2021-03-25 2021-06-11 清华大学深圳国际研究生院 Image classification method and computer readable storage medium
CN112990065A (en) * 2021-03-31 2021-06-18 上海海事大学 Optimized YOLOv5 model-based vehicle classification detection method
CN113076858A (en) * 2021-03-30 2021-07-06 深圳技术大学 Vehicle information detection method based on deep learning, storage medium and terminal device
CN113139945A (en) * 2021-02-26 2021-07-20 山东大学 Intelligent image detection method, equipment and medium for air conditioner outdoor unit based on Attention + YOLOv3
CN113134683A (en) * 2021-05-13 2021-07-20 兰州理工大学 Laser marking method and device based on machine learning
CN113283307A (en) * 2021-04-30 2021-08-20 北京雷石天地电子技术有限公司 Method and system for identifying object in video and computer storage medium
CN113298167A (en) * 2021-06-01 2021-08-24 北京思特奇信息技术股份有限公司 Character detection method and system based on lightweight neural network model
CN113343785A (en) * 2021-05-19 2021-09-03 山东大学 YOLO ground mark detection method and equipment based on perspective downsampling and storage medium
CN113537106A (en) * 2021-07-23 2021-10-22 仲恺农业工程学院 Fish feeding behavior identification method based on YOLOv5
CN113538389A (en) * 2021-07-23 2021-10-22 仲恺农业工程学院 Pigeon egg quality identification method
CN113538390A (en) * 2021-07-23 2021-10-22 仲恺农业工程学院 Quick identification method for shaddock diseases and insect pests
CN113743233A (en) * 2021-08-10 2021-12-03 暨南大学 Vehicle model identification method based on YOLOv5 and MobileNet V2
CN113780270A (en) * 2021-03-23 2021-12-10 京东鲲鹏(江苏)科技有限公司 Target detection method and device
CN113808080A (en) * 2021-08-12 2021-12-17 常州大学 Method for detecting number of interference fringes of glass panel of mobile phone camera hole
CN113808200A (en) * 2021-08-03 2021-12-17 嘉洋智慧安全生产科技发展(北京)有限公司 Method and device for detecting moving speed of target object and electronic equipment
CN113850799A (en) * 2021-10-14 2021-12-28 长春工业大学 YOLOv 5-based trace DNA extraction workstation workpiece detection method
CN113963299A (en) * 2021-10-26 2022-01-21 大连民族大学 Table tennis ball detection method based on improved YOLO V4 algorithm
CN114387520A (en) * 2022-01-14 2022-04-22 华南农业大学 Precision detection method and system for intensive plums picked by robot
CN114648513A (en) * 2022-03-29 2022-06-21 华南理工大学 Motorcycle detection method based on self-labeling data augmentation
CN114972807A (en) * 2022-05-17 2022-08-30 北京百度网讯科技有限公司 Method and device for determining image recognition accuracy, electronic equipment and medium
CN117612021A (en) * 2023-10-19 2024-02-27 广州大学 Remote sensing extraction method and system for agricultural plastic greenhouse

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108520114A (en) * 2018-03-21 2018-09-11 华中科技大学 A kind of textile cloth defect detection model and its training method and application
CN109816024A (en) * 2019-01-29 2019-05-28 电子科技大学 A kind of real-time automobile logo detection method based on multi-scale feature fusion and DCNN
CN109829428A (en) * 2019-01-31 2019-05-31 兰州交通大学 Based on the video image pedestrian detection method and system for improving YOLOv2
CN109886147A (en) * 2019-01-29 2019-06-14 电子科技大学 A kind of more attribute detection methods of vehicle based on the study of single network multiple-task
WO2019127838A1 (en) * 2017-12-29 2019-07-04 国民技术股份有限公司 Method and apparatus for realizing convolutional neural network, terminal, and storage medium
CN110443208A (en) * 2019-08-08 2019-11-12 南京工业大学 A kind of vehicle target detection method, system and equipment based on YOLOv2
CN110751232A (en) * 2019-11-04 2020-02-04 哈尔滨理工大学 Chinese complex scene text detection and identification method
CN110929577A (en) * 2019-10-23 2020-03-27 桂林电子科技大学 Improved target identification method based on YOLOv3 lightweight framework
CN111428558A (en) * 2020-02-18 2020-07-17 东华大学 Vehicle detection method based on improved YO L Ov3 method
CN111476756A (en) * 2020-03-09 2020-07-31 重庆大学 Method for identifying casting DR image loose defects based on improved YO L Ov3 network model

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019127838A1 (en) * 2017-12-29 2019-07-04 国民技术股份有限公司 Method and apparatus for realizing convolutional neural network, terminal, and storage medium
CN108520114A (en) * 2018-03-21 2018-09-11 华中科技大学 A kind of textile cloth defect detection model and its training method and application
CN109816024A (en) * 2019-01-29 2019-05-28 电子科技大学 A kind of real-time automobile logo detection method based on multi-scale feature fusion and DCNN
CN109886147A (en) * 2019-01-29 2019-06-14 电子科技大学 A kind of more attribute detection methods of vehicle based on the study of single network multiple-task
CN109829428A (en) * 2019-01-31 2019-05-31 兰州交通大学 Based on the video image pedestrian detection method and system for improving YOLOv2
CN110443208A (en) * 2019-08-08 2019-11-12 南京工业大学 A kind of vehicle target detection method, system and equipment based on YOLOv2
CN110929577A (en) * 2019-10-23 2020-03-27 桂林电子科技大学 Improved target identification method based on YOLOv3 lightweight framework
CN110751232A (en) * 2019-11-04 2020-02-04 哈尔滨理工大学 Chinese complex scene text detection and identification method
CN111428558A (en) * 2020-02-18 2020-07-17 东华大学 Vehicle detection method based on improved YO L Ov3 method
CN111476756A (en) * 2020-03-09 2020-07-31 重庆大学 Method for identifying casting DR image loose defects based on improved YO L Ov3 network model

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
JUN SANG等: "An Improved YOLOv2 for Vehicle Detection", 《SENSORS》, vol. 18, pages 1 - 15 *
MIGE_: "(五)目标检测yolov2", pages 1 - 5, Retrieved from the Internet <URL:《https://blog.csdn.net/MIge_/article/details/108680652》> *
XUN LI等: "Multi-object Recognition Method Based on Improved YOLOv2 Model", 《INFORMATION TECHNOLOGY AND CONTROL》, vol. 50, no. 1, pages 13 - 27 *
李珣等: "基于改进YOLOv2模型的多目标识别方法", 《激光与光电子学进展》, vol. 57, no. 10, pages 1 - 10 *

Cited By (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113139945A (en) * 2021-02-26 2021-07-20 山东大学 Intelligent image detection method, equipment and medium for air conditioner outdoor unit based on Attention + YOLOv3
CN113780270A (en) * 2021-03-23 2021-12-10 京东鲲鹏(江苏)科技有限公司 Target detection method and device
CN112949750A (en) * 2021-03-25 2021-06-11 清华大学深圳国际研究生院 Image classification method and computer readable storage medium
CN112926681A (en) * 2021-03-29 2021-06-08 复旦大学 Target detection method and device based on deep convolutional neural network
CN112926681B (en) * 2021-03-29 2022-11-29 复旦大学 Target detection method and device based on deep convolutional neural network
CN113076858A (en) * 2021-03-30 2021-07-06 深圳技术大学 Vehicle information detection method based on deep learning, storage medium and terminal device
CN112990065A (en) * 2021-03-31 2021-06-18 上海海事大学 Optimized YOLOv5 model-based vehicle classification detection method
CN112990065B (en) * 2021-03-31 2024-03-22 上海海事大学 Vehicle classification detection method based on optimized YOLOv5 model
CN113283307A (en) * 2021-04-30 2021-08-20 北京雷石天地电子技术有限公司 Method and system for identifying object in video and computer storage medium
CN113134683A (en) * 2021-05-13 2021-07-20 兰州理工大学 Laser marking method and device based on machine learning
CN113343785A (en) * 2021-05-19 2021-09-03 山东大学 YOLO ground mark detection method and equipment based on perspective downsampling and storage medium
CN113298167A (en) * 2021-06-01 2021-08-24 北京思特奇信息技术股份有限公司 Character detection method and system based on lightweight neural network model
CN113538390B (en) * 2021-07-23 2023-05-09 仲恺农业工程学院 Quick identification method for shaddock diseases and insect pests
CN113537106B (en) * 2021-07-23 2023-06-02 仲恺农业工程学院 Fish ingestion behavior identification method based on YOLOv5
CN113537106A (en) * 2021-07-23 2021-10-22 仲恺农业工程学院 Fish feeding behavior identification method based on YOLOv5
CN113538389A (en) * 2021-07-23 2021-10-22 仲恺农业工程学院 Pigeon egg quality identification method
CN113538389B (en) * 2021-07-23 2023-05-09 仲恺农业工程学院 Pigeon egg quality identification method
CN113538390A (en) * 2021-07-23 2021-10-22 仲恺农业工程学院 Quick identification method for shaddock diseases and insect pests
CN113808200B (en) * 2021-08-03 2023-04-07 嘉洋智慧安全科技(北京)股份有限公司 Method and device for detecting moving speed of target object and electronic equipment
CN113808200A (en) * 2021-08-03 2021-12-17 嘉洋智慧安全生产科技发展(北京)有限公司 Method and device for detecting moving speed of target object and electronic equipment
CN113743233B (en) * 2021-08-10 2023-08-01 暨南大学 Vehicle model identification method based on YOLOv5 and MobileNet V2
CN113743233A (en) * 2021-08-10 2021-12-03 暨南大学 Vehicle model identification method based on YOLOv5 and MobileNet V2
CN113808080B (en) * 2021-08-12 2023-10-24 常州大学 Method for detecting number of interference fringes of glass panel of camera hole of mobile phone
CN113808080A (en) * 2021-08-12 2021-12-17 常州大学 Method for detecting number of interference fringes of glass panel of mobile phone camera hole
CN113850799A (en) * 2021-10-14 2021-12-28 长春工业大学 YOLOv 5-based trace DNA extraction workstation workpiece detection method
CN113850799B (en) * 2021-10-14 2024-06-07 长春工业大学 YOLOv 5-based trace DNA extraction workstation workpiece detection method
CN113963299A (en) * 2021-10-26 2022-01-21 大连民族大学 Table tennis ball detection method based on improved YOLO V4 algorithm
CN114387520A (en) * 2022-01-14 2022-04-22 华南农业大学 Precision detection method and system for intensive plums picked by robot
CN114387520B (en) * 2022-01-14 2024-05-14 华南农业大学 Method and system for accurately detecting compact Li Zijing for robot picking
CN114648513A (en) * 2022-03-29 2022-06-21 华南理工大学 Motorcycle detection method based on self-labeling data augmentation
CN114972807A (en) * 2022-05-17 2022-08-30 北京百度网讯科技有限公司 Method and device for determining image recognition accuracy, electronic equipment and medium
CN117612021A (en) * 2023-10-19 2024-02-27 广州大学 Remote sensing extraction method and system for agricultural plastic greenhouse

Similar Documents

Publication Publication Date Title
CN112270252A (en) Multi-vehicle target identification method for improving YOLOv2 model
CN111062413B (en) Road target detection method and device, electronic equipment and storage medium
CN110059554B (en) Multi-branch target detection method based on traffic scene
CN109902677B (en) Vehicle detection method based on deep learning
CN110188705B (en) Remote traffic sign detection and identification method suitable for vehicle-mounted system
CN109087510B (en) Traffic monitoring method and device
CN110348384B (en) Small target vehicle attribute identification method based on feature fusion
CN107784288B (en) Iterative positioning type face detection method based on deep neural network
CN111461083A (en) Rapid vehicle detection method based on deep learning
CN112016605B (en) Target detection method based on corner alignment and boundary matching of bounding box
CN111275044A (en) Weak supervision target detection method based on sample selection and self-adaptive hard case mining
CN108960074B (en) Small-size pedestrian target detection method based on deep learning
CN110288017B (en) High-precision cascade target detection method and device based on dynamic structure optimization
CN112084890B (en) Method for identifying traffic signal sign in multiple scales based on GMM and CQFL
CN109858327B (en) Character segmentation method based on deep learning
CN109087337B (en) Long-time target tracking method and system based on hierarchical convolution characteristics
CN111079540A (en) Target characteristic-based layered reconfigurable vehicle-mounted video target detection method
CN115170611A (en) Complex intersection vehicle driving track analysis method, system and application
CN114332921A (en) Pedestrian detection method based on improved clustering algorithm for Faster R-CNN network
CN114529581A (en) Multi-target tracking method based on deep learning and multi-task joint training
CN115620518A (en) Intersection traffic conflict discrimination method based on deep learning
CN116964588A (en) Target detection method, target detection model training method and device
CN114219936A (en) Object detection method, electronic device, storage medium, and computer program product
CN112819100A (en) Multi-scale target detection method and device for unmanned aerial vehicle platform
CN116311004A (en) Video moving target detection method based on sparse optical flow extraction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination