CN117593623A - Lightweight vehicle detection method based on improved YOLOv8n model - Google Patents

Lightweight vehicle detection method based on improved YOLOv8n model Download PDF

Info

Publication number
CN117593623A
CN117593623A CN202311471722.0A CN202311471722A CN117593623A CN 117593623 A CN117593623 A CN 117593623A CN 202311471722 A CN202311471722 A CN 202311471722A CN 117593623 A CN117593623 A CN 117593623A
Authority
CN
China
Prior art keywords
model
yolov8n
improved
module
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311471722.0A
Other languages
Chinese (zh)
Inventor
魏巍
刘雨修
云健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian Minzu University
Original Assignee
Dalian Minzu University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian Minzu University filed Critical Dalian Minzu University
Priority to CN202311471722.0A priority Critical patent/CN117593623A/en
Publication of CN117593623A publication Critical patent/CN117593623A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/52Scale-space analysis, e.g. wavelet analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • G06V10/763Non-hierarchical techniques, e.g. based on statistics of modelling distributions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/766Arrangements for image or video recognition or understanding using pattern recognition or machine learning using regression, e.g. by projecting features on hyperplanes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/08Detecting or categorising vehicles
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a lightweight vehicle detection method based on an improved YOLOv8n model, which comprises the following steps: constructing a vehicle data set; taking Yolov8n as an original model, introducing a GSconv module and a VoVGSCSP module into the original Yolov8n model, adding an EMA attention mechanism, adding a multi-scale convolution MSC module, setting a high-speed detector DFast module, forming an improved Yolov8n model, and taking the improved Yolov8n model as a light-weight vehicle detection network model; preprocessing an image derived from a vehicle dataset; inputting the preprocessed vehicle data set into an improved YOLOv8n model, and performing model training; the actual vehicle image is detected using the trained modified YOLOv8n model. According to the invention, the YOLOv8n model is improved, lightweight GSconv and VoVGSCSP modules are introduced, a brand new high-speed detection head is designed, a attention mechanism is added, a multi-scale convolution block is designed, the generalization capability and the robustness of the model are improved, the precision is improved, the parameter quantity and FLPs are greatly reduced, and the speed is improved.

Description

Lightweight vehicle detection method based on improved YOLOv8n model
Technical Field
The invention belongs to the technical field of target detection, relates to a lightweight detection method, and particularly relates to a lightweight vehicle detection method based on an improved YOLOv8n model.
Background
Vehicle detection is an important research topic in the fields of traffic monitoring, driver assistance systems, automatic driving and the like, and is also one of research hotspots in the field of computer vision. The Chinese autopilot realizes the crossover development, steps into a new stage of high-quality development, and the importance of vehicle detection is self-evident. However, for the on-vehicle edge computing platform, a huge model is difficult to realize real-time detection requirements, and the on-vehicle edge computing platform faces challenges such as high computing load and poor detection rate. Therefore, light-weight vehicle detection is an important link in the intelligent production process, and various large enterprises consider the light-weight vehicle detection as a key technology for improving the product quality, and related scholars are also researching a light-weight vehicle detection method. Therefore, rapid, accurate and lightweight detection of vehicles under the above conditions is of great research importance.
Currently, commonly used vehicle detection methods can be divided into two types: conventional methods and methods based on deep learning. Based on these two methods, researchers have recently achieved a series of research results in vehicle detection. Prior to large-scale application of deep learning, video detection was mainly a traditional method of constructing a detection model by means of artificial features. Along with the rapid development of deep learning, the method has strong capability of extracting the features of the learning image, and has the advantages of strong generalization capability and high robustness. Vehicle detection based on computer vision combined with deep learning is becoming the dominant approach in this field. This approach does not require manual extraction of features and can be divided into two categories. The first is a two-stage target detection method and its representative network, such as R-CNN, fast R-CNN, mask R-CNN, which uses a selective search algorithm or regional suggestion network (RPN) to extract regional suggestions, thereby detecting targets. Although the detection accuracy is improved compared with the conventional target detection method, the methods are complex and time-consuming, and are not suitable for real-time application. Another class is single-stage object detection methods and their representative networks, such as SSD detection method series, YOLO detection method series, and RetinaNet. These methods are more efficient in terms of detection speed than the two-stage target detection method. Before the advent of the YOLOv4 and YOLOv5 algorithms, the YOLOv3 algorithm was widely used in vehicle detection and related tasks. The addition of modified version of DenseNet reduces the number of model parameters for PVIDNet based on the priority vehicle image detection network (PVIDNet) of YOLOv3 and uses a Soft Root Symbol (SRS) activation function to reduce the execution time of the model. On the basis of YOLOv3, an improved k-means clustering algorithm is provided, model instability caused by singular points is improved through parallel to one branch on a die backbone network, and weak features of small-scale target detection are enhanced.
While most of the above methods improve detection accuracy in some cases, they all suffer from high computation zero, large parameter amounts, and slow computation speed. Some lightweight models can effectively reduce model parameters, but do not balance accuracy and speed.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention provides a lightweight vehicle detection method based on an improved YOLOv8n model, which greatly reduces the quantity of parameters and FLOPs and improves the speed while improving the precision, and meets the requirements of an industrial environment and an edge computing platform on lightweight and rapid real-time detection of vehicle detection.
The invention adopts the technical proposal for solving the technical problems that:
a lightweight vehicle detection method based on an improved YOLOv8n model comprises the following steps:
constructing a vehicle data set;
building a light vehicle detection network model: taking Yolov8n as an original model, introducing a GSconv module and a VoVGSCSP module into the original Yolov8n model, adding an EMA attention mechanism, adding a multi-scale convolution MSC module, setting a high-speed detector DFast module, forming an improved Yolov8n model, and taking the improved Yolov8n model as a light-weight vehicle detection network model;
preprocessing an image derived from a vehicle dataset;
inputting the preprocessed vehicle data set into an improved YOLOv8n model, and performing model training;
the actual vehicle image is detected using the trained modified YOLOv8n model.
Based on the scheme, the YOLOv8n is adopted as an original model, a GSconv module and a VoVGSCSP module are introduced, model compression is realized, and detection precision and speed are kept; adding an EMA attention mechanism and a multi-scale convolution MSC module, selecting key information in a vehicle detection task, inhibiting non-key information, and improving algorithm detection accuracy; the DFast module of the high-speed detector is arranged, so that the problem of low regression speed of the target detection box in the training process is solved; the improved YOLOv8n model obtained after fusion realizes higher detection performance; by preprocessing the number of images, the robustness and generalization capability of the model can be improved, so that the model can better process vehicle images in various actual scenes.
Further, the improved YOLOv8n model is specifically improved by the following steps:
configuring an environment, and inputting the constructed vehicle data set into an original YOLOv8n model;
configuring super parameters: setting a super-parameter combination with optimal learning rate, number of batch samples, iteration times, number of image channels, picture cutting size and learning rate momentum;
the following modifications were made to the original YOLOv8n model: a. embedding a multiscale convolution MSC module into a C2f module in an original Yolov8n model backbone network, and introducing a plurality of convolutions with different kernel sizes; b. adding an EMA attention mechanism when outputting a feature detection layer in an original YOLOv8n model backbone network; c. using a GSConv module and a VoVGSCSP module to replace a downsampling module and a C2f module of the neck of the original YOLOv8n model respectively; d. a high-speed detector DFast module is arranged, and the DFast module carries out target detection by adding point-by-point convolution PWConv in the partial convolution PConv.
Based on the scheme, the multi-scale convolution MSC module is embedded into the backbone network of the original YOLOv8n model, and a plurality of convolutions with different kernel sizes are introduced, so that the model is facilitated to better capture the characteristics of different scales, and the detection capability of vehicles with different sizes is improved; the EMA attention mechanism is added into the backbone network of the original YOLOv8n model, so that attention to key features is enhanced, the detection accuracy is improved, the model can be better focused on the key part of the vehicle, and the detection accuracy is improved; the GSConv module and the VoVGSCSP module replace the downsampling module and the C2f module of the neck of the original YOLOv8n model, so that the efficiency and the performance of feature extraction can be improved, vehicles and backgrounds can be better distinguished, and the detection accuracy is improved; by setting the high-speed detector DFast module and by applying the point-by-point convolution PWConv, the target detection process can be accelerated, and the calculation complexity can be reduced, so that the method has better performance in real-time application. These improved methods collectively improve the performance of the improved YOLOv8n model in vehicle detection tasks, including higher accuracy, better robustness, and higher real-time, by increasing the perceptibility of the model, optimizing feature extraction, enhancing the attentiveness mechanism, and improving the detection speed.
Further, the multi-scale convolution MSC module divides the input channel into 3 heads, applies different depth separable convolutions to each head, does no operation to the first head, then uses 3×3 and 5×5 to initialize kernel size, and finally interacts information through 1×1.
Based on the scheme, the input channel is divided into 3 magnetic heads, and different depth separable convolutions are applied to each magnetic head, so that each magnetic head can pay special attention to information of different scales, the perception field of a model is enhanced, object characteristics under different scales can be better understood, and the global performance of detection is improved particularly for vehicles with different sizes; the first magnetic head is not operated at all, then the kernel sizes are initialized by using 3 multiplied by 3 and 5 multiplied by 5, convolution kernels with different sizes are introduced, the model is helped to learn the characteristics under different scales, and the multi-kernel initialization is helped to improve the flexibility of the model, so that the model can be better adapted to targets with different scales. The application of the depth separable convolution can help the model to better learn the feature representation, reduce the parameter quantity and simultaneously maintain the effective feature extraction capability, so the design can enable the model to more efficiently capture key features in the vehicle image, and improve the detection accuracy.
Furthermore, the EMA attention mechanism retains information on each channel, reduces computation overhead, remodels partial channels into batch processing dimensions, and divides the channel dimensions into a plurality of sub-features, so that the spatial semantic features are distributed well in each feature group.
Based on the scheme, each feature group focuses on different types of information, so that the model is facilitated to better understand and utilize rich information of input data; by remolding part of the channels into batch processing dimensions, the computational overhead can be reduced, and the model does not need to process information of all channels, so that the computational complexity is reduced, and the efficiency of the model is improved.
Further, the GSConv module is a convolution module of mixed standard convolution SC, depth separable convolution DWConv and shuffle, and the information generated by the standard convolution SC is infiltrated into each part of the information generated by the DWConv by using the shuffle.
Based on the scheme, the GSConv module allows information to better communicate and fuse between feature graphs of different levels by combining the standard convolution SC and the depth separable convolution DWConv, and is beneficial to capturing features of different scales and levels, so that the perception and understanding capability of the model is improved; depth separable convolution DWConv typically has fewer parameters and computational costs, but may limit the nonlinear modeling capabilities of the network. By introducing a standard convolution SC, the GSConv module can increase the nonlinearity of the network, thereby being beneficial to better fitting complex data distribution; the GSConv module utilizes the parameter efficiency of the depth separable convolution, and improves the performance of the model through the introduction of the standard convolution, and the design balances the parameter quantity and the performance, so that the model is more suitable for being deployed in the environment with limited resources; the shuffle operation of the GSConv module allows the information generated by the standard convolution SC to be uniformly infiltrated into each portion of the information generated by the depth separable convolution DWConv, which helps to improve uniformity of feature representation, reduces the problem of information fragmentation, and thus improves performance of the model.
Further, the high-speed detector DFast module performs only one path when inputting, extracts features by using PConv plus PWConv, then performs separation, performs adjustment of the channel number by using 1×1 convolution, and then performs calculation of bounding box loss and class loss, respectively.
Based on the scheme, the DFast module adopts lighter PConv and PWConv, and has no complex multi-path in the characteristic processing process, so that the calculation load is reduced, and the calculation efficiency of the model is improved; PConv and PWConv and the like facilitate extracting useful information of features, thereby facilitating detection and classification tasks; the separation and channel number adjustment operation can ensure that the characteristics are reasonably interacted and coordinated among different branches, so that the quality of the characteristic expression is improved; by performing the calculation of bounding box loss and class loss separately, the DFast module can more efficiently handle different types of loss, allowing each branch to focus on different tasks, and such distributed processing helps to increase the training and inference speed of the model. The design of the DFast module is suitable for different target detection tasks, including calculation of boundary box loss and category loss, and the universality enables the DFast module to be a flexible component and can be used in various target detection architectures; by dividing the tasks into different branches, the DFast module can reduce interference among different tasks, and is beneficial to reducing the overfitting risk of the model, so that the generalization performance is improved.
Further, the model training comprises the following specific steps: automatically generating prior frames by using a K-mean clustering method, obtaining boundary sizes by means of frame regression prediction, classifying the boundary frames by using a classifier to obtain defect type probabilities corresponding to each boundary frame, sorting the classification probabilities of each boundary frame by means of non-maximum suppression to obtain a boundary frame predicted value with the maximum confidence, setting a confidence threshold to 0.25, setting an IOU threshold to 0.7, calculating a loss value between the predicted value and a true value by means of a loss function, carrying out back propagation according to the loss value until the preset iteration times are reached, and completing network model training.
Based on the scheme, the prior frame is automatically generated by using the K-means clustering method, so that the model is better adapted to the sizes and shapes of different targets, the generalization capability of the model can be improved, and the model can detect diversified targets; the bounding box size of the target can be predicted more accurately by the model through the bounding box regression, so that the detection accuracy is improved, and the model is particularly suitable for targets with different sizes; the classifier is used for classifying the boundary boxes so as to determine the probability of the defect type corresponding to each boundary box, and the model is allowed to detect the existence of the target and classify the type of the target; sorting the sorting probability of each bounding box by using a non-maximum suppression method, and removing the highly overlapped bounding boxes, thereby reducing repeated detection and improving the quality of detection results; the confidence threshold and the IOU threshold are set to help control the output result of the model, so that only a boundary box with enough high confidence is reserved, and the robustness and accuracy of the model are improved; calculating the difference between the predicted value and the true value by the loss function, the model may be back-propagated, thereby updating the model parameters to reduce the loss;
further, the vehicle data set is built by extracting and integrating vehicle images with different models, angles and colors and images with different types of vehicles in the MS COCO data set.
Further, the image preprocessing specifically includes: with equal scale relaxation or up to 640, the remainder is filled with background gray scale.
The beneficial effects of the invention include:
the method is characterized in that the YOLOv8n model is subjected to light weight improvement aiming at vehicle detection, and comprises the steps of introducing GSconv and VoVGSCSP modules to realize model compression and keep detection precision and speed. An EMA attention mechanism is added, a new multi-scale convolution block (MSC) module is added, a brand new high-speed detection head (DFast) is designed, an improved YOLOv8n model is obtained after fusion, the problem of poor effect in the process of feature extraction and small target detection is solved, the detection precision of the model is improved, and the generalization capability and the robustness of the model are improved. The method greatly reduces the quantity of parameters and FLPs, improves the speed, meets the requirements of an industrial environment and an edge computing platform on light weight and rapid real-time detection of the vehicle detection, and lays a technical foundation for finally building a vehicle detection system.
Drawings
FIG. 1 is a flow chart of a lightweight vehicle detection method based on an improved YOLOv8n model of the present invention;
fig. 2 is a schematic diagram of an MSC module according to the present invention;
FIG. 3 is a schematic diagram of an EMA module for use with the present invention;
FIG. 4 is a schematic diagram of a GSConv module for use with the present invention;
FIG. 5 is a schematic diagram of a VoVGSCSP module used in the present invention;
FIG. 6 is a schematic diagram of a DFast module according to the present invention;
FIG. 7 is a schematic diagram of a PConv module for use with the present invention;
FIG. 8 is a schematic diagram of a comparison of the PConv+PWConv module with a conventional convolution;
FIG. 9 is a schematic diagram of an improved algorithm configuration of the present invention;
FIG. 10 is a graph comparing the results of the improved model of the present invention with the original model test.
Detailed Description
The following description of the embodiments of the present invention will be made apparent and fully in view of the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In addition, the technical features of the different embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
Example 1
The embodiment provides a lightweight vehicle detection method based on a YOLOv8n model, which combines an artificial intelligence technology and a vehicle detection technology, realizes lightweight vehicle detection, aims to strengthen self-adaptive feature fusion of a neural network during feature extraction and improve detection effect, reduces the quantity of parameters and FLPs and can also improve detection speed on the premise of improving detection precision, and meets the requirements of industrial real-time and edge equipment.
The specific operation of the lightweight vehicle detection method based on the YOLOv8n model of the embodiment is as follows:
1. building a vehicle data set:
in this embodiment, the original vehicle data set is a vehicle data set constructed in an image of the MS COCO data set, and the obtained vehicle surface defect image information is: image resolution 640 x 640, channel number 3. Including buses, trucks, and automobiles of different models, colors, and angles.
Dividing the data set: to train the model test vehicle, the constructed dataset is divided into a training set and a validation set, 9: 1.
Reading data: and using a torch.utils.data function of a pyrach library in python software to read the data set.
2. Building a lightweight vehicle detection network model based on an improved YOLOv8n model:
(1) Configuration environment: the virtual YOLOv8 environment is then created using Anaconda from gitdiov 8n model, and matplotlib > =3.2, opencv-python > = 4.6.0, pilow > =7.2, pyYAML > =5.1, requests > = 2.23.0, scipy > =1.4, torch > = 1.7.0, torchvision > = 0.8.1, tqdm > = 4.64.0, tensorbaard > = 2.13.0, dvclve > = 2.11.0, clearml, comet, pandas > =1.4, seaborn > = 0.11.0, coremlton > =6.0, onx > = 1.12.0, onnxsim > 0.4.1, ndia-screen > =4-463=35, rfviz > =3-95, and rf-95 > =35.
(2) Data input: inputting the constructed vehicle data set into a YOLOv8n network structure model, wherein the method comprises the steps of acquiring a vehicle image, a vehicle image label, pre-training weight of the YOLOv8n network structure model and configuration file of the YOLOv8n network structure model;
(3) Configuring super parameters: setting a super-parameter combination with optimal learning rate, batch sample number, iteration number, image channel number, picture cutting size and learning rate momentum, wherein the super-parameter combination specifically comprises the following steps: learning rate lr=0.01, batch sample number batch=64, iteration number epoch=200, image channel number channels=3, picture cropping size cropsize=640×640, learning rate momentum momentum=0.937;
(4) Model improvement: the original model adopted by the invention is YOLOv8n, and the model of the network structure of YOLOv8n is improved in the following four places:
1) The design of the multi-scale convolution MSC module is embedded in a C2f module in the YOLOv8n backbone network, and a plurality of convolutions with different kernel sizes are introduced, so that various spatial features with a plurality of scales can be captured. In order to reduce redundant information in the feature extraction process, the invention provides MSC (Multi-Scale Conv) lightweight Multi-Scale convolution to reduce the parameter number and calculation amount of a convolution neural network. As shown in fig. 2, it introduces multiple convolutions of different kernel sizes that enable it to capture various spatial features of multiple dimensions. In addition, the MSC can use a large 5X 5 convolution kernel to extend the acceptance domain, enhancing its ability to model long-term dependencies. The MSC divides the input channel into 3 heads and applies a different depth separable convolution to each head, thereby reducing the size of the parameters and the computational cost. To simplify the design, no operation is done on the first head, and then the kernel sizes are initialized using 3×3 and 5×5. The method can more accurately describe foreground and target objects by adjusting the scope of the receiving domain and the multi-granularity information, and effectively filter background information. In order to enhance information interaction among a plurality of heads, PWConv is used for communicating information of all feature images, and final learning communication of the feature images is enhanced.
2) EMA is added in the backbone network when the feature detection layer is output. In order to select key information in the current task and improve the efficiency and accuracy of image information processing, EMA (efficiency Multi-Scale attribute) is introduced herein to improve the representational capacity of convolutional neural networks. As shown in FIG. 3, the EMA places a 3×3 kernel in parallel with the 1×1 branches for fast response, named 3×3 branches, in order to aggregate multi-scale spatial structure information. Considering the feature grouping and the multi-scale structure, short-term and long-term dependency relationships are effectively established, so that better performance is obtained.
3) The downsampling and C2f modules of YOLOv8n neck were replaced with GSConv and VoVGSCSP, respectively. In order to make the output of DSC (depth-wise separable convolution) as close as possible to SC (standard convolution), a new convolution of blend SC (standard convolution), DWConv (depth-wise separable convolution) and shuffle was introduced, named GSConv. As shown in fig. 4, the information generated by SC (channel-intensive convolution operation) is infiltrated into each part of the information generated by DWConv (depth-wise separable convolution) using a shuffle. The shuffle is a unified mixing strategy that allows complete mixing of the information of the SC into the output of the DSC, exchanging local feature information on different channels uniformly. The GS bottleneck was introduced on the basis of GSConv, and fig. 5 (a) shows the structure of GS bottleneck. Fig. 5 (b) designs a cross-phase partial network (VoVGSCSP) module using a one-time aggregation method. In this neck phase, the feature map of the connection handled using GSConv is just correct: less redundant duplicate information, no compression is needed, and note that modules work better, such as SPPF and EMA.
4) A high-speed detector module (DFast) was designed to detect targets using PConv (Partial Convolution) plus point-by-point convolution (PWconv). The original YOLOv8n detection head has two routes, and predicts the boundary box loss and the category loss respectively. Each route is composed of two 3×3 convolution normalized plus activated functions, and a total of three detectors with different scales are used for respectively detecting the characteristic diagrams of 80×80×64, 40×40×128 and 20×20×128, thus resulting in larger calculation amount and parameter quantity. The invention provides DFast, which uses the design concept of shared parameters, only performs one route when inputting, extracts the features by using PConv plus Pwconv, then performs separation, respectively adjusts the channel number by using 1X 1 convolution, and then respectively calculates the boundary box loss and the category loss, as shown in figure 6. Wherein PConv exploits redundancy in the feature map, applying conventional Conv only to part of the input channels for spatial feature extraction, while retaining the remaining channels, as shown in fig. 7. Essentially, the FLOPs of PConv is lower than conventional Conv, but the FLOPs of PConv is higher than DWConv. The formula is as follows:
FLOPS is an abbreviation for floating-point operations per second, meaning floating-point number of operations per second. For measuring the performance of the hardware. FLPs are an abbreviation for floating point of operations, which is floating point number of operations, and can be used to measure algorithm/model complexity. Because PConv can avoid frequent memory accesses, computing power on the device is better utilized. The FLOPS of PConv is higher than DWConv. For sequential or regular memory accesses, consider the first or last sequential cp channel as representative of the entire feature map. Typically, the input and output feature maps have the same number of channels. Using a ratio of r=cp/c=1/4, FLOPs for PConv are only 1/16 of conventional Conv, formulated as follows:
h×w×k 2 ×cp 2
since only the cp channel is used for null feature extraction, the present invention avoids simply removing the remaining (c-cp) channels, otherwise PConv would degrade to a conventional Conv with fewer channels, which deviates from the goal of reducing redundancy. The present invention employs keeping the remaining channels unchanged, rather than deleting them, because they are useful to the subsequent PWConv layer, PWConv allows feature information to flow through all channels, and can fully and efficiently use information from all channels, so a point state convolution (PWConv) is further added to PConv so that their effective receptive field on the input feature map is in a T-shape Conv, which focuses more on the center position than conventional Conv that uniformly processes patches, while also approximating a conventional Conv. As shown in fig. 8. The convolution rationality is maintained with reduced computation and feature redundancy. The improved YOLOv8n network structure is schematically shown in fig. 9.
3. Image preprocessing:
the specific steps of pretreatment include:
with equal scale relaxation/up to 640, the remainder is filled with background gray scale.
4. Training using improved YOLOv8n model
(1) The image data enhancement and pre-processed vehicle data set is input into an improved YOLOv8n model with super parameters set for training, and the configuration used by the invention is that a CPU: intel Core i5-13600KF; CPU dominant frequency: 3.50GHz; memory: 64G; GPU: NVIDIA GeForce RTX 4090 and 4090; and (3) video memory: 24G; the deep learning framework is PyTorch, and the development environment is Pytoch 2.0.1,Python 3.11,Cuda 11.8;
(2) Model training: inputting the training set of the constructed vehicle data set into an improved YOLOv8n model for training, automatically generating a priori frame by using a K-mean clustering method, obtaining boundary dimensions by frame regression prediction, classifying the boundary frames by using a classifier to obtain defect type probabilities corresponding to each boundary frame, sequencing the classification probabilities of each boundary frame by using a non-maximum suppression (NMS) method to obtain a boundary frame predicted value with the maximum confidence coefficient, setting the confidence coefficient threshold value to 0.25, setting the IOU threshold value to 0.7, calculating a loss value between the predicted value and a true value by using a loss function, carrying out back propagation according to the loss value until the preset iteration times are reached, and completing the training of the network model.
5. The modified YOLOv8n model was tested and evaluated:
(1) Model test: the performance change caused by the network structure change is verified through an ablation experiment. Among them, we call EMA+MSC structure added in backbone network EMSC (efficiency Multi-Scale Attention Conv), GSConv and VoVGSCSP structure added in neck network as slimneck. Four experiments were trained on YOLOv8n, YOLOv8n-slimneck, YOLOv n-slip-DFast, YOLOv8 n-slip-DFast-EMSC. The experimental results are shown in fig. 10 and table 1, respectively.
Table 1 the ablation test evaluation chart of the present invention
As can be seen from Table 1, the average detection processing time of the improved model slightly increased after improving the neck network structure of Yolov8n, and the number of FLOPs and model parameters was reduced by 9.75% and 6.84%, respectively, compared to Yolov8n. But the model accuracy is nearly uniform. After using the DFast high-speed detection head, the number of flow and model parameters is reduced by 36.48% and 21.02% respectively compared with YOLOv8n with almost unchanged accuracy. After the EMA and MSC modules are used for replacing the YOLOv8n backbone network, the average detection processing time of the improved model is slightly increased, and the number of model parameters is reduced by 5.88%. Meanwhile, the mode accuracy rate is increased by 1.7%. The EMA module is shown to play an effect of enhancing features in the Yolov8n backbone network. Fig. 10 shows the performance of the proposed method of the invention compared to YOLOv8n in different scenarios. The left image is the detection result of YOLOv8n and the right image is the detection result of the method of the invention. In terms of simple scene and large target vehicle detection, this method is hardly different from YOLOv8n. For a small fuzzy target vehicle with severe occlusion and dense distribution, as shown in fig. 10 (c), the method can detect a vehicle with severe occlusion, while YOLOv8n cannot. In FIG. 10 (d), the present method is able to detect a car deleted in Yolov8n. The results show that the method is superior to YOLOv8n.
6. Detecting a vehicle image by using the improved YOLOv8n model:
and inputting the vehicle image into the improved YOLOv8n model to finish the detection of the vehicle image.
It is apparent that the above examples are given by way of illustration only and are not limiting of the embodiments. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. While still being apparent from variations or modifications that may be made by those skilled in the art are within the scope of the invention.

Claims (9)

1. A lightweight vehicle detection method based on an improved YOLOv8n model is characterized by comprising the following steps:
constructing a vehicle data set;
building a light vehicle detection network model: taking Yolov8n as an original model, introducing a GSconv module and a VoVGSCSP module into the original Yolov8n model, adding an EMA attention mechanism, adding a multi-scale convolution MSC module, setting a high-speed detector DFast module, forming an improved Yolov8n model, and taking the improved Yolov8n model as a light-weight vehicle detection network model;
preprocessing an image derived from a vehicle dataset;
inputting the preprocessed vehicle data set into an improved YOLOv8n model, and performing model training;
the actual vehicle image is detected using the trained modified YOLOv8n model.
2. The lightweight vehicle detection method based on an improved YOLOv8n model of claim 1, wherein the improved YOLOv8n model is specifically improved by:
configuring an environment, and inputting the constructed vehicle data set into an original YOLOv8n model;
configuring super parameters: setting a super-parameter combination with optimal learning rate, number of batch samples, iteration times, number of image channels, picture cutting size and learning rate momentum;
the following modifications were made to the original YOLOv8n model: a. embedding a multiscale convolution MSC module into a C2f module in an original Yolov8n model backbone network, and introducing a plurality of convolutions with different kernel sizes; b. adding an EMA attention mechanism when outputting a feature detection layer in an original YOLOv8n model backbone network; c. using a GSConv module and a VoVGSCSP module to replace a downsampling module and a C2f module of the neck of the original YOLOv8n model respectively; d. a high-speed detector DFast module is arranged, and the DFast module carries out target detection by adding point-by-point convolution PWConv in the partial convolution PConv.
3. The method of claim 2, wherein the multi-scale convolution MSC module divides the input channel into 3 heads and applies a different depth separable convolution to each head, does nothing to the first head, then initializes the kernel size using 3 x 3 and 5 x 5, and finally interacts information by 1 x 1.
4. The lightweight vehicle detection method based on the improved YOLOv8n model of claim 2, wherein the EMA attention mechanism retains information on each channel, reduces computational overhead, reshapes part of the channels into batch dimensions, and partitions the channel dimensions into multiple sub-features that distribute spatial semantic features well within each feature set.
5. The lightweight vehicle detection method based on the improved YOLOv8n model of claim 2, wherein the GSConv module is a convolution module that mixes standard convolutions SC, depth separable convolutions DWConv, and shuffles, using the shuffles to infiltrate information generated by the standard convolutions SC into each portion of information generated by DWConv.
6. The method for lightweight vehicle detection based on improved YOLOv8n model according to claim 2, wherein the high-speed detector DFast module performs only one path at the time of input, extracts features using PConv plus PWConv, then performs separation, performs adjustment of the number of channels using 1 x 1 convolution, and then performs calculation of bounding box loss and class loss, respectively.
7. The lightweight vehicle detection method based on an improved YOLOv8n model according to claim 1 or 2, wherein the model training comprises the specific steps of: automatically generating prior frames by using a K-mean clustering method, obtaining boundary sizes by means of frame regression prediction, classifying the boundary frames by using a classifier to obtain defect type probabilities corresponding to each boundary frame, sorting the classification probabilities of each boundary frame by means of non-maximum suppression to obtain a boundary frame predicted value with the maximum confidence, setting a confidence threshold to 0.25, setting an IOU threshold to 0.7, calculating a loss value between the predicted value and a true value by means of a loss function, carrying out back propagation according to the loss value until the preset iteration times are reached, and completing network model training.
8. The method for lightweight vehicle detection based on improved YOLOv8n model of claim 1, wherein the constructing the vehicle dataset is by integrating and extracting images of vehicles with different models, angles, colors and images with different types of vehicles in the MS COCO dataset.
9. The lightweight vehicle detection method based on the improved YOLOv8n model of claim 1, wherein the image preprocessing specifically comprises: with equal scale relaxation or up to 640, the remainder is filled with background gray scale.
CN202311471722.0A 2023-11-07 2023-11-07 Lightweight vehicle detection method based on improved YOLOv8n model Pending CN117593623A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311471722.0A CN117593623A (en) 2023-11-07 2023-11-07 Lightweight vehicle detection method based on improved YOLOv8n model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311471722.0A CN117593623A (en) 2023-11-07 2023-11-07 Lightweight vehicle detection method based on improved YOLOv8n model

Publications (1)

Publication Number Publication Date
CN117593623A true CN117593623A (en) 2024-02-23

Family

ID=89917383

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311471722.0A Pending CN117593623A (en) 2023-11-07 2023-11-07 Lightweight vehicle detection method based on improved YOLOv8n model

Country Status (1)

Country Link
CN (1) CN117593623A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117893894A (en) * 2024-03-15 2024-04-16 吉林大学 Underwater target lightweight detection method and device based on infrared polarized image

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117893894A (en) * 2024-03-15 2024-04-16 吉林大学 Underwater target lightweight detection method and device based on infrared polarized image
CN117893894B (en) * 2024-03-15 2024-06-11 吉林大学 Underwater target lightweight detection method and device based on infrared polarized image

Similar Documents

Publication Publication Date Title
Chen et al. Embedded system real-time vehicle detection based on improved YOLO network
CN107145889B (en) Target identification method based on double CNN network with RoI pooling
CN110348384B (en) Small target vehicle attribute identification method based on feature fusion
CN109902806A (en) Method is determined based on the noise image object boundary frame of convolutional neural networks
Gong et al. Object detection based on improved YOLOv3-tiny
CN111292366B (en) Visual driving ranging algorithm based on deep learning and edge calculation
CN111680739A (en) Multi-task parallel method and system for target detection and semantic segmentation
CN117593623A (en) Lightweight vehicle detection method based on improved YOLOv8n model
CN112417973A (en) Unmanned system based on car networking
CN112101113B (en) Lightweight unmanned aerial vehicle image small target detection method
CN116682090A (en) Vehicle target detection method based on improved YOLOv3 algorithm
CN115909078A (en) Ship classification method based on HRRP and SAR data feature level fusion
CN115880562A (en) Lightweight target detection network based on improved YOLOv5
CN117975377A (en) High-precision vehicle detection method
CN118155147A (en) Light network model for vehicle detection
CN112132207A (en) Target detection neural network construction method based on multi-branch feature mapping
Liu et al. Object Detection of UAV Aerial Image based on YOLOv8
Pan et al. Traffic Light Detection for Self-Driving Vehicles Based on Deep Learning
Zhang et al. Traffic sign detection algorithm based on YOLOv5 combined with BIFPN and attention mechanism
Mohamed et al. Improving Vehicle Classification and Detection with Deep Neural Networks
Wang et al. Road traffic vehicle detection method using lightweight yolov5 and attention mechanism
CN113076898B (en) Traffic vehicle target detection method, device, equipment and readable storage medium
CN116895029B (en) Aerial image target detection method and aerial image target detection system based on improved YOLO V7
Zhang et al. Research on traffic target detection method based on improved yolov3
CN117710755B (en) Vehicle attribute identification system and method based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination