CN115731533B - Vehicle-mounted target detection method based on improved YOLOv5 - Google Patents

Vehicle-mounted target detection method based on improved YOLOv5 Download PDF

Info

Publication number
CN115731533B
CN115731533B CN202211506283.8A CN202211506283A CN115731533B CN 115731533 B CN115731533 B CN 115731533B CN 202211506283 A CN202211506283 A CN 202211506283A CN 115731533 B CN115731533 B CN 115731533B
Authority
CN
China
Prior art keywords
feature
layer
yolov5
improved
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211506283.8A
Other languages
Chinese (zh)
Other versions
CN115731533A (en
Inventor
张青春
蒋方呈
高峰
王文聘
张洪源
文张源
张宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huaiyin Institute of Technology
Original Assignee
Huaiyin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huaiyin Institute of Technology filed Critical Huaiyin Institute of Technology
Priority to CN202211506283.8A priority Critical patent/CN115731533B/en
Publication of CN115731533A publication Critical patent/CN115731533A/en
Application granted granted Critical
Publication of CN115731533B publication Critical patent/CN115731533B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Image Analysis (AREA)

Abstract

According to the vehicle-mounted target detection method based on the improved Yolov5, the Yolov5 network structure is improved, and the obstacle detection of a complex road is realized; the specific operation steps are as follows: step 1: collecting a front image of a vehicle through a camera; step 2: the video streams acquired by the cameras are respectively subjected to key frame extraction to acquire a picture data set for subsequent model training; preprocessing the collected picture data set, and dividing the picture data set into a training set, a testing set and a verification set according to a proper proportion; step 3: configuring a related environment, building an improved Yolov5 network structure, and putting the processed picture training set, the picture test set and the picture verification set into the improved Yolov5 for training; after training is completed, a best test. Pt model with the best detection effect is obtained; step 4: and placing the image to be detected into a best. Pt model to obtain a detection result. The invention can keep higher recognition accuracy under the conditions of small targets and low resolution, and improves the accuracy of target detection.

Description

Vehicle-mounted target detection method based on improved YOLOv5
Technical Field
The invention relates to the technical field of computer image processing, in particular to a vehicle-mounted target detection method based on improved YOLOv 5.
Background
With the rapid development of the logistics industry and the rapid growth of the travel demands of people, the road transportation industry of China is rapidly developing. With the rapid development of the transportation industry, the road condition is complex, and the frequency of traffic accidents is also higher and higher. At present, a vehicle-mounted obstacle recognition system mostly adopts a laser radar, an ultrasonic sensor and the like; the laser radar, the ultrasonic sensor and other equipment have higher cost, large calculation workload and inconvenient deployment and use.
In the field of target detection, the current mainstream is to use a deep learning neural network, and enable the network to have the capability of target recognition through training. The current mainstream target detection network has two structures, one is a one-stage type network represented by YOLO and one is a two-stage type network represented by Fast-RCNN. the two-stage type network extracts the region of interest from the input image, locates the target first, then extracts the characteristics of each region of interest, and finally uses a multi-classifier to identify the category of each region, wherein the detection accuracy is higher but the detection speed is slower; the one-stage type network integrates the positioning and classification into one network to be completed independently, so that the detection speed is greatly improved, but a part of detection precision is lost.
The existing YOLO network has a faster development trend, has a larger improvement in detection speed and accuracy, and is not weaker than the two-stage type network. YOLOv5 is used as the latest version of the current YOLOv network series, the performance is obviously improved compared with the prior version, but the detection method has the defects in low resolution and small target detection, and the detection precision under the complex condition is easy to interfere.
Disclosure of Invention
In order to solve the problem that the YOLOv5 has defects in low resolution and small target detection and the detection precision is easy to be interfered under the complex condition, the invention provides a vehicle-mounted target detection method based on improved YOLOv5, which uses an SPD-Conv structure to replace the original Conv structure and improves the precision of identifying the low resolution and small target; the technical problems can be effectively solved.
The invention is realized by the following technical scheme:
according to the vehicle-mounted target detection method based on the improved Yolov5, the Yolov5 network structure is improved, and the obstacle detection of a complex road is realized; the specific operation steps are as follows:
step 1: collecting a front image of a vehicle through a camera;
step 2: the video streams acquired by the cameras are respectively subjected to key frame extraction to acquire a picture data set for subsequent model training; preprocessing the collected picture data set, and dividing the picture data set into a training set, a testing set and a verification set according to a proper proportion;
step 3: configuring a related environment, building an improved Yolov5 network structure, and putting the processed picture training set, the picture test set and the picture verification set into the improved Yolov5 for training; after training is completed, a best test. Pt model with the best detection effect is obtained;
the improved Yolov5 network structure is built, the Yolov5 is improved, and the improvement points of the Yolov5 are as follows: replacing the original neck network of Yolov5 with a weighted bidirectional pyramid network BiFPN for feature extraction; introducing an attention mechanism in a backbone network, adding a CBAM module, and combining the attention mechanisms of two dimensions of a feature channel and a feature space; the SPD-Conv module is used for replacing the original CNN module to obtain a Yolov5-SPD module which is used for processing low-resolution and smaller targets; replacing the original IoU function with an EIoU loss function; replacing the SilU activation function with the Mish activation function;
step 4: and placing the image to be detected into a best. Pt model to obtain a detection result.
Further, in the step 3, a feature fusion neck network is introduced into the Yolov5 backbone network, and feature extraction is performed by a weighted bidirectional pyramid network BiFPN, which specifically comprises the following operation modes: introducing a CBAM convolution attention module into a Backbone network of a Yolov5 network, wherein the CBAM convolution attention module combines a channel attention mechanism and a space attention mechanism; the method comprises the steps that a Backbone network of a backhaul extracts features, a CBAM module carries out global average pooling and global maximum pooling on single feature layers in input feature layers respectively by a focus mechanism of a channel, converts the single feature layers into two 1x1 forms, adds the results of the global average pooling and the global maximum pooling by using a full connection layer, carries out sigmoid operation on the added results to obtain a weight of each feature channel, and multiplies the weight by an original feature layer to obtain features of the channel;
the attention mechanism of the CBAM module for the space is characterized in that the maximum value and the average value of each feature point on an input feature layer are taken, the maximum value and the average value are stacked, a single feature layer is converted into 2 channels, the number of the channels is adjusted by convolution with the number of the channels being 1 again, the single feature layer is converted into 1 channel again, sigmoid operation is carried out on the processed feature points, the weight of each feature point is obtained, and the feature of the feature point can be obtained by multiplying the weight with the feature point on the original feature layer;
the attention mechanism highlights the key part in the characteristics, simultaneously focuses on the position information and semantic information of the target, introduces the attention mechanism in both the bottom characteristic layer and the high-level characteristic layer of the Backbone network of the Backbone, namely adds a CBAM module in the 6 th layer, the 11 th layer, the 16 th layer and the last layer, highlights the bottom and high-level characteristic information, and introduces the CBAM module in the last layer of the Backbone network of the Backbone to meet the requirement of a subsequent Neck bottleneck structure.
Furthermore, the CBAM module is introduced into the last layer of the Backbone network of the backhaul to meet the requirement of the subsequent Neck bottleneck structure, and the specific operation mode is as follows: the bi-directional pyramid network BiFPN introduces a learnable weight to the features of different scales so as to better balance the feature information of the different scales; namely, introducing a learnable weight parameter O to the features of different scales to control the weight of each layer of features, wherein the specific distribution mode of O is as follows:
wherein w is i Activating a function for the weight through the SiLU to be more than or equal to 0; let epsilon=0.0001 prevent numerical instability;
the feature layer is subjected to feature fusion in a weighted mode, and the specific mode is as follows:
wherein P is i td Is P i Intermediate properties of layers, P i out Is P i Layer output characteristics, resize will P i-1 、P i+1 Feature layer conversion to and P i The same dimensions.
Further, the SPD-Conv module in the step 3 is used for replacing the original CNN module to obtain a Yolov5-SPD module; the Yolov5-SPD module comprises an SPD layer and a non-strided convolutio layer; the SPD layer performs downsampling on the original feature map, cuts a certain feature map according to proportion to obtain a series of sub-feature maps, and splices the sub-feature maps according to channels to obtain an intermediate feature map, wherein the specific mode is as follows:
f m-1,n-1 =X[scale-1:m:scale,scale-1:n:scale];
wherein X is an original feature map, the size is m multiplied by n, and scale is a scaling factor;
the non-strided convolutio layer uses a non-stride convolution mode to keep the feature information for discrimination as far as possible, and simultaneously controls the depth and the width of the intermediate feature map to meet the requirements of the depth and the width of the subsequent network; the Yolov5-SPD module is used for replacing the original CNN for processing the low-resolution and smaller targets, so that the precision of identifying the low-resolution and smaller targets can be improved.
Further, replacing the original IoU function with the EIoU loss function in the step 3, where the GIoU in the EIoU loss function can split the loss term of the aspect ratio into the difference value between the predicted width and height and the minimum external frame width and height respectively; meanwhile, focal Loss is introduced, so that the optimized contribution of a large number of anchor frames which are less overlapped with the target frame to BBox regression is reduced, and the regression process is more focused on high-quality frames, wherein the specific formula is as follows:
E loss =IoU loss +dis loss +asp loss
wherein dis loss As center point loss, asp loss For length and width loss ρ 2 (b,b gt ) Euclidean distance, ρ representing the center point of the predicted and real frames 2 (w,w gt )、ρ 2 (h,h gt ) Representing the Euclidean distance of the width and height of the prediction and real frames, respectively, c representing the diagonal distance of the minimum enclosed region containing both the prediction and real frames, c w Representing the width of the smallest closed area containing both the predicted and real frames, c h Representing the height of the smallest closed area containing both the predicted and real frames.
Further, in the step 3, the function of activating Mish is used to replace the SiLU activation function, the function of activating Mish has a lower bound, and the negative half-axis has smaller weight, so that the occurrence of neuron necrosis phenomenon can be prevented, and a stronger regularization effect can be generated; a small amount of negative information is reserved, so that the phenomenon of the Dying ReLU of the ReLU is avoided, and better expression and information flow are facilitated; the specific formula of the Mish activation function is:
Mish(x)=x*Tanh(Softplus(x));
where Tanh is a hyperbolic tangent function, softplus is an activation function, which can be seen as a smoothing of ReLu.
Further, before the processed picture training set, the picture testing set and the picture verifying set are put into the improved Yolov5 for training in the step 3, network training parameters need to be set, and the specific operation mode is as follows: the iteration number is set to 200, the bitchsize is 16, and the initial learning rate is 0.0001.
Further, in the step 1, the front image of the vehicle is collected by the camera, and the specific operation mode is as follows: the camera is arranged at the top of the vehicle and used for collecting images in front of the vehicle; during the running process of the vehicle, the camera can collect video streams in front of the vehicle.
Further, in step 2, the video streams collected by the camera are respectively extracted into key frames, that is, the current frames are extracted from the video streams collected by the camera at intervals of 1s and are used as key frames, and the key frames are stored in the picture data set.
Further, the preprocessing of the collected picture data set in step 2 specifically includes: removing pictures which do not contain targets, have fuzzy characteristics and have disordered backgrounds; labeling the screened pictures, labeling targets to be detected on the pictures, such as culverts, height limiting rods, trees and other barriers by using rectangular frames, recording names of the targets and coordinates of the rectangular frames, and generating txt files for storage; finally, dividing the picture data set into a training set, a testing set and a verification set according to the ratio of 7:2:1.
Advantageous effects
Compared with the prior art, the vehicle-mounted target detection method based on the improved YOLOv5 has the following beneficial effects:
(1) According to the technical scheme, the vehicle-mounted camera and the embedded equipment are directly mounted on the vehicle, so that additional hardware is not needed, and the hardware cost is saved; the camera is used for collecting the front image of the vehicle, inputting the image into the model, and judging whether a certain obstacle exists in front of the vehicle, so that the purpose of monitoring the road condition of the vehicle is achieved. In addition, replacing the original neck network of Yolov5 with a weighted bidirectional pyramid network BiFPN for feature extraction; introducing an attention mechanism in a backbone network, adding a CBAM module, and combining the attention mechanisms of two dimensions of a feature channel and a feature space; the SPD-Conv module is used for replacing the original CNN module to obtain a Yolov5-SPD module, and the method is used for processing low-resolution and small targets to improve the accuracy of identifying the low-resolution and small targets; replacing the original IoU function with an EIoU loss function; the loss function adopts EIoU, so that the difference between the prediction frame and the real frame can be calculated more effectively, and the precision of the model is improved.
(2) According to the technical scheme, an improved BiFPN bidirectional weighted pyramid network is introduced at the neg end, learnable weights are introduced for different scale features to better balance the feature information of different scales, a SiLU activation function is replaced by a Mish activation function, the Dying ReLU phenomenon of ReLU is avoided, and better expression and information flow are facilitated.
(3) According to the technical scheme, the original IoU function is replaced by the EIoU loss function, so that the problem of sample unbalance in the bounding box regression task is solved, namely the optimized contribution of a large number of anchor boxes which are less overlapped with the target box to BBox regression is reduced, and the regression process is more focused on high-quality boxes.
Drawings
FIG. 1 is a schematic overall flow chart of the present invention.
FIG. 2 is a diagram of an improved YOLOv5 network architecture in accordance with the present invention.
FIG. 3 is a schematic diagram of a CBAM attention module according to the present invention.
FIG. 4 is a diagram of a BiFPN network architecture in accordance with the present invention.
FIG. 5 is a schematic diagram of a Yolov5-SPD module according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments.
Example 1:
1-5, an improved Yolov5 network structure is improved to realize obstacle detection of a complex road based on an improved Yolov5 vehicle-mounted target detection method; the specific operation steps are as follows:
step 1: collecting a front image of a vehicle through a camera;
the camera is arranged at the top of the vehicle and used for collecting images in front of the vehicle; during the running process of the vehicle, the camera can collect video streams in front of the vehicle.
Step 2: and respectively extracting key frames from the video streams acquired by the cameras, namely extracting current frames from the video streams acquired by the cameras at intervals of 1s as key frames, and storing the key frames into a picture data set. Acquiring a picture data set of subsequent model training; preprocessing the collected picture data set to remove pictures which do not contain targets, have fuzzy characteristics and have disordered backgrounds; labeling the screened pictures, labeling targets to be detected on the pictures, such as culverts, height limiting rods, trees and other barriers by using rectangular frames, recording names of the targets and coordinates of the rectangular frames, and generating txt files for storage; finally, dividing the picture data set into a training set, a testing set and a verification set according to the ratio of 7:2:1.
Step 3: configuring a related environment, building an improved Yolov5 network structure, and putting the processed picture training set, the picture test set and the picture verification set into the improved Yolov5 for training; after training is completed, a best test. Pt model with the best detection effect is obtained; the method specifically comprises the following steps:
the first step: the improved Yolov5 network structure is built, the Yolov5 is improved, and the improvement points of the Yolov5 are as follows: and replacing the original neck network of Yolov5 with a weighted bidirectional pyramid network BiFPN for feature extraction.
Introducing a CBAM convolution attention module into a Backbone network of a Yolov5 network, wherein the CBAM convolution attention module combines a channel attention mechanism and a space attention mechanism; the method comprises the steps of extracting features by a Backbone network of a backhaul, respectively carrying out global average pooling and global maximum pooling on a single feature layer in input feature layers by a CBAM module on a attention mechanism of a channel, converting the single feature layer into two 1x1 forms, adding the results of the global average pooling and the global maximum pooling by using a full connection layer, carrying out sigmoid operation on the added results to obtain a weight of each feature channel, and multiplying the weight by the original feature layer to obtain the features of the channel.
The attention mechanism of the CBAM module for the space is characterized in that the maximum value and the average value of each feature point on an input feature layer are taken, the maximum value and the average value are stacked, a single feature layer is converted into 2 channels, the number of the channels is adjusted by convolution with the number of the channels being 1 again, the single feature layer is converted into 1 channel again, sigmoid operation is carried out on the processed feature points, the weight of each feature point is obtained, and the feature of the feature point can be obtained by multiplying the weight with the feature point on the original feature layer.
The attention mechanism highlights the key part in the characteristics, simultaneously focuses on the position information and semantic information of the target, introduces the attention mechanism in both the bottom characteristic layer and the high-level characteristic layer of the Backbone network of the Backbone, namely adds a CBAM module in the 6 th layer, the 11 th layer, the 16 th layer and the last layer, highlights the bottom and high-level characteristic information, and introduces the CBAM module in the last layer of the Backbone network of the Backbone to meet the requirement of a subsequent Neck bottleneck structure.
And a second step of: introducing an attention mechanism in a backbone network, adding a CBAM module, and combining the attention mechanisms of two dimensions of a feature channel and a feature space; a CBAM module is introduced into the last layer of the Backbone network of the back bone to meet the requirement of a subsequent Neck bottleneck structure, and a bi-directional pyramid network BiFPN introduces a learnable weight for different scale features so as to better balance the feature information of different scales; namely, introducing a learnable weight parameter O to the features of different scales to control the weight of each layer of features, wherein the specific distribution mode of O is as follows:
wherein w is i Activating a function for the weight through the SiLU to be more than or equal to 0; let epsilon=0.0001 prevent numerical instability;
the feature layer is subjected to feature fusion in a weighted mode, and the specific mode is as follows:
wherein P is i td Is P i Intermediate properties of layers, P i out Is P i Layer output characteristics, resize will P i-1 、P i+1 Feature layer conversion to and P i The same dimensions.
And a third step of: the SPD-Conv module is used for replacing the original CNN module to obtain a Yolov5-SPD module which is used for processing low-resolution and smaller targets; the Yolov5-SPD module comprises an SPD layer and a non-strided convolutio layer; the SPD layer performs downsampling on the original feature map, cuts a certain feature map according to proportion to obtain a series of sub-feature maps, and splices the sub-feature maps according to channels to obtain an intermediate feature map, wherein the specific mode is as follows:
f m-1,n-1 =X[scale-1:m:scale,scale-1:n:scale];
wherein X is an original feature map, the size is m multiplied by n, and scale is a scaling factor;
the non-strided convolutio layer uses a non-stride convolution mode to keep the feature information for discrimination as far as possible, and simultaneously controls the depth and the width of the intermediate feature map to meet the requirements of the depth and the width of the subsequent network; the Yolov5-SPD module is used for replacing the original CNN for processing the low-resolution and smaller targets, so that the precision of identifying the low-resolution and smaller targets can be improved.
Fourth step: replacing the original IoU function with an EIoU loss function; the GIoU in the EIoU loss function can split the loss term of the aspect ratio into the difference value of the predicted width and height and the minimum external frame width and height respectively; meanwhile, focal Loss is introduced, so that the optimized contribution of a large number of anchor frames which are less overlapped with the target frame to BBox regression is reduced, and the regression process is more focused on high-quality frames, wherein the specific formula is as follows:
E loss =IoU loss +dis loss +asp loss
wherein dis loss As center point loss, asp loss For length and width loss ρ 2 (b,b gt ) Euclidean distance, ρ representing the center point of the predicted and real frames 2 (w,w gt )、ρ 2 (h,h gt ) Representing the Euclidean distance of the width and height of the prediction and real frames, respectively, c representing the diagonal distance of the minimum enclosed region containing both the prediction and real frames, c w Representing the width of the smallest closed area containing both the predicted and real frames, c h Representing the height of the smallest closed area containing both the predicted and real frames.
Fifth step: replacing the SilU activation function with the Mish activation function; the Mish activation function has a lower bound, has smaller weight on a negative half shaft, can prevent the occurrence of neuron necrosis phenomenon, and can generate stronger regularization effect; a small amount of negative information is reserved, so that the phenomenon of the Dying ReLU of the ReLU is avoided, and better expression and information flow are facilitated; the specific formula of the Mish activation function is:
Mish(x)=x*Tanh(Softplus(x));
where Tanh is a hyperbolic tangent function, softplus is an activation function, which can be seen as a smoothing of ReLu.
Sixth step: before the processed picture training set, picture test set and picture verification set are put into the improved Yolov5 for training, network training parameters need to be set, and the specific operation mode is as follows: the iteration number is set to 200, the bitchsize is 16, and the initial learning rate is 0.0001.
Step 4: and placing the image to be detected into a best. Pt model to obtain a detection result.

Claims (5)

1. According to the vehicle-mounted target detection method based on the improved Yolov5, the Yolov5 network structure is improved, and the obstacle detection of a complex road is realized; the specific operation steps are as follows:
step 1: collecting a front image of a vehicle through a camera;
step 2: the video streams acquired by the cameras are respectively subjected to key frame extraction to acquire a picture data set for subsequent model training; preprocessing the collected picture data set, and dividing the picture data set into a training set, a testing set and a verification set according to a proper proportion;
step 3: configuring a related environment, building an improved Yolov5 network structure, and putting the processed picture training set, the picture test set and the picture verification set into the improved Yolov5 for training; after training is completed, a best test. Pt model with the best detection effect is obtained;
the improved Yolov5 network structure is built, the Yolov5 is improved, and the improvement points of the Yolov5 are as follows:
firstly, replacing an original neck network of Yolov5 with a weighted bidirectional pyramid network BiFPN to extract features; introducing an attention mechanism in a backbone network, adding a CBAM module, and combining the attention mechanisms of two dimensions of a feature channel and a feature space; introducing a feature fusion neck network into a Yolov5 backbone network, and extracting features by a weighted bidirectional pyramid network BiFPN, wherein the specific operation mode is as follows:
introducing a CBAM convolution attention module into a Backbone network of a Yolov5 network, wherein the CBAM convolution attention module combines a channel attention mechanism and a space attention mechanism; the method comprises the steps that a Backbone network of a backhaul extracts features, a CBAM module carries out global average pooling and global maximum pooling on single feature layers in input feature layers respectively by a focus mechanism of a channel, converts the single feature layers into two 1x1 forms, adds the results of the global average pooling and the global maximum pooling by using a full connection layer, carries out sigmoid operation on the added results to obtain a weight of each feature channel, and multiplies the weight by an original feature layer to obtain features of the channel;
the attention mechanism of the CBAM module for the space is characterized in that the maximum value and the average value of each feature point on an input feature layer are taken, the maximum value and the average value are stacked, a single feature layer is converted into 2 channels, the number of the channels is adjusted by convolution with the number of the channels being 1 again, the single feature layer is converted into 1 channel again, sigmoid operation is carried out on the processed feature points, the weight of each feature point is obtained, and the feature of the feature point can be obtained by multiplying the weight with the feature point on the original feature layer;
the attention mechanism highlights the key part in the characteristics, simultaneously focuses on the position information and semantic information of the target, introduces the attention mechanism in both the bottom characteristic layer and the high-level characteristic layer of the Backbone network of the Backbone, namely adds a CBAM module in the 6 th layer, the 11 th layer, the 16 th layer and the last layer, highlights the bottom and high-level characteristic information, introduces the CBAM module in the last layer of the Backbone network of the Backbone to meet the requirement of the subsequent Neck bottleneck structure, and specifically operates in the following modes: the bi-directional pyramid network BiFPN introduces a learnable weight to the features of different scales so as to better balance the feature information of the different scales; namely, introducing a learnable weight parameter O to the features of different scales to control the weight of each layer of features, wherein the specific distribution mode of O is as follows:
wherein w is i Activating a function for the weight through the SiLU to be more than or equal to 0; let epsilon=0.0001 prevent numerical instability;
the feature layer is subjected to feature fusion in a weighted mode, and the specific mode is as follows:
wherein P is i td Is P i Intermediate properties of layers, P i out Is P i Layer output characteristics, resize will P i-1 、P i+1 Feature layer conversion to and P i The same size;
secondly, an SPD-Conv module is used for replacing an original CNN module to obtain a Yolov5-SPD module, wherein the Yolov5-SPD module comprises an SPD layer and a non-strided convolutio layer; the SPD layer performs downsampling on the original feature map, cuts a certain feature map according to proportion to obtain a series of sub-feature maps, and splices the sub-feature maps according to channels to obtain an intermediate feature map, wherein the specific mode is as follows:
f m-1,n-1 =X[scale-1:m:scale,scale-1:n:scale];
wherein X is an original feature map, the size is m multiplied by n, and scale is a scaling factor;
the non-strided convolutio layer uses a non-stride convolution mode to keep the feature information for discrimination as far as possible, and simultaneously controls the depth and the width of the intermediate feature map to meet the requirements of the depth and the width of the subsequent network; the Yolov5-SPD module is used for replacing the original CNN for processing the low-resolution and smaller targets, so that the precision of identifying the low-resolution and smaller targets can be improved;
then, replace the original IoU function with the EIoU loss function; the GIoU in the EIoU loss function can split the loss term of the aspect ratio into the difference value of the predicted width and height and the minimum external frame width and height respectively; meanwhile, focal Loss is introduced, so that the optimized contribution of a large number of anchor frames which are less overlapped with the target frame to BBox regression is reduced, and the regression process is more focused on high-quality frames, wherein the specific formula is as follows:
E loss =IoU loss +dis loss +asp loss
wherein dis loss As center point loss, asp loss For length and width loss ρ 2 (b,b gt ) Euclidean distance, ρ representing the center point of the predicted and real frames 2 (w,w gt )、ρ 2 (h,h gt ) Representing the Euclidean distance of the width and height of the prediction and real frames, respectively, c representing the diagonal distance of the minimum enclosed region containing both the prediction and real frames, c w Representing the width of the smallest closed area containing both the predicted and real frames, c h Representing the height of the minimum closed area containing both the predicted and real frames;
finally, replacing the SiLU activation function by using the Mish activation function; the Mish activation function has a lower bound, has smaller weight on a negative half shaft, can prevent the occurrence of neuron necrosis phenomenon, and can generate stronger regularization effect; a small amount of negative information is reserved, so that the phenomenon of the Dying ReLU of the ReLU is avoided, and better expression and information flow are facilitated; the specific formula of the Mish activation function is:
Mish(x)=x*Tanh(Softplus(x));
where Tanh is a hyperbolic tangent function, softplus is an activation function, which can be considered as smoothing of ReLu;
step 4: and placing the image to be detected into a best. Pt model to obtain a detection result.
2. The improved YOLOv 5-based vehicle-mounted object detection method of claim 1, wherein: before the processed picture training set, the picture testing set and the picture verifying set are put into the improved Yolov5 for training, the network training parameters need to be set in the specific operation mode: the iteration number is set to 200, the bitchsize is 16, and the initial learning rate is 0.0001.
3. The improved YOLOv 5-based vehicle-mounted object detection method of claim 1, wherein: the step 1 of collecting the front image of the vehicle through the camera comprises the following specific operation modes: the camera is arranged at the top of the vehicle and used for collecting images in front of the vehicle; during the running process of the vehicle, the camera can collect video streams in front of the vehicle.
4. The improved YOLOv 5-based vehicle-mounted object detection method of claim 1, wherein: and step 2, extracting key frames from the video streams acquired by the cameras respectively, namely extracting current frames from the video streams acquired by the cameras at intervals of 1s as key frames, and storing the key frames into a picture data set.
5. The improved YOLOv 5-based vehicle-mounted object detection method of claim 1, wherein: the preprocessing of the collected picture data set in the step 2 specifically includes the steps of: removing pictures which do not contain targets, have fuzzy characteristics and have disordered backgrounds; labeling the screened pictures, labeling targets to be detected on the pictures, such as culverts, height limiting rods, trees and other barriers by using rectangular frames, recording names of the targets and coordinates of the rectangular frames, and generating txt files for storage; finally, dividing the picture data set into a training set, a testing set and a verification set according to the ratio of 7:2:1.
CN202211506283.8A 2022-11-29 2022-11-29 Vehicle-mounted target detection method based on improved YOLOv5 Active CN115731533B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211506283.8A CN115731533B (en) 2022-11-29 2022-11-29 Vehicle-mounted target detection method based on improved YOLOv5

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211506283.8A CN115731533B (en) 2022-11-29 2022-11-29 Vehicle-mounted target detection method based on improved YOLOv5

Publications (2)

Publication Number Publication Date
CN115731533A CN115731533A (en) 2023-03-03
CN115731533B true CN115731533B (en) 2024-04-05

Family

ID=85299062

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211506283.8A Active CN115731533B (en) 2022-11-29 2022-11-29 Vehicle-mounted target detection method based on improved YOLOv5

Country Status (1)

Country Link
CN (1) CN115731533B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116416504B (en) * 2023-03-16 2024-02-06 北京瑞拓电子技术发展有限公司 Expressway foreign matter detection system and method based on vehicle cooperation
CN116452972B (en) * 2023-03-17 2024-06-21 兰州交通大学 Transformer end-to-end remote sensing image vehicle target detection method
CN116342596B (en) * 2023-05-29 2023-11-28 云南电网有限责任公司 YOLOv5 improved substation equipment nut defect identification detection method
CN116994243B (en) * 2023-07-31 2024-04-02 安徽省农业科学院农业经济与信息研究所 Lightweight agricultural pest detection method and system
CN117058526A (en) * 2023-10-11 2023-11-14 创思(广州)电子科技有限公司 Automatic cargo identification method and system based on artificial intelligence
CN118190168B (en) * 2023-10-19 2024-10-18 重庆大学 Temperature monitoring method and system for key area of high-speed vehicle
CN117668669B (en) * 2024-02-01 2024-04-19 齐鲁工业大学(山东省科学院) Pipeline safety monitoring method and system based on improvement YOLOv (YOLOv)
CN117876848B (en) * 2024-03-13 2024-05-07 成都理工大学 Complex environment falling stone detection method based on improvement yolov5
CN118674723A (en) * 2024-08-23 2024-09-20 南京华视智能科技股份有限公司 Method for detecting virtual edges of coated ceramic area based on deep learning

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113435425A (en) * 2021-08-26 2021-09-24 绵阳职业技术学院 Wild animal emergence and emergence detection method based on recursive multi-feature fusion
CN113989613A (en) * 2021-10-13 2022-01-28 上海海事大学 Light-weight high-precision ship target detection method coping with complex environment
CN114120019A (en) * 2021-11-08 2022-03-01 贵州大学 Lightweight target detection method
CN114548363A (en) * 2021-12-29 2022-05-27 淮阴工学院 Unmanned vehicle carried camera target detection method based on YOLOv5
CN114565959A (en) * 2022-02-18 2022-05-31 武汉东信同邦信息技术有限公司 Target detection method and device based on YOLO-SD-Tiny
CN114758288A (en) * 2022-03-15 2022-07-15 华北电力大学 Power distribution network engineering safety control detection method and device
CN114821032A (en) * 2022-03-11 2022-07-29 山东大学 Special target abnormal state detection and tracking method based on improved YOLOv5 network
CN114926722A (en) * 2022-04-19 2022-08-19 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Method and storage medium for detecting scale self-adaptive target based on YOLOv5
CN115311524A (en) * 2022-08-16 2022-11-08 盐城工学院 Small target detection algorithm fusing attention and multi-scale double pyramids
CN115331256A (en) * 2022-07-31 2022-11-11 南京邮电大学 People flow statistical method based on mutual supervision

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190339688A1 (en) * 2016-05-09 2019-11-07 Strong Force Iot Portfolio 2016, Llc Methods and systems for data collection, learning, and streaming of machine signals for analytics and maintenance using the industrial internet of things
CN109919251B (en) * 2019-03-21 2024-08-09 腾讯科技(深圳)有限公司 Image-based target detection method, model training method and device
CN112884064B (en) * 2021-03-12 2022-07-29 迪比(重庆)智能科技研究院有限公司 Target detection and identification method based on neural network
WO2022236824A1 (en) * 2021-05-14 2022-11-17 北京大学深圳研究生院 Target detection network construction optimization method, apparatus and device, and medium and product

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113435425A (en) * 2021-08-26 2021-09-24 绵阳职业技术学院 Wild animal emergence and emergence detection method based on recursive multi-feature fusion
CN113989613A (en) * 2021-10-13 2022-01-28 上海海事大学 Light-weight high-precision ship target detection method coping with complex environment
CN114120019A (en) * 2021-11-08 2022-03-01 贵州大学 Lightweight target detection method
CN114548363A (en) * 2021-12-29 2022-05-27 淮阴工学院 Unmanned vehicle carried camera target detection method based on YOLOv5
CN114565959A (en) * 2022-02-18 2022-05-31 武汉东信同邦信息技术有限公司 Target detection method and device based on YOLO-SD-Tiny
CN114821032A (en) * 2022-03-11 2022-07-29 山东大学 Special target abnormal state detection and tracking method based on improved YOLOv5 network
CN114758288A (en) * 2022-03-15 2022-07-15 华北电力大学 Power distribution network engineering safety control detection method and device
CN114926722A (en) * 2022-04-19 2022-08-19 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Method and storage medium for detecting scale self-adaptive target based on YOLOv5
CN115331256A (en) * 2022-07-31 2022-11-11 南京邮电大学 People flow statistical method based on mutual supervision
CN115311524A (en) * 2022-08-16 2022-11-08 盐城工学院 Small target detection algorithm fusing attention and multi-scale double pyramids

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
《No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects》;Raja Sunkara and Tie Luo;《ECML PKDD 2022》;第5-6页 *
《改进YOLOv5网络在遥感图像目标检测中的应用》;周华平 ,郭伟;《遥感信息》;20221110;第37卷(第5期);23-30 *
基于双注意力机制的遥感图像目标检测;周幸;陈立福;;计算机与现代化;20200815(08);5-11 *

Also Published As

Publication number Publication date
CN115731533A (en) 2023-03-03

Similar Documents

Publication Publication Date Title
CN115731533B (en) Vehicle-mounted target detection method based on improved YOLOv5
CN112200161B (en) Face recognition detection method based on mixed attention mechanism
Dewi et al. Weight analysis for various prohibitory sign detection and recognition using deep learning
CN114202672A (en) Small target detection method based on attention mechanism
CN111126472A (en) Improved target detection method based on SSD
CN110929577A (en) Improved target identification method based on YOLOv3 lightweight framework
CN112232371B (en) American license plate recognition method based on YOLOv3 and text recognition
CN112528961B (en) Video analysis method based on Jetson Nano
CN113255589B (en) Target detection method and system based on multi-convolution fusion network
CN109670555B (en) Instance-level pedestrian detection and pedestrian re-recognition system based on deep learning
CN111126278A (en) Target detection model optimization and acceleration method for few-category scene
CN117079163A (en) Aerial image small target detection method based on improved YOLOX-S
CN112561801A (en) Target detection model training method based on SE-FPN, target detection method and device
Fan et al. A novel sonar target detection and classification algorithm
CN115861619A (en) Airborne LiDAR (light detection and ranging) urban point cloud semantic segmentation method and system of recursive residual double-attention kernel point convolution network
CN116665153A (en) Road scene segmentation method based on improved deep bv3+ network model
CN113903022A (en) Text detection method and system based on feature pyramid and attention fusion
CN112084897A (en) Rapid traffic large-scene vehicle target detection method of GS-SSD
CN116597411A (en) Method and system for identifying traffic sign by unmanned vehicle in extreme weather
CN111339950A (en) Remote sensing image target detection method
CN112785610B (en) Lane line semantic segmentation method integrating low-level features
CN117710841A (en) Small target detection method and device for aerial image of unmanned aerial vehicle
CN116311004B (en) Video moving target detection method based on sparse optical flow extraction
CN117237614A (en) Deep learning-based lake surface floater small target detection method
CN114782762B (en) Garbage image detection method and community garbage station

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant