CN115731533B - Vehicle-mounted target detection method based on improved YOLOv5 - Google Patents
Vehicle-mounted target detection method based on improved YOLOv5 Download PDFInfo
- Publication number
- CN115731533B CN115731533B CN202211506283.8A CN202211506283A CN115731533B CN 115731533 B CN115731533 B CN 115731533B CN 202211506283 A CN202211506283 A CN 202211506283A CN 115731533 B CN115731533 B CN 115731533B
- Authority
- CN
- China
- Prior art keywords
- feature
- layer
- yolov5
- improved
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 38
- 238000012549 training Methods 0.000 claims abstract description 32
- 238000012360 testing method Methods 0.000 claims abstract description 17
- 238000012795 verification Methods 0.000 claims abstract description 11
- 230000000694 effects Effects 0.000 claims abstract description 8
- 238000000605 extraction Methods 0.000 claims abstract description 7
- 238000007781 pre-processing Methods 0.000 claims abstract description 6
- 230000006870 function Effects 0.000 claims description 41
- 230000007246 mechanism Effects 0.000 claims description 26
- 230000004913 activation Effects 0.000 claims description 17
- 238000000034 method Methods 0.000 claims description 12
- 238000011176 pooling Methods 0.000 claims description 12
- 230000002457 bidirectional effect Effects 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 7
- 238000012545 processing Methods 0.000 claims description 7
- 238000002372 labelling Methods 0.000 claims description 6
- 230000003213 activating effect Effects 0.000 claims description 5
- 230000004927 fusion Effects 0.000 claims description 5
- 239000000284 extract Substances 0.000 claims description 4
- 230000006872 improvement Effects 0.000 claims description 4
- 230000004888 barrier function Effects 0.000 claims description 3
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000009499 grossing Methods 0.000 claims description 3
- 230000017074 necrotic cell death Effects 0.000 claims description 3
- 210000002569 neuron Anatomy 0.000 claims description 3
- 238000013527 convolutional neural network Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 238000011161 development Methods 0.000 description 3
- 230000007547 defect Effects 0.000 description 2
- 206010039203 Road traffic accident Diseases 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 210000000988 bone and bone Anatomy 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Image Analysis (AREA)
Abstract
According to the vehicle-mounted target detection method based on the improved Yolov5, the Yolov5 network structure is improved, and the obstacle detection of a complex road is realized; the specific operation steps are as follows: step 1: collecting a front image of a vehicle through a camera; step 2: the video streams acquired by the cameras are respectively subjected to key frame extraction to acquire a picture data set for subsequent model training; preprocessing the collected picture data set, and dividing the picture data set into a training set, a testing set and a verification set according to a proper proportion; step 3: configuring a related environment, building an improved Yolov5 network structure, and putting the processed picture training set, the picture test set and the picture verification set into the improved Yolov5 for training; after training is completed, a best test. Pt model with the best detection effect is obtained; step 4: and placing the image to be detected into a best. Pt model to obtain a detection result. The invention can keep higher recognition accuracy under the conditions of small targets and low resolution, and improves the accuracy of target detection.
Description
Technical Field
The invention relates to the technical field of computer image processing, in particular to a vehicle-mounted target detection method based on improved YOLOv 5.
Background
With the rapid development of the logistics industry and the rapid growth of the travel demands of people, the road transportation industry of China is rapidly developing. With the rapid development of the transportation industry, the road condition is complex, and the frequency of traffic accidents is also higher and higher. At present, a vehicle-mounted obstacle recognition system mostly adopts a laser radar, an ultrasonic sensor and the like; the laser radar, the ultrasonic sensor and other equipment have higher cost, large calculation workload and inconvenient deployment and use.
In the field of target detection, the current mainstream is to use a deep learning neural network, and enable the network to have the capability of target recognition through training. The current mainstream target detection network has two structures, one is a one-stage type network represented by YOLO and one is a two-stage type network represented by Fast-RCNN. the two-stage type network extracts the region of interest from the input image, locates the target first, then extracts the characteristics of each region of interest, and finally uses a multi-classifier to identify the category of each region, wherein the detection accuracy is higher but the detection speed is slower; the one-stage type network integrates the positioning and classification into one network to be completed independently, so that the detection speed is greatly improved, but a part of detection precision is lost.
The existing YOLO network has a faster development trend, has a larger improvement in detection speed and accuracy, and is not weaker than the two-stage type network. YOLOv5 is used as the latest version of the current YOLOv network series, the performance is obviously improved compared with the prior version, but the detection method has the defects in low resolution and small target detection, and the detection precision under the complex condition is easy to interfere.
Disclosure of Invention
In order to solve the problem that the YOLOv5 has defects in low resolution and small target detection and the detection precision is easy to be interfered under the complex condition, the invention provides a vehicle-mounted target detection method based on improved YOLOv5, which uses an SPD-Conv structure to replace the original Conv structure and improves the precision of identifying the low resolution and small target; the technical problems can be effectively solved.
The invention is realized by the following technical scheme:
according to the vehicle-mounted target detection method based on the improved Yolov5, the Yolov5 network structure is improved, and the obstacle detection of a complex road is realized; the specific operation steps are as follows:
step 1: collecting a front image of a vehicle through a camera;
step 2: the video streams acquired by the cameras are respectively subjected to key frame extraction to acquire a picture data set for subsequent model training; preprocessing the collected picture data set, and dividing the picture data set into a training set, a testing set and a verification set according to a proper proportion;
step 3: configuring a related environment, building an improved Yolov5 network structure, and putting the processed picture training set, the picture test set and the picture verification set into the improved Yolov5 for training; after training is completed, a best test. Pt model with the best detection effect is obtained;
the improved Yolov5 network structure is built, the Yolov5 is improved, and the improvement points of the Yolov5 are as follows: replacing the original neck network of Yolov5 with a weighted bidirectional pyramid network BiFPN for feature extraction; introducing an attention mechanism in a backbone network, adding a CBAM module, and combining the attention mechanisms of two dimensions of a feature channel and a feature space; the SPD-Conv module is used for replacing the original CNN module to obtain a Yolov5-SPD module which is used for processing low-resolution and smaller targets; replacing the original IoU function with an EIoU loss function; replacing the SilU activation function with the Mish activation function;
step 4: and placing the image to be detected into a best. Pt model to obtain a detection result.
Further, in the step 3, a feature fusion neck network is introduced into the Yolov5 backbone network, and feature extraction is performed by a weighted bidirectional pyramid network BiFPN, which specifically comprises the following operation modes: introducing a CBAM convolution attention module into a Backbone network of a Yolov5 network, wherein the CBAM convolution attention module combines a channel attention mechanism and a space attention mechanism; the method comprises the steps that a Backbone network of a backhaul extracts features, a CBAM module carries out global average pooling and global maximum pooling on single feature layers in input feature layers respectively by a focus mechanism of a channel, converts the single feature layers into two 1x1 forms, adds the results of the global average pooling and the global maximum pooling by using a full connection layer, carries out sigmoid operation on the added results to obtain a weight of each feature channel, and multiplies the weight by an original feature layer to obtain features of the channel;
the attention mechanism of the CBAM module for the space is characterized in that the maximum value and the average value of each feature point on an input feature layer are taken, the maximum value and the average value are stacked, a single feature layer is converted into 2 channels, the number of the channels is adjusted by convolution with the number of the channels being 1 again, the single feature layer is converted into 1 channel again, sigmoid operation is carried out on the processed feature points, the weight of each feature point is obtained, and the feature of the feature point can be obtained by multiplying the weight with the feature point on the original feature layer;
the attention mechanism highlights the key part in the characteristics, simultaneously focuses on the position information and semantic information of the target, introduces the attention mechanism in both the bottom characteristic layer and the high-level characteristic layer of the Backbone network of the Backbone, namely adds a CBAM module in the 6 th layer, the 11 th layer, the 16 th layer and the last layer, highlights the bottom and high-level characteristic information, and introduces the CBAM module in the last layer of the Backbone network of the Backbone to meet the requirement of a subsequent Neck bottleneck structure.
Furthermore, the CBAM module is introduced into the last layer of the Backbone network of the backhaul to meet the requirement of the subsequent Neck bottleneck structure, and the specific operation mode is as follows: the bi-directional pyramid network BiFPN introduces a learnable weight to the features of different scales so as to better balance the feature information of the different scales; namely, introducing a learnable weight parameter O to the features of different scales to control the weight of each layer of features, wherein the specific distribution mode of O is as follows:
wherein w is i Activating a function for the weight through the SiLU to be more than or equal to 0; let epsilon=0.0001 prevent numerical instability;
the feature layer is subjected to feature fusion in a weighted mode, and the specific mode is as follows:
wherein P is i td Is P i Intermediate properties of layers, P i out Is P i Layer output characteristics, resize will P i-1 、P i+1 Feature layer conversion to and P i The same dimensions.
Further, the SPD-Conv module in the step 3 is used for replacing the original CNN module to obtain a Yolov5-SPD module; the Yolov5-SPD module comprises an SPD layer and a non-strided convolutio layer; the SPD layer performs downsampling on the original feature map, cuts a certain feature map according to proportion to obtain a series of sub-feature maps, and splices the sub-feature maps according to channels to obtain an intermediate feature map, wherein the specific mode is as follows:
f m-1,n-1 =X[scale-1:m:scale,scale-1:n:scale];
wherein X is an original feature map, the size is m multiplied by n, and scale is a scaling factor;
the non-strided convolutio layer uses a non-stride convolution mode to keep the feature information for discrimination as far as possible, and simultaneously controls the depth and the width of the intermediate feature map to meet the requirements of the depth and the width of the subsequent network; the Yolov5-SPD module is used for replacing the original CNN for processing the low-resolution and smaller targets, so that the precision of identifying the low-resolution and smaller targets can be improved.
Further, replacing the original IoU function with the EIoU loss function in the step 3, where the GIoU in the EIoU loss function can split the loss term of the aspect ratio into the difference value between the predicted width and height and the minimum external frame width and height respectively; meanwhile, focal Loss is introduced, so that the optimized contribution of a large number of anchor frames which are less overlapped with the target frame to BBox regression is reduced, and the regression process is more focused on high-quality frames, wherein the specific formula is as follows:
E loss =IoU loss +dis loss +asp loss
wherein dis loss As center point loss, asp loss For length and width loss ρ 2 (b,b gt ) Euclidean distance, ρ representing the center point of the predicted and real frames 2 (w,w gt )、ρ 2 (h,h gt ) Representing the Euclidean distance of the width and height of the prediction and real frames, respectively, c representing the diagonal distance of the minimum enclosed region containing both the prediction and real frames, c w Representing the width of the smallest closed area containing both the predicted and real frames, c h Representing the height of the smallest closed area containing both the predicted and real frames.
Further, in the step 3, the function of activating Mish is used to replace the SiLU activation function, the function of activating Mish has a lower bound, and the negative half-axis has smaller weight, so that the occurrence of neuron necrosis phenomenon can be prevented, and a stronger regularization effect can be generated; a small amount of negative information is reserved, so that the phenomenon of the Dying ReLU of the ReLU is avoided, and better expression and information flow are facilitated; the specific formula of the Mish activation function is:
Mish(x)=x*Tanh(Softplus(x));
where Tanh is a hyperbolic tangent function, softplus is an activation function, which can be seen as a smoothing of ReLu.
Further, before the processed picture training set, the picture testing set and the picture verifying set are put into the improved Yolov5 for training in the step 3, network training parameters need to be set, and the specific operation mode is as follows: the iteration number is set to 200, the bitchsize is 16, and the initial learning rate is 0.0001.
Further, in the step 1, the front image of the vehicle is collected by the camera, and the specific operation mode is as follows: the camera is arranged at the top of the vehicle and used for collecting images in front of the vehicle; during the running process of the vehicle, the camera can collect video streams in front of the vehicle.
Further, in step 2, the video streams collected by the camera are respectively extracted into key frames, that is, the current frames are extracted from the video streams collected by the camera at intervals of 1s and are used as key frames, and the key frames are stored in the picture data set.
Further, the preprocessing of the collected picture data set in step 2 specifically includes: removing pictures which do not contain targets, have fuzzy characteristics and have disordered backgrounds; labeling the screened pictures, labeling targets to be detected on the pictures, such as culverts, height limiting rods, trees and other barriers by using rectangular frames, recording names of the targets and coordinates of the rectangular frames, and generating txt files for storage; finally, dividing the picture data set into a training set, a testing set and a verification set according to the ratio of 7:2:1.
Advantageous effects
Compared with the prior art, the vehicle-mounted target detection method based on the improved YOLOv5 has the following beneficial effects:
(1) According to the technical scheme, the vehicle-mounted camera and the embedded equipment are directly mounted on the vehicle, so that additional hardware is not needed, and the hardware cost is saved; the camera is used for collecting the front image of the vehicle, inputting the image into the model, and judging whether a certain obstacle exists in front of the vehicle, so that the purpose of monitoring the road condition of the vehicle is achieved. In addition, replacing the original neck network of Yolov5 with a weighted bidirectional pyramid network BiFPN for feature extraction; introducing an attention mechanism in a backbone network, adding a CBAM module, and combining the attention mechanisms of two dimensions of a feature channel and a feature space; the SPD-Conv module is used for replacing the original CNN module to obtain a Yolov5-SPD module, and the method is used for processing low-resolution and small targets to improve the accuracy of identifying the low-resolution and small targets; replacing the original IoU function with an EIoU loss function; the loss function adopts EIoU, so that the difference between the prediction frame and the real frame can be calculated more effectively, and the precision of the model is improved.
(2) According to the technical scheme, an improved BiFPN bidirectional weighted pyramid network is introduced at the neg end, learnable weights are introduced for different scale features to better balance the feature information of different scales, a SiLU activation function is replaced by a Mish activation function, the Dying ReLU phenomenon of ReLU is avoided, and better expression and information flow are facilitated.
(3) According to the technical scheme, the original IoU function is replaced by the EIoU loss function, so that the problem of sample unbalance in the bounding box regression task is solved, namely the optimized contribution of a large number of anchor boxes which are less overlapped with the target box to BBox regression is reduced, and the regression process is more focused on high-quality boxes.
Drawings
FIG. 1 is a schematic overall flow chart of the present invention.
FIG. 2 is a diagram of an improved YOLOv5 network architecture in accordance with the present invention.
FIG. 3 is a schematic diagram of a CBAM attention module according to the present invention.
FIG. 4 is a diagram of a BiFPN network architecture in accordance with the present invention.
FIG. 5 is a schematic diagram of a Yolov5-SPD module according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments.
Example 1:
1-5, an improved Yolov5 network structure is improved to realize obstacle detection of a complex road based on an improved Yolov5 vehicle-mounted target detection method; the specific operation steps are as follows:
step 1: collecting a front image of a vehicle through a camera;
the camera is arranged at the top of the vehicle and used for collecting images in front of the vehicle; during the running process of the vehicle, the camera can collect video streams in front of the vehicle.
Step 2: and respectively extracting key frames from the video streams acquired by the cameras, namely extracting current frames from the video streams acquired by the cameras at intervals of 1s as key frames, and storing the key frames into a picture data set. Acquiring a picture data set of subsequent model training; preprocessing the collected picture data set to remove pictures which do not contain targets, have fuzzy characteristics and have disordered backgrounds; labeling the screened pictures, labeling targets to be detected on the pictures, such as culverts, height limiting rods, trees and other barriers by using rectangular frames, recording names of the targets and coordinates of the rectangular frames, and generating txt files for storage; finally, dividing the picture data set into a training set, a testing set and a verification set according to the ratio of 7:2:1.
Step 3: configuring a related environment, building an improved Yolov5 network structure, and putting the processed picture training set, the picture test set and the picture verification set into the improved Yolov5 for training; after training is completed, a best test. Pt model with the best detection effect is obtained; the method specifically comprises the following steps:
the first step: the improved Yolov5 network structure is built, the Yolov5 is improved, and the improvement points of the Yolov5 are as follows: and replacing the original neck network of Yolov5 with a weighted bidirectional pyramid network BiFPN for feature extraction.
Introducing a CBAM convolution attention module into a Backbone network of a Yolov5 network, wherein the CBAM convolution attention module combines a channel attention mechanism and a space attention mechanism; the method comprises the steps of extracting features by a Backbone network of a backhaul, respectively carrying out global average pooling and global maximum pooling on a single feature layer in input feature layers by a CBAM module on a attention mechanism of a channel, converting the single feature layer into two 1x1 forms, adding the results of the global average pooling and the global maximum pooling by using a full connection layer, carrying out sigmoid operation on the added results to obtain a weight of each feature channel, and multiplying the weight by the original feature layer to obtain the features of the channel.
The attention mechanism of the CBAM module for the space is characterized in that the maximum value and the average value of each feature point on an input feature layer are taken, the maximum value and the average value are stacked, a single feature layer is converted into 2 channels, the number of the channels is adjusted by convolution with the number of the channels being 1 again, the single feature layer is converted into 1 channel again, sigmoid operation is carried out on the processed feature points, the weight of each feature point is obtained, and the feature of the feature point can be obtained by multiplying the weight with the feature point on the original feature layer.
The attention mechanism highlights the key part in the characteristics, simultaneously focuses on the position information and semantic information of the target, introduces the attention mechanism in both the bottom characteristic layer and the high-level characteristic layer of the Backbone network of the Backbone, namely adds a CBAM module in the 6 th layer, the 11 th layer, the 16 th layer and the last layer, highlights the bottom and high-level characteristic information, and introduces the CBAM module in the last layer of the Backbone network of the Backbone to meet the requirement of a subsequent Neck bottleneck structure.
And a second step of: introducing an attention mechanism in a backbone network, adding a CBAM module, and combining the attention mechanisms of two dimensions of a feature channel and a feature space; a CBAM module is introduced into the last layer of the Backbone network of the back bone to meet the requirement of a subsequent Neck bottleneck structure, and a bi-directional pyramid network BiFPN introduces a learnable weight for different scale features so as to better balance the feature information of different scales; namely, introducing a learnable weight parameter O to the features of different scales to control the weight of each layer of features, wherein the specific distribution mode of O is as follows:
wherein w is i Activating a function for the weight through the SiLU to be more than or equal to 0; let epsilon=0.0001 prevent numerical instability;
the feature layer is subjected to feature fusion in a weighted mode, and the specific mode is as follows:
wherein P is i td Is P i Intermediate properties of layers, P i out Is P i Layer output characteristics, resize will P i-1 、P i+1 Feature layer conversion to and P i The same dimensions.
And a third step of: the SPD-Conv module is used for replacing the original CNN module to obtain a Yolov5-SPD module which is used for processing low-resolution and smaller targets; the Yolov5-SPD module comprises an SPD layer and a non-strided convolutio layer; the SPD layer performs downsampling on the original feature map, cuts a certain feature map according to proportion to obtain a series of sub-feature maps, and splices the sub-feature maps according to channels to obtain an intermediate feature map, wherein the specific mode is as follows:
f m-1,n-1 =X[scale-1:m:scale,scale-1:n:scale];
wherein X is an original feature map, the size is m multiplied by n, and scale is a scaling factor;
the non-strided convolutio layer uses a non-stride convolution mode to keep the feature information for discrimination as far as possible, and simultaneously controls the depth and the width of the intermediate feature map to meet the requirements of the depth and the width of the subsequent network; the Yolov5-SPD module is used for replacing the original CNN for processing the low-resolution and smaller targets, so that the precision of identifying the low-resolution and smaller targets can be improved.
Fourth step: replacing the original IoU function with an EIoU loss function; the GIoU in the EIoU loss function can split the loss term of the aspect ratio into the difference value of the predicted width and height and the minimum external frame width and height respectively; meanwhile, focal Loss is introduced, so that the optimized contribution of a large number of anchor frames which are less overlapped with the target frame to BBox regression is reduced, and the regression process is more focused on high-quality frames, wherein the specific formula is as follows:
E loss =IoU loss +dis loss +asp loss
wherein dis loss As center point loss, asp loss For length and width loss ρ 2 (b,b gt ) Euclidean distance, ρ representing the center point of the predicted and real frames 2 (w,w gt )、ρ 2 (h,h gt ) Representing the Euclidean distance of the width and height of the prediction and real frames, respectively, c representing the diagonal distance of the minimum enclosed region containing both the prediction and real frames, c w Representing the width of the smallest closed area containing both the predicted and real frames, c h Representing the height of the smallest closed area containing both the predicted and real frames.
Fifth step: replacing the SilU activation function with the Mish activation function; the Mish activation function has a lower bound, has smaller weight on a negative half shaft, can prevent the occurrence of neuron necrosis phenomenon, and can generate stronger regularization effect; a small amount of negative information is reserved, so that the phenomenon of the Dying ReLU of the ReLU is avoided, and better expression and information flow are facilitated; the specific formula of the Mish activation function is:
Mish(x)=x*Tanh(Softplus(x));
where Tanh is a hyperbolic tangent function, softplus is an activation function, which can be seen as a smoothing of ReLu.
Sixth step: before the processed picture training set, picture test set and picture verification set are put into the improved Yolov5 for training, network training parameters need to be set, and the specific operation mode is as follows: the iteration number is set to 200, the bitchsize is 16, and the initial learning rate is 0.0001.
Step 4: and placing the image to be detected into a best. Pt model to obtain a detection result.
Claims (5)
1. According to the vehicle-mounted target detection method based on the improved Yolov5, the Yolov5 network structure is improved, and the obstacle detection of a complex road is realized; the specific operation steps are as follows:
step 1: collecting a front image of a vehicle through a camera;
step 2: the video streams acquired by the cameras are respectively subjected to key frame extraction to acquire a picture data set for subsequent model training; preprocessing the collected picture data set, and dividing the picture data set into a training set, a testing set and a verification set according to a proper proportion;
step 3: configuring a related environment, building an improved Yolov5 network structure, and putting the processed picture training set, the picture test set and the picture verification set into the improved Yolov5 for training; after training is completed, a best test. Pt model with the best detection effect is obtained;
the improved Yolov5 network structure is built, the Yolov5 is improved, and the improvement points of the Yolov5 are as follows:
firstly, replacing an original neck network of Yolov5 with a weighted bidirectional pyramid network BiFPN to extract features; introducing an attention mechanism in a backbone network, adding a CBAM module, and combining the attention mechanisms of two dimensions of a feature channel and a feature space; introducing a feature fusion neck network into a Yolov5 backbone network, and extracting features by a weighted bidirectional pyramid network BiFPN, wherein the specific operation mode is as follows:
introducing a CBAM convolution attention module into a Backbone network of a Yolov5 network, wherein the CBAM convolution attention module combines a channel attention mechanism and a space attention mechanism; the method comprises the steps that a Backbone network of a backhaul extracts features, a CBAM module carries out global average pooling and global maximum pooling on single feature layers in input feature layers respectively by a focus mechanism of a channel, converts the single feature layers into two 1x1 forms, adds the results of the global average pooling and the global maximum pooling by using a full connection layer, carries out sigmoid operation on the added results to obtain a weight of each feature channel, and multiplies the weight by an original feature layer to obtain features of the channel;
the attention mechanism of the CBAM module for the space is characterized in that the maximum value and the average value of each feature point on an input feature layer are taken, the maximum value and the average value are stacked, a single feature layer is converted into 2 channels, the number of the channels is adjusted by convolution with the number of the channels being 1 again, the single feature layer is converted into 1 channel again, sigmoid operation is carried out on the processed feature points, the weight of each feature point is obtained, and the feature of the feature point can be obtained by multiplying the weight with the feature point on the original feature layer;
the attention mechanism highlights the key part in the characteristics, simultaneously focuses on the position information and semantic information of the target, introduces the attention mechanism in both the bottom characteristic layer and the high-level characteristic layer of the Backbone network of the Backbone, namely adds a CBAM module in the 6 th layer, the 11 th layer, the 16 th layer and the last layer, highlights the bottom and high-level characteristic information, introduces the CBAM module in the last layer of the Backbone network of the Backbone to meet the requirement of the subsequent Neck bottleneck structure, and specifically operates in the following modes: the bi-directional pyramid network BiFPN introduces a learnable weight to the features of different scales so as to better balance the feature information of the different scales; namely, introducing a learnable weight parameter O to the features of different scales to control the weight of each layer of features, wherein the specific distribution mode of O is as follows:
wherein w is i Activating a function for the weight through the SiLU to be more than or equal to 0; let epsilon=0.0001 prevent numerical instability;
the feature layer is subjected to feature fusion in a weighted mode, and the specific mode is as follows:
wherein P is i td Is P i Intermediate properties of layers, P i out Is P i Layer output characteristics, resize will P i-1 、P i+1 Feature layer conversion to and P i The same size;
secondly, an SPD-Conv module is used for replacing an original CNN module to obtain a Yolov5-SPD module, wherein the Yolov5-SPD module comprises an SPD layer and a non-strided convolutio layer; the SPD layer performs downsampling on the original feature map, cuts a certain feature map according to proportion to obtain a series of sub-feature maps, and splices the sub-feature maps according to channels to obtain an intermediate feature map, wherein the specific mode is as follows:
f m-1,n-1 =X[scale-1:m:scale,scale-1:n:scale];
wherein X is an original feature map, the size is m multiplied by n, and scale is a scaling factor;
the non-strided convolutio layer uses a non-stride convolution mode to keep the feature information for discrimination as far as possible, and simultaneously controls the depth and the width of the intermediate feature map to meet the requirements of the depth and the width of the subsequent network; the Yolov5-SPD module is used for replacing the original CNN for processing the low-resolution and smaller targets, so that the precision of identifying the low-resolution and smaller targets can be improved;
then, replace the original IoU function with the EIoU loss function; the GIoU in the EIoU loss function can split the loss term of the aspect ratio into the difference value of the predicted width and height and the minimum external frame width and height respectively; meanwhile, focal Loss is introduced, so that the optimized contribution of a large number of anchor frames which are less overlapped with the target frame to BBox regression is reduced, and the regression process is more focused on high-quality frames, wherein the specific formula is as follows:
E loss =IoU loss +dis loss +asp loss
wherein dis loss As center point loss, asp loss For length and width loss ρ 2 (b,b gt ) Euclidean distance, ρ representing the center point of the predicted and real frames 2 (w,w gt )、ρ 2 (h,h gt ) Representing the Euclidean distance of the width and height of the prediction and real frames, respectively, c representing the diagonal distance of the minimum enclosed region containing both the prediction and real frames, c w Representing the width of the smallest closed area containing both the predicted and real frames, c h Representing the height of the minimum closed area containing both the predicted and real frames;
finally, replacing the SiLU activation function by using the Mish activation function; the Mish activation function has a lower bound, has smaller weight on a negative half shaft, can prevent the occurrence of neuron necrosis phenomenon, and can generate stronger regularization effect; a small amount of negative information is reserved, so that the phenomenon of the Dying ReLU of the ReLU is avoided, and better expression and information flow are facilitated; the specific formula of the Mish activation function is:
Mish(x)=x*Tanh(Softplus(x));
where Tanh is a hyperbolic tangent function, softplus is an activation function, which can be considered as smoothing of ReLu;
step 4: and placing the image to be detected into a best. Pt model to obtain a detection result.
2. The improved YOLOv 5-based vehicle-mounted object detection method of claim 1, wherein: before the processed picture training set, the picture testing set and the picture verifying set are put into the improved Yolov5 for training, the network training parameters need to be set in the specific operation mode: the iteration number is set to 200, the bitchsize is 16, and the initial learning rate is 0.0001.
3. The improved YOLOv 5-based vehicle-mounted object detection method of claim 1, wherein: the step 1 of collecting the front image of the vehicle through the camera comprises the following specific operation modes: the camera is arranged at the top of the vehicle and used for collecting images in front of the vehicle; during the running process of the vehicle, the camera can collect video streams in front of the vehicle.
4. The improved YOLOv 5-based vehicle-mounted object detection method of claim 1, wherein: and step 2, extracting key frames from the video streams acquired by the cameras respectively, namely extracting current frames from the video streams acquired by the cameras at intervals of 1s as key frames, and storing the key frames into a picture data set.
5. The improved YOLOv 5-based vehicle-mounted object detection method of claim 1, wherein: the preprocessing of the collected picture data set in the step 2 specifically includes the steps of: removing pictures which do not contain targets, have fuzzy characteristics and have disordered backgrounds; labeling the screened pictures, labeling targets to be detected on the pictures, such as culverts, height limiting rods, trees and other barriers by using rectangular frames, recording names of the targets and coordinates of the rectangular frames, and generating txt files for storage; finally, dividing the picture data set into a training set, a testing set and a verification set according to the ratio of 7:2:1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211506283.8A CN115731533B (en) | 2022-11-29 | 2022-11-29 | Vehicle-mounted target detection method based on improved YOLOv5 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211506283.8A CN115731533B (en) | 2022-11-29 | 2022-11-29 | Vehicle-mounted target detection method based on improved YOLOv5 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115731533A CN115731533A (en) | 2023-03-03 |
CN115731533B true CN115731533B (en) | 2024-04-05 |
Family
ID=85299062
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211506283.8A Active CN115731533B (en) | 2022-11-29 | 2022-11-29 | Vehicle-mounted target detection method based on improved YOLOv5 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115731533B (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116416504B (en) * | 2023-03-16 | 2024-02-06 | 北京瑞拓电子技术发展有限公司 | Expressway foreign matter detection system and method based on vehicle cooperation |
CN116452972B (en) * | 2023-03-17 | 2024-06-21 | 兰州交通大学 | Transformer end-to-end remote sensing image vehicle target detection method |
CN116342596B (en) * | 2023-05-29 | 2023-11-28 | 云南电网有限责任公司 | YOLOv5 improved substation equipment nut defect identification detection method |
CN116994243B (en) * | 2023-07-31 | 2024-04-02 | 安徽省农业科学院农业经济与信息研究所 | Lightweight agricultural pest detection method and system |
CN117058526A (en) * | 2023-10-11 | 2023-11-14 | 创思(广州)电子科技有限公司 | Automatic cargo identification method and system based on artificial intelligence |
CN118190168B (en) * | 2023-10-19 | 2024-10-18 | 重庆大学 | Temperature monitoring method and system for key area of high-speed vehicle |
CN117668669B (en) * | 2024-02-01 | 2024-04-19 | 齐鲁工业大学(山东省科学院) | Pipeline safety monitoring method and system based on improvement YOLOv (YOLOv) |
CN117876848B (en) * | 2024-03-13 | 2024-05-07 | 成都理工大学 | Complex environment falling stone detection method based on improvement yolov5 |
CN118674723A (en) * | 2024-08-23 | 2024-09-20 | 南京华视智能科技股份有限公司 | Method for detecting virtual edges of coated ceramic area based on deep learning |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113435425A (en) * | 2021-08-26 | 2021-09-24 | 绵阳职业技术学院 | Wild animal emergence and emergence detection method based on recursive multi-feature fusion |
CN113989613A (en) * | 2021-10-13 | 2022-01-28 | 上海海事大学 | Light-weight high-precision ship target detection method coping with complex environment |
CN114120019A (en) * | 2021-11-08 | 2022-03-01 | 贵州大学 | Lightweight target detection method |
CN114548363A (en) * | 2021-12-29 | 2022-05-27 | 淮阴工学院 | Unmanned vehicle carried camera target detection method based on YOLOv5 |
CN114565959A (en) * | 2022-02-18 | 2022-05-31 | 武汉东信同邦信息技术有限公司 | Target detection method and device based on YOLO-SD-Tiny |
CN114758288A (en) * | 2022-03-15 | 2022-07-15 | 华北电力大学 | Power distribution network engineering safety control detection method and device |
CN114821032A (en) * | 2022-03-11 | 2022-07-29 | 山东大学 | Special target abnormal state detection and tracking method based on improved YOLOv5 network |
CN114926722A (en) * | 2022-04-19 | 2022-08-19 | 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) | Method and storage medium for detecting scale self-adaptive target based on YOLOv5 |
CN115311524A (en) * | 2022-08-16 | 2022-11-08 | 盐城工学院 | Small target detection algorithm fusing attention and multi-scale double pyramids |
CN115331256A (en) * | 2022-07-31 | 2022-11-11 | 南京邮电大学 | People flow statistical method based on mutual supervision |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190339688A1 (en) * | 2016-05-09 | 2019-11-07 | Strong Force Iot Portfolio 2016, Llc | Methods and systems for data collection, learning, and streaming of machine signals for analytics and maintenance using the industrial internet of things |
CN109919251B (en) * | 2019-03-21 | 2024-08-09 | 腾讯科技(深圳)有限公司 | Image-based target detection method, model training method and device |
CN112884064B (en) * | 2021-03-12 | 2022-07-29 | 迪比(重庆)智能科技研究院有限公司 | Target detection and identification method based on neural network |
WO2022236824A1 (en) * | 2021-05-14 | 2022-11-17 | 北京大学深圳研究生院 | Target detection network construction optimization method, apparatus and device, and medium and product |
-
2022
- 2022-11-29 CN CN202211506283.8A patent/CN115731533B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113435425A (en) * | 2021-08-26 | 2021-09-24 | 绵阳职业技术学院 | Wild animal emergence and emergence detection method based on recursive multi-feature fusion |
CN113989613A (en) * | 2021-10-13 | 2022-01-28 | 上海海事大学 | Light-weight high-precision ship target detection method coping with complex environment |
CN114120019A (en) * | 2021-11-08 | 2022-03-01 | 贵州大学 | Lightweight target detection method |
CN114548363A (en) * | 2021-12-29 | 2022-05-27 | 淮阴工学院 | Unmanned vehicle carried camera target detection method based on YOLOv5 |
CN114565959A (en) * | 2022-02-18 | 2022-05-31 | 武汉东信同邦信息技术有限公司 | Target detection method and device based on YOLO-SD-Tiny |
CN114821032A (en) * | 2022-03-11 | 2022-07-29 | 山东大学 | Special target abnormal state detection and tracking method based on improved YOLOv5 network |
CN114758288A (en) * | 2022-03-15 | 2022-07-15 | 华北电力大学 | Power distribution network engineering safety control detection method and device |
CN114926722A (en) * | 2022-04-19 | 2022-08-19 | 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) | Method and storage medium for detecting scale self-adaptive target based on YOLOv5 |
CN115331256A (en) * | 2022-07-31 | 2022-11-11 | 南京邮电大学 | People flow statistical method based on mutual supervision |
CN115311524A (en) * | 2022-08-16 | 2022-11-08 | 盐城工学院 | Small target detection algorithm fusing attention and multi-scale double pyramids |
Non-Patent Citations (3)
Title |
---|
《No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects》;Raja Sunkara and Tie Luo;《ECML PKDD 2022》;第5-6页 * |
《改进YOLOv5网络在遥感图像目标检测中的应用》;周华平 ,郭伟;《遥感信息》;20221110;第37卷(第5期);23-30 * |
基于双注意力机制的遥感图像目标检测;周幸;陈立福;;计算机与现代化;20200815(08);5-11 * |
Also Published As
Publication number | Publication date |
---|---|
CN115731533A (en) | 2023-03-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115731533B (en) | Vehicle-mounted target detection method based on improved YOLOv5 | |
CN112200161B (en) | Face recognition detection method based on mixed attention mechanism | |
Dewi et al. | Weight analysis for various prohibitory sign detection and recognition using deep learning | |
CN114202672A (en) | Small target detection method based on attention mechanism | |
CN111126472A (en) | Improved target detection method based on SSD | |
CN110929577A (en) | Improved target identification method based on YOLOv3 lightweight framework | |
CN112232371B (en) | American license plate recognition method based on YOLOv3 and text recognition | |
CN112528961B (en) | Video analysis method based on Jetson Nano | |
CN113255589B (en) | Target detection method and system based on multi-convolution fusion network | |
CN109670555B (en) | Instance-level pedestrian detection and pedestrian re-recognition system based on deep learning | |
CN111126278A (en) | Target detection model optimization and acceleration method for few-category scene | |
CN117079163A (en) | Aerial image small target detection method based on improved YOLOX-S | |
CN112561801A (en) | Target detection model training method based on SE-FPN, target detection method and device | |
Fan et al. | A novel sonar target detection and classification algorithm | |
CN115861619A (en) | Airborne LiDAR (light detection and ranging) urban point cloud semantic segmentation method and system of recursive residual double-attention kernel point convolution network | |
CN116665153A (en) | Road scene segmentation method based on improved deep bv3+ network model | |
CN113903022A (en) | Text detection method and system based on feature pyramid and attention fusion | |
CN112084897A (en) | Rapid traffic large-scene vehicle target detection method of GS-SSD | |
CN116597411A (en) | Method and system for identifying traffic sign by unmanned vehicle in extreme weather | |
CN111339950A (en) | Remote sensing image target detection method | |
CN112785610B (en) | Lane line semantic segmentation method integrating low-level features | |
CN117710841A (en) | Small target detection method and device for aerial image of unmanned aerial vehicle | |
CN116311004B (en) | Video moving target detection method based on sparse optical flow extraction | |
CN117237614A (en) | Deep learning-based lake surface floater small target detection method | |
CN114782762B (en) | Garbage image detection method and community garbage station |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |