CN116129327A - Infrared vehicle detection method based on improved YOLOv7 algorithm - Google Patents
Infrared vehicle detection method based on improved YOLOv7 algorithm Download PDFInfo
- Publication number
- CN116129327A CN116129327A CN202310175297.4A CN202310175297A CN116129327A CN 116129327 A CN116129327 A CN 116129327A CN 202310175297 A CN202310175297 A CN 202310175297A CN 116129327 A CN116129327 A CN 116129327A
- Authority
- CN
- China
- Prior art keywords
- convolution
- convolution block
- layer
- channels
- block
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 75
- 238000000605 extraction Methods 0.000 claims abstract description 54
- 238000012549 training Methods 0.000 claims abstract description 30
- 238000001931 thermography Methods 0.000 claims abstract description 7
- 238000007781 pre-processing Methods 0.000 claims abstract description 5
- 230000006870 function Effects 0.000 claims description 57
- 230000004913 activation Effects 0.000 claims description 54
- 238000010606 normalization Methods 0.000 claims description 54
- 238000000034 method Methods 0.000 claims description 26
- 238000012360 testing method Methods 0.000 claims description 11
- 230000007423 decrease Effects 0.000 claims description 3
- 238000002372 labelling Methods 0.000 claims description 3
- 238000004519 manufacturing process Methods 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims 4
- 238000013527 convolutional neural network Methods 0.000 description 6
- 238000004088 simulation Methods 0.000 description 5
- 238000013135 deep learning Methods 0.000 description 3
- 238000005259 measurement Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/7715—Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Medical Informatics (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Processing (AREA)
- Traffic Control Systems (AREA)
Abstract
The invention discloses an infrared vehicle detection method based on an improved YOLOv7 algorithm, which comprises the following steps of; step 1: collecting vehicle videos on a traffic channel, and carrying out frame extraction and image preprocessing to obtain an infrared vehicle image data set; step 2: constructing a new trunk feature extraction network Conv31 containing 31 convolution blocks; step 3: connecting the new trunk feature extraction network with the original predicted network of the YOLOv7 to form a new network model Conv31-YOLOv7; step 4: the training data set obtained in the step 1 is sent to a new network model Conv31-YOLOv7 in the step 3, and a small batch of random gradient descent algorithm is adopted for training, so that a trained infrared vehicle detection model is obtained; step 5: and sending the infrared vehicle video on the traffic road, which is acquired by the infrared thermal imaging equipment in real time, into a trained infrared vehicle detection model according to frames to obtain real-time position information, scale information and confidence of the vehicle. The invention obviously improves the detection accuracy on the premise of ensuring higher detection speed.
Description
Technical Field
The invention belongs to the technical field of vehicle detection, and particularly relates to an infrared vehicle detection method based on an improved YOLOv7 algorithm.
Background
The infrared target detection technology refers to automatically extracting the position information of a target from an infrared image. In view of the advantages of infrared thermal imaging, the infrared target detection technology can be applied to vehicle detection scenes on traffic roads and can adapt to the conditions of night, strong light and extreme weather, so that the breakthrough of the technology has important theoretical significance and practical value in the fields of automatic driving, intelligent traffic and the like.
The conventional infrared vehicle detection method generally adopts a gradient direction histogram and other methods to extract the characteristics of a target, and then utilizes a classifier such as a positive and negative sample training support vector machine to classify the characteristics of the target. The method has the advantages of low detection speed, incapability of meeting the requirement of timeliness, limited application scene, poor robustness and weak generalization capability.
In recent years, with the rapid development of artificial intelligence technology, an infrared vehicle detection method based on a convolutional neural network is widely applied. The method can automatically abstract and extract the characteristics of the image through the convolutional neural network, and has higher detection accuracy and stronger robustness.
At present, a target detection algorithm based on deep learning mainly comprises two types, namely a two-stage detection algorithm, wherein the two types of detection algorithm divide a detection process into two stages, a candidate region of an image to be detected is generated in the first stage, and the generated candidate region is classified and regressed in the second stage to obtain a final detection result. The first stage of the algorithm is time-consuming, the overall detection accuracy is high, but the detection speed is low, the real-time requirement cannot be met generally, and the representative algorithms include R-CNN, fast R-CNN and the like. The other type is a single-stage detection algorithm, the algorithm unifies the two-stage detection process into an end-to-end regression process, the two steps of region selection and detection judgment are combined into one, the detection accuracy is low, the detection speed is high, and the representative algorithm comprises YOLO, SSD and the like.
The target detection algorithm based on deep learning has a good detection effect in a visible light image target detection scene, but in an infrared target detection scene, as the infrared image is a single-channel image and has unobvious characteristics, the characteristic extraction of an infrared vehicle target is difficult, so that the detection accuracy of the current mainstream target detection algorithm is generally low, and the actual requirements are difficult to meet.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention aims to provide an infrared vehicle detection method based on an improved YOLOv7 algorithm, which can obviously improve the detection accuracy on the premise of ensuring higher detection speed.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
an infrared vehicle detection method based on an improved YOLOv7 algorithm comprises the following steps of;
step 1: collecting vehicle videos on a traffic channel, and carrying out frame extraction and image preprocessing to obtain an infrared vehicle image data set;
step 2: the method comprises the steps of improving a trunk feature extraction network of a Yolov7 algorithm, namely discarding the trunk feature extraction network in the Yolov7 algorithm, and constructing a new trunk feature extraction network Conv31 containing 31 convolution blocks to replace the trunk feature extraction network in the Yolov7 algorithm;
step 3: connecting the new trunk feature extraction network with the original predicted network of the YOLOv7 to form a new network model Conv31-YOLOv7;
step 4: the training data set obtained in the step 1 is sent to a network model Conv31-YOLOv7 in the step 3, and a small batch of random gradient descent algorithm is adopted for training, so that a trained infrared vehicle detection model is obtained;
step 5: and sending the infrared vehicle video on the traffic road, which is acquired by the infrared thermal imaging equipment in real time, into the trained infrared vehicle detection model according to frames to obtain real-time position information, scale information and confidence of the vehicle.
The steps of frame extraction and image preprocessing in the step 1 specifically include:
(1.1) acquiring infrared vehicle videos on an intersection, reading the first 10000 frames of the videos, setting the resolution of an image to be output as 640 multiplied by 640, outputting each frame in an image format in sequence to obtain 10000 infrared vehicle images, marking the position information of a vehicle target of the obtained infrared vehicle images, and manufacturing an infrared vehicle image data set, wherein the data set has 10000 infrared vehicle images with the resolution of 640 multiplied by 640;
and (1.2) dividing the infrared vehicle image dataset into a training dataset and a test dataset according to the proportion of 9:1, namely randomly selecting 9000 infrared images from the dataset to form the training dataset, and forming the test dataset by the rest 1000 infrared images.
The step 2 specifically comprises the following steps:
(2.1) discarding the trunk feature extraction network in the YOLOv7 algorithm, and constructing a new trunk feature extraction network Conv31 for replacing the trunk feature extraction network in the YOLOv7 algorithm, wherein the new trunk feature extraction network Conv31 comprises 31 convolution blocks, and each convolution block has the following structure:
1 st convolution block: the method comprises a convolution layer with the number of input channels being 1, the number of output channels being 32, the convolution kernel size being 3 multiplied by 3, the step length being 1 and being filled with 1, a batch normalization layer with the number of channels being 32, and a LeakyReLU activation function layer with the complex slope being 0.1;
convolution block 2: comprises a convolution layer with 32 input channels, 16 output channels, 1×1 convolution kernel size, 1 step length and 0 filling, a batch normalization layer with 16 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;
3 rd convolution block: the method comprises a convolution layer with 16 input channels, 32 output channels, a convolution kernel size of 3 multiplied by 3, a step length of 1 and filling of 1, a batch normalization layer with 32 channels and a LeakyReLU activation function layer with a complex slope of 0.1;
4 th convolution block: the method comprises a convolution layer with 32 input channels, 32 output channels, 1×1 convolution kernel size, 2 step length and 0 filling, a batch normalization layer with 32 channels and a LeakyReLU activation function layer with a complex slope of 0.1;
the 5 th convolution block and the 7 th convolution block: comprises a convolution layer with 32 input channels, 64 output channels, a convolution kernel size of 3×3, a step length of 1 and 1 filling, a batch normalization layer with 64 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;
the 6 th convolution block: comprises a convolution layer with the input channel number of 64, the output channel number of 32, the convolution kernel size of 1 multiplied by 1, the step length of 1 and the filling of 0, a batch normalization layer with the channel number of 32 and a LeakyReLU activation function layer with the complex slope of 0.1;
8 th convolution block: comprises a convolution layer with 64 input channels, 64 output channels, 1×1 convolution kernel size, 2 step length and 0 filling, a batch normalization layer with 64 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;
the 9 th convolution block and the 11 th convolution block: comprises a convolution layer with the input channel number of 64, the output channel number of 128, the convolution kernel size of 3 multiplied by 3, the step length of 1 and the filling of 1, a batch normalization layer with the channel number of 128 and a LeakyReLU activation function layer with the complex slope of 0.1;
10 th convolution block: comprises a convolution layer with 128 input channels, 64 output channels, 1×1 convolution kernel size, 1 step length and 0 filling, a batch normalization layer with 64 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;
12 th convolution block: comprises a convolution layer with 128 input channels, 128 output channels, 1×1 convolution kernel size, 2 step length and 0 filling, a batch normalization layer with 128 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;
13 th convolution block and 15 th convolution block: comprises a convolution layer with 128 input channels, 256 output channels, 3×3 convolution kernel size, 1 step length and 1 filling, a batch normalization layer with 256 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;
14 th convolution block: comprises a convolution layer with 256 input channels, 128 output channels, 1×1 convolution kernel size, 1 step length and 0 filling, a batch normalization layer with 128 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;
16 th convolution block, 19 th convolution block, 21 st convolution block, 23 rd convolution block: comprises a convolution layer with 256 input channels, 512 output channels, 3×3 convolution kernel size, 1 step length and 1 filling, a batch normalization layer with 512 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;
17 th convolution block, 20 th convolution block, 22 nd convolution block: comprises a convolution layer with the input channel number of 512, the output channel number of 256, the convolution kernel size of 1 multiplied by 1, the step length of 1 and the filling of 0, a batch normalization layer with the channel number of 256 and a LeakyReLU activation function layer with the complex slope of 0.1;
18 th convolution block: comprises a convolution layer with 256 input channels, 256 output channels, 1×1 convolution kernel size, 2 step length and 0 filling, a batch normalization layer with 256 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;
24 th convolution block, 27 th convolution block, 29 th convolution block, 31 st convolution block: comprises a convolution layer with 512 input channels, 1024 output channels, convolution kernel size of 3×3, step length of 1 and filling 1, a batch normalization layer with 1024 channels, and a LeakyReLU activation function layer with complex slope of 0.1;
25 th convolution block, 28 th convolution block, 30 th convolution block: all comprise a convolution layer with 1024 input channels, 512 output channels, 1×1 convolution kernel size, 1 step length and 0 filling, a batch normalization layer with 512 channels and a LeakyReLU activation function layer with 0.1 complex slope;
26 th convolution block: comprises a convolution layer with 512 input channels, 512 output channels, 1×1 convolution kernel size, 2 step length and 0 filling, a batch normalization layer with 512 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;
(2.2) sequentially connecting the 31 convolution blocks to obtain a new trunk feature extraction network Conv31 with the following structure:
1 st convolution block- > 2 nd convolution block- > 3 rd convolution block- > 4 th convolution block- > 5 th convolution block- > 6 th convolution block- > 7 th convolution block- > 8 th convolution block- > 9 th convolution block- > 10 th convolution block- > 11 th convolution block- > 12 th convolution block- > 13 th convolution block- > 14 th convolution block- > 15 th convolution block- > 16 th convolution block- > 17 th convolution block- > 18 th convolution block- > 20 th convolution block- > 21 th convolution block- > 22 th convolution block- > 23 th convolution block- > 24 th convolution block- > 25 th convolution block- > 26 th convolution block- > 27 th convolution block- > 28 th convolution block- > 29 th convolution block- > 31 th convolution block- > the 1 st convolution block is described.
(2.3) replacing the backbone feature extraction network in the YOLOv7 algorithm with Conv 31.
The step 3 specifically comprises the following steps:
connecting the 16 th convolution block in the new trunk feature extraction network Conv31 obtained in the step 2 with the 1 st prediction branch of the YOLOv7 prediction network;
connecting the 24 th convolution block in the new trunk feature extraction network Conv31 with the 2 nd prediction branch of the YOLOv7 prediction network;
the 31 st convolution block in the new backbone feature extraction network Conv31 is connected to the 3 rd prediction branch of the YOLOv7 prediction network.
The connection relation of the modules in the YOLOv7 prediction network is as follows:
the connection relation of each module of the 1 st prediction branch is as follows:
the 16 th convolution Block- > branch convolution Block1- > multi_concat_block1- > RepConv1- > detection head 1;
the connection relation of each module of the 2 nd prediction branch is as follows:
the 24 th convolution Block- > branch convolution Block2- > multi_concat_block2- > multi_concat_block3- > RepConv2- > detection head 2;
the connection relation of each module of the 3 rd prediction branch is as follows:
31 st convolution Block- > multi_concat_block4- > RepConv3- > detection head 3;
the connection relation between the prediction branch modules is as follows:
31 st convolution Block- > upsampling convolution Block1- > upsampling layer 1- > multi_concat_block2- > upsampling convolution Block2- > upsampling layer 2- > multi_concat_block1;
Multi_Concat_Block1->TransitionBlock1->Multi_Concat_Block2->TransitionBlock2->Multi_Concat_Block4。
the step 4 specifically comprises the following steps:
(4.1) setting training parameters: the number of training wheels is 200, the number of infrared vehicle images selected in one training is set to be 16, the learning rate is set to be 0.001, and the confidence threshold and the IOU neglect threshold are both set to be 0.5;
(4.2) inputting 9000 infrared vehicle images in the training set 16 at a time into the model Conv31-Yolov7, and obtaining the offset value (t x ,t y ,t w ,t h ) And a target confidence p, where t x Is the offset value of the target bounding box relative to the label box in the x direction, t y Is the offset value of the target bounding box relative to the label box in the y direction, t w Is the offset value of the target bounding box relative to the label box width, t h The offset value of the target bounding box relative to the label box is high;
(4.3) the offset value (t x ,t y ,t w ,t h ) The position and width and height of the prediction frame are calculated by the following coordinate offset formula:
wherein b x ,b y To predict the position of the frame c x ,c y To mark the position of the frame b w ,b h To predict the width and height of the frame, p w ,p h The width and the height of the marking frame are the same;
(4.4) confidence (b) of the position, width and height of the prediction frame and the target x ,b y ,b w ,b h P) substituting the confidence coefficient of the position, the width and the height of the labeling frame and the target into a loss function to calculate a loss value, and updating the weight of the loss value by using a small batch of random gradient descent algorithm;
(4.5) repeating (4.2) - (4.4) until the loss value tends to stabilize and no longer decreases, stopping training, and obtaining a trained infrared vehicle detection model.
The step 5 specifically comprises the following steps:
and acquiring infrared vehicle videos on the traffic road in real time by using infrared thermal imaging equipment, and sending the infrared vehicle videos into a trained infrared vehicle detection model according to frames to obtain real-time position information and confidence of the vehicle.
The invention has the beneficial effects that:
according to the method, the main feature extraction network of the YOLOv7 algorithm is replaced by the new main feature extraction network Conv31, the Conv31 comprises 31 convolution blocks, each convolution block comprises one convolution layer, the total number of the convolution layers is 31, the network depth is increased by increasing the number of the convolution layers, the detection accuracy can be effectively improved to a certain extent, the number of the convolution layers is greatly increased by stacking the convolution blocks, the network depth can be deepened, the feature extraction capacity is enhanced, and the detection accuracy is effectively improved; the 16 th convolution block, the 24 th convolution block and the 31 st convolution block of Conv31 can respectively extract shallow layer characteristics, middle layer characteristics and deep layer characteristics of infrared vehicle targets, and through connecting the 3 convolution blocks with 3 prediction branches, fusion of multi-scale characteristics can be realized, and the capability of a network model for detecting infrared vehicle targets with different scales is improved, so that the detection accuracy is further improved. Test results show that compared with other vehicle detection methods based on convolutional neural networks, the method provided by the invention can obviously improve the detection accuracy under the premise of ensuring higher detection speed.
Drawings
FIG. 1 is a flow chart of an implementation of the present invention.
FIG. 2 is a diagram of the Conv31-YOLOv7 network constructed in the present invention.
Fig. 3 is a schematic diagram of the detection of the present invention in a practical scenario.
Detailed Description
The invention is described in further detail below with reference to the accompanying drawings.
As shown in fig. 1:
step 1: an infrared vehicle dataset is constructed.
(1.1) collecting infrared vehicle videos on an intersection, reading the first 10000 frames of the videos, setting the resolution of an image to be output to 640 x 640, outputting each frame in an image format in sequence to obtain 10000 infrared vehicle images, marking the position information of a vehicle target of the obtained infrared vehicle images, and manufacturing an infrared vehicle image data set, wherein the data set has 10000 infrared images with the resolution of 640 x 640 in total.
And (1.2) dividing the infrared vehicle image dataset into a training dataset and a test dataset according to the proportion of 9:1, namely randomly selecting 9000 infrared images from the dataset to form the training dataset, and forming the test dataset by the rest 1000 infrared images.
Step 2: a new backbone feature extraction network is constructed.
The step of constructing a new trunk feature extraction network is based on improving the trunk feature extraction network of the existing YOLOv7 algorithm. The network model in the YOLOv7 algorithm comprises a trunk feature extraction network and a prediction network, and the trunk feature extraction network is only improved in the step, and the method is concretely realized as follows:
(2.1) discarding the trunk feature extraction network in the YOLOv7 algorithm, and constructing a new trunk feature extraction network Conv31 for replacing the trunk feature extraction network in the YOLOv7 algorithm. A new trunk feature extraction network Conv31 is constructed, comprising 31 convolution blocks, wherein each convolution block has the following structure:
1 st convolution block: comprises a convolution layer with input channel number of 1, output channel number of 32, convolution kernel size of 3×3, step length of 1, and filling of 1, a batch normalization layer with channel number of 32, and a LeakyReLU activation function layer with complex slope of 0.1.
Convolution block 2: comprises a convolution layer with 32 input channels, 16 output channels, 1×1 convolution kernel size, 1 step length and 0 filling, a batch normalization layer with 16 channels, and a LeakyReLU activation function layer with a complex slope of 0.1.
3 rd convolution block: comprises a convolution layer with 16 input channels, 32 output channels, a convolution kernel size of 3×3, a step size of 1 and a filling of 1, a batch normalization layer with 32 channels, and a LeakyReLU activation function layer with a complex slope of 0.1.
4 th convolution block: comprises a convolution layer with 32 input channels, 32 output channels, 1×1 convolution kernel size, 2 step length and 0 filling, a batch normalization layer with 32 channels, and a LeakyReLU activation function layer with a complex slope of 0.1.
The 5 th convolution block and the 7 th convolution block: comprises a convolution layer with 32 input channels, 64 output channels, a convolution kernel size of 3×3, a step size of 1, and a padding of 1, a batch normalization layer with 64 channels, and a LeakyReLU activation function layer with a complex slope of 0.1.
The 6 th convolution block: comprises a convolution layer with 64 input channels, 32 output channels, 1×1 convolution kernel size, 1 step length and 0 filling, a batch normalization layer with 32 channels, and a LeakyReLU activation function layer with a complex slope of 0.1.
8 th convolution block: comprises a convolution layer with 64 input channels, 64 output channels, 1×1 convolution kernel size, 2 step length and 0 filling, a batch normalization layer with 64 channels, and a LeakyReLU activation function layer with a complex slope of 0.1.
The 9 th convolution block and the 11 th convolution block: comprises a convolution layer with 64 input channels, 128 output channels, a convolution kernel size of 3×3, a step size of 1, and a padding of 1, a batch normalization layer with 128 channels, and a LeakyReLU activation function layer with a complex slope of 0.1.
10 th convolution block: comprises a convolution layer with 128 input channels, 64 output channels, 1×1 convolution kernel size, 1 step length and 0 filling, a batch normalization layer with 64 channels, and a LeakyReLU activation function layer with a complex slope of 0.1.
12 th convolution block: comprises a convolution layer with 128 input channels, 128 output channels, 1×1 convolution kernel size, 2 step length and 0 filling, a batch normalization layer with 128 channels, and a LeakyReLU activation function layer with a complex slope of 0.1.
13 th convolution block and 15 th convolution block: comprises a convolution layer with 128 input channels, 256 output channels, 3×3 convolution kernel size, 1 step length and 1 filling, a batch normalization layer with 256 channels, and a LeakyReLU activation function layer with a complex slope of 0.1.
14 th convolution block: comprises a convolution layer with 256 input channels, 128 output channels, 1×1 convolution kernel size, 1 step length and 0 filling, a batch normalization layer with 128 channels, and a LeakyReLU activation function layer with a complex slope of 0.1.
16 th convolution block, 19 th convolution block, 21 st convolution block, 23 rd convolution block: comprises a convolution layer with 256 input channels, 512 output channels, a convolution kernel size of 3×3, a step size of 1, and a padding of 1, a batch normalization layer with 512 channels, and a LeakyReLU activation function layer with a complex slope of 0.1.
17 th convolution block, 20 th convolution block, 22 nd convolution block: comprises a convolution layer with the input channel number of 512, the output channel number of 256, the convolution kernel size of 1 multiplied by 1, the step size of 1 and the filling of 0, a batch normalization layer with the channel number of 256 and a LeakyReLU activation function layer with the complex slope of 0.1.
18 th convolution block: comprises a convolution layer with 256 input channels, 256 output channels, 1×1 convolution kernel size, 2 step length and 0 filling, a batch normalization layer with 256 channels, and a LeakyReLU activation function layer with a complex slope of 0.1.
24 th convolution block, 27 th convolution block, 29 th convolution block, 31 st convolution block: comprises a convolution layer with 512 input channels, 1024 output channels, convolution kernel size of 3×3, step size of 1, and padding of 1, a batch normalization layer with 1024 channels, and a LeakyReLU activation function layer with complex slope of 0.1.
25 th convolution block, 28 th convolution block, 30 th convolution block: all comprise a convolution layer with 1024 input channels, 512 output channels, 1×1 convolution kernel size, 1 step length and 0 padding, a batch normalization layer with 512 channels, and a LeakyReLU activation function layer with a complex slope of 0.1.
26 th convolution block: comprises a convolution layer with 512 input channels, 512 output channels, 1×1 convolution kernel, 2 step sizes and 0 filling, a batch normalization layer with 512 channels, and a LeakyReLU activation function layer with a complex slope of 0.1.
(2.2) sequentially connecting the 31 convolution blocks to obtain a new trunk feature extraction network Conv31 with the following structure:
1 st convolution block- > 2 nd convolution block- > 3 rd convolution block- > 4 th convolution block- > 5 th convolution block- > 6 th convolution block- > 7 th convolution block- > 8 th convolution block- > 9 th convolution block- > 10 th convolution block- > 11 th convolution block- > 12 th convolution block- > 13 th convolution block- > 14 th convolution block- > 15 th convolution block- > 16 th convolution block- > 17 th convolution block- > 18 th convolution block- > 20 th convolution block- > 21 th convolution block- > 22 th convolution block- > 23 th convolution block- > 24 th convolution block- > 25 th convolution block- > 26 th convolution block- > 27 th convolution block- > 28 th convolution block- > 29 th convolution block- > 31.
(2.3) replacing the backbone feature extraction network in the YOLOv7 algorithm with Conv 31.
Step 3: a new network model Conv31-YOLOv7 was constructed.
Referring to fig. 2, a new backbone feature extraction network and a predicted network of YOLOv7 are connected according to the following structural relationship to form a new network model Conv31-YOLOv7:
the 16 th convolution block in the new trunk feature extraction network Conv31 is connected to the 1 st prediction branch of the YOLOv7 prediction network.
The 24 th convolution block in the new backbone feature extraction network Conv31 is connected to the 2 nd prediction branch of the YOLOv7 prediction network.
The 31 st convolution block in the new backbone feature extraction network Conv31 is connected to the 3 rd prediction branch of the YOLOv7 prediction network.
Step 4: the new network model Conv31-YOLOv7 was trained.
(4.1) setting training parameters: the number of training rounds is 200, the number of infrared vehicle images selected in one training is set to 16, the learning rate is set to 0.001, and the confidence threshold and the IOU neglect threshold are both set to 0.5.
(4.2) inputting 9000 infrared vehicle images in the training set 16 at a time into the model Conv31-Yolov7, and obtaining the offset value (t x ,t y ,t w ,t h ) And a target confidence p, where t x Is the offset value of the target bounding box relative to the label box in the x direction, t y Is the offset value of the target bounding box relative to the label box in the y direction, t w Is the offset value of the target bounding box relative to the label box width, t h Is the offset value of the target bounding box relative to the label box.
(4.3) the offset value (t x ,t y ,t w ,t h ) The position and width and height of the prediction frame are calculated by the following coordinate offset formula:
wherein b x ,b y To predict the position of the frame c x ,c y To mark the position of the frame b w ,b h To predict the width and height of the frame, p w ,p h Is the width and height of the label frame.
(4.4) confidence (b) of the position, width and height of the prediction frame and the target x ,b y ,b w ,b h P) substituting the confidence coefficient of the position, the width and the height of the labeling frame and the target into a loss function to calculate a loss value, and updating the weight of the loss value by using a small batch random gradient descent algorithm.
(4.5) repeating (4.2) - (4.4) until the loss value tends to stabilize and no longer decreases, stopping training, and obtaining a trained infrared vehicle detection model.
Step 5: and carrying out infrared vehicle detection by using the trained model.
And acquiring infrared vehicle videos on the traffic road in real time by using infrared thermal imaging equipment, and sending the infrared vehicle videos into a trained infrared vehicle detection model according to frames to obtain real-time position information and confidence of the vehicle.
The effect of the invention is further illustrated by the following simulation experiment and measured data:
1. simulation, actual measurement environment
The simulation and actual measurement of the invention use Windows 10 operating system, a NVIDIA GeForce GTX 2060GPU acceleration, and the deep learning framework is pytorch 1.8.1.
2. Emulation content
Simulation 1: and training other target detection models based on the convolutional neural network by using the training set and the parameters which are the same as those of the invention to obtain the respectively trained infrared vehicle detection models.
And (3) sending 1000 images of the test set into the trained model of the invention according to 1 image each time to test, thereby obtaining the accuracy and speed of the infrared vehicle detection with the IOU threshold value of 0.5.
The same test set as the invention was used to test the accuracy and speed of the IOU threshold of 0.5 for infrared vehicle detection for other methods.
The method is compared with an infrared vehicle detection method based on YOLOv7 by a simulation experiment, and the comparison result is shown in Table 1:
TABLE 1
Comparing the invention with YOLOv7, the method can detect 31 images per second, and compared with the detection speed of 33 images per second of YOLOv7, the detection speed of the method is slightly reduced. The method has the advantages that the accuracy of the IOU threshold value of 0.5 for infrared vehicle detection is 94.36%, the accuracy of the YOLOv7 for infrared vehicle target detection is 91.89%, the detection accuracy is obviously improved compared with that of the YOLOv7, and higher detection accuracy is ensured.
3. Content of actual measurement
The infrared vehicle video on the traffic road is acquired in real time by using the infrared thermal imaging equipment and sent into the trained infrared vehicle detection model according to the frame to obtain the real-time position information and the confidence coefficient of the vehicle, as shown in fig. 3.
The large rectangular box in fig. 3 represents the predicted box surrounding the vehicle in the infrared image, and the small rectangular box above the large rectangular box shows the confidence of the vehicle target.
The foregoing detailed description of the invention is merely illustrative of the preferred embodiments of the invention and is not intended to limit the scope of the invention, but various modifications and improvements can be made by those skilled in the art without departing from the spirit of the invention, which is defined by the claims.
Claims (6)
1. An infrared vehicle detection method based on an improved YOLOv7 algorithm is characterized by comprising the following steps of;
step 1: collecting vehicle videos on a traffic channel, and carrying out frame extraction and image preprocessing to obtain an infrared vehicle image data set;
step 2: the method comprises the steps of improving a trunk feature extraction network of a Yolov7 algorithm, namely discarding the trunk feature extraction network in the Yolov7 algorithm, and constructing a new trunk feature extraction network Conv31 containing 31 convolution blocks to replace the trunk feature extraction network in the Yolov7 algorithm;
step 3: connecting the new trunk feature extraction network with the original predicted network of the YOLOv7 to form a new network model Conv31-YOLOv7;
step 4: the training data set obtained in the step 1 is sent to a network model Conv31-YOLOv7 in the step 3, and a small batch of random gradient descent algorithm is adopted for training, so that a trained infrared vehicle detection model is obtained;
step 5: and sending the infrared vehicle video on the traffic road, which is acquired by the infrared thermal imaging equipment in real time, into the trained infrared vehicle detection model according to frames to obtain real-time position information, scale information and confidence of the vehicle.
2. The method for detecting an infrared vehicle based on the modified YOLOv7 algorithm according to claim 1, wherein the steps of frame extraction and image preprocessing in step 1 specifically include:
(1.1) acquiring infrared vehicle videos on an intersection, reading the first 10000 frames of the videos, setting the resolution of an image to be output as 640 multiplied by 640, outputting each frame in an image format in sequence to obtain 10000 infrared vehicle images, marking the position information of a vehicle target of the obtained infrared vehicle images, and manufacturing an infrared vehicle image data set, wherein the data set has 10000 infrared vehicle images with the resolution of 640 multiplied by 640;
and (1.2) dividing the infrared vehicle image dataset into a training dataset and a test dataset according to the proportion of 9:1, namely randomly selecting 9000 infrared images from the dataset to form the training dataset, and forming the test dataset by the rest 1000 infrared images.
3. The method for detecting an infrared vehicle based on the modified YOLOv7 algorithm according to claim 1, wherein the step 2 specifically comprises:
(2.1) discarding the trunk feature extraction network in the YOLOv7 algorithm, and constructing a new trunk feature extraction network Conv31 for replacing the trunk feature extraction network in the YOLOv7 algorithm, wherein the new trunk feature extraction network Conv31 comprises 31 convolution blocks, and each convolution block has the following structure:
1 st convolution block: the method comprises a convolution layer with the number of input channels being 1, the number of output channels being 32, the convolution kernel size being 3 multiplied by 3, the step length being 1 and being filled with 1, a batch normalization layer with the number of channels being 32, and a LeakyReLU activation function layer with the complex slope being 0.1;
convolution block 2: comprises a convolution layer with 32 input channels, 16 output channels, 1×1 convolution kernel size, 1 step length and 0 filling, a batch normalization layer with 16 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;
3 rd convolution block: the method comprises a convolution layer with 16 input channels, 32 output channels, a convolution kernel size of 3 multiplied by 3, a step length of 1 and filling of 1, a batch normalization layer with 32 channels and a LeakyReLU activation function layer with a complex slope of 0.1;
4 th convolution block: the method comprises a convolution layer with 32 input channels, 32 output channels, 1×1 convolution kernel size, 2 step length and 0 filling, a batch normalization layer with 32 channels and a LeakyReLU activation function layer with a complex slope of 0.1;
the 5 th convolution block and the 7 th convolution block: comprises a convolution layer with 32 input channels, 64 output channels, a convolution kernel size of 3×3, a step length of 1 and 1 filling, a batch normalization layer with 64 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;
the 6 th convolution block: comprises a convolution layer with the input channel number of 64, the output channel number of 32, the convolution kernel size of 1 multiplied by 1, the step length of 1 and the filling of 0, a batch normalization layer with the channel number of 32 and a LeakyReLU activation function layer with the complex slope of 0.1;
8 th convolution block: comprises a convolution layer with 64 input channels, 64 output channels, 1×1 convolution kernel size, 2 step length and 0 filling, a batch normalization layer with 64 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;
the 9 th convolution block and the 11 th convolution block: comprises a convolution layer with the input channel number of 64, the output channel number of 128, the convolution kernel size of 3 multiplied by 3, the step length of 1 and the filling of 1, a batch normalization layer with the channel number of 128 and a LeakyReLU activation function layer with the complex slope of 0.1;
10 th convolution block: comprises a convolution layer with 128 input channels, 64 output channels, 1×1 convolution kernel size, 1 step length and 0 filling, a batch normalization layer with 64 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;
12 th convolution block: comprises a convolution layer with 128 input channels, 128 output channels, 1×1 convolution kernel size, 2 step length and 0 filling, a batch normalization layer with 128 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;
13 th convolution block and 15 th convolution block: comprises a convolution layer with 128 input channels, 256 output channels, 3×3 convolution kernel size, 1 step length and 1 filling, a batch normalization layer with 256 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;
14 th convolution block: comprises a convolution layer with 256 input channels, 128 output channels, 1×1 convolution kernel size, 1 step length and 0 filling, a batch normalization layer with 128 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;
16 th convolution block, 19 th convolution block, 21 st convolution block, 23 rd convolution block: comprises a convolution layer with 256 input channels, 512 output channels, 3×3 convolution kernel size, 1 step length and 1 filling, a batch normalization layer with 512 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;
17 th convolution block, 20 th convolution block, 22 nd convolution block: comprises a convolution layer with the input channel number of 512, the output channel number of 256, the convolution kernel size of 1 multiplied by 1, the step length of 1 and the filling of 0, a batch normalization layer with the channel number of 256 and a LeakyReLU activation function layer with the complex slope of 0.1;
18 th convolution block: comprises a convolution layer with 256 input channels, 256 output channels, 1×1 convolution kernel size, 2 step length and 0 filling, a batch normalization layer with 256 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;
24 th convolution block, 27 th convolution block, 29 th convolution block, 31 st convolution block: comprises a convolution layer with 512 input channels, 1024 output channels, convolution kernel size of 3×3, step length of 1 and filling 1, a batch normalization layer with 1024 channels, and a LeakyReLU activation function layer with complex slope of 0.1;
25 th convolution block, 28 th convolution block, 30 th convolution block: all comprise a convolution layer with 1024 input channels, 512 output channels, 1×1 convolution kernel size, 1 step length and 0 filling, a batch normalization layer with 512 channels and a LeakyReLU activation function layer with 0.1 complex slope;
26 th convolution block: comprises a convolution layer with 512 input channels, 512 output channels, 1×1 convolution kernel size, 2 step length and 0 filling, a batch normalization layer with 512 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;
(2.2) sequentially connecting the 31 convolution blocks to obtain a new trunk feature extraction network Conv31 with the following structure:
1 st convolution block- > 2 nd convolution block- > 3 rd convolution block- > 4 th convolution block- > 5 th convolution block- > 6 th convolution block- > 7 th convolution block- > 8 th convolution block- > 9 th convolution block- > 10 th convolution block- > 11 th convolution block- > 12 th convolution block- > 13 th convolution block- > 14 th convolution block- > 15 th convolution block- > 16 th convolution block- > 17 th convolution block- > 18 th convolution block- > 20 th convolution block- > 21 th convolution block- > 22 th convolution block- > 23 th convolution block- > 24 th convolution block- > 25 th convolution block- > 26 th convolution block- > 27 th convolution block- > 28 th convolution block- > 29 th convolution block- > 31 th convolution block- > the 1 st convolution block is described.
(2.3) replacing the backbone feature extraction network in the YOLOv7 algorithm with Conv 31.
4. The method for detecting an infrared vehicle based on the modified YOLOv7 algorithm according to claim 1, wherein the step 3 specifically comprises:
connecting the 16 th convolution block in the new trunk feature extraction network Conv31 obtained in the step 2 with the 1 st prediction branch of the YOLOv7 prediction network;
connecting the 24 th convolution block in the new trunk feature extraction network Conv31 with the 2 nd prediction branch of the YOLOv7 prediction network;
the 31 st convolution block in the new backbone feature extraction network Conv31 is connected to the 3 rd prediction branch of the YOLOv7 prediction network.
5. The infrared vehicle detection method based on the improved YOLOv7 algorithm of claim 4, wherein the YOLOv7 prediction network internal module connection relationship is:
the connection relation of each module of the 1 st prediction branch is as follows:
the 16 th convolution Block- > branch convolution Block1- > multi_concat_block1- > RepConv1- > detection head 1;
the connection relation of each module of the 2 nd prediction branch is as follows:
the 24 th convolution Block- > branch convolution Block2- > multi_concat_block2- > multi_concat_block3- > RepConv2- > detection head 2;
the connection relation of each module of the 3 rd prediction branch is as follows:
31 st convolution Block- > multi_Concat_Block4- > RepConv3- > detection head 3;
the connection relation between the prediction branch modules is as follows:
the 31 st convolution Block- > up-sampling convolution Block1- > up-sampling layer 1- > multi_Concat_Block2- > up-sampling convolution Block2- > up-sampling layer 2- > multi_Concat_Block1;
Multi_Concat_Block1->TransitionBlock1->Multi_Concat_Block2->TransitionBlock2->Multi_Concat_Block4。
6. the method for detecting an infrared vehicle based on the modified YOLOv7 algorithm according to claim 1, wherein the step 4 specifically comprises:
(4.1) setting training parameters: the number of training wheels is 200, the number of infrared vehicle images selected in one training is set to be 16, the learning rate is set to be 0.001, and the confidence threshold and the IOU neglect threshold are both set to be 0.5;
(4.2) inputting 9000 infrared vehicle images in the training set 16 at a time into the model Conv31-Yolov7, and obtaining the offset value (t x ,t y ,t w ,t h ) And a target confidence p, where t x Is the offset value of the target bounding box relative to the label box in the x direction, t y Is the offset value of the target bounding box relative to the label box in the y direction, t w Is the offset value of the target bounding box relative to the label box width, t h The offset value of the target bounding box relative to the label box is high;
(4.3) the offset value (t x ,t y ,t w ,t h ) The position and width and height of the prediction frame are calculated by the following coordinate offset formula:
wherein b x ,b y To predict the position of the frame c x ,c y To mark the position of the frame b w ,b h To predict the width and height of the frame, p w ,p h The width and the height of the marking frame are the same;
(4.4) confidence (b) of the position, width and height of the prediction frame and the target x ,b y ,b w ,b h P) substituting the confidence coefficient of the position, the width and the height of the labeling frame and the target into a loss function to calculate a loss value, and updating the weight of the loss value by using a small batch of random gradient descent algorithm;
(4.5) repeating (4.2) - (4.4) until the loss value tends to stabilize and no longer decreases, stopping training, and obtaining a trained infrared vehicle detection model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310175297.4A CN116129327A (en) | 2023-02-28 | 2023-02-28 | Infrared vehicle detection method based on improved YOLOv7 algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310175297.4A CN116129327A (en) | 2023-02-28 | 2023-02-28 | Infrared vehicle detection method based on improved YOLOv7 algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116129327A true CN116129327A (en) | 2023-05-16 |
Family
ID=86297468
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310175297.4A Pending CN116129327A (en) | 2023-02-28 | 2023-02-28 | Infrared vehicle detection method based on improved YOLOv7 algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116129327A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116485802A (en) * | 2023-06-26 | 2023-07-25 | 广东电网有限责任公司湛江供电局 | Insulator flashover defect detection method, device, equipment and storage medium |
-
2023
- 2023-02-28 CN CN202310175297.4A patent/CN116129327A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116485802A (en) * | 2023-06-26 | 2023-07-25 | 广东电网有限责任公司湛江供电局 | Insulator flashover defect detection method, device, equipment and storage medium |
CN116485802B (en) * | 2023-06-26 | 2024-01-26 | 广东电网有限责任公司湛江供电局 | Insulator flashover defect detection method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111461083A (en) | Rapid vehicle detection method based on deep learning | |
CN113436169B (en) | Industrial equipment surface crack detection method and system based on semi-supervised semantic segmentation | |
CN108875595A (en) | A kind of Driving Scene object detection method merged based on deep learning and multilayer feature | |
CN111582029B (en) | Traffic sign identification method based on dense connection and attention mechanism | |
CN112084901A (en) | GCAM-based high-resolution SAR image airport runway area automatic detection method and system | |
CN111967313B (en) | Unmanned aerial vehicle image annotation method assisted by deep learning target detection algorithm | |
CN114495029B (en) | Traffic target detection method and system based on improved YOLOv4 | |
CN113420643B (en) | Lightweight underwater target detection method based on depth separable cavity convolution | |
CN113888754B (en) | Vehicle multi-attribute identification method based on radar vision fusion | |
CN109508675A (en) | A kind of pedestrian detection method for complex scene | |
CN111428558A (en) | Vehicle detection method based on improved YO L Ov3 method | |
CN113327248B (en) | Tunnel traffic flow statistical method based on video | |
CN112528934A (en) | Improved YOLOv3 traffic sign detection method based on multi-scale feature layer | |
CN113297915A (en) | Insulator recognition target detection method based on unmanned aerial vehicle inspection | |
CN111915583A (en) | Vehicle and pedestrian detection method based on vehicle-mounted thermal infrared imager in complex scene | |
CN113313031B (en) | Deep learning-based lane line detection and vehicle transverse positioning method | |
CN114049572A (en) | Detection method for identifying small target | |
CN109272060A (en) | A kind of method and system carrying out target detection based on improved darknet neural network | |
CN112613428B (en) | Resnet-3D convolution cattle video target detection method based on balance loss | |
CN114332942A (en) | Night infrared pedestrian detection method and system based on improved YOLOv3 | |
CN111582339A (en) | Vehicle detection and identification method based on deep learning | |
CN113239753A (en) | Improved traffic sign detection and identification method based on YOLOv4 | |
CN115147380A (en) | Small transparent plastic product defect detection method based on YOLOv5 | |
CN115331183A (en) | Improved YOLOv5s infrared target detection method | |
CN116129327A (en) | Infrared vehicle detection method based on improved YOLOv7 algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |