CN116129327A - Infrared vehicle detection method based on improved YOLOv7 algorithm - Google Patents

Infrared vehicle detection method based on improved YOLOv7 algorithm Download PDF

Info

Publication number
CN116129327A
CN116129327A CN202310175297.4A CN202310175297A CN116129327A CN 116129327 A CN116129327 A CN 116129327A CN 202310175297 A CN202310175297 A CN 202310175297A CN 116129327 A CN116129327 A CN 116129327A
Authority
CN
China
Prior art keywords
convolution
convolution block
layer
channels
block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310175297.4A
Other languages
Chinese (zh)
Inventor
徐一铭
姬红兵
张文博
李林
臧博
龙璐岚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202310175297.4A priority Critical patent/CN116129327A/en
Publication of CN116129327A publication Critical patent/CN116129327A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Processing (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses an infrared vehicle detection method based on an improved YOLOv7 algorithm, which comprises the following steps of; step 1: collecting vehicle videos on a traffic channel, and carrying out frame extraction and image preprocessing to obtain an infrared vehicle image data set; step 2: constructing a new trunk feature extraction network Conv31 containing 31 convolution blocks; step 3: connecting the new trunk feature extraction network with the original predicted network of the YOLOv7 to form a new network model Conv31-YOLOv7; step 4: the training data set obtained in the step 1 is sent to a new network model Conv31-YOLOv7 in the step 3, and a small batch of random gradient descent algorithm is adopted for training, so that a trained infrared vehicle detection model is obtained; step 5: and sending the infrared vehicle video on the traffic road, which is acquired by the infrared thermal imaging equipment in real time, into a trained infrared vehicle detection model according to frames to obtain real-time position information, scale information and confidence of the vehicle. The invention obviously improves the detection accuracy on the premise of ensuring higher detection speed.

Description

Infrared vehicle detection method based on improved YOLOv7 algorithm
Technical Field
The invention belongs to the technical field of vehicle detection, and particularly relates to an infrared vehicle detection method based on an improved YOLOv7 algorithm.
Background
The infrared target detection technology refers to automatically extracting the position information of a target from an infrared image. In view of the advantages of infrared thermal imaging, the infrared target detection technology can be applied to vehicle detection scenes on traffic roads and can adapt to the conditions of night, strong light and extreme weather, so that the breakthrough of the technology has important theoretical significance and practical value in the fields of automatic driving, intelligent traffic and the like.
The conventional infrared vehicle detection method generally adopts a gradient direction histogram and other methods to extract the characteristics of a target, and then utilizes a classifier such as a positive and negative sample training support vector machine to classify the characteristics of the target. The method has the advantages of low detection speed, incapability of meeting the requirement of timeliness, limited application scene, poor robustness and weak generalization capability.
In recent years, with the rapid development of artificial intelligence technology, an infrared vehicle detection method based on a convolutional neural network is widely applied. The method can automatically abstract and extract the characteristics of the image through the convolutional neural network, and has higher detection accuracy and stronger robustness.
At present, a target detection algorithm based on deep learning mainly comprises two types, namely a two-stage detection algorithm, wherein the two types of detection algorithm divide a detection process into two stages, a candidate region of an image to be detected is generated in the first stage, and the generated candidate region is classified and regressed in the second stage to obtain a final detection result. The first stage of the algorithm is time-consuming, the overall detection accuracy is high, but the detection speed is low, the real-time requirement cannot be met generally, and the representative algorithms include R-CNN, fast R-CNN and the like. The other type is a single-stage detection algorithm, the algorithm unifies the two-stage detection process into an end-to-end regression process, the two steps of region selection and detection judgment are combined into one, the detection accuracy is low, the detection speed is high, and the representative algorithm comprises YOLO, SSD and the like.
The target detection algorithm based on deep learning has a good detection effect in a visible light image target detection scene, but in an infrared target detection scene, as the infrared image is a single-channel image and has unobvious characteristics, the characteristic extraction of an infrared vehicle target is difficult, so that the detection accuracy of the current mainstream target detection algorithm is generally low, and the actual requirements are difficult to meet.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention aims to provide an infrared vehicle detection method based on an improved YOLOv7 algorithm, which can obviously improve the detection accuracy on the premise of ensuring higher detection speed.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
an infrared vehicle detection method based on an improved YOLOv7 algorithm comprises the following steps of;
step 1: collecting vehicle videos on a traffic channel, and carrying out frame extraction and image preprocessing to obtain an infrared vehicle image data set;
step 2: the method comprises the steps of improving a trunk feature extraction network of a Yolov7 algorithm, namely discarding the trunk feature extraction network in the Yolov7 algorithm, and constructing a new trunk feature extraction network Conv31 containing 31 convolution blocks to replace the trunk feature extraction network in the Yolov7 algorithm;
step 3: connecting the new trunk feature extraction network with the original predicted network of the YOLOv7 to form a new network model Conv31-YOLOv7;
step 4: the training data set obtained in the step 1 is sent to a network model Conv31-YOLOv7 in the step 3, and a small batch of random gradient descent algorithm is adopted for training, so that a trained infrared vehicle detection model is obtained;
step 5: and sending the infrared vehicle video on the traffic road, which is acquired by the infrared thermal imaging equipment in real time, into the trained infrared vehicle detection model according to frames to obtain real-time position information, scale information and confidence of the vehicle.
The steps of frame extraction and image preprocessing in the step 1 specifically include:
(1.1) acquiring infrared vehicle videos on an intersection, reading the first 10000 frames of the videos, setting the resolution of an image to be output as 640 multiplied by 640, outputting each frame in an image format in sequence to obtain 10000 infrared vehicle images, marking the position information of a vehicle target of the obtained infrared vehicle images, and manufacturing an infrared vehicle image data set, wherein the data set has 10000 infrared vehicle images with the resolution of 640 multiplied by 640;
and (1.2) dividing the infrared vehicle image dataset into a training dataset and a test dataset according to the proportion of 9:1, namely randomly selecting 9000 infrared images from the dataset to form the training dataset, and forming the test dataset by the rest 1000 infrared images.
The step 2 specifically comprises the following steps:
(2.1) discarding the trunk feature extraction network in the YOLOv7 algorithm, and constructing a new trunk feature extraction network Conv31 for replacing the trunk feature extraction network in the YOLOv7 algorithm, wherein the new trunk feature extraction network Conv31 comprises 31 convolution blocks, and each convolution block has the following structure:
1 st convolution block: the method comprises a convolution layer with the number of input channels being 1, the number of output channels being 32, the convolution kernel size being 3 multiplied by 3, the step length being 1 and being filled with 1, a batch normalization layer with the number of channels being 32, and a LeakyReLU activation function layer with the complex slope being 0.1;
convolution block 2: comprises a convolution layer with 32 input channels, 16 output channels, 1×1 convolution kernel size, 1 step length and 0 filling, a batch normalization layer with 16 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;
3 rd convolution block: the method comprises a convolution layer with 16 input channels, 32 output channels, a convolution kernel size of 3 multiplied by 3, a step length of 1 and filling of 1, a batch normalization layer with 32 channels and a LeakyReLU activation function layer with a complex slope of 0.1;
4 th convolution block: the method comprises a convolution layer with 32 input channels, 32 output channels, 1×1 convolution kernel size, 2 step length and 0 filling, a batch normalization layer with 32 channels and a LeakyReLU activation function layer with a complex slope of 0.1;
the 5 th convolution block and the 7 th convolution block: comprises a convolution layer with 32 input channels, 64 output channels, a convolution kernel size of 3×3, a step length of 1 and 1 filling, a batch normalization layer with 64 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;
the 6 th convolution block: comprises a convolution layer with the input channel number of 64, the output channel number of 32, the convolution kernel size of 1 multiplied by 1, the step length of 1 and the filling of 0, a batch normalization layer with the channel number of 32 and a LeakyReLU activation function layer with the complex slope of 0.1;
8 th convolution block: comprises a convolution layer with 64 input channels, 64 output channels, 1×1 convolution kernel size, 2 step length and 0 filling, a batch normalization layer with 64 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;
the 9 th convolution block and the 11 th convolution block: comprises a convolution layer with the input channel number of 64, the output channel number of 128, the convolution kernel size of 3 multiplied by 3, the step length of 1 and the filling of 1, a batch normalization layer with the channel number of 128 and a LeakyReLU activation function layer with the complex slope of 0.1;
10 th convolution block: comprises a convolution layer with 128 input channels, 64 output channels, 1×1 convolution kernel size, 1 step length and 0 filling, a batch normalization layer with 64 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;
12 th convolution block: comprises a convolution layer with 128 input channels, 128 output channels, 1×1 convolution kernel size, 2 step length and 0 filling, a batch normalization layer with 128 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;
13 th convolution block and 15 th convolution block: comprises a convolution layer with 128 input channels, 256 output channels, 3×3 convolution kernel size, 1 step length and 1 filling, a batch normalization layer with 256 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;
14 th convolution block: comprises a convolution layer with 256 input channels, 128 output channels, 1×1 convolution kernel size, 1 step length and 0 filling, a batch normalization layer with 128 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;
16 th convolution block, 19 th convolution block, 21 st convolution block, 23 rd convolution block: comprises a convolution layer with 256 input channels, 512 output channels, 3×3 convolution kernel size, 1 step length and 1 filling, a batch normalization layer with 512 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;
17 th convolution block, 20 th convolution block, 22 nd convolution block: comprises a convolution layer with the input channel number of 512, the output channel number of 256, the convolution kernel size of 1 multiplied by 1, the step length of 1 and the filling of 0, a batch normalization layer with the channel number of 256 and a LeakyReLU activation function layer with the complex slope of 0.1;
18 th convolution block: comprises a convolution layer with 256 input channels, 256 output channels, 1×1 convolution kernel size, 2 step length and 0 filling, a batch normalization layer with 256 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;
24 th convolution block, 27 th convolution block, 29 th convolution block, 31 st convolution block: comprises a convolution layer with 512 input channels, 1024 output channels, convolution kernel size of 3×3, step length of 1 and filling 1, a batch normalization layer with 1024 channels, and a LeakyReLU activation function layer with complex slope of 0.1;
25 th convolution block, 28 th convolution block, 30 th convolution block: all comprise a convolution layer with 1024 input channels, 512 output channels, 1×1 convolution kernel size, 1 step length and 0 filling, a batch normalization layer with 512 channels and a LeakyReLU activation function layer with 0.1 complex slope;
26 th convolution block: comprises a convolution layer with 512 input channels, 512 output channels, 1×1 convolution kernel size, 2 step length and 0 filling, a batch normalization layer with 512 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;
(2.2) sequentially connecting the 31 convolution blocks to obtain a new trunk feature extraction network Conv31 with the following structure:
1 st convolution block- > 2 nd convolution block- > 3 rd convolution block- > 4 th convolution block- > 5 th convolution block- > 6 th convolution block- > 7 th convolution block- > 8 th convolution block- > 9 th convolution block- > 10 th convolution block- > 11 th convolution block- > 12 th convolution block- > 13 th convolution block- > 14 th convolution block- > 15 th convolution block- > 16 th convolution block- > 17 th convolution block- > 18 th convolution block- > 20 th convolution block- > 21 th convolution block- > 22 th convolution block- > 23 th convolution block- > 24 th convolution block- > 25 th convolution block- > 26 th convolution block- > 27 th convolution block- > 28 th convolution block- > 29 th convolution block- > 31 th convolution block- > the 1 st convolution block is described.
(2.3) replacing the backbone feature extraction network in the YOLOv7 algorithm with Conv 31.
The step 3 specifically comprises the following steps:
connecting the 16 th convolution block in the new trunk feature extraction network Conv31 obtained in the step 2 with the 1 st prediction branch of the YOLOv7 prediction network;
connecting the 24 th convolution block in the new trunk feature extraction network Conv31 with the 2 nd prediction branch of the YOLOv7 prediction network;
the 31 st convolution block in the new backbone feature extraction network Conv31 is connected to the 3 rd prediction branch of the YOLOv7 prediction network.
The connection relation of the modules in the YOLOv7 prediction network is as follows:
the connection relation of each module of the 1 st prediction branch is as follows:
the 16 th convolution Block- > branch convolution Block1- > multi_concat_block1- > RepConv1- > detection head 1;
the connection relation of each module of the 2 nd prediction branch is as follows:
the 24 th convolution Block- > branch convolution Block2- > multi_concat_block2- > multi_concat_block3- > RepConv2- > detection head 2;
the connection relation of each module of the 3 rd prediction branch is as follows:
31 st convolution Block- > multi_concat_block4- > RepConv3- > detection head 3;
the connection relation between the prediction branch modules is as follows:
31 st convolution Block- > upsampling convolution Block1- > upsampling layer 1- > multi_concat_block2- > upsampling convolution Block2- > upsampling layer 2- > multi_concat_block1;
Multi_Concat_Block1->TransitionBlock1->Multi_Concat_Block2->TransitionBlock2->Multi_Concat_Block4。
the step 4 specifically comprises the following steps:
(4.1) setting training parameters: the number of training wheels is 200, the number of infrared vehicle images selected in one training is set to be 16, the learning rate is set to be 0.001, and the confidence threshold and the IOU neglect threshold are both set to be 0.5;
(4.2) inputting 9000 infrared vehicle images in the training set 16 at a time into the model Conv31-Yolov7, and obtaining the offset value (t x ,t y ,t w ,t h ) And a target confidence p, where t x Is the offset value of the target bounding box relative to the label box in the x direction, t y Is the offset value of the target bounding box relative to the label box in the y direction, t w Is the offset value of the target bounding box relative to the label box width, t h The offset value of the target bounding box relative to the label box is high;
(4.3) the offset value (t x ,t y ,t w ,t h ) The position and width and height of the prediction frame are calculated by the following coordinate offset formula:
Figure BDA0004100625120000071
Figure BDA0004100625120000081
Figure BDA0004100625120000082
Figure BDA0004100625120000083
wherein b x ,b y To predict the position of the frame c x ,c y To mark the position of the frame b w ,b h To predict the width and height of the frame, p w ,p h The width and the height of the marking frame are the same;
(4.4) confidence (b) of the position, width and height of the prediction frame and the target x ,b y ,b w ,b h P) substituting the confidence coefficient of the position, the width and the height of the labeling frame and the target into a loss function to calculate a loss value, and updating the weight of the loss value by using a small batch of random gradient descent algorithm;
(4.5) repeating (4.2) - (4.4) until the loss value tends to stabilize and no longer decreases, stopping training, and obtaining a trained infrared vehicle detection model.
The step 5 specifically comprises the following steps:
and acquiring infrared vehicle videos on the traffic road in real time by using infrared thermal imaging equipment, and sending the infrared vehicle videos into a trained infrared vehicle detection model according to frames to obtain real-time position information and confidence of the vehicle.
The invention has the beneficial effects that:
according to the method, the main feature extraction network of the YOLOv7 algorithm is replaced by the new main feature extraction network Conv31, the Conv31 comprises 31 convolution blocks, each convolution block comprises one convolution layer, the total number of the convolution layers is 31, the network depth is increased by increasing the number of the convolution layers, the detection accuracy can be effectively improved to a certain extent, the number of the convolution layers is greatly increased by stacking the convolution blocks, the network depth can be deepened, the feature extraction capacity is enhanced, and the detection accuracy is effectively improved; the 16 th convolution block, the 24 th convolution block and the 31 st convolution block of Conv31 can respectively extract shallow layer characteristics, middle layer characteristics and deep layer characteristics of infrared vehicle targets, and through connecting the 3 convolution blocks with 3 prediction branches, fusion of multi-scale characteristics can be realized, and the capability of a network model for detecting infrared vehicle targets with different scales is improved, so that the detection accuracy is further improved. Test results show that compared with other vehicle detection methods based on convolutional neural networks, the method provided by the invention can obviously improve the detection accuracy under the premise of ensuring higher detection speed.
Drawings
FIG. 1 is a flow chart of an implementation of the present invention.
FIG. 2 is a diagram of the Conv31-YOLOv7 network constructed in the present invention.
Fig. 3 is a schematic diagram of the detection of the present invention in a practical scenario.
Detailed Description
The invention is described in further detail below with reference to the accompanying drawings.
As shown in fig. 1:
step 1: an infrared vehicle dataset is constructed.
(1.1) collecting infrared vehicle videos on an intersection, reading the first 10000 frames of the videos, setting the resolution of an image to be output to 640 x 640, outputting each frame in an image format in sequence to obtain 10000 infrared vehicle images, marking the position information of a vehicle target of the obtained infrared vehicle images, and manufacturing an infrared vehicle image data set, wherein the data set has 10000 infrared images with the resolution of 640 x 640 in total.
And (1.2) dividing the infrared vehicle image dataset into a training dataset and a test dataset according to the proportion of 9:1, namely randomly selecting 9000 infrared images from the dataset to form the training dataset, and forming the test dataset by the rest 1000 infrared images.
Step 2: a new backbone feature extraction network is constructed.
The step of constructing a new trunk feature extraction network is based on improving the trunk feature extraction network of the existing YOLOv7 algorithm. The network model in the YOLOv7 algorithm comprises a trunk feature extraction network and a prediction network, and the trunk feature extraction network is only improved in the step, and the method is concretely realized as follows:
(2.1) discarding the trunk feature extraction network in the YOLOv7 algorithm, and constructing a new trunk feature extraction network Conv31 for replacing the trunk feature extraction network in the YOLOv7 algorithm. A new trunk feature extraction network Conv31 is constructed, comprising 31 convolution blocks, wherein each convolution block has the following structure:
1 st convolution block: comprises a convolution layer with input channel number of 1, output channel number of 32, convolution kernel size of 3×3, step length of 1, and filling of 1, a batch normalization layer with channel number of 32, and a LeakyReLU activation function layer with complex slope of 0.1.
Convolution block 2: comprises a convolution layer with 32 input channels, 16 output channels, 1×1 convolution kernel size, 1 step length and 0 filling, a batch normalization layer with 16 channels, and a LeakyReLU activation function layer with a complex slope of 0.1.
3 rd convolution block: comprises a convolution layer with 16 input channels, 32 output channels, a convolution kernel size of 3×3, a step size of 1 and a filling of 1, a batch normalization layer with 32 channels, and a LeakyReLU activation function layer with a complex slope of 0.1.
4 th convolution block: comprises a convolution layer with 32 input channels, 32 output channels, 1×1 convolution kernel size, 2 step length and 0 filling, a batch normalization layer with 32 channels, and a LeakyReLU activation function layer with a complex slope of 0.1.
The 5 th convolution block and the 7 th convolution block: comprises a convolution layer with 32 input channels, 64 output channels, a convolution kernel size of 3×3, a step size of 1, and a padding of 1, a batch normalization layer with 64 channels, and a LeakyReLU activation function layer with a complex slope of 0.1.
The 6 th convolution block: comprises a convolution layer with 64 input channels, 32 output channels, 1×1 convolution kernel size, 1 step length and 0 filling, a batch normalization layer with 32 channels, and a LeakyReLU activation function layer with a complex slope of 0.1.
8 th convolution block: comprises a convolution layer with 64 input channels, 64 output channels, 1×1 convolution kernel size, 2 step length and 0 filling, a batch normalization layer with 64 channels, and a LeakyReLU activation function layer with a complex slope of 0.1.
The 9 th convolution block and the 11 th convolution block: comprises a convolution layer with 64 input channels, 128 output channels, a convolution kernel size of 3×3, a step size of 1, and a padding of 1, a batch normalization layer with 128 channels, and a LeakyReLU activation function layer with a complex slope of 0.1.
10 th convolution block: comprises a convolution layer with 128 input channels, 64 output channels, 1×1 convolution kernel size, 1 step length and 0 filling, a batch normalization layer with 64 channels, and a LeakyReLU activation function layer with a complex slope of 0.1.
12 th convolution block: comprises a convolution layer with 128 input channels, 128 output channels, 1×1 convolution kernel size, 2 step length and 0 filling, a batch normalization layer with 128 channels, and a LeakyReLU activation function layer with a complex slope of 0.1.
13 th convolution block and 15 th convolution block: comprises a convolution layer with 128 input channels, 256 output channels, 3×3 convolution kernel size, 1 step length and 1 filling, a batch normalization layer with 256 channels, and a LeakyReLU activation function layer with a complex slope of 0.1.
14 th convolution block: comprises a convolution layer with 256 input channels, 128 output channels, 1×1 convolution kernel size, 1 step length and 0 filling, a batch normalization layer with 128 channels, and a LeakyReLU activation function layer with a complex slope of 0.1.
16 th convolution block, 19 th convolution block, 21 st convolution block, 23 rd convolution block: comprises a convolution layer with 256 input channels, 512 output channels, a convolution kernel size of 3×3, a step size of 1, and a padding of 1, a batch normalization layer with 512 channels, and a LeakyReLU activation function layer with a complex slope of 0.1.
17 th convolution block, 20 th convolution block, 22 nd convolution block: comprises a convolution layer with the input channel number of 512, the output channel number of 256, the convolution kernel size of 1 multiplied by 1, the step size of 1 and the filling of 0, a batch normalization layer with the channel number of 256 and a LeakyReLU activation function layer with the complex slope of 0.1.
18 th convolution block: comprises a convolution layer with 256 input channels, 256 output channels, 1×1 convolution kernel size, 2 step length and 0 filling, a batch normalization layer with 256 channels, and a LeakyReLU activation function layer with a complex slope of 0.1.
24 th convolution block, 27 th convolution block, 29 th convolution block, 31 st convolution block: comprises a convolution layer with 512 input channels, 1024 output channels, convolution kernel size of 3×3, step size of 1, and padding of 1, a batch normalization layer with 1024 channels, and a LeakyReLU activation function layer with complex slope of 0.1.
25 th convolution block, 28 th convolution block, 30 th convolution block: all comprise a convolution layer with 1024 input channels, 512 output channels, 1×1 convolution kernel size, 1 step length and 0 padding, a batch normalization layer with 512 channels, and a LeakyReLU activation function layer with a complex slope of 0.1.
26 th convolution block: comprises a convolution layer with 512 input channels, 512 output channels, 1×1 convolution kernel, 2 step sizes and 0 filling, a batch normalization layer with 512 channels, and a LeakyReLU activation function layer with a complex slope of 0.1.
(2.2) sequentially connecting the 31 convolution blocks to obtain a new trunk feature extraction network Conv31 with the following structure:
1 st convolution block- > 2 nd convolution block- > 3 rd convolution block- > 4 th convolution block- > 5 th convolution block- > 6 th convolution block- > 7 th convolution block- > 8 th convolution block- > 9 th convolution block- > 10 th convolution block- > 11 th convolution block- > 12 th convolution block- > 13 th convolution block- > 14 th convolution block- > 15 th convolution block- > 16 th convolution block- > 17 th convolution block- > 18 th convolution block- > 20 th convolution block- > 21 th convolution block- > 22 th convolution block- > 23 th convolution block- > 24 th convolution block- > 25 th convolution block- > 26 th convolution block- > 27 th convolution block- > 28 th convolution block- > 29 th convolution block- > 31.
(2.3) replacing the backbone feature extraction network in the YOLOv7 algorithm with Conv 31.
Step 3: a new network model Conv31-YOLOv7 was constructed.
Referring to fig. 2, a new backbone feature extraction network and a predicted network of YOLOv7 are connected according to the following structural relationship to form a new network model Conv31-YOLOv7:
the 16 th convolution block in the new trunk feature extraction network Conv31 is connected to the 1 st prediction branch of the YOLOv7 prediction network.
The 24 th convolution block in the new backbone feature extraction network Conv31 is connected to the 2 nd prediction branch of the YOLOv7 prediction network.
The 31 st convolution block in the new backbone feature extraction network Conv31 is connected to the 3 rd prediction branch of the YOLOv7 prediction network.
Step 4: the new network model Conv31-YOLOv7 was trained.
(4.1) setting training parameters: the number of training rounds is 200, the number of infrared vehicle images selected in one training is set to 16, the learning rate is set to 0.001, and the confidence threshold and the IOU neglect threshold are both set to 0.5.
(4.2) inputting 9000 infrared vehicle images in the training set 16 at a time into the model Conv31-Yolov7, and obtaining the offset value (t x ,t y ,t w ,t h ) And a target confidence p, where t x Is the offset value of the target bounding box relative to the label box in the x direction, t y Is the offset value of the target bounding box relative to the label box in the y direction, t w Is the offset value of the target bounding box relative to the label box width, t h Is the offset value of the target bounding box relative to the label box.
(4.3) the offset value (t x ,t y ,t w ,t h ) The position and width and height of the prediction frame are calculated by the following coordinate offset formula:
Figure BDA0004100625120000131
Figure BDA0004100625120000132
Figure BDA0004100625120000133
Figure BDA0004100625120000134
wherein b x ,b y To predict the position of the frame c x ,c y To mark the position of the frame b w ,b h To predict the width and height of the frame, p w ,p h Is the width and height of the label frame.
(4.4) confidence (b) of the position, width and height of the prediction frame and the target x ,b y ,b w ,b h P) substituting the confidence coefficient of the position, the width and the height of the labeling frame and the target into a loss function to calculate a loss value, and updating the weight of the loss value by using a small batch random gradient descent algorithm.
(4.5) repeating (4.2) - (4.4) until the loss value tends to stabilize and no longer decreases, stopping training, and obtaining a trained infrared vehicle detection model.
Step 5: and carrying out infrared vehicle detection by using the trained model.
And acquiring infrared vehicle videos on the traffic road in real time by using infrared thermal imaging equipment, and sending the infrared vehicle videos into a trained infrared vehicle detection model according to frames to obtain real-time position information and confidence of the vehicle.
The effect of the invention is further illustrated by the following simulation experiment and measured data:
1. simulation, actual measurement environment
The simulation and actual measurement of the invention use Windows 10 operating system, a NVIDIA GeForce GTX 2060GPU acceleration, and the deep learning framework is pytorch 1.8.1.
2. Emulation content
Simulation 1: and training other target detection models based on the convolutional neural network by using the training set and the parameters which are the same as those of the invention to obtain the respectively trained infrared vehicle detection models.
And (3) sending 1000 images of the test set into the trained model of the invention according to 1 image each time to test, thereby obtaining the accuracy and speed of the infrared vehicle detection with the IOU threshold value of 0.5.
The same test set as the invention was used to test the accuracy and speed of the IOU threshold of 0.5 for infrared vehicle detection for other methods.
The method is compared with an infrared vehicle detection method based on YOLOv7 by a simulation experiment, and the comparison result is shown in Table 1:
TABLE 1
Figure BDA0004100625120000151
Comparing the invention with YOLOv7, the method can detect 31 images per second, and compared with the detection speed of 33 images per second of YOLOv7, the detection speed of the method is slightly reduced. The method has the advantages that the accuracy of the IOU threshold value of 0.5 for infrared vehicle detection is 94.36%, the accuracy of the YOLOv7 for infrared vehicle target detection is 91.89%, the detection accuracy is obviously improved compared with that of the YOLOv7, and higher detection accuracy is ensured.
3. Content of actual measurement
The infrared vehicle video on the traffic road is acquired in real time by using the infrared thermal imaging equipment and sent into the trained infrared vehicle detection model according to the frame to obtain the real-time position information and the confidence coefficient of the vehicle, as shown in fig. 3.
The large rectangular box in fig. 3 represents the predicted box surrounding the vehicle in the infrared image, and the small rectangular box above the large rectangular box shows the confidence of the vehicle target.
The foregoing detailed description of the invention is merely illustrative of the preferred embodiments of the invention and is not intended to limit the scope of the invention, but various modifications and improvements can be made by those skilled in the art without departing from the spirit of the invention, which is defined by the claims.

Claims (6)

1. An infrared vehicle detection method based on an improved YOLOv7 algorithm is characterized by comprising the following steps of;
step 1: collecting vehicle videos on a traffic channel, and carrying out frame extraction and image preprocessing to obtain an infrared vehicle image data set;
step 2: the method comprises the steps of improving a trunk feature extraction network of a Yolov7 algorithm, namely discarding the trunk feature extraction network in the Yolov7 algorithm, and constructing a new trunk feature extraction network Conv31 containing 31 convolution blocks to replace the trunk feature extraction network in the Yolov7 algorithm;
step 3: connecting the new trunk feature extraction network with the original predicted network of the YOLOv7 to form a new network model Conv31-YOLOv7;
step 4: the training data set obtained in the step 1 is sent to a network model Conv31-YOLOv7 in the step 3, and a small batch of random gradient descent algorithm is adopted for training, so that a trained infrared vehicle detection model is obtained;
step 5: and sending the infrared vehicle video on the traffic road, which is acquired by the infrared thermal imaging equipment in real time, into the trained infrared vehicle detection model according to frames to obtain real-time position information, scale information and confidence of the vehicle.
2. The method for detecting an infrared vehicle based on the modified YOLOv7 algorithm according to claim 1, wherein the steps of frame extraction and image preprocessing in step 1 specifically include:
(1.1) acquiring infrared vehicle videos on an intersection, reading the first 10000 frames of the videos, setting the resolution of an image to be output as 640 multiplied by 640, outputting each frame in an image format in sequence to obtain 10000 infrared vehicle images, marking the position information of a vehicle target of the obtained infrared vehicle images, and manufacturing an infrared vehicle image data set, wherein the data set has 10000 infrared vehicle images with the resolution of 640 multiplied by 640;
and (1.2) dividing the infrared vehicle image dataset into a training dataset and a test dataset according to the proportion of 9:1, namely randomly selecting 9000 infrared images from the dataset to form the training dataset, and forming the test dataset by the rest 1000 infrared images.
3. The method for detecting an infrared vehicle based on the modified YOLOv7 algorithm according to claim 1, wherein the step 2 specifically comprises:
(2.1) discarding the trunk feature extraction network in the YOLOv7 algorithm, and constructing a new trunk feature extraction network Conv31 for replacing the trunk feature extraction network in the YOLOv7 algorithm, wherein the new trunk feature extraction network Conv31 comprises 31 convolution blocks, and each convolution block has the following structure:
1 st convolution block: the method comprises a convolution layer with the number of input channels being 1, the number of output channels being 32, the convolution kernel size being 3 multiplied by 3, the step length being 1 and being filled with 1, a batch normalization layer with the number of channels being 32, and a LeakyReLU activation function layer with the complex slope being 0.1;
convolution block 2: comprises a convolution layer with 32 input channels, 16 output channels, 1×1 convolution kernel size, 1 step length and 0 filling, a batch normalization layer with 16 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;
3 rd convolution block: the method comprises a convolution layer with 16 input channels, 32 output channels, a convolution kernel size of 3 multiplied by 3, a step length of 1 and filling of 1, a batch normalization layer with 32 channels and a LeakyReLU activation function layer with a complex slope of 0.1;
4 th convolution block: the method comprises a convolution layer with 32 input channels, 32 output channels, 1×1 convolution kernel size, 2 step length and 0 filling, a batch normalization layer with 32 channels and a LeakyReLU activation function layer with a complex slope of 0.1;
the 5 th convolution block and the 7 th convolution block: comprises a convolution layer with 32 input channels, 64 output channels, a convolution kernel size of 3×3, a step length of 1 and 1 filling, a batch normalization layer with 64 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;
the 6 th convolution block: comprises a convolution layer with the input channel number of 64, the output channel number of 32, the convolution kernel size of 1 multiplied by 1, the step length of 1 and the filling of 0, a batch normalization layer with the channel number of 32 and a LeakyReLU activation function layer with the complex slope of 0.1;
8 th convolution block: comprises a convolution layer with 64 input channels, 64 output channels, 1×1 convolution kernel size, 2 step length and 0 filling, a batch normalization layer with 64 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;
the 9 th convolution block and the 11 th convolution block: comprises a convolution layer with the input channel number of 64, the output channel number of 128, the convolution kernel size of 3 multiplied by 3, the step length of 1 and the filling of 1, a batch normalization layer with the channel number of 128 and a LeakyReLU activation function layer with the complex slope of 0.1;
10 th convolution block: comprises a convolution layer with 128 input channels, 64 output channels, 1×1 convolution kernel size, 1 step length and 0 filling, a batch normalization layer with 64 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;
12 th convolution block: comprises a convolution layer with 128 input channels, 128 output channels, 1×1 convolution kernel size, 2 step length and 0 filling, a batch normalization layer with 128 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;
13 th convolution block and 15 th convolution block: comprises a convolution layer with 128 input channels, 256 output channels, 3×3 convolution kernel size, 1 step length and 1 filling, a batch normalization layer with 256 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;
14 th convolution block: comprises a convolution layer with 256 input channels, 128 output channels, 1×1 convolution kernel size, 1 step length and 0 filling, a batch normalization layer with 128 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;
16 th convolution block, 19 th convolution block, 21 st convolution block, 23 rd convolution block: comprises a convolution layer with 256 input channels, 512 output channels, 3×3 convolution kernel size, 1 step length and 1 filling, a batch normalization layer with 512 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;
17 th convolution block, 20 th convolution block, 22 nd convolution block: comprises a convolution layer with the input channel number of 512, the output channel number of 256, the convolution kernel size of 1 multiplied by 1, the step length of 1 and the filling of 0, a batch normalization layer with the channel number of 256 and a LeakyReLU activation function layer with the complex slope of 0.1;
18 th convolution block: comprises a convolution layer with 256 input channels, 256 output channels, 1×1 convolution kernel size, 2 step length and 0 filling, a batch normalization layer with 256 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;
24 th convolution block, 27 th convolution block, 29 th convolution block, 31 st convolution block: comprises a convolution layer with 512 input channels, 1024 output channels, convolution kernel size of 3×3, step length of 1 and filling 1, a batch normalization layer with 1024 channels, and a LeakyReLU activation function layer with complex slope of 0.1;
25 th convolution block, 28 th convolution block, 30 th convolution block: all comprise a convolution layer with 1024 input channels, 512 output channels, 1×1 convolution kernel size, 1 step length and 0 filling, a batch normalization layer with 512 channels and a LeakyReLU activation function layer with 0.1 complex slope;
26 th convolution block: comprises a convolution layer with 512 input channels, 512 output channels, 1×1 convolution kernel size, 2 step length and 0 filling, a batch normalization layer with 512 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;
(2.2) sequentially connecting the 31 convolution blocks to obtain a new trunk feature extraction network Conv31 with the following structure:
1 st convolution block- > 2 nd convolution block- > 3 rd convolution block- > 4 th convolution block- > 5 th convolution block- > 6 th convolution block- > 7 th convolution block- > 8 th convolution block- > 9 th convolution block- > 10 th convolution block- > 11 th convolution block- > 12 th convolution block- > 13 th convolution block- > 14 th convolution block- > 15 th convolution block- > 16 th convolution block- > 17 th convolution block- > 18 th convolution block- > 20 th convolution block- > 21 th convolution block- > 22 th convolution block- > 23 th convolution block- > 24 th convolution block- > 25 th convolution block- > 26 th convolution block- > 27 th convolution block- > 28 th convolution block- > 29 th convolution block- > 31 th convolution block- > the 1 st convolution block is described.
(2.3) replacing the backbone feature extraction network in the YOLOv7 algorithm with Conv 31.
4. The method for detecting an infrared vehicle based on the modified YOLOv7 algorithm according to claim 1, wherein the step 3 specifically comprises:
connecting the 16 th convolution block in the new trunk feature extraction network Conv31 obtained in the step 2 with the 1 st prediction branch of the YOLOv7 prediction network;
connecting the 24 th convolution block in the new trunk feature extraction network Conv31 with the 2 nd prediction branch of the YOLOv7 prediction network;
the 31 st convolution block in the new backbone feature extraction network Conv31 is connected to the 3 rd prediction branch of the YOLOv7 prediction network.
5. The infrared vehicle detection method based on the improved YOLOv7 algorithm of claim 4, wherein the YOLOv7 prediction network internal module connection relationship is:
the connection relation of each module of the 1 st prediction branch is as follows:
the 16 th convolution Block- > branch convolution Block1- > multi_concat_block1- > RepConv1- > detection head 1;
the connection relation of each module of the 2 nd prediction branch is as follows:
the 24 th convolution Block- > branch convolution Block2- > multi_concat_block2- > multi_concat_block3- > RepConv2- > detection head 2;
the connection relation of each module of the 3 rd prediction branch is as follows:
31 st convolution Block- > multi_Concat_Block4- > RepConv3- > detection head 3;
the connection relation between the prediction branch modules is as follows:
the 31 st convolution Block- > up-sampling convolution Block1- > up-sampling layer 1- > multi_Concat_Block2- > up-sampling convolution Block2- > up-sampling layer 2- > multi_Concat_Block1;
Multi_Concat_Block1->TransitionBlock1->Multi_Concat_Block2->TransitionBlock2->Multi_Concat_Block4。
6. the method for detecting an infrared vehicle based on the modified YOLOv7 algorithm according to claim 1, wherein the step 4 specifically comprises:
(4.1) setting training parameters: the number of training wheels is 200, the number of infrared vehicle images selected in one training is set to be 16, the learning rate is set to be 0.001, and the confidence threshold and the IOU neglect threshold are both set to be 0.5;
(4.2) inputting 9000 infrared vehicle images in the training set 16 at a time into the model Conv31-Yolov7, and obtaining the offset value (t x ,t y ,t w ,t h ) And a target confidence p, where t x Is the offset value of the target bounding box relative to the label box in the x direction, t y Is the offset value of the target bounding box relative to the label box in the y direction, t w Is the offset value of the target bounding box relative to the label box width, t h The offset value of the target bounding box relative to the label box is high;
(4.3) the offset value (t x ,t y ,t w ,t h ) The position and width and height of the prediction frame are calculated by the following coordinate offset formula:
Figure FDA0004100625110000061
Figure FDA0004100625110000062
Figure FDA0004100625110000063
Figure FDA0004100625110000064
wherein b x ,b y To predict the position of the frame c x ,c y To mark the position of the frame b w ,b h To predict the width and height of the frame, p w ,p h The width and the height of the marking frame are the same;
(4.4) confidence (b) of the position, width and height of the prediction frame and the target x ,b y ,b w ,b h P) substituting the confidence coefficient of the position, the width and the height of the labeling frame and the target into a loss function to calculate a loss value, and updating the weight of the loss value by using a small batch of random gradient descent algorithm;
(4.5) repeating (4.2) - (4.4) until the loss value tends to stabilize and no longer decreases, stopping training, and obtaining a trained infrared vehicle detection model.
CN202310175297.4A 2023-02-28 2023-02-28 Infrared vehicle detection method based on improved YOLOv7 algorithm Pending CN116129327A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310175297.4A CN116129327A (en) 2023-02-28 2023-02-28 Infrared vehicle detection method based on improved YOLOv7 algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310175297.4A CN116129327A (en) 2023-02-28 2023-02-28 Infrared vehicle detection method based on improved YOLOv7 algorithm

Publications (1)

Publication Number Publication Date
CN116129327A true CN116129327A (en) 2023-05-16

Family

ID=86297468

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310175297.4A Pending CN116129327A (en) 2023-02-28 2023-02-28 Infrared vehicle detection method based on improved YOLOv7 algorithm

Country Status (1)

Country Link
CN (1) CN116129327A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116485802A (en) * 2023-06-26 2023-07-25 广东电网有限责任公司湛江供电局 Insulator flashover defect detection method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116485802A (en) * 2023-06-26 2023-07-25 广东电网有限责任公司湛江供电局 Insulator flashover defect detection method, device, equipment and storage medium
CN116485802B (en) * 2023-06-26 2024-01-26 广东电网有限责任公司湛江供电局 Insulator flashover defect detection method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111461083A (en) Rapid vehicle detection method based on deep learning
CN113436169B (en) Industrial equipment surface crack detection method and system based on semi-supervised semantic segmentation
CN108875595A (en) A kind of Driving Scene object detection method merged based on deep learning and multilayer feature
CN111582029B (en) Traffic sign identification method based on dense connection and attention mechanism
CN112084901A (en) GCAM-based high-resolution SAR image airport runway area automatic detection method and system
CN111967313B (en) Unmanned aerial vehicle image annotation method assisted by deep learning target detection algorithm
CN114495029B (en) Traffic target detection method and system based on improved YOLOv4
CN113420643B (en) Lightweight underwater target detection method based on depth separable cavity convolution
CN113888754B (en) Vehicle multi-attribute identification method based on radar vision fusion
CN109508675A (en) A kind of pedestrian detection method for complex scene
CN111428558A (en) Vehicle detection method based on improved YO L Ov3 method
CN113327248B (en) Tunnel traffic flow statistical method based on video
CN112528934A (en) Improved YOLOv3 traffic sign detection method based on multi-scale feature layer
CN113297915A (en) Insulator recognition target detection method based on unmanned aerial vehicle inspection
CN111915583A (en) Vehicle and pedestrian detection method based on vehicle-mounted thermal infrared imager in complex scene
CN113313031B (en) Deep learning-based lane line detection and vehicle transverse positioning method
CN114049572A (en) Detection method for identifying small target
CN109272060A (en) A kind of method and system carrying out target detection based on improved darknet neural network
CN112613428B (en) Resnet-3D convolution cattle video target detection method based on balance loss
CN114332942A (en) Night infrared pedestrian detection method and system based on improved YOLOv3
CN111582339A (en) Vehicle detection and identification method based on deep learning
CN113239753A (en) Improved traffic sign detection and identification method based on YOLOv4
CN115147380A (en) Small transparent plastic product defect detection method based on YOLOv5
CN115331183A (en) Improved YOLOv5s infrared target detection method
CN116129327A (en) Infrared vehicle detection method based on improved YOLOv7 algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination