CN116129327A

CN116129327A - Infrared vehicle detection method based on improved YOLOv7 algorithm

Info

Publication number: CN116129327A
Application number: CN202310175297.4A
Authority: CN
Inventors: 徐一铭; 姬红兵; 张文博; 李林; 臧博; 龙璐岚
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2023-02-28
Filing date: 2023-02-28
Publication date: 2023-05-16
Anticipated expiration: 2043-02-28

Abstract

The invention discloses an infrared vehicle detection method based on an improved YOLOv7 algorithm, which comprises the following steps of; step 1: collecting vehicle videos on a traffic channel, and carrying out frame extraction and image preprocessing to obtain an infrared vehicle image data set; step 2: constructing a new trunk feature extraction network Conv31 containing 31 convolution blocks; step 3: connecting the new trunk feature extraction network with the original predicted network of the YOLOv7 to form a new network model Conv31-YOLOv7; step 4: the training data set obtained in the step 1 is sent to a new network model Conv31-YOLOv7 in the step 3, and a small batch of random gradient descent algorithm is adopted for training, so that a trained infrared vehicle detection model is obtained; step 5: and sending the infrared vehicle video on the traffic road, which is acquired by the infrared thermal imaging equipment in real time, into a trained infrared vehicle detection model according to frames to obtain real-time position information, scale information and confidence of the vehicle. The invention obviously improves the detection accuracy on the premise of ensuring higher detection speed.

Description

An infrared vehicle detection method based on improved YOLOv7 algorithm

技术领域Technical Field

本发明属于车辆检测技术领域，具体涉及一种基于改进YOLOv7算法的红外车辆检测方法。The invention belongs to the technical field of vehicle detection, and in particular relates to an infrared vehicle detection method based on an improved YOLOv7 algorithm.

背景技术Background Art

红外目标检测技术是指从红外图像中自动提取出目标的位置信息。鉴于红外热成像的优势，红外目标检测技术可应用于交通道路上的车辆检测场景中，并可适应黑夜、强光和极端天气的情况，因此，该技术的突破对自动驾驶和智能交通等领域具有重要理论意义和实用价值。Infrared target detection technology refers to the automatic extraction of target location information from infrared images. Given the advantages of infrared thermal imaging, infrared target detection technology can be applied to vehicle detection scenarios on traffic roads and can adapt to conditions such as darkness, strong light, and extreme weather. Therefore, the breakthrough of this technology has important theoretical significance and practical value in fields such as autonomous driving and intelligent transportation.

传统的红外车辆检测方法通常首先采用梯度方向直方图等方法提取目标的特征，然后利用正负样本训练支持向量机等分类器对目标特征进行分类。这种方法检测速度慢，无法满足时性的要求，且存在应用场景受限，鲁棒性差，泛化能力弱的问题。Traditional infrared vehicle detection methods usually first use methods such as gradient directional histograms to extract target features, and then use positive and negative samples to train classifiers such as support vector machines to classify target features. This method has a slow detection speed and cannot meet the requirements of timeliness. It also has problems such as limited application scenarios, poor robustness, and weak generalization ability.

近年来，随着人工智能技术的快速发展，基于卷积神经网络的红外车辆检测方法得到了广泛应用。它可通过卷积神经网络自动对图像进行特征抽象和特征提取，具有较高的检测准确率与较强的鲁棒性。In recent years, with the rapid development of artificial intelligence technology, infrared vehicle detection methods based on convolutional neural networks have been widely used. It can automatically abstract and extract features from images through convolutional neural networks, and has high detection accuracy and strong robustness.

目前，基于深度学习的目标检测算法主要包括两类，一类是两阶段检测算法，该类算法将检测过程分为两个阶段，第一阶段生成待检测图像的候选区域，第二阶段对生成的候选区域作分类和回归，得到最终的检测结果。该类算法第一阶段比较耗时，总体上检测准确率高，但检测速度慢，一般无法满足实时性的需求，代表算法有R-CNN、Fast R-CNN、Faster R-CNN等。另一类是单阶段检测算法，该类算法将上述的两阶段检测过程统一为一个端到端的回归过程，区域选择和检测判断这两个步骤合二为一，检测准确率较低，但检测速度快，代表算法有YOLO和SSD等。At present, there are two main types of target detection algorithms based on deep learning. One type is a two-stage detection algorithm, which divides the detection process into two stages. The first stage generates candidate regions of the image to be detected, and the second stage classifies and regresses the generated candidate regions to obtain the final detection results. The first stage of this type of algorithm is relatively time-consuming, and the overall detection accuracy is high, but the detection speed is slow, and generally cannot meet the real-time requirements. Representative algorithms include R-CNN, Fast R-CNN, Faster R-CNN, etc. The other type is a single-stage detection algorithm, which unifies the above two-stage detection process into an end-to-end regression process, combining the two steps of region selection and detection judgment into one. The detection accuracy is low, but the detection speed is fast. Representative algorithms include YOLO and SSD, etc.

基于深度学习的目标检测算法在可见光图像目标检测场景中具有较好的检测效果，但在红外目标检测场景中，由于红外图像为单通道图像、特征不明显，红外车辆目标的特征提取较为困难，致使当前主流目标检测算法检测准确率普遍较低，难以满足实际需求。The target detection algorithm based on deep learning has good detection effect in visible light image target detection scenarios. However, in infrared target detection scenarios, since infrared images are single-channel images with unclear features, feature extraction of infrared vehicle targets is more difficult. As a result, the detection accuracy of current mainstream target detection algorithms is generally low and it is difficult to meet actual needs.

发明内容Summary of the invention

为了克服上述现有技术存在的不足，本发明的目的在于提供一种基于改进YOLOv7算法的红外车辆检测方法，在保证较高检测速度的前提下，显著提高检测准确率。In order to overcome the shortcomings of the above-mentioned prior art, the purpose of the present invention is to provide an infrared vehicle detection method based on an improved YOLOv7 algorithm, which significantly improves the detection accuracy while ensuring a high detection speed.

为了实现上述目的，本发明采用的技术方案是：In order to achieve the above object, the technical solution adopted by the present invention is:

一种基于改进YOLOv7算法的红外车辆检测方法，包括以下步骤；An infrared vehicle detection method based on an improved YOLOv7 algorithm comprises the following steps:

步骤1：采集交通道路上的车辆视频进行帧提取和图像预处理，得到红外车辆图像数据集；Step 1: Collect vehicle videos on traffic roads, perform frame extraction and image preprocessing, and obtain infrared vehicle image datasets;

步骤2：对YOLOv7算法主干特征提取网络进行改进，即舍弃YOLOv7算法中的主干特征提取网络，构建一个包含31个卷积块的新主干特征提取网络Conv31，替换YOLOv7算法中的主干特征提取网络；Step 2: Improve the backbone feature extraction network of the YOLOv7 algorithm, that is, discard the backbone feature extraction network in the YOLOv7 algorithm, build a new backbone feature extraction network Conv31 containing 31 convolutional blocks, and replace the backbone feature extraction network in the YOLOv7 algorithm;

步骤3：将新的主干特征提取网络与原YOLOv7的预测网络连接，构成新的网络模型Conv31-YOLOv7；Step 3: Connect the new backbone feature extraction network with the original YOLOv7 prediction network to form a new network model Conv31-YOLOv7;

步骤4：将步骤1得到的训练数据集送入到步骤3的网络模型Conv31-YOLOv7中，采用小批次随机梯度下降算法进行训练，得到训练好的红外车辆检测模型；Step 4: Send the training data set obtained in step 1 to the network model Conv31-YOLOv7 in step 3, and use the small batch stochastic gradient descent algorithm for training to obtain a trained infrared vehicle detection model;

步骤5：将红外热成像设备实时采集的交通道路上的红外车辆视频按帧送入到所述训练好的红外车辆检测模型，得到车辆的实时位置信息、尺度信息和置信度。Step 5: The infrared vehicle video on the traffic road collected in real time by the infrared thermal imaging device is sent frame by frame to the trained infrared vehicle detection model to obtain the real-time position information, scale information and confidence of the vehicle.

所述步骤1中进行帧提取和图像预处理的步骤具体为：The steps of frame extraction and image preprocessing in step 1 are specifically as follows:

(1.1)采集交通道路上的红外车辆视频，读取视频的前10000帧，设置待输出图像的分辨率为640×640，将每一帧按顺序以图像格式输出，得到10000张红外车辆图像，并对所得到的红外车辆图像的车辆目标的位置信息进行标注，制作成红外车辆图像数据集，该数据集共有10000张分辨率为640×640的红外车辆图像；(1.1) Collect infrared vehicle videos on traffic roads, read the first 10,000 frames of the video, set the resolution of the output image to 640×640, output each frame in sequence in image format, obtain 10,000 infrared vehicle images, and annotate the location information of the vehicle targets in the obtained infrared vehicle images to create an infrared vehicle image dataset. The dataset contains 10,000 infrared vehicle images with a resolution of 640×640;

(1.2)将红外车辆图像数据集按照9:1的比例划分为训练数据集和测试数据集，即从数据集中随机选取9000张红外图像组成训练集，剩余1000张红外图像组成测试集。(1.2) The infrared vehicle image dataset is divided into a training dataset and a test dataset in a ratio of 9:1. That is, 9000 infrared images are randomly selected from the dataset to form the training set, and the remaining 1000 infrared images form the test set.

所述步骤2具体为：The step 2 is specifically as follows:

(2.1)舍弃YOLOv7算法中的主干特征提取网络，构建一个新的主干特征提取网络Conv31，用以替换YOLOv7算法中的主干特征提取网络，构建新的主干特征提取网络Conv31，包含31个卷积块，其中每个卷积块结构如下：(2.1) Abandon the backbone feature extraction network in the YOLOv7 algorithm and construct a new backbone feature extraction network Conv31 to replace the backbone feature extraction network in the YOLOv7 algorithm. Construct a new backbone feature extraction network Conv31, which contains 31 convolution blocks, and each convolution block has the following structure:

第1个卷积块：包含一个输入通道数为1，输出通道数为32，卷积核大小为3×3，步长为1，填充为1的卷积层，一个通道数为32的批量归一化层，一个复斜率为0.1的LeakyReLU激活函数层；The first convolutional block: contains a convolutional layer with an input channel of 1, an output channel of 32, a kernel size of 3×3, a stride of 1, and a padding of 1, a batch normalization layer with a channel number of 32, and a LeakyReLU activation function layer with a complex slope of 0.1;

第2个卷积块：包含一个输入通道数为32，输出通道数为16，卷积核大小为1×1，步长为1，填充为0的卷积层，一个通道数为16的批量归一化层，一个复斜率为0.1的LeakyReLU激活函数层；The second convolutional block: contains a convolutional layer with 32 input channels, 16 output channels, a kernel size of 1×1, a stride of 1, and a padding of 0, a batch normalization layer with 16 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;

第3个卷积块：包含一个输入通道数为16，输出通道数为32，卷积核大小为3×3，步长为1，填充为1的卷积层，一个通道数为32的批量归一化层，一个复斜率为0.1的LeakyReLU激活函数层；The third convolutional block: contains a convolutional layer with 16 input channels, 32 output channels, a kernel size of 3×3, a stride of 1, and a padding of 1, a batch normalization layer with 32 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;

第4个卷积块：包含一个输入通道数为32，输出通道数为32，卷积核大小为1×1，步长为2，填充为0的卷积层，一个通道数为32的批量归一化层，一个复斜率为0.1的LeakyReLU激活函数层；The fourth convolutional block: contains a convolutional layer with 32 input channels, 32 output channels, a kernel size of 1×1, a stride of 2, and a padding of 0, a batch normalization layer with 32 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;

第5个卷积块和第7个卷积块：包含一个输入通道数为32，输出通道数为64，卷积核大小为3×3，步长为1，填充为1的卷积层，一个通道数为64的批量归一化层，一个复斜率为0.1的LeakyReLU激活函数层；The fifth and seventh convolution blocks: contain a convolutional layer with 32 input channels, 64 output channels, a kernel size of 3×3, a stride of 1, and a padding of 1, a batch normalization layer with 64 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;

第6个卷积块：包含一个输入通道数为64，输出通道数为32，卷积核大小为1×1，步长为1，填充为0的卷积层，一个通道数为32的批量归一化层，一个复斜率为0.1的LeakyReLU激活函数层；The sixth convolutional block: contains a convolutional layer with 64 input channels, 32 output channels, a kernel size of 1×1, a stride of 1, and a padding of 0, a batch normalization layer with 32 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;

第8个卷积块：包含一个输入通道数为64，输出通道数为64，卷积核大小为1×1，步长为2，填充为0的卷积层，一个通道数为64的批量归一化层，一个复斜率为0.1的LeakyReLU激活函数层；The 8th convolutional block: contains a convolutional layer with 64 input channels, 64 output channels, a kernel size of 1×1, a stride of 2, and a padding of 0, a batch normalization layer with 64 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;

第9个卷积块和第11个卷积块：包含一个输入通道数为64，输出通道数为128，卷积核大小为3×3，步长为1，填充为1的卷积层，一个通道数为128的批量归一化层，一个复斜率为0.1的LeakyReLU激活函数层；The 9th and 11th convolutional blocks: contain a convolutional layer with 64 input channels, 128 output channels, a convolution kernel size of 3×3, a stride of 1, and a padding of 1, a batch normalization layer with 128 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;

第10个卷积块：包含一个输入通道数为128，输出通道数为64，卷积核大小为1×1，步长为1，填充为0的卷积层，一个通道数为64的批量归一化层，一个复斜率为0.1的LeakyReLU激活函数层；The 10th convolutional block: contains a convolutional layer with 128 input channels, 64 output channels, a kernel size of 1×1, a stride of 1, and a padding of 0, a batch normalization layer with 64 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;

第12个卷积块：包含一个输入通道数为128，输出通道数为128，卷积核大小为1×1，步长为2，填充为0的卷积层，一个通道数为128的批量归一化层，一个复斜率为0.1的LeakyReLU激活函数层；The 12th convolutional block: contains a convolutional layer with 128 input channels, 128 output channels, a kernel size of 1×1, a stride of 2, and a padding of 0, a batch normalization layer with 128 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;

第13个卷积块和第15个卷积块：包含一个输入通道数为128，输出通道数为256，卷积核大小为3×3，步长为1，填充为1的卷积层，一个通道数为256的批量归一化层，一个复斜率为0.1的LeakyReLU激活函数层；The 13th and 15th convolutional blocks: contain a convolutional layer with 128 input channels, 256 output channels, a kernel size of 3×3, a stride of 1, and a padding of 1, a batch normalization layer with 256 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;

第14个卷积块：包含一个输入通道数为256，输出通道数为128，卷积核大小为1×1，步长为1，填充为0的卷积层，一个通道数为128的批量归一化层，一个复斜率为0.1的LeakyReLU激活函数层；The 14th convolutional block: contains a convolutional layer with 256 input channels, 128 output channels, a kernel size of 1×1, a stride of 1, and a padding of 0, a batch normalization layer with 128 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;

第16个卷积块、第19个卷积块、第21个卷积块、第23个卷积块：包含一个输入通道数为256，输出通道数为512，卷积核大小为3×3，步长为1，填充为1的卷积层，一个通道数为512的批量归一化层，一个复斜率为0.1的LeakyReLU激活函数层；The 16th convolution block, the 19th convolution block, the 21st convolution block, and the 23rd convolution block: contain a convolution layer with an input channel number of 256, an output channel number of 512, a convolution kernel size of 3×3, a stride of 1, and a padding of 1, a batch normalization layer with a channel number of 512, and a LeakyReLU activation function layer with a complex slope of 0.1;

第17个卷积块、第20个卷积块、第22个卷积块：包含一个输入通道数为512，输出通道数为256，卷积核大小为1×1，步长为1，填充为0的卷积层，一个通道数为256的批量归一化层，一个复斜率为0.1的LeakyReLU激活函数层；The 17th convolutional block, the 20th convolutional block, and the 22nd convolutional block: contain a convolutional layer with an input channel number of 512, an output channel number of 256, a convolution kernel size of 1×1, a stride of 1, and a padding of 0, a batch normalization layer with a channel number of 256, and a LeakyReLU activation function layer with a complex slope of 0.1;

第18个卷积块：包含一个输入通道数为256，输出通道数为256，卷积核大小为1×1，步长为2，填充为0的卷积层，一个通道数为256的批量归一化层，一个复斜率为0.1的LeakyReLU激活函数层；The 18th convolutional block: contains a convolutional layer with 256 input channels, 256 output channels, a kernel size of 1×1, a stride of 2, and a padding of 0, a batch normalization layer with 256 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;

第24个卷积块、第27个卷积块、第29个卷积块、第31个卷积块：包含一个输入通道数为512，输出通道数为1024，卷积核大小为3×3，步长为1，填充为1的卷积层，一个通道数为1024的批量归一化层，一个复斜率为0.1的LeakyReLU激活函数层；The 24th convolution block, the 27th convolution block, the 29th convolution block, and the 31st convolution block: contain a convolution layer with an input channel number of 512, an output channel number of 1024, a convolution kernel size of 3×3, a stride of 1, and a padding of 1, a batch normalization layer with a channel number of 1024, and a LeakyReLU activation function layer with a complex slope of 0.1;

第25个卷积块、第28个卷积块、第30个卷积块：均包含一个输入通道数为1024，输出通道数为512，卷积核大小为1×1，步长为1，填充为0的卷积层，一个通道数为512的批量归一化层，一个复斜率为0.1的LeakyReLU激活函数层；The 25th, 28th, and 30th convolution blocks all contain a convolution layer with an input channel number of 1024, an output channel number of 512, a convolution kernel size of 1×1, a stride of 1, and a padding of 0, a batch normalization layer with a channel number of 512, and a LeakyReLU activation function layer with a complex slope of 0.1;

第26个卷积块：包含一个输入通道数为512，输出通道数为512，卷积核大小为1×1，步长为2，填充为0的卷积层，一个通道数为512的批量归一化层，一个复斜率为0.1的LeakyReLU激活函数层；The 26th convolutional block: contains a convolutional layer with an input channel number of 512, an output channel number of 512, a convolution kernel size of 1×1, a stride of 2, and a padding of 0, a batch normalization layer with a channel number of 512, and a LeakyReLU activation function layer with a complex slope of 0.1;

(2.2)将上述31个卷积块按照顺序依次连接，得到如下结构的新的主干特征提取网络Conv31：(2.2) Connect the above 31 convolution blocks in sequence to obtain a new backbone feature extraction network Conv31 with the following structure:

第1个卷积块->第2个卷积块->第3个卷积块->第4个卷积块->第5个卷积块->第6个卷积块->第7个卷积块->第8个卷积块->第9个卷积块->第10个卷积块->第11个卷积块->第12个卷积块->第13个卷积块->第14个卷积块->第15个卷积块->第16个卷积块->第17个卷积块->第18个卷积块->第19个卷积块->第20个卷积块->第21个卷积块->第22个卷积块->第23个卷积块->第24个卷积块->第25个卷积块->第26个卷积块->第27个卷积块->第28个卷积块->第29个卷积块->第30个卷积块->第31个卷积块；1st convolution block -> 2nd convolution block -> 3rd convolution block -> 4th convolution block -> 5th convolution block -> 6th convolution block -> 7th convolution block -> 8th convolution block -> 9th convolution block -> 10th convolution block -> 11th convolution block -> 12th convolution block -> 13th convolution block -> 14th convolution block -> 15th convolution block -> 16th convolution block -> 17th convolution block -> 18th convolution block -> 19th convolution block -> 20th convolution block -> 21st convolution block -> 22nd convolution block -> 23rd convolution block -> 24th convolution block -> 25th convolution block -> 26th convolution block -> 27th convolution block -> 28th convolution block -> 29th convolution block -> 30th convolution block -> 31st convolution block;

(2.3)用Conv31替换YOLOv7算法中的主干特征提取网络。(2.3) Replace the backbone feature extraction network in the YOLOv7 algorithm with Conv31.

所述步骤3具体为：The step 3 is specifically as follows:

将步骤2得到的新的主干特征提取网络Conv31中的第16个卷积块与YOLOv7预测网络的第1个预测支路连接；Connect the 16th convolutional block in the new backbone feature extraction network Conv31 obtained in step 2 to the first prediction branch of the YOLOv7 prediction network;

将新的主干特征提取网络Conv31中的第24个卷积块与YOLOv7预测网络的第2个预测支路连接；Connect the 24th convolutional block in the new backbone feature extraction network Conv31 to the second prediction branch of the YOLOv7 prediction network;

将新的主干特征提取网络Conv31中的第31个卷积块与YOLOv7预测网络的第3个预测支路连接。Connect the 31st convolutional block in the new backbone feature extraction network Conv31 to the 3rd prediction branch of the YOLOv7 prediction network.

所述YOLOv7预测网络内部模块连接关系为：The internal module connection relationship of the YOLOv7 prediction network is:

第1个预测支路各模块连接关系如下：The connection relationship between the modules of the first prediction branch is as follows:

第16个卷积块->支路卷积块1->Multi_Concat_Block1->RepConv1->检测头1；16th convolution block->branch convolution block 1->Multi_Concat_Block1->RepConv1->Detection head 1;

第2个预测支路各模块连接关系如下：The connection relationship between the modules of the second prediction branch is as follows:

第24个卷积块->支路卷积块2->Multi_Concat_Block2->Multi_Concat_Block3->RepConv2->检测头2；24th convolution block->branch convolution block 2->Multi_Concat_Block2->Multi_Concat_Block3->RepConv2->Detection head 2;

第3个预测支路各模块连接关系如下：The connection relationship between the modules of the third prediction branch is as follows:

第31个卷积块->Multi_Concat_Block4->RepConv3->检测头3；31st convolution block->Multi_Concat_Block4->RepConv3->Detection head 3;

各预测支路模块之间的连接关系如下：The connection relationship between the prediction branch modules is as follows:

第31个卷积块->上采样卷积块1->上采样层1->Multi_Concat_Block2->上采样卷积块2->上采样层2->Multi_Concat_Block1；31st convolution block -> upsampling convolution block 1 -> upsampling layer 1 -> Multi_Concat_Block2 -> upsampling convolution block 2 -> upsampling layer 2 -> Multi_Concat_Block1;

Multi_Concat_Block1->TransitionBlock1->Multi_Concat_Block2->TransitionBlock2->Multi_Concat_Block4。Multi_Concat_Block1->TransitionBlock1->Multi_Concat_Block2->TransitionBlock2->Multi_Concat_Block4.

所述步骤4具体为：The step 4 is specifically as follows:

(4.1)设置训练参数：训练轮数为200，一次训练所选取的红外车辆图像数量设置为16，学习率设置为0.001，置信度阈值和IOU忽略阈值均设置为0.5；(4.1) Set the training parameters: the number of training rounds is 200, the number of infrared vehicle images selected for one training is set to 16, the learning rate is set to 0.001, and the confidence threshold and IOU ignore threshold are both set to 0.5;

(4.2)将训练集中的9000张红外车辆图像按每次16张输入到模型Conv31-YOLOv7中，每次输出得到目标边界框相对于标注框的偏移值(t_x,t_y,t_w,t_h)和目标置信度p，其中t_x是目标边界框相对于标注框在x方向的偏移值，t_y是目标边界框相对于标注框在y方向的偏移值，t_w是目标边界框相对于标注框宽的偏移值，t_h是目标边界框相对于标注框高的偏移值；(4.2) Input 16 images of 9000 infrared vehicle images in the training set into the Conv31-YOLOv7 model at a time. Each time, the output is the offset value ( _tx , _ty , _tw , _th ) of the target bounding box relative to the labeled box and the target confidence p, where _tx is the offset value of the target bounding box relative to the labeled box in the x direction, _ty is the offset value of the target bounding box relative to the labeled box in the y direction, _tw is the offset value of the target bounding box relative to the width of the labeled box, and _th is the offset value of the target bounding box relative to the height of the labeled box;

(4.3)将偏移值(t_x,t_y,t_w,t_h)通过以下坐标偏移公式，计算得到预测框的位置和宽高：(4.3) The offset value ( _tx , _ty , _tw , _th ) is calculated using the following coordinate offset formula to obtain the position, width and height of the predicted box:

其中，b_x，b_y为预测框的位置，c_x，c_y为标注框的位置，b_w，b_h为预测框的宽和高，p_w，p_h为标注框的宽和高；Among them, _bx _, by is the position of the prediction box, _cx , _cy is the position of the annotation box, _bw , _bh are the width and height of the prediction box, _pw , _ph are the width and height of the annotation box;

(4.4)将预测框的位置、宽高和目标的置信度(b_x,b_y,b_w,b_h,p)与标注框的位置、宽高和目标的置信度代入损失函数中计算损失值，并使用小批次随机梯度下降算法更新其权重；(4.4) Substitute the position, width, height and confidence of the predicted box (b _x , _by , b _w , b _h , p) and the position, width, height and confidence of the labeled box into the loss function to calculate the loss value, and use the mini-batch stochastic gradient descent algorithm to update its weights;

(4.5)重复(4.2)-(4.4)，直到损失值趋于稳定不再下降时，停止训练，得到训练好的红外车辆检测模型。(4.5) Repeat (4.2)-(4.4) until the loss value stabilizes and no longer decreases, then stop training to obtain a trained infrared vehicle detection model.

所述步骤5具体为：The step 5 is specifically as follows:

用红外热成像设备实时采集交通道路上的红外车辆视频，并将其按帧送入到已经训练好的红外车辆检测模型，得到车辆的实时位置信息和置信度。Infrared thermal imaging equipment is used to collect infrared vehicle videos on traffic roads in real time, and the videos are fed frame by frame into the trained infrared vehicle detection model to obtain the real-time location information and confidence of the vehicle.

本发明的有益效果：Beneficial effects of the present invention:

本发明由于将YOLOv7算法的主干特征提取网络替换为新的主干特征提取网络Conv31，Conv31包含31个卷积块，每个卷积块各含有一个卷积层，共31个卷积层，通过增加卷积层数量来增加网络深度在一定程度上可以有效提高检测准确率，本方法通过堆叠卷积块来大量增加卷积层数量，可以加深网络深度，强化特征提取能力，有效提高检测准确率；Conv31的第16个卷积块、第24个卷积块和第31个卷积块可分别提取红外车辆目标的浅层特征、中层特征和深层特征，通过将这3个卷积块与3个预测支路相连接，可以实现多尺度特征的融合，提高网络模型检测不同尺度红外车辆目标的能力，从而进一步提高检测准确率。测试结果表明，本发明与其他基于卷积神经网络的车辆检测方法相比，能在保证较高检测速度的前提下，显著提高检测准确率。The present invention replaces the backbone feature extraction network of the YOLOv7 algorithm with a new backbone feature extraction network Conv31, Conv31 contains 31 convolution blocks, each of which contains a convolution layer, a total of 31 convolution layers, and increasing the number of convolution layers to increase the network depth can effectively improve the detection accuracy to a certain extent. The present method increases the number of convolution layers by stacking convolution blocks, which can deepen the network depth, strengthen the feature extraction ability, and effectively improve the detection accuracy; the 16th convolution block, the 24th convolution block and the 31st convolution block of Conv31 can respectively extract the shallow features, middle features and deep features of the infrared vehicle target. By connecting these three convolution blocks with three prediction branches, the fusion of multi-scale features can be realized, and the ability of the network model to detect infrared vehicle targets of different scales can be improved, thereby further improving the detection accuracy. The test results show that compared with other vehicle detection methods based on convolutional neural networks, the present invention can significantly improve the detection accuracy while ensuring a high detection speed.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本发明实现流程图。FIG1 is a flow chart of the implementation of the present invention.

图2是本发明中构建的Conv31-YOLOv7网络结构图。FIG. 2 is a diagram of the Conv31-YOLOv7 network structure constructed in the present invention.

图3是本发明在实际场景下的检测示意图。FIG. 3 is a detection schematic diagram of the present invention in an actual scenario.

具体实施方式DETAILED DESCRIPTION

下面结合附图对本发明作进一步详细说明。The present invention will be further described in detail below in conjunction with the accompanying drawings.

如图1所示：As shown in Figure 1:

步骤1：构建红外车辆数据集。Step 1: Build an infrared vehicle dataset.

(1.1)采集交通道路上的红外车辆视频，读取视频的前10000帧，设置待输出图像的分辨率为640×640，将每一帧按顺序以图像格式输出，得到10000张红外车辆图像，，并对所得到的红外车辆图像的车辆目标的位置信息进行标注，制作成红外车辆图像数据集，该数据集共有10000张分辨率为640×640的红外图像。(1.1) Collect infrared vehicle videos on traffic roads, read the first 10,000 frames of the video, set the resolution of the output image to 640×640, output each frame in sequence in image format, and obtain 10,000 infrared vehicle images. Annotate the location information of the vehicle targets in the obtained infrared vehicle images to create an infrared vehicle image dataset. The dataset contains 10,000 infrared images with a resolution of 640×640.

步骤2：构建新的主干特征提取网络。Step 2: Build a new backbone feature extraction network.

本步骤构建新的主干特征提取网络是基于对现有YOLOv7算法的主干特征提取网络进行改进。所述YOLOv7算法中的网络模型包括一个主干特征提取网络和一个预测网络，本步骤仅对其主干特征提取网络进行改进，具体实现如下：This step constructs a new backbone feature extraction network based on the improvement of the backbone feature extraction network of the existing YOLOv7 algorithm. The network model in the YOLOv7 algorithm includes a backbone feature extraction network and a prediction network. This step only improves its backbone feature extraction network, which is specifically implemented as follows:

(2.1)舍弃YOLOv7算法中的主干特征提取网络，构建一个新的主干特征提取网络Conv31，用以替换YOLOv7算法中的主干特征提取网络。构建新的主干特征提取网络Conv31，包含31个卷积块，其中每个卷积块结构如下：(2.1) Abandon the backbone feature extraction network in the YOLOv7 algorithm and construct a new backbone feature extraction network Conv31 to replace the backbone feature extraction network in the YOLOv7 algorithm. Construct a new backbone feature extraction network Conv31, which contains 31 convolution blocks, each of which has the following structure:

第1个卷积块：包含一个输入通道数为1，输出通道数为32，卷积核大小为3×3，步长为1，填充为1的卷积层，一个通道数为32的批量归一化层，一个复斜率为0.1的LeakyReLU激活函数层。The first convolution block: contains a convolution layer with an input channel of 1, an output channel of 32, a convolution kernel size of 3×3, a stride of 1, and a padding of 1, a batch normalization layer with a channel number of 32, and a LeakyReLU activation function layer with a complex slope of 0.1.

第2个卷积块：包含一个输入通道数为32，输出通道数为16，卷积核大小为1×1，步长为1，填充为0的卷积层，一个通道数为16的批量归一化层，一个复斜率为0.1的LeakyReLU激活函数层。The second convolution block: contains a convolution layer with 32 input channels, 16 output channels, a convolution kernel size of 1×1, a stride of 1, and a padding of 0, a batch normalization layer with 16 channels, and a LeakyReLU activation function layer with a complex slope of 0.1.

第3个卷积块：包含一个输入通道数为16，输出通道数为32，卷积核大小为3×3，步长为1，填充为1的卷积层，一个通道数为32的批量归一化层，一个复斜率为0.1的LeakyReLU激活函数层。The third convolutional block: contains a convolutional layer with 16 input channels, 32 output channels, a convolution kernel size of 3×3, a stride of 1, and a padding of 1, a batch normalization layer with 32 channels, and a LeakyReLU activation function layer with a complex slope of 0.1.

第4个卷积块：包含一个输入通道数为32，输出通道数为32，卷积核大小为1×1，步长为2，填充为0的卷积层，一个通道数为32的批量归一化层，一个复斜率为0.1的LeakyReLU激活函数层。The fourth convolutional block: contains a convolutional layer with 32 input channels, 32 output channels, a convolution kernel size of 1×1, a stride of 2, and a padding of 0, a batch normalization layer with 32 channels, and a LeakyReLU activation function layer with a complex slope of 0.1.

第5个卷积块和第7个卷积块：包含一个输入通道数为32，输出通道数为64，卷积核大小为3×3，步长为1，填充为1的卷积层，一个通道数为64的批量归一化层，一个复斜率为0.1的LeakyReLU激活函数层。The 5th and 7th convolutional blocks contain a convolutional layer with 32 input channels, 64 output channels, a convolution kernel size of 3×3, a stride of 1, and a padding of 1, a batch normalization layer with 64 channels, and a LeakyReLU activation function layer with a complex slope of 0.1.

第6个卷积块：包含一个输入通道数为64，输出通道数为32，卷积核大小为1×1，步长为1，填充为0的卷积层，一个通道数为32的批量归一化层，一个复斜率为0.1的LeakyReLU激活函数层。The sixth convolutional block: contains a convolutional layer with 64 input channels, 32 output channels, a convolution kernel size of 1×1, a stride of 1, and a padding of 0, a batch normalization layer with 32 channels, and a LeakyReLU activation function layer with a complex slope of 0.1.

第8个卷积块：包含一个输入通道数为64，输出通道数为64，卷积核大小为1×1，步长为2，填充为0的卷积层，一个通道数为64的批量归一化层，一个复斜率为0.1的LeakyReLU激活函数层。The 8th convolutional block: contains a convolutional layer with 64 input channels, 64 output channels, a convolution kernel size of 1×1, a stride of 2, and a padding of 0, a batch normalization layer with 64 channels, and a LeakyReLU activation function layer with a complex slope of 0.1.

第9个卷积块和第11个卷积块：包含一个输入通道数为64，输出通道数为128，卷积核大小为3×3，步长为1，填充为1的卷积层，一个通道数为128的批量归一化层，一个复斜率为0.1的LeakyReLU激活函数层。The 9th and 11th convolutional blocks contain a convolutional layer with 64 input channels, 128 output channels, a convolution kernel size of 3×3, a stride of 1, and a padding of 1, a batch normalization layer with 128 channels, and a LeakyReLU activation function layer with a complex slope of 0.1.

第10个卷积块：包含一个输入通道数为128，输出通道数为64，卷积核大小为1×1，步长为1，填充为0的卷积层，一个通道数为64的批量归一化层，一个复斜率为0.1的LeakyReLU激活函数层。The 10th convolutional block: contains a convolutional layer with an input channel number of 128, an output channel number of 64, a convolution kernel size of 1×1, a stride of 1, and a padding of 0, a batch normalization layer with a channel number of 64, and a LeakyReLU activation function layer with a complex slope of 0.1.

第12个卷积块：包含一个输入通道数为128，输出通道数为128，卷积核大小为1×1，步长为2，填充为0的卷积层，一个通道数为128的批量归一化层，一个复斜率为0.1的LeakyReLU激活函数层。The 12th convolutional block: contains a convolutional layer with an input channel number of 128, an output channel number of 128, a convolution kernel size of 1×1, a stride of 2, and a padding of 0, a batch normalization layer with a channel number of 128, and a LeakyReLU activation function layer with a complex slope of 0.1.

第13个卷积块和第15个卷积块：包含一个输入通道数为128，输出通道数为256，卷积核大小为3×3，步长为1，填充为1的卷积层，一个通道数为256的批量归一化层，一个复斜率为0.1的LeakyReLU激活函数层。The 13th and 15th convolutional blocks contain a convolutional layer with an input channel number of 128, an output channel number of 256, a convolution kernel size of 3×3, a stride of 1, and a padding of 1, a batch normalization layer with a channel number of 256, and a LeakyReLU activation function layer with a complex slope of 0.1.

第14个卷积块：包含一个输入通道数为256，输出通道数为128，卷积核大小为1×1，步长为1，填充为0的卷积层，一个通道数为128的批量归一化层，一个复斜率为0.1的LeakyReLU激活函数层。The 14th convolutional block: contains a convolutional layer with an input channel number of 256, an output channel number of 128, a convolution kernel size of 1×1, a stride of 1, and a padding of 0, a batch normalization layer with a channel number of 128, and a LeakyReLU activation function layer with a complex slope of 0.1.

第16个卷积块、第19个卷积块、第21个卷积块、第23个卷积块：包含一个输入通道数为256，输出通道数为512，卷积核大小为3×3，步长为1，填充为1的卷积层，一个通道数为512的批量归一化层，一个复斜率为0.1的LeakyReLU激活函数层。The 16th convolution block, the 19th convolution block, the 21st convolution block, and the 23rd convolution block: contain a convolution layer with an input channel number of 256, an output channel number of 512, a convolution kernel size of 3×3, a stride of 1, and a padding of 1, a batch normalization layer with a channel number of 512, and a LeakyReLU activation function layer with a complex slope of 0.1.

第17个卷积块、第20个卷积块、第22个卷积块：包含一个输入通道数为512，输出通道数为256，卷积核大小为1×1，步长为1，填充为0的卷积层，一个通道数为256的批量归一化层，一个复斜率为0.1的LeakyReLU激活函数层。The 17th convolution block, the 20th convolution block, and the 22nd convolution block: contain a convolution layer with an input channel number of 512, an output channel number of 256, a convolution kernel size of 1×1, a stride of 1, and a padding of 0, a batch normalization layer with a channel number of 256, and a LeakyReLU activation function layer with a complex slope of 0.1.

第18个卷积块：包含一个输入通道数为256，输出通道数为256，卷积核大小为1×1，步长为2，填充为0的卷积层，一个通道数为256的批量归一化层，一个复斜率为0.1的LeakyReLU激活函数层。The 18th convolutional block: contains a convolutional layer with an input channel number of 256, an output channel number of 256, a convolution kernel size of 1×1, a stride of 2, and a padding of 0, a batch normalization layer with a channel number of 256, and a LeakyReLU activation function layer with a complex slope of 0.1.

第24个卷积块、第27个卷积块、第29个卷积块、第31个卷积块：包含一个输入通道数为512，输出通道数为1024，卷积核大小为3×3，步长为1，填充为1的卷积层，一个通道数为1024的批量归一化层，一个复斜率为0.1的LeakyReLU激活函数层。The 24th convolution block, the 27th convolution block, the 29th convolution block, and the 31st convolution block: contain a convolution layer with an input channel number of 512, an output channel number of 1024, a convolution kernel size of 3×3, a stride of 1, and a padding of 1, a batch normalization layer with a channel number of 1024, and a LeakyReLU activation function layer with a complex slope of 0.1.

第25个卷积块、第28个卷积块、第30个卷积块：均包含一个输入通道数为1024，输出通道数为512，卷积核大小为1×1，步长为1，填充为0的卷积层，一个通道数为512的批量归一化层，一个复斜率为0.1的LeakyReLU激活函数层。The 25th convolution block, the 28th convolution block, and the 30th convolution block all contain a convolution layer with an input channel number of 1024, an output channel number of 512, a convolution kernel size of 1×1, a stride of 1, and a padding of 0, a batch normalization layer with a channel number of 512, and a LeakyReLU activation function layer with a complex slope of 0.1.

第26个卷积块：包含一个输入通道数为512，输出通道数为512，卷积核大小为1×1，步长为2，填充为0的卷积层，一个通道数为512的批量归一化层，一个复斜率为0.1的LeakyReLU激活函数层。The 26th convolutional block: contains a convolutional layer with an input channel number of 512, an output channel number of 512, a convolution kernel size of 1×1, a stride of 2, and a padding of 0, a batch normalization layer with a channel number of 512, and a LeakyReLU activation function layer with a complex slope of 0.1.

第1个卷积块->第2个卷积块->第3个卷积块->第4个卷积块->第5个卷积块->第6个卷积块->第7个卷积块->第8个卷积块->第9个卷积块->第10个卷积块->第11个卷积块->第12个卷积块->第13个卷积块->第14个卷积块->第15个卷积块->第16个卷积块->第17个卷积块->第18个卷积块->第19个卷积块->第20个卷积块->第21个卷积块->第22个卷积块->第23个卷积块->第24个卷积块->第25个卷积块->第26个卷积块->第27个卷积块->第28个卷积块->第29个卷积块->第30个卷积块->第31个卷积块。1st convolution block -> 2nd convolution block -> 3rd convolution block -> 4th convolution block -> 5th convolution block -> 6th convolution block -> 7th convolution block -> 8th convolution block -> 9th convolution block -> 10th convolution block -> 11th convolution block -> 12th convolution block -> 13th convolution block -> 14th convolution block -> 15th convolution block -> 16th convolution block -> 17th convolution block -> 18th convolution block -> 19th convolution block -> 20th convolution block -> 21st convolution block -> 22nd convolution block -> 23rd convolution block -> 24th convolution block -> 25th convolution block -> 26th convolution block -> 27th convolution block -> 28th convolution block -> 29th convolution block -> 30th convolution block -> 31st convolution block.

步骤3：构建新的网络模型Conv31-YOLOv7。Step 3: Build a new network model Conv31-YOLOv7.

参照图2，将新的主干特征提取网络与YOLOv7的预测网络按如下结构关系连接，构成新的网络模型Conv31-YOLOv7：Referring to Figure 2, the new backbone feature extraction network is connected to the prediction network of YOLOv7 according to the following structural relationship to form a new network model Conv31-YOLOv7:

将新的主干特征提取网络Conv31中的第16个卷积块与YOLOv7预测网络的第1个预测支路连接。Connect the 16th convolutional block in the new backbone feature extraction network Conv31 to the first prediction branch of the YOLOv7 prediction network.

将新的主干特征提取网络Conv31中的第24个卷积块与YOLOv7预测网络的第2个预测支路连接。Connect the 24th convolutional block in the new backbone feature extraction network Conv31 to the 2nd prediction branch of the YOLOv7 prediction network.

步骤4：对新的网络模型Conv31-YOLOv7进行训练。Step 4: Train the new network model Conv31-YOLOv7.

(4.1)设置训练参数：训练轮数为200，一次训练所选取的红外车辆图像数量设置为16，学习率设置为0.001，置信度阈值和IOU忽略阈值均设置为0.5。(4.1) Set the training parameters: the number of training rounds is 200, the number of infrared vehicle images selected for one training is set to 16, the learning rate is set to 0.001, and the confidence threshold and IOU ignore threshold are both set to 0.5.

(4.2)将训练集中的9000张红外车辆图像按每次16张输入到模型Conv31-YOLOv7中，每次输出得到目标边界框相对于标注框的偏移值(t_x,t_y,t_w,t_h)和目标置信度p，其中t_x是目标边界框相对于标注框在x方向的偏移值，t_y是目标边界框相对于标注框在y方向的偏移值，t_w是目标边界框相对于标注框宽的偏移值，t_h是目标边界框相对于标注框高的偏移值。(4.2) The 9000 infrared vehicle images in the training set are input into the Conv31-YOLOv7 model 16 at a time. Each time the output is the offset value ( _tx , _ty , _tw , _th ) of the target bounding box relative to the labeled box and the target confidence p, where _tx is the offset value of the target bounding box relative to the labeled box in the x direction, _ty is the offset value of the target bounding box relative to the labeled box in the y direction, _tw is the offset value of the target bounding box relative to the width of the labeled box, and _th is the offset value of the target bounding box relative to the height of the labeled box.

其中，b_x，b_y为预测框的位置，c_x，c_y为标注框的位置，b_w，b_h为预测框的宽和高，p_w，p_h为标注框的宽和高。Among them, _bx , _by are the positions of the predicted box, _cx , _cy are the positions of the labeled box, _bw , _bh are the width and height of the predicted box, and _pw , _ph are the width and height of the labeled box.

(4.4)将预测框的位置、宽高和目标的置信度(b_x,b_y,b_w,b_h,p)与标注框的位置、宽高和目标的置信度代入损失函数中计算损失值，并使用小批次随机梯度下降算法更新其权重。(4.4) Substitute the position, width, height and confidence of the predicted box ( _bx , _by , _bw , _bh , p) and the position, width, height and confidence of the labeled box into the loss function to calculate the loss value, and use the mini-batch stochastic gradient descent algorithm to update its weights.

步骤5：利用训练好的模型进行红外车辆检测。Step 5: Use the trained model to perform infrared vehicle detection.

本发明的效果通过以下仿真实验和实测数据进一步说明：The effect of the present invention is further illustrated by the following simulation experiments and measured data:

一、仿真、实测环境1. Simulation and test environment

本发明仿真和实测使用Windows 10操作系统，使用一块NVIDIA GeForce GTX2060GPU加速，使用的深度学习框架为pytorch 1.8.1。The simulation and actual measurement of the present invention use the Windows 10 operating system, an NVIDIA GeForce GTX2060 GPU for acceleration, and the deep learning framework used is pytorch 1.8.1.

二、仿真内容2. Simulation Content

仿真1：使用与本发明相同的训练集和参数训练其他基于卷积神经网络的目标检测模型，得到各自训练好的红外车辆检测模型。Simulation 1: Use the same training set and parameters as the present invention to train other target detection models based on convolutional neural networks to obtain respective trained infrared vehicle detection models.

将1000张测试集图像按每次1张送入至已经训练好的本发明模型中进行测试，得到本发明红外车辆检测的IOU阈值为0.5的准确率和速度。1000 test set images are sent one at a time to the trained model of the present invention for testing, and the accuracy and speed of the infrared vehicle detection of the present invention with an IOU threshold of 0.5 are obtained.

使用与本发明相同的测试集，测试得到其他方法的红外车辆检测的IOU阈值为0.5的准确率和速度。Using the same test set as the present invention, the accuracy and speed of infrared vehicle detection of other methods are tested with an IOU threshold of 0.5.

将本发明方法与基于YOLOv7的红外车辆检测方法进行仿真实验对比，对比结果如表1所示：The method of the present invention is compared with the infrared vehicle detection method based on YOLOv7 through simulation experiments. The comparison results are shown in Table 1:

表1Table 1

将本发明与YOLOv7对比发现，本发明方法每秒能检测31张图像，相比于YOLOv7的每秒33张图像的检测速度略有降低。本发明方法对红外车辆检测的IOU阈值为0.5的准确率为94.36％，YOLOv7对红外车辆目标检测的准确率为91.89％，本发明方法在检测准确率上相对于YOLOv7显著提高，保证了较高的检测准确率。Comparing the present invention with YOLOv7, it is found that the present invention method can detect 31 images per second, which is slightly lower than the detection speed of 33 images per second of YOLOv7. The accuracy of the present invention method for infrared vehicle detection with an IOU threshold of 0.5 is 94.36%, and the accuracy of YOLOv7 for infrared vehicle target detection is 91.89%. The detection accuracy of the present invention method is significantly improved compared with YOLOv7, ensuring a higher detection accuracy.

三、实测内容3. Test content

用红外热成像设备实时采集交通道路上的红外车辆视频，并将其按帧送入到本发明已经训练好的红外车辆检测模型，得到车辆的实时位置信息和置信度，如图3所示。Infrared thermal imaging equipment is used to collect infrared vehicle videos on traffic roads in real time, and the videos are sent frame by frame to the infrared vehicle detection model that has been trained by the present invention to obtain the real-time position information and confidence of the vehicle, as shown in FIG3 .

图3中大矩形框表示包围出红外图像中车辆的预测框，大矩形框上方的小矩形框显示车辆目标的置信度。The large rectangular box in Figure 3 represents the predicted box surrounding the vehicle in the infrared image, and the small rectangular box above the large rectangular box shows the confidence of the vehicle target.

以上所述的具体实施方式仅是对本发明的优选方式进行描述，并非对本发明的范围进行限定，在不脱离本发明设计精神的前提下，本领域普通技术人员对本发明的技术方案做出的各种变形和改进，均应落入本发明权利要求书确定的保护范围内。The specific implementation methods described above are only descriptions of the preferred methods of the present invention, and are not intended to limit the scope of the present invention. Without departing from the design spirit of the present invention, various modifications and improvements made to the technical solutions of the present invention by ordinary technicians in this field should all fall within the protection scope determined by the claims of the present invention.

Claims

1. An infrared vehicle detection method based on an improved YOLOv7 algorithm, characterized in that it comprises the following steps;

Step 1: Collect vehicle videos on traffic roads, perform frame extraction and image preprocessing, and obtain infrared vehicle image datasets;

Step 2: Improve the backbone feature extraction network of the YOLOv7 algorithm, that is, discard the backbone feature extraction network in the YOLOv7 algorithm, build a new backbone feature extraction network Conv31 containing 31 convolutional blocks, and replace the backbone feature extraction network in the YOLOv7 algorithm;

Step 3: Connect the new backbone feature extraction network with the original YOLOv7 prediction network to form a new network model Conv31-YOLOv7;

Step 4: Send the training data set obtained in step 1 to the network model Conv31-YOLOv7 in step 3, and use the small batch stochastic gradient descent algorithm for training to obtain a trained infrared vehicle detection model;

Step 5: The infrared vehicle video on the traffic road collected in real time by the infrared thermal imaging device is sent frame by frame to the trained infrared vehicle detection model to obtain the real-time position information, scale information and confidence of the vehicle.

2. An infrared vehicle detection method based on an improved YOLOv7 algorithm according to claim 1, characterized in that the steps of frame extraction and image preprocessing in step 1 are specifically:

(1.1) Collect infrared vehicle videos on traffic roads, read the first 10,000 frames of the video, set the resolution of the output image to 640×640, output each frame in sequence in image format, obtain 10,000 infrared vehicle images, and annotate the location information of the vehicle targets in the obtained infrared vehicle images to create an infrared vehicle image dataset. The dataset contains 10,000 infrared vehicle images with a resolution of 640×640;

(1.2) The infrared vehicle image dataset is divided into a training dataset and a test dataset in a ratio of 9:1. That is, 9000 infrared images are randomly selected from the dataset to form the training set, and the remaining 1000 infrared images form the test set.

3. According to claim 1, a method for infrared vehicle detection based on an improved YOLOv7 algorithm, characterized in that step 2 is specifically:

(2.1) Abandon the backbone feature extraction network in the YOLOv7 algorithm and construct a new backbone feature extraction network Conv31 to replace the backbone feature extraction network in the YOLOv7 algorithm. Construct a new backbone feature extraction network Conv31, which contains 31 convolution blocks, and each convolution block has the following structure:

The first convolutional block: contains a convolutional layer with an input channel of 1, an output channel of 32, a kernel size of 3×3, a stride of 1, and a padding of 1, a batch normalization layer with a channel number of 32, and a LeakyReLU activation function layer with a complex slope of 0.1;

The second convolutional block: contains a convolutional layer with 32 input channels, 16 output channels, a kernel size of 1×1, a stride of 1, and a padding of 0, a batch normalization layer with 16 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;

The third convolutional block: contains a convolutional layer with 16 input channels, 32 output channels, a kernel size of 3×3, a stride of 1, and a padding of 1, a batch normalization layer with 32 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;

The fourth convolutional block: contains a convolutional layer with 32 input channels, 32 output channels, a kernel size of 1×1, a stride of 2, and a padding of 0, a batch normalization layer with 32 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;

The fifth and seventh convolution blocks: contain a convolutional layer with 32 input channels, 64 output channels, a kernel size of 3×3, a stride of 1, and a padding of 1, a batch normalization layer with 64 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;

The sixth convolutional block: contains a convolutional layer with 64 input channels, 32 output channels, a kernel size of 1×1, a stride of 1, and a padding of 0, a batch normalization layer with 32 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;

The 8th convolutional block: contains a convolutional layer with 64 input channels, 64 output channels, a kernel size of 1×1, a stride of 2, and a padding of 0, a batch normalization layer with 64 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;

The 9th and 11th convolutional blocks: contain a convolutional layer with 64 input channels, 128 output channels, a convolution kernel size of 3×3, a stride of 1, and a padding of 1, a batch normalization layer with 128 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;

The 10th convolutional block: contains a convolutional layer with 128 input channels, 64 output channels, a kernel size of 1×1, a stride of 1, and a padding of 0, a batch normalization layer with 64 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;

The 12th convolutional block: contains a convolutional layer with 128 input channels, 128 output channels, a kernel size of 1×1, a stride of 2, and a padding of 0, a batch normalization layer with 128 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;

The 13th and 15th convolutional blocks: contain a convolutional layer with 128 input channels, 256 output channels, a kernel size of 3×3, a stride of 1, and a padding of 1, a batch normalization layer with 256 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;

The 14th convolutional block: contains a convolutional layer with 256 input channels, 128 output channels, a kernel size of 1×1, a stride of 1, and a padding of 0, a batch normalization layer with 128 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;

The 16th convolution block, the 19th convolution block, the 21st convolution block, and the 23rd convolution block: contain a convolution layer with an input channel number of 256, an output channel number of 512, a convolution kernel size of 3×3, a stride of 1, and a padding of 1, a batch normalization layer with a channel number of 512, and a LeakyReLU activation function layer with a complex slope of 0.1;

The 17th convolutional block, the 20th convolutional block, and the 22nd convolutional block: contain a convolutional layer with an input channel number of 512, an output channel number of 256, a convolution kernel size of 1×1, a stride of 1, and a padding of 0, a batch normalization layer with a channel number of 256, and a LeakyReLU activation function layer with a complex slope of 0.1;

The 18th convolutional block: contains a convolutional layer with 256 input channels, 256 output channels, a kernel size of 1×1, a stride of 2, and a padding of 0, a batch normalization layer with 256 channels, and a LeakyReLU activation function layer with a complex slope of 0.1;

The 24th convolution block, the 27th convolution block, the 29th convolution block, and the 31st convolution block: contain a convolution layer with an input channel number of 512, an output channel number of 1024, a convolution kernel size of 3×3, a stride of 1, and a padding of 1, a batch normalization layer with a channel number of 1024, and a LeakyReLU activation function layer with a complex slope of 0.1;

The 25th, 28th, and 30th convolution blocks all contain a convolution layer with an input channel number of 1024, an output channel number of 512, a convolution kernel size of 1×1, a stride of 1, and a padding of 0, a batch normalization layer with a channel number of 512, and a LeakyReLU activation function layer with a complex slope of 0.1;

The 26th convolutional block: contains a convolutional layer with an input channel number of 512, an output channel number of 512, a convolution kernel size of 1×1, a stride of 2, and a padding of 0, a batch normalization layer with a channel number of 512, and a LeakyReLU activation function layer with a complex slope of 0.1;

(2.2) Connect the above 31 convolution blocks in sequence to obtain a new backbone feature extraction network Conv31 with the following structure:

1st convolution block -> 2nd convolution block -> 3rd convolution block -> 4th convolution block -> 5th convolution block -> 6th convolution block -> 7th convolution block -> 8th convolution block -> 9th convolution block -> 10th convolution block -> 11th convolution block -> 12th convolution block -> 13th convolution block -> 14th convolution block -> 15th convolution block -> 16th convolution block -> 17th convolution block -> 18th convolution block -> 19th convolution block -> 20th convolution block -> 21st convolution block -> 22nd convolution block -> 23rd convolution block -> 24th convolution block -> 25th convolution block -> 26th convolution block -> 27th convolution block -> 28th convolution block -> 29th convolution block -> 30th convolution block -> 31st convolution block;

(2.3) Replace the backbone feature extraction network in the YOLOv7 algorithm with Conv31.

4. According to a method for infrared vehicle detection based on an improved YOLOv7 algorithm according to claim 1, it is characterized in that the step 3 is specifically:

Connect the 16th convolutional block in the new backbone feature extraction network Conv31 obtained in step 2 to the first prediction branch of the YOLOv7 prediction network;

Connect the 24th convolutional block in the new backbone feature extraction network Conv31 to the second prediction branch of the YOLOv7 prediction network;

Connect the 31st convolutional block in the new backbone feature extraction network Conv31 to the 3rd prediction branch of the YOLOv7 prediction network.

5. An infrared vehicle detection method based on an improved YOLOv7 algorithm according to claim 4, characterized in that the connection relationship of the internal modules of the YOLOv7 prediction network is:

The connection relationship between the modules of the first prediction branch is as follows:

16th convolution block->branch convolution block 1->Multi_Concat_Block1->RepConv1->Detection head 1;

The connection relationship between the modules of the second prediction branch is as follows:

24th convolution block->branch convolution block 2->Multi_Concat_Block2->Multi_Concat_Block3->RepConv2->Detection head 2;

The connection relationship between the modules of the third prediction branch is as follows:

The 31st convolution block->Multi_Concat_Block4->RepConv3->Detection head 3;

The connection relationship between the prediction branch modules is as follows:

31st convolution block->Upsampling convolution block 1->Upsampling layer 1->Multi_Concat_Block2->Upsampling convolution block 2->Upsampling layer 2->Multi_Concat_Block1;

Multi_Concat_Block1->TransitionBlock1->Multi_Concat_Block2->TransitionBlock2->Multi_Concat_Block4.

6. According to claim 1, a method for infrared vehicle detection based on an improved YOLOv7 algorithm, characterized in that step 4 is specifically:

(4.1) Set the training parameters: the number of training rounds is 200, the number of infrared vehicle images selected for one training is set to 16, the learning rate is set to 0.001, and the confidence threshold and IOU ignore threshold are both set to 0.5;

(4.2) Input 16 images of 9000 infrared vehicle images in the training set into the Conv31-YOLOv7 model at a time. Each time, the offset value ( _tx , _ty , _tw , _th ) of the target bounding box relative to the labeled box and the target confidence p are output, where _tx is the offset value of the target bounding box relative to the labeled box in the x direction, _ty is the offset value of the target bounding box relative to the labeled box in the y direction, _tw is the offset value of the target bounding box relative to the width of the labeled box, and _th is the offset value of the target bounding box relative to the height of the labeled box;

(4.3) The offset values ( _tx , _ty , _tw , _th ) are calculated using the following coordinate offset formula to obtain the position, width and height of the predicted box:

Among them, _bx _, by is the position of the prediction box, _cx , _cy is the position of the annotation box, _bw , _bh are the width and height of the prediction box, _pw , _ph are the width and height of the annotation box;

(4.4) Substitute the predicted box position, width, height and target confidence ( _bx , _by , _bw , _bh , p) and the labeled box position, width, height and target confidence into the loss function to calculate the loss value, and use the mini-batch stochastic gradient descent algorithm to update its weights;

(4.5) Repeat (4.2)-(4.4) until the loss value stabilizes and no longer decreases, then stop training to obtain a trained infrared vehicle detection model.