CN115690704A - LG-CenterNet model-based complex road scene target detection method and device - Google Patents

LG-CenterNet model-based complex road scene target detection method and device Download PDF

Info

Publication number
CN115690704A
CN115690704A CN202211179337.4A CN202211179337A CN115690704A CN 115690704 A CN115690704 A CN 115690704A CN 202211179337 A CN202211179337 A CN 202211179337A CN 115690704 A CN115690704 A CN 115690704A
Authority
CN
China
Prior art keywords
module
model
feature
target
road scene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211179337.4A
Other languages
Chinese (zh)
Other versions
CN115690704B (en
Inventor
高尚兵
李�杰
胡序洋
李少凡
刘宇
余骥远
陈浩霖
于永涛
张海艳
陈晓兵
李翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huaiyin Institute of Technology
Original Assignee
Huaiyin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huaiyin Institute of Technology filed Critical Huaiyin Institute of Technology
Priority to CN202211179337.4A priority Critical patent/CN115690704B/en
Publication of CN115690704A publication Critical patent/CN115690704A/en
Application granted granted Critical
Publication of CN115690704B publication Critical patent/CN115690704B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a complex road scene target detection method and a device based on an LG-CenterNet model, which are used for collecting an original road image data set to prepare a data set, extracting a feature pair by constructing the LG-CenterNet network model and using ResNet50 as a backsbone of the model, and guiding the features of different levels while improving the sense of the feature pattern of a main network by adopting a level guiding attention mechanism; inputting the feature graph processed by the level guide mechanism into a scaleseEncoder module for processing; performing characteristic pixel reduction by adopting a deconvolution module; a new feature enhancement module is adopted for the restored features to solve the problem of feature information loss in the pixel restoration process; and finally, inputting the enhanced feature map into a Center points prediction module for road target class identification and position positioning. The average recognition precision of the self-built complex road scene data set is 86.93%, the road scene target image detection speed reaches 50 frames/s, and the requirements on accurate detection and real-time detection of a road scene can be met.

Description

基于LG-CenterNet模型的复杂道路场景目标检测方法及装置Object detection method and device for complex road scenes based on LG-CenterNet model

技术领域technical field

本发明属于语义分割、图像处理及智能驾驶领域,具体涉及一种基于LG-CenterNet模型的复杂道路场景目标检测方法及装置。The invention belongs to the fields of semantic segmentation, image processing and intelligent driving, and in particular relates to a complex road scene object detection method and device based on an LG-CenterNet model.

背景技术Background technique

近些年汽车数目稳健上升导致交通事故频发,这严重威胁到了人民的生命安全。如今,随着自动驾驶技术的发展,研究人员也从汽车被动安全技术研究转向了汽车主动安全技术研究。实现汽车的自动化必须使用一些先进的技术手段才能完成部分的汽车驾驶任务。采用深度学习方法对道路场景目标进行智能检测是解决汽车主动安全技术的关键。现阶段的目标检测网络主要是通过主干网络进行特征提取,但是在对底层的多尺度问题没有进行过多的考虑,可能会导致多尺度目标检测能力不足的情况。In recent years, the steady increase in the number of cars has led to frequent traffic accidents, which seriously threaten the safety of people's lives. Nowadays, with the development of autonomous driving technology, researchers have also shifted from research on passive safety technology to active safety technology. To realize the automation of the car, some advanced technical means must be used to complete part of the car driving task. Using deep learning method to intelligently detect road scene objects is the key to solve the vehicle active safety technology. The target detection network at this stage mainly uses the backbone network for feature extraction, but does not give too much consideration to the underlying multi-scale problems, which may lead to insufficient multi-scale target detection capabilities.

发明内容Contents of the invention

发明目的:针对现阶段复杂道路场景目标检测应用效果不佳,常规的检测方法不能满足实际道路环境的检测要求,提供一种基于LG-CenterNet模型的复杂道路场景目标检测方法及装置。Purpose of the invention: To provide a complex road scene target detection method and device based on the LG-CenterNet model in view of the poor application effect of complex road scene target detection at the present stage, and conventional detection methods that cannot meet the detection requirements of the actual road environment.

技术方案:本发明提出一种基于LG-CenterNet模型的复杂道路场景目标检测方法,具体包括以下步骤:Technical solution: The present invention proposes a complex road scene target detection method based on the LG-CenterNet model, which specifically includes the following steps:

(1)对复杂道路场景的图像进行处理,获取到含有多种类别的道路目标图像,对图像中的道路目标进行类别和位置标记,构建出复杂的道路场景数据集并进行预处理;(1) Process the images of complex road scenes, obtain road target images containing multiple categories, mark the categories and positions of the road targets in the images, construct complex road scene datasets and perform preprocessing;

(2)构建目标检测LG-CenterNet模型,并将上述的道路目标数据集通过LG-CenterNet模型进行训练得到模型S;所述LG-CenterNet模型包括Backbone模块、层级引导注意力模块、Scales Encoder模块、反卷积模块、特征增强模块和Centerpoints预测模块;(2) build target detection LG-CenterNet model, and above-mentioned road target data set is trained to obtain model S by LG-CenterNet model; Described LG-CenterNet model comprises Backbone module, hierarchical guidance attention module, Scales Encoder module, Deconvolution module, feature enhancement module and Centerpoints prediction module;

(3)使用训练好的模型S对复杂道路目标通过Center points预测模块以热力图的形式进行目标定位、边框大小划分和类别预测,并将得到的结果在视频或者图像上进行显示输入相应的效果。(3) Use the trained model S to perform target positioning, frame size division and category prediction in the form of heat maps for complex road targets through the Center points prediction module, and display the obtained results on videos or images and input corresponding effects .

进一步地,步骤(1)所述的对道路场景数据集预处理是通过将像素不一和复杂道路场景的图像进行归一化处理,将图像的大小归一化为512×512像素大小,再通过批标准化、ReLU激活函数和最大池化操作得到分布均匀的特征目标样本。Further, the preprocessing of the road scene data set described in step (1) is to normalize the image size of the image to 512×512 pixel size by normalizing the images of different pixels and complex road scenes, and then Uniformly distributed feature target samples are obtained through batch normalization, ReLU activation function, and maximum pooling operations.

进一步地,所述步骤(2)实现过程如下:Further, the implementation process of the step (2) is as follows:

(21)LG-CenterNet模型中提出新的MresneIt50作为Backbone模块,MresneIt50由多个残差块组成,其中将4个残差模块提取到的特征图记为E1,通道数为512;将6个残差块提取到的特征图记为E2,通道数为1024;将3个通道数提取到的特征图记为E3,通道数为2048;(21) In the LG-CenterNet model, a new MresneIt50 is proposed as the Backbone module. MresneIt50 is composed of multiple residual blocks. The feature map extracted by the 4 residual modules is marked as E1, and the number of channels is 512; the 6 residual blocks are The feature map extracted from the difference block is marked as E2, and the number of channels is 1024; the feature map extracted from 3 channels is marked as E3, and the number of channels is 2048;

(22)将Backbone提取到的特征图E1、E2、E3输入到层级引导注意力模块中,其主要的结构包括两个分支:全局池化分支和层级引导分支,将通道数为512的特征图E1输入到全局池化分支,通过全局最大池化层和上采样层操作获得EC1;将通道数为512、1024、2048的特征图E1、E2、E3输入到层级引导分支中,通过一系列的平均池化和卷积操作并配合上采样得到EC2;将EC1和EC2使用add进行特征联合获得EC3,从而减少计算参数;(22) Input the feature maps E1, E2, and E3 extracted by Backbone into the hierarchical guided attention module. Its main structure includes two branches: the global pooling branch and the hierarchical guided branch, and the feature map with 512 channels E1 is input to the global pooling branch, and EC1 is obtained through the operation of the global maximum pooling layer and the up-sampling layer; the feature maps E1, E2, and E3 with the number of channels of 512, 1024, and 2048 are input into the hierarchical guidance branch, through a series of The average pooling and convolution operations are combined with upsampling to obtain EC2; EC1 and EC2 are combined using add to obtain EC3, thereby reducing calculation parameters;

(23)将提取到的EC3输入到Scales Encoder模块,进行一系列的卷积和残差模块运算后得到EC4;(23) Input the extracted EC3 to the Scales Encoder module, and perform a series of convolution and residual module operations to obtain EC4;

(24)将提取到的EC4输入到反卷积模块,反卷积模块由3个deconv组组成,通过每次deconv组的卷积运算将特征图尺寸不断放大,同时通道数不断降低,得到尺度为128×128×64的特征图记为EC5;(24) Input the extracted EC4 to the deconvolution module. The deconvolution module is composed of 3 deconv groups. The size of the feature map is continuously enlarged through the convolution operation of each deconv group, and the number of channels is continuously reduced to obtain the scale. The feature map of 128×128×64 is marked as EC5;

(25)将特征图EC5输入到特征增强模块进行卷积运算得到尺度大小为128×128×64特征图EC6,P-FEM由3×3的Poly-Scale Convolution、批标准化、ReLU激活函数和Sigmoid激活函数构成,主要是为了提高特征图中的局部信息的相关性,增强其对特征的表达能力。(25) Input the feature map EC5 to the feature enhancement module for convolution operation to obtain a feature map EC6 with a scale size of 128×128×64. P-FEM consists of 3×3 Poly-Scale Convolution, batch normalization, ReLU activation function and Sigmoid The composition of the activation function is mainly to improve the correlation of local information in the feature map and enhance its ability to express features.

进一步地,所述步骤(3)实现过程如下:Further, the implementation process of the step (3) is as follows:

Centerpoints预测模块通过对训练好的模型S对输入的图片进行分类预测,将原始图像生成尺度与EC6大小一致的heatmap图,随后通过分别计算热力图的损失值记为Lh,目标长宽的损失值记为Ls和中心点偏移量的损失值记为Lf来确定目标的位置和大小并生成最后的分类定位的heatmap;其中总体的网络损失为:The Centerpoints prediction module classifies and predicts the input images through the trained model S, generates a heatmap image with the same scale as EC6 from the original image, and then calculates the loss value of the heat map separately as L h , the loss of the target length and width The value is recorded as L s and the loss value of the center point offset is recorded as L f to determine the position and size of the target and generate the final heatmap for classification and positioning; the overall network loss is:

Ld=LksLsfLf L d = L ks L sf L f

其中λs=0.1,λf=1;对于输入图片大小为512×215的图像来说其通过该网络生成的特征图为H×W×C,则Lk、Ls和Lf计算公式分别为:Where λ s =0.1, λ f =1; for an image with an input image size of 512×215, the feature map generated by the network is H×W×C, then the calculation formulas of L k , L s and L f are respectively for:

Figure BDA0003865960710000031
Figure BDA0003865960710000031

Figure BDA0003865960710000032
Figure BDA0003865960710000032

Figure BDA0003865960710000033
Figure BDA0003865960710000033

其中,AHWC为图像中目标标注的真实值,A'HWC为图像的预测值,α和β分别为2和4,N为图像中关键点的个数,s'pk为预测尺寸,sk为真实尺寸,p为图像中目标的中心点位置。Among them, A HWC is the actual value of the target label in the image, A' HWC is the predicted value of the image, α and β are 2 and 4 respectively, N is the number of key points in the image, s' pk is the predicted size, s k is the real size, and p is the center point position of the target in the image.

基于相同的发明构思,本发明还提供一种基于LG-CenterNet模型的复杂道路场景目标检测装置,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述计算机程序被加载至处理器时实现上述的基于LG-CenterNet模型的复杂道路场景目标检测方法。Based on the same inventive concept, the present invention also provides a complex road scene object detection device based on the LG-CenterNet model, comprising a memory, a processor and a computer program stored on the memory and operable on the processor, the computer program When loaded into the processor, the above-mentioned complex road scene target detection method based on the LG-CenterNet model is realized.

有益效果:与现有技术相比,本发明的有益效果:1、通过改进LG-CenterNet模型的主干网络,提出MresneIt50加强特征提取效果;2、提出一种层级引导注意力模块对主干网络提取到的特征图的进行特征融合;3、提出新的Scales Encoder模块和特征增强模块注重对局部特征的提取,避免在反卷积模块中出现的特征丢失问题;4、改进后的LG-CenterNet目标检测模型对比原来的CenterNet框架的平均精度mAP(meanAveragePrecision)提升了5个百分点;5、本发明在应对复杂道路场景也有较高的检测精度。Beneficial effect: compared with prior art, the beneficial effect of the present invention: 1, by improving the backbone network of LG-CenterNet model, propose MresneIt50 to strengthen feature extraction effect; 3. A new Scales Encoder module and a feature enhancement module are proposed to focus on the extraction of local features to avoid the problem of feature loss in the deconvolution module; 4. The improved LG-CenterNet target detection Compared with the average precision mAP (meanAveragePrecision) of the original CenterNet framework, the model has improved by 5 percentage points; 5. The present invention also has higher detection precision in dealing with complex road scenes.

附图说明Description of drawings

图1是基于LG-CenterNet模型的复杂道路场景目标检测方法的流程图;Figure 1 is a flowchart of a complex road scene target detection method based on the LG-CenterNet model;

图2是本发明提出的基于LG-CenterNet目标检测模型示意图;Fig. 2 is a schematic diagram based on the LG-CenterNet target detection model proposed by the present invention;

图3是本发明提出的残差块结构Mblock结构示意图;Fig. 3 is a structural schematic diagram of the residual block structure Mblock proposed by the present invention;

图4是层级引导注意力模型结构示意图;Figure 4 is a schematic diagram of the structure of the hierarchical guided attention model;

图5是Scales Encoder模块结构示意图;Figure 5 is a schematic diagram of the Scales Encoder module structure;

图6是特征增强模块结构示意图;Fig. 6 is a schematic structural diagram of a feature enhancement module;

图7是采用LG-CenterNet目标检测模型后得到的检测效果图。Figure 7 is a detection effect diagram obtained after using the LG-CenterNet target detection model.

具体实施方式Detailed ways

下面结合附图对本发明作进一步详细说明。The present invention will be described in further detail below in conjunction with the accompanying drawings.

本实施方式中涉及大量变量,现将各变量作如下说明。如表1所示。A large number of variables are involved in this embodiment, and each variable is described as follows. As shown in Table 1.

表1变量说明表Table 1 variable description table

变量variable 变量说明variable specification SS 3×3,通道数为1024的卷积核3×3 convolution kernel with 1024 channels E1E1 Backbone模块中4个残差块提取到的特征图Feature map extracted from 4 residual blocks in the Backbone module E2E2 Backbone模块中6个残差块提取到的特征图The feature map extracted from the 6 residual blocks in the Backbone module E3E3 Backbone模块中3个残差块提取到的特征图Feature map extracted from 3 residual blocks in the Backbone module EC1EC1 E1经过全局池化分支得到的特征图The feature map obtained by E1 through the global pooling branch EC2EC2 E1、E2、E3经由层级引导分支得到的特征图Feature maps obtained by E1, E2, and E3 via hierarchical guidance branches EC3EC3 特征图EC2经由ScalesEncoder模块处理后的特征图The feature map of EC2 processed by the ScalesEncoder module EC4EC4 特征图EC3经由ScalesEncoder模块处理后的特征图The feature map of EC3 processed by the ScalesEncoder module EC5EC5 特征图EC4经由反卷积模块处理后的特征图The feature map of EC4 processed by the deconvolution module EC6EC6 特征图EC5经由特征增强模块处理后的特征图The feature map of EC5 processed by the feature enhancement module

本发明提供一种基于LG-CenterNet模型的复杂道路场景目标检测方法,通过采集道路场景不同目标图像并进行标记制作成复杂道路场景数据集,利用提出的MresneIt50作为主干网络进行特征提取,对于主干网络中提取到的不同尺度的特征图输入到层级引导注意力模块当中,随后通过Scales Encoder模块得到多个感受野特征,之后利用反卷积模块进行特征像素还原,利用Poly-Scale Convolution(简称PSConv)构建特征增强模块提高局部特征的信息相关性。最终利用Center points预测模块对目标的中心点位置、预测框的尺度大小和中心点的偏移进行预测,同时识别出目标类别。如图1所示,具体包括以下步骤:The present invention provides a complex road scene target detection method based on the LG-CenterNet model. By collecting different target images of the road scene and marking them into complex road scene data sets, the proposed MresneIt50 is used as the backbone network for feature extraction. For the backbone network The feature maps of different scales extracted in are input into the hierarchical guidance attention module, and then multiple receptive field features are obtained through the Scales Encoder module, and then the feature pixels are restored by the deconvolution module, and the Poly-Scale Convolution (PSConv for short) is used. Build a feature enhancement module to improve the information relevance of local features. Finally, the center points prediction module is used to predict the position of the center point of the target, the scale size of the prediction frame and the offset of the center point, and at the same time identify the target category. As shown in Figure 1, it specifically includes the following steps:

步骤1:对复杂道路场景的图像进行处理,获取到含有多种类别的道路目标图像进行预处理,并对图像中的道路目标进行类别和位置标记,构建出复杂的道路场景数据集。Step 1: Process the image of the complex road scene, obtain the road target image containing multiple categories for preprocessing, and mark the category and position of the road target in the image to construct a complex road scene dataset.

对道路场景数据集预处理主要是通过将像素不一和复杂道路场景的图像进行归一化处理,将图像的大小归一化为512×512像素大小,再通过批标准化(BatchNormalizaition)、ReLU激活函数和最大池化操作得到目标样本处于在图像分布较为均匀。The preprocessing of the road scene dataset is mainly by normalizing the images of different pixels and complex road scenes, normalizing the size of the image to 512×512 pixel size, and then through batch normalization (BatchNormalizaition), ReLU activation The function and the maximum pooling operation get the target samples in a relatively uniform distribution in the image.

步骤2、构建目标检测LG-CenterNet模型,LG-CenterNet模型结构如图2所示,并将上述的道路目标数据集通过LG-CenterNet模型进行训练得到模型S,其中LG-CenterNet网络主要包括Backbone模块、层级引导注意力模块(Levels guide attention,简称LGA)、Scales Encoder模块、反卷积模块、特征增强模块(P-Feature enhancement module,P-FEM)和Centerpoints预测模块。Step 2. Build the target detection LG-CenterNet model. The structure of the LG-CenterNet model is shown in Figure 2, and the above-mentioned road target data set is trained through the LG-CenterNet model to obtain the model S. The LG-CenterNet network mainly includes the Backbone module , Levels guide attention module (Levels guide attention, LGA for short), Scales Encoder module, deconvolution module, feature enhancement module (P-Feature enhancement module, P-FEM) and Centerpoints prediction module.

(21)LG-CenterNet模型中提出新的MresneIt50作为Backbone模块,MresneIt50由多个残差块Mblock组成,残差块结构Mblock如图3所示,其中将4个残差块提取到的特征图记为E1,通道数为512;将6个残差块提取到的特征图记为E2,通道数为1024;将3个通道数提取到的特征图记为E3,通道数为2048。(21) In the LG-CenterNet model, a new MresneIt50 is proposed as the Backbone module. MresneIt50 is composed of multiple residual blocks Mblock. The residual block structure Mblock is shown in Figure 3. The feature maps extracted from the four residual blocks are marked is E1, the number of channels is 512; the feature map extracted from 6 residual blocks is marked as E2, and the number of channels is 1024; the feature map extracted from 3 channels is marked as E3, and the number of channels is 2048.

(22)将Backbone提取到的特征图E1、E2、E3输入到层级引导注意力模块(Levelsguide attention,简称LGA)中,LGA模块结构如图4所示,其主要的结构包括两个分支:全局池化分支和层级引导分支,将通道数为512的特征图E1输入到全局池化分支,通过全局最大池化层和上采样层操作获得EC1;将通道数为512、1024、2048的特征图E1、E2、E3输入到层级引导分支中,通过一系列的平均池化和卷积操作并配合上采样得到EC2。将EC1和EC2使用add进行特征联合获得EC3,从而减少计算参数。(22) Input the feature maps E1, E2, and E3 extracted by Backbone into the Levelsguide attention module (Levelsguide attention, referred to as LGA). The structure of the LGA module is shown in Figure 4. Its main structure includes two branches: global The pooling branch and the hierarchical guidance branch input the feature map E1 with 512 channels into the global pooling branch, and obtain EC1 through the global maximum pooling layer and upsampling layer operation; the feature maps with 512, 1024, and 2048 channels E1, E2, and E3 are input to the hierarchical guidance branch, and EC2 is obtained through a series of average pooling and convolution operations with upsampling. Combine EC1 and EC2 with add to obtain EC3, thereby reducing calculation parameters.

(23)将提取到的EC3输入到Scales Encoder模块,Scales Encoder模块结构如图5所示,进行一系列的卷积和残差模块运算后得到EC4。(23) Input the extracted EC3 to the Scales Encoder module. The structure of the Scales Encoder module is shown in Figure 5, and EC4 is obtained after a series of convolution and residual module operations.

(24)将提取到的EC4输入到反卷积模块,反卷积模块由3个deconv组组成,通过每次deconv组的卷积运算将特征图尺寸不断放大,同时通道数不断降低,得到尺度为128×128×64的特征图记为EC5。(24) Input the extracted EC4 to the deconvolution module. The deconvolution module is composed of 3 deconv groups. The size of the feature map is continuously enlarged through the convolution operation of each deconv group, and the number of channels is continuously reduced to obtain the scale. A feature map of 128×128×64 is marked as EC5.

(25)将特征图EC5输入到P-FEM进行卷积运算得到尺度大小为128×128×64特征图EC6,P-FEM由3×3的Poly-Scale Convolution(简称PSConv)、批标准化(BatchNormalizaition)、ReLU激活函数和Sigmoid激活函数构成,主要是为了提高特征图中的局部信息的相关性,增强其对特征的表达能力。P-FEM结构如图6所示。(25) Input feature map EC5 to P-FEM for convolution operation to obtain feature map EC6 with a scale size of 128×128×64. P-FEM consists of 3×3 Poly-Scale Convolution (PSConv for short), Batch Normalization (BatchNormalization ), ReLU activation function and Sigmoid activation function, mainly to improve the correlation of local information in the feature map and enhance its ability to express features. The P-FEM structure is shown in Fig. 6.

步骤3:使用训练好的模型S对道路场景目标通过Centerpoints预测模块以热力图的形式进行目标定位、边框大小划分和类别预测,并将得到的结果在视频或者图像上进行显示输入相应的效果。Step 3: Use the trained model S to perform target positioning, frame size division, and category prediction in the form of heat maps through the Centerpoints prediction module for road scene targets, and display the obtained results on videos or images and input corresponding effects.

Centerpoints预测模块通过对训练好的模型S对输入的图片进行分类预测,将原始图像生成尺度与EC6大小一致的heatmap图,随后通过分别计算热力图的损失值记为Lh,目标长宽(size)的损失值记为Ls和中心点偏移量(offset)的损失值记为Lf来确定目标的位置和大小并生成最后的分类定位的heatmap。其中总体的网络损失为LdThe Centerpoints prediction module classifies and predicts the input pictures through the trained model S, and generates a heatmap with the same scale as EC6 from the original image, and then calculates the loss value of the heat map separately as L h , and the target length and width (size ) is recorded as L s and the loss value of the center point offset (offset) is recorded as L f to determine the position and size of the target and generate the final heatmap for classification and positioning. where the overall network loss is L d .

Ld=LksLsfLf L d = L ks L sf L f

其中λs=0.1,λf=1。对于输入图片大小为512×215的图像来说其通过该网络生成的特征图为H×W×C,则Lk、Ls和Lf计算公式分别为:Wherein λ s =0.1, λ f =1. For an image with an input image size of 512×215, the feature map generated by the network is H×W×C, then the calculation formulas of L k , L s and L f are respectively:

Figure BDA0003865960710000061
Figure BDA0003865960710000061

Figure BDA0003865960710000062
Figure BDA0003865960710000062

Figure BDA0003865960710000063
Figure BDA0003865960710000063

其中,AHWC为图像中目标标注的真实值,A'HWC为图像的预测值,α和β分别为2和4,N为图像中关键点的个数,s'pk为预测尺寸,sk为真实尺寸,p为图像中目标的中心点位置。Among them, A HWC is the actual value of the target label in the image, A' HWC is the predicted value of the image, α and β are 2 and 4 respectively, N is the number of key points in the image, s' pk is the predicted size, s k is the real size, and p is the center point position of the target in the image.

基于相同的发明构思,本发明还提供一种基于LG-CenterNet模型的复杂道路场景目标检测装置,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述计算机程序被加载至处理器时实现上述的基于LG-CenterNet模型的复杂道路场景目标检测方法。如图7所示。Based on the same inventive concept, the present invention also provides a complex road scene object detection device based on the LG-CenterNet model, comprising a memory, a processor and a computer program stored on the memory and operable on the processor, the computer program When loaded into the processor, the above-mentioned complex road scene target detection method based on the LG-CenterNet model is realized. As shown in Figure 7.

将自建的复杂场景数据集通过LG-CenterNet网络进行训练,得到可以识别复杂场景目标的模型,通过数据集中的验证集进行模型性能验证,如图7所示。本发明在自建的复杂道路场景数据集的识别平均精度为86.93%,道路场景目标图像检测速度达到50帧/s,能够满足对道路场景的准确检测和实时检测的要求。The self-built complex scene data set is trained through the LG-CenterNet network to obtain a model that can recognize complex scene objects, and the model performance is verified through the verification set in the data set, as shown in Figure 7. The present invention has an average recognition accuracy of 86.93% in the self-built complex road scene data set, and the detection speed of the road scene object image reaches 50 frames/s, which can meet the requirements of accurate detection and real-time detection of the road scene.

Figure BDA0003865960710000071
Figure BDA0003865960710000071

Figure BDA0003865960710000072
Figure BDA0003865960710000072

Figure BDA0003865960710000073
Figure BDA0003865960710000073

Figure BDA0003865960710000074
Figure BDA0003865960710000074

其中,Precision为精确度,Recall为召回率,AP为精度,mAP为平均精度,FPS为帧数,t为检测单张图片的时间。数据集中有较多样本类别(如car,person等),n表示样本个数,TP(True Positives)为正样本并被认定为正样本的数量(即为car的样本被认定为car的总数);TN(True Negatives)为负样本模型识别也为负样本的总数;FP(FalsePositives)为负样本模型认定为正样本的总数(即样本不为car,模型认定为car的总数);FN(False Negatives)为负样本模型认定为正样本的总数。Among them, Precision is the precision, Recall is the recall rate, AP is the precision, mAP is the average precision, FPS is the number of frames, and t is the time to detect a single picture. There are many sample categories in the data set (such as car, person, etc.), n represents the number of samples, TP (True Positives) is the number of positive samples and is recognized as positive samples (that is, the total number of samples that are car are recognized as car) ; TN (True Negatives) is the total number of negative samples identified by the negative sample model; FP (FalsePositives) is the total number of positive samples identified by the negative sample model (that is, the total number of samples that are not car, and the model is identified as car); FN (FalsePositives) Negatives) is the total number of positive samples identified by the negative sample model.

上面结合附图对本发明的实施方式作了详细说明,但是本发明并不限于上述实施方式,在本领域普通技术人员所具备的知识范围内,还可以在不脱离本发明宗旨的前提下做出各种变化。The embodiments of the present invention have been described in detail above in conjunction with the accompanying drawings, but the present invention is not limited to the above embodiments, and can also be made without departing from the gist of the present invention within the scope of knowledge possessed by those of ordinary skill in the art. Variations.

Claims (5)

1. A complex road scene target detection method based on an LG-CenterNet model is characterized by comprising the following steps:
(1) Processing images of complex road scenes to obtain road target images containing various categories, marking the categories and positions of road targets in the images, constructing a complex road scene data set and preprocessing the complex road scene data set;
(2) Constructing an LG-CenterNet model for target detection, and training the road target data set through the LG-CenterNet model to obtain a model S; the LG-CenterNet model comprises a Backbone module, a hierarchy attention-directing module, a Scales Encoder module, a deconvolution module, a feature enhancement module and a Centerpoints prediction module;
(3) And (3) performing target positioning, frame size division and category prediction on the complex road target in a thermodynamic diagram mode by using the trained model S through a Center points prediction module, and displaying and inputting the obtained result on a video or an image to obtain a corresponding effect.
2. The method for detecting the complex road scene target based on the LG-centrnet model as claimed in claim 1, wherein the preprocessing of the road scene data set in step (1) is to normalize the image of the complex road scene by pixel inconsistency, normalize the image size to 512 x 512 pixel size, and obtain the uniformly distributed feature target samples by batch normalization, reLU activation function and max pooling.
3. The LG-CenterNet model-based complex road scene target detection method according to claim 1, wherein the step (2) is realized by the following steps:
(21) A new Mresneit50 is proposed in the LG-CenterNet model to serve as a Backbone module, the Mresneit50 is composed of a plurality of residual blocks, a feature diagram extracted by 4 residual blocks is marked as E1, and the number of channels is 512; marking a feature map extracted by the 6 residual blocks as E2, wherein the number of channels is 1024; the feature map extracted by the number of 3 channels is marked as E3, and the number of the channels is 2048;
(22) The character diagrams E1, E2 and E3 extracted from the Backbone are input into a hierarchical attention-directing module, and the main structure of the hierarchical attention-directing module comprises two branches: the method comprises the steps of global pooling branching and level guiding branching, inputting a feature map E1 with the channel number being 512 into the global pooling branching, and obtaining EC1 through operation of a global maximum pooling layer and an upper sampling layer; inputting feature maps E1, E2 and E3 with the channel numbers of 512, 1024 and 2048 into a hierarchical guide branch, and obtaining EC2 through a series of average pooling and convolution operations and matching with upsampling; performing feature combination on EC1 and EC2 by using add to obtain EC3, thereby reducing calculation parameters;
(23) Inputting the extracted EC3 into a Scales Encoder module, and carrying out a series of convolution and residual module operations to obtain EC4;
(24) Inputting the extracted EC4 into a deconvolution module, wherein the deconvolution module consists of 3 deconv groups, the size of the characteristic graph is continuously enlarged through convolution operation of the deconv groups each time, and the number of channels is continuously reduced at the same time, so that the characteristic graph with the dimension of 128 multiplied by 64 is obtained and recorded as EC5;
(25) Inputting the feature map EC5 into a feature enhancement module to carry out Convolution operation to obtain a feature map EC6 with the dimension of 128 × 128 × 64, wherein the P-FEM is composed of 3 × 3 Poly-Scale conversion, batch standardization, reLU activation function and Sigmoid activation function, and is mainly used for improving the correlation of local information in the feature map and enhancing the expression capability of the feature map on the features.
4. The LG-CenterNet model-based complex road scene target detection method according to claim 1, characterized in that the step (3) is realized by the following steps:
the Centerpoints prediction module classifies and predicts the input pictures through the trained model S, generates a heatmap image with the scale consistent with the size of EC6 from the original image, and then records the loss value of the original image as L through respectively calculating the thermodynamic diagrams h The loss value of the target length and width is expressed as L s And the loss value of the center point offset is recorded as L f To determine the position and size of the target and to generate a final sorted positioned heatmap; where the overall network loss is:
L d =L ks L sf L f
wherein λ s =0.1,λ f =1; for an image with an input picture size of 512 × 215, the feature map generated by the network is H × W × C, and L k 、L s And L f The calculation formulas are respectively as follows:
Figure FDA0003865960700000021
Figure FDA0003865960700000022
Figure FDA0003865960700000023
wherein, A HWC Is the true value, A ', of the target annotation in the image' HWC Is a predicted value of the image, alpha and beta are respectively 2 and 4, N is the number of key points in the image, s' pk To predict the size, s k For true size, p is the location of the center point of the object in the image.
5. An LG-cenernet model-based complex road scene object detection device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the computer program when loaded into the processor implements the LG-cenernet model-based complex road scene object detection method according to any one of claims 1 to 4.
CN202211179337.4A 2022-09-27 2022-09-27 Object detection method and device for complex road scenes based on LG-CenterNet model Active CN115690704B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211179337.4A CN115690704B (en) 2022-09-27 2022-09-27 Object detection method and device for complex road scenes based on LG-CenterNet model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211179337.4A CN115690704B (en) 2022-09-27 2022-09-27 Object detection method and device for complex road scenes based on LG-CenterNet model

Publications (2)

Publication Number Publication Date
CN115690704A true CN115690704A (en) 2023-02-03
CN115690704B CN115690704B (en) 2023-08-22

Family

ID=85063352

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211179337.4A Active CN115690704B (en) 2022-09-27 2022-09-27 Object detection method and device for complex road scenes based on LG-CenterNet model

Country Status (1)

Country Link
CN (1) CN115690704B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117690165A (en) * 2024-02-02 2024-03-12 四川泓宝润业工程技术有限公司 Method and device for detecting personnel passing between drill rod and hydraulic pliers

Citations (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108717537A (en) * 2018-05-30 2018-10-30 淮阴工学院 A kind of face identification method and system of the complex scene based on pattern-recognition
CN110543895A (en) * 2019-08-08 2019-12-06 淮阴工学院 An Image Classification Method Based on VGGNet and ResNet
CN111382714A (en) * 2020-03-13 2020-07-07 Oppo广东移动通信有限公司 Image detection method, device, terminal and storage medium
CN111709895A (en) * 2020-06-17 2020-09-25 中国科学院微小卫星创新研究院 Blind image deblurring method and system based on attention mechanism
CN111814889A (en) * 2020-07-14 2020-10-23 大连理工大学人工智能大连研究院 A One-Stage Object Detection Method Using Anchor-Free Module and Boosted Classifier
CN111932553A (en) * 2020-07-27 2020-11-13 北京航空航天大学 Remote sensing image semantic segmentation method based on area description self-attention mechanism
CN112329800A (en) * 2020-12-03 2021-02-05 河南大学 Salient object detection method based on global information guiding residual attention
CN112580443A (en) * 2020-12-02 2021-03-30 燕山大学 Pedestrian detection method based on embedded device improved CenterNet
CN112686207A (en) * 2021-01-22 2021-04-20 北京同方软件有限公司 Urban street scene target detection method based on regional information enhancement
CN112700444A (en) * 2021-02-19 2021-04-23 中国铁道科学研究院集团有限公司铁道建筑研究所 Bridge bolt detection method based on self-attention and central point regression model
CN113378815A (en) * 2021-06-16 2021-09-10 南京信息工程大学 Model for scene text positioning recognition and training and recognition method thereof
CN113408498A (en) * 2021-08-05 2021-09-17 广东众聚人工智能科技有限公司 Crowd counting system and method, equipment and storage medium
CN113657326A (en) * 2021-08-24 2021-11-16 陕西科技大学 A Weed Detection Method Based on Multiscale Fusion Module and Feature Enhancement
WO2021244621A1 (en) * 2020-06-04 2021-12-09 华为技术有限公司 Scenario semantic parsing method based on global guidance selective context network
CN114359153A (en) * 2021-12-07 2022-04-15 湖北工业大学 Insulator defect detection method based on improved CenterNet
CN114419589A (en) * 2022-01-17 2022-04-29 东南大学 A road target detection method based on attention feature enhancement module
CN114581866A (en) * 2022-01-24 2022-06-03 江苏大学 Multi-target visual detection algorithm for automatic driving scene based on improved CenterNet
CN114638836A (en) * 2022-02-18 2022-06-17 湖北工业大学 An urban streetscape segmentation method based on highly effective driving and multi-level feature fusion
US20220204035A1 (en) * 2020-12-28 2022-06-30 Hyundai Mobis Co., Ltd. Driver management system and method of operating same
US20220237403A1 (en) * 2021-01-28 2022-07-28 Salesforce.Com, Inc. Neural network based scene text recognition
CN114863368A (en) * 2022-07-05 2022-08-05 城云科技(中国)有限公司 Multi-scale target detection model and method for road damage detection
CN115035361A (en) * 2022-05-11 2022-09-09 中国科学院声学研究所南海研究站 Target detection method and system based on attention mechanism and feature cross fusion

Patent Citations (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108717537A (en) * 2018-05-30 2018-10-30 淮阴工学院 A kind of face identification method and system of the complex scene based on pattern-recognition
CN110543895A (en) * 2019-08-08 2019-12-06 淮阴工学院 An Image Classification Method Based on VGGNet and ResNet
CN111382714A (en) * 2020-03-13 2020-07-07 Oppo广东移动通信有限公司 Image detection method, device, terminal and storage medium
WO2021244621A1 (en) * 2020-06-04 2021-12-09 华为技术有限公司 Scenario semantic parsing method based on global guidance selective context network
CN111709895A (en) * 2020-06-17 2020-09-25 中国科学院微小卫星创新研究院 Blind image deblurring method and system based on attention mechanism
CN111814889A (en) * 2020-07-14 2020-10-23 大连理工大学人工智能大连研究院 A One-Stage Object Detection Method Using Anchor-Free Module and Boosted Classifier
CN111932553A (en) * 2020-07-27 2020-11-13 北京航空航天大学 Remote sensing image semantic segmentation method based on area description self-attention mechanism
CN112580443A (en) * 2020-12-02 2021-03-30 燕山大学 Pedestrian detection method based on embedded device improved CenterNet
CN112329800A (en) * 2020-12-03 2021-02-05 河南大学 Salient object detection method based on global information guiding residual attention
US20220204035A1 (en) * 2020-12-28 2022-06-30 Hyundai Mobis Co., Ltd. Driver management system and method of operating same
CN112686207A (en) * 2021-01-22 2021-04-20 北京同方软件有限公司 Urban street scene target detection method based on regional information enhancement
US20220237403A1 (en) * 2021-01-28 2022-07-28 Salesforce.Com, Inc. Neural network based scene text recognition
CN112700444A (en) * 2021-02-19 2021-04-23 中国铁道科学研究院集团有限公司铁道建筑研究所 Bridge bolt detection method based on self-attention and central point regression model
CN113378815A (en) * 2021-06-16 2021-09-10 南京信息工程大学 Model for scene text positioning recognition and training and recognition method thereof
CN113408498A (en) * 2021-08-05 2021-09-17 广东众聚人工智能科技有限公司 Crowd counting system and method, equipment and storage medium
CN113657326A (en) * 2021-08-24 2021-11-16 陕西科技大学 A Weed Detection Method Based on Multiscale Fusion Module and Feature Enhancement
CN114359153A (en) * 2021-12-07 2022-04-15 湖北工业大学 Insulator defect detection method based on improved CenterNet
CN114419589A (en) * 2022-01-17 2022-04-29 东南大学 A road target detection method based on attention feature enhancement module
CN114581866A (en) * 2022-01-24 2022-06-03 江苏大学 Multi-target visual detection algorithm for automatic driving scene based on improved CenterNet
CN114638836A (en) * 2022-02-18 2022-06-17 湖北工业大学 An urban streetscape segmentation method based on highly effective driving and multi-level feature fusion
CN115035361A (en) * 2022-05-11 2022-09-09 中国科学院声学研究所南海研究站 Target detection method and system based on attention mechanism and feature cross fusion
CN114863368A (en) * 2022-07-05 2022-08-05 城云科技(中国)有限公司 Multi-scale target detection model and method for road damage detection

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
于方程 等: "基于改进CenterNet的自动驾驶小目标检测", 《HTTP://KNS.CNKI.NET/KCMS/DETAIL/11.2175.TN.20220719.1838.026.HTML》, pages 1 - 8 *
于方程 等: "基于改进CenterNet的自动驾驶小目标检测", 《电子测量技术》, vol. 45, no. 15, pages 115 - 122 *
成怡 等: "改进CenterNet的交通标志检测算法", 《信号处理》, vol. 38, no. 3, pages 511 - 518 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117690165A (en) * 2024-02-02 2024-03-12 四川泓宝润业工程技术有限公司 Method and device for detecting personnel passing between drill rod and hydraulic pliers

Also Published As

Publication number Publication date
CN115690704B (en) 2023-08-22

Similar Documents

Publication Publication Date Title
CN110738207B (en) Character detection method for fusing character area edge information in character image
Wang et al. FE-YOLOv5: Feature enhancement network based on YOLOv5 for small object detection
CN109558832B (en) Human body posture detection method, device, equipment and storage medium
CN108492319B (en) Moving target detection method based on deep full convolution neural network
WO2021155792A1 (en) Processing apparatus, method and storage medium
CN108280397B (en) Human body image hair detection method based on deep convolutional neural network
CN110569738B (en) Natural scene text detection method, equipment and medium based on densely connected network
CN110175613A (en) Street view image semantic segmentation method based on Analysis On Multi-scale Features and codec models
CN111340123A (en) Image score label prediction method based on deep convolutional neural network
CN107944020A (en) Facial image lookup method and device, computer installation and storage medium
CN111210446B (en) Video target segmentation method, device and equipment
CN113487610B (en) Herpes image recognition method and device, computer equipment and storage medium
CN116012709B (en) High-resolution remote sensing image building extraction method and system
CN113936195B (en) Sensitive image recognition model training method and device and electronic equipment
CN113033454B (en) A detection method for building changes in urban video cameras
CN114266794A (en) Pathological section image cancer region segmentation system based on full convolution neural network
CN116665054A (en) Remote sensing image small target detection method based on improved YOLOv3
CN117726575A (en) Medical image segmentation method based on multi-scale feature fusion
CN114743257A (en) Method for detecting and identifying image target behaviors
CN116503726A (en) Multi-scale light smoke image segmentation method and device
CN115690704B (en) Object detection method and device for complex road scenes based on LG-CenterNet model
CN110287970B (en) A Weakly Supervised Object Localization Method Based on CAM and Masking
Li et al. Incremental learning of infrared vehicle detection method based on SSD
CN103927517B (en) Motion detection method based on human body global feature histogram entropies
CN114898290A (en) Real-time detection method and system for marine ship

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20230203

Assignee: Jiangsu Kesheng Xuanyi Technology Co.,Ltd.

Assignor: HUAIYIN INSTITUTE OF TECHNOLOGY

Contract record no.: X2023980048436

Denomination of invention: Method and device for complex road scene object detection based on LG CenterNet model

Granted publication date: 20230822

License type: Common License

Record date: 20231129