CN114332919A - A pedestrian detection method, device and terminal device based on multi-spatial relationship perception - Google Patents
A pedestrian detection method, device and terminal device based on multi-spatial relationship perception Download PDFInfo
- Publication number
- CN114332919A CN114332919A CN202111510823.5A CN202111510823A CN114332919A CN 114332919 A CN114332919 A CN 114332919A CN 202111510823 A CN202111510823 A CN 202111510823A CN 114332919 A CN114332919 A CN 114332919A
- Authority
- CN
- China
- Prior art keywords
- spatial relationship
- feature
- feature map
- relationship
- pedestrian detection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 55
- 230000008447 perception Effects 0.000 title claims abstract description 44
- 238000000034 method Methods 0.000 claims abstract description 14
- 238000012549 training Methods 0.000 claims abstract description 11
- 239000011159 matrix material Substances 0.000 claims description 27
- 239000013598 vector Substances 0.000 claims description 27
- 230000006870 function Effects 0.000 claims description 22
- 230000004913 activation Effects 0.000 claims description 9
- 238000011176 pooling Methods 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 8
- 230000002457 bidirectional effect Effects 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000005070 sampling Methods 0.000 claims description 3
- 239000000284 extract Substances 0.000 abstract description 2
- 238000013527 convolutional neural network Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Image Analysis (AREA)
Abstract
本发明公开了一种基于多空间关系感知的行人检测方法、装置及终端设备,该方法包括步骤1,采集行人图像数据集,调整到固定大小进行模型的训练;步骤2,采用YOLOX的检测框架,将图像输入到框架模型中,先对图像进行数据增强;步骤3,将数据增强后的图像输入到Focus模块中,对图像按照奇偶进行切片操作,获得4张图像,再沿着通道方向进行拼接;步骤4,将拼接后的图像输入到YOLOX检测框架的主干网络中,与主干网络连接的是三个分支;步骤5,每个分支包含2个部分,多空间关系感知模块和检测头,多空间关系感知模块通过在不同空间维度中结合特征之间的关系,将全局信息和局部信息有效地融合在一起,获得多空间关系感知特征图。该方法既关注于全局信息,又能提取局部信息,并将二者有效融合,从而获取到更加有辨识度的特征信息,提高行人检测性能。
The invention discloses a pedestrian detection method, device and terminal device based on multi-spatial relationship perception. The method includes step 1: collecting a pedestrian image data set and adjusting it to a fixed size for model training; step 2, using a YOLOX detection framework , input the image into the frame model, first perform data enhancement on the image; step 3, input the data-enhanced image into the Focus module, slice the image according to parity, obtain 4 images, and then proceed along the channel direction. Splicing; Step 4, input the spliced image into the backbone network of the YOLOX detection framework, and three branches are connected to the backbone network; Step 5, each branch contains 2 parts, a multi-spatial relationship perception module and a detection head, The multi-spatial relationship-aware module effectively fuses global information and local information together by combining the relationship between features in different spatial dimensions to obtain multi-spatial relationship-aware feature maps. This method not only focuses on global information, but also extracts local information, and effectively fuses the two to obtain more recognizable feature information and improve pedestrian detection performance.
Description
技术领域technical field
本发明涉及图像识别研究领域,尤其是行人检测方法,具体涉及一种基于多空间关系感知的行人检测方法、装置及终端设备。The invention relates to the field of image recognition research, in particular to a pedestrian detection method, and in particular to a pedestrian detection method, device and terminal device based on multi-spatial relationship perception.
背景技术Background technique
随着智慧城市建设的不断发展,许多人工智能新技术应用于智能交通、智能政务、智能工厂等,而每个应用都离不开群众,都是服务于人,因此,行人检测是很多应用技术的前提。然而,现实场景往往比较复杂,如人群密集时导致躯体交错重叠,或被物体遮挡,光照变化强烈,恶劣气候因素(雨雪天气等)导致的画面模糊等,这些真实情况加大了行人检测的难度。为此,急需一种行人检测技术,能够在图像中的行人区域挖掘出更加深层次的、有判别力的特征,足以在各种环境下表征出行人。With the continuous development of smart city construction, many new artificial intelligence technologies are applied to smart transportation, smart government affairs, smart factories, etc., and each application is inseparable from the masses and serves people. Therefore, pedestrian detection is one of many application technologies. the premise. However, real scenes are often more complex, such as dense crowds causing bodies to overlap or overlap, or being blocked by objects, strong changes in illumination, blurred images caused by harsh weather factors (rain and snow, etc.), etc. These real situations increase the difficulty of pedestrian detection. difficulty. To this end, a pedestrian detection technology is urgently needed, which can dig deeper and discriminative features in the pedestrian area in the image, which is sufficient to characterize pedestrians in various environments.
在实现本发明过程中,发明人发现现有技术中至少存在如下问题:目前流行的行人检测技术大多基于卷积神经网络(CNN),而多数CNN行人检测模型都是使用有限的感受野,很难结合全局信息学习到丰富的结构模式,比如利用CNN对行人进行检测和分割,从而获取最终位置信息;比如使用CNN结合特征融合进行行人检测;虽然有的方法考虑到不同的感受野,但是没有很好的结合全局和局部信息;此外,还有些方法通过堆叠网络深度来增强模型的学习能力,这种模型无论是训练还是部署都十分耗费资源。In the process of realizing the present invention, the inventor found that there are at least the following problems in the prior art: the current popular pedestrian detection technologies are mostly based on convolutional neural networks (CNN), and most CNN pedestrian detection models use a limited receptive field, which is very difficult to achieve. It is difficult to combine global information to learn rich structural patterns, such as using CNN to detect and segment pedestrians to obtain final location information; such as using CNN combined with feature fusion for pedestrian detection; although some methods consider different receptive fields, but no It is a good combination of global and local information; in addition, there are some methods to enhance the learning ability of the model by stacking the network depth, which is very resource-intensive for both training and deployment.
发明内容SUMMARY OF THE INVENTION
为了克服现有技术的不足,本发明提供了一种基于多空间关系感知的行人检测方法、装置及终端设备,该方法既关注于全局信息,又能提取局部信息,并将二者有效融合,从而获取到更加有辨识度的特征信息,提高行人检测性能。技术方案如下:In order to overcome the deficiencies of the prior art, the present invention provides a pedestrian detection method, device and terminal device based on multi-spatial relationship perception. Thereby, more recognizable feature information can be obtained, and the pedestrian detection performance can be improved. The technical solution is as follows:
本发明提供了一种基于多空间关系感知的行人检测方法,该方法包括如下步骤:The present invention provides a pedestrian detection method based on multi-spatial relationship perception, the method comprising the following steps:
步骤1,采集行人图像数据集,调整到固定大小进行模型的训练。Step 1: Collect a pedestrian image dataset and adjust it to a fixed size for model training.
步骤2,采用YOLOX的检测框架,将图像输入到框架模型中,先对图像进行数据增强。Step 2, using YOLOX's detection framework, input the image into the framework model, and first perform data enhancement on the image.
步骤3,将数据增强后的图像输入到Focus模块中,对图像按照奇偶进行切片操作,获得4张图像,再沿着通道方向进行拼接。Step 3: Input the data-enhanced image into the Focus module, slice the image according to parity, obtain 4 images, and then stitch them along the channel direction.
步骤4,将拼接后的图像输入到YOLOX检测框架的主干网络中,与主干网络连接的是三个分支,这三个分支分别对应不同的感受野,三种感受野能够覆盖不同尺寸的目标。Step 4: Input the spliced image into the backbone network of the YOLOX detection framework. Three branches are connected to the backbone network. These three branches correspond to different receptive fields, and the three receptive fields can cover targets of different sizes.
步骤5,每个分支包含2个部分,多空间关系感知模块和检测头,多空间关系感知模块通过在不同空间维度中结合特征之间的关系,将全局信息和局部信息有效地融合在一起,获得多空间关系感知特征图。Step 5, each branch contains 2 parts, a multi-spatial relationship perception module and a detection head. The multi-spatial relationship perception module effectively fuses global information and local information together by combining the relationship between features in different spatial dimensions, Obtain multi-spatial relation-aware feature maps.
多空间关系感知模块的工作流程具体如下:The workflow of the multi-spatial relationship perception module is as follows:
输入到多空间关系感知模块的特征图X维度为H×W×C,H为高,W为宽,C为通道数;The X dimension of the feature map input to the multi-spatial relationship perception module is H×W×C, where H is height, W is width, and C is the number of channels;
(1)构建H×W空间的关系特征图;(1) Construct the relational feature map of H×W space;
在H×W空间范围,将特征图X分解成H×W个长度为C的特征向量,特征向量xi映射到特征向量xj的关系信息用ri,j表示,计算方式如下:In the H×W space range, the feature map X is decomposed into H×W feature vectors of length C, and the relationship information of the feature vector x i mapped to the feature vector x j is represented by ri , j , and the calculation method is as follows:
其中,和φH×W为2个嵌入函数,均由一个1×1的卷积层、一个BatchNormalization层和一个ReLU激活层组成。相应的,特征向量xj映射到特征向量Jxi的关系信息为rj,i=fH×W(xj,xi),则(ri,j,rj,i)描述了特征向量xi和xj之间的双向关系;对于单向关系,计算出所有特征向量之间的关系信息并进行堆叠即可得到一个亲和矩阵矩阵通道数为H×W,因此双向关系可以得到两个不同的亲和矩阵M1和M2,对特征的局部信息进行深度挖掘。in, and φ H×W are 2 embedding functions, each consisting of a 1×1 convolutional layer, a BatchNormalization layer and a ReLU activation layer. Correspondingly, the relationship information of the feature vector x j mapped to the feature vector Jx i is r j, i =f H×W (x j , x i ), then (r i, j , r j, i ) describes the feature vector Two-way relationship between x i and x j ; for one-way relationship, calculate the relationship information between all eigenvectors and stack them to get an affinity matrix The number of matrix channels is H×W, so two different affinity matrices M1 and M2 can be obtained from the bidirectional relationship, and the local information of the feature can be deeply mined.
将原始的全局结构信息保留下来,具体地,对原始特征图X进行1×1卷积后,在通道方向做全局平均池化操作,获得一个全局结构特征图F,将全局结构特征图F与两个亲和矩阵串联在一起,获得一个特征矩阵Y,公式如下:Retain the original global structure information. Specifically, after performing 1×1 convolution on the original feature map X, perform a global average pooling operation in the channel direction to obtain a global structure feature map F, Concatenate the global structural feature map F with two affinity matrices to obtain a feature matrix Y, the formula is as follows:
pool表示全局平均池化,θH×W和均由一个一个1×1的卷积层、一个BatchNormalization层和一个ReLU激活层组成,相比于和φH×W,它们的输出激活节点数量都是不一样的;将特征矩阵Y通过1×1的卷积,来融合特征矩阵中包含的所有全局和局部信息,从而得到属于片×W空间的关系特征图。pool denotes global average pooling, θ H×W and Both consist of a 1×1 convolutional layer, a BatchNormalization layer and a ReLU activation layer, compared to and φ H×W , the number of output activation nodes is different; the feature matrix Y is fused by 1×1 convolution to fuse all the global and local information contained in the feature matrix, so as to obtain the space belonging to the slice×W The relational feature map of .
(2)构建通道空间C的关系特征图;(2) Construct the relational feature map of the channel space C;
同理,在通道空间范围,将特征图X分解成C个长度为H×W的特征向量,特征向量xa映射到特征向量xb的关系信息ra,b为:Similarly, in the channel space range, the feature map X is decomposed into C feature vectors of length H×W, and the relationship information r a and b of the feature vector x a mapped to the feature vector x b are:
其中阳φC函数与和φH×W一致,只是输出维度不同;采用与步骤5(1)相同的计算方式获得亲和矩阵即双向关系可以得到两个不同的亲和矩阵M′1和M′2。in Yang φ C function with Consistent with φ H×W , but the output dimension is different; use the same calculation method as step 5(1) to obtain the affinity matrix That is, the bidirectional relationship can obtain two different affinity matrices M' 1 and M' 2 .
对原始特征图X进行1×1卷积后,在H×W维度做全局平均池化,获得结构特征图将结构特征图与两个亲和矩阵串联在一起,获得的特征矩阵Y′,计算方式如下:After performing 1×1 convolution on the original feature map X, perform global average pooling in the H×W dimension to obtain the structural feature map Concatenate the structural feature map with the two affinity matrices to obtain the feature matrix Y′, which is calculated as follows:
Y′=[pool(θC(X)),θC(M′1),θC(M′2)]。Y'=[pool(θ C (X)), θ C (M' 1 ), θ C (M' 2 )].
θC和函数功能与θH×W和一致,只是输出维度不同;将特征矩阵Y′通过1×1的卷积,来融合特征矩阵中包含的所有全局和局部信息,从而得到属于通道空间C的关系特征图。θ C and function function with θ H×W and Consistent, but the output dimensions are different; the feature matrix Y′ is convolutional by 1×1 to fuse all the global and local information contained in the feature matrix, so as to obtain the relational feature map belonging to the channel space C.
将H×W空间和通道空间C的关系特征图点乘,获得多空间关系感知特征图。The multi-space relationship-aware feature map is obtained by dot-multiplying the relationship feature map of the H×W space and the channel space C.
步骤6,将多空间关系感知特征图放入检测头中,YOLOX将分类和坐标定位进行解耦,先通过一个1×1的卷积对通道进行降维,后接两个轻量分支,分别进行分类和回归。Step 6: Put the multi-spatial relationship-aware feature map into the detection head. YOLOX decouples the classification and coordinate positioning. First, the channel is dimensionally reduced by a 1×1 convolution, followed by two lightweight branches, respectively. Perform classification and regression.
优选的,步骤2数据增强包括图像的随机水平翻转,颜色抖动,多尺度增强以及马赛克坐标增强方法。Preferably, the data enhancement in step 2 includes random horizontal flipping of the image, color dithering, multi-scale enhancement and mosaic coordinate enhancement methods.
优选的,步骤4中三个分支分别对应的感受野为下采样8倍、16倍、32倍。Preferably, the receptive fields corresponding to the three branches in step 4 are down-sampling 8 times, 16 times, and 32 times respectively.
优选的,在训练阶段,分类损失函数采用交叉熵,回归损失函数采用GIOU损失,并用L1范数对获取的位置信息施加惩罚。Preferably, in the training phase, the classification loss function adopts cross entropy, the regression loss function adopts GIOU loss, and the L1 norm is used to impose a penalty on the acquired position information.
与现有技术相比,上述技术方案中的一个技术方案具有如下有益效果:通过多空间关系感知模块,深度挖掘不同空间维度中,特征与特征之间的关系,既关注于全局信息,又能提取局部信息,并将二者有效融合,将不同空间的特征信息与特征间的关系信息建立联系,使得模型学习到的特征更加具有辨识度和判别性,从而获取到更加有辨识度的特征信息,提高行人检测准确率。Compared with the prior art, one of the above technical solutions has the following beneficial effects: through the multi-spatial relationship perception module, the relationship between features and features in different spatial dimensions can be deeply excavated, which not only focuses on the global information, but also Extract local information, and effectively fuse the two to establish a connection between the feature information in different spaces and the relationship information between the features, so that the features learned by the model are more recognizable and discriminative, so as to obtain more recognizable feature information. , to improve the accuracy of pedestrian detection.
附图说明Description of drawings
图1为本公开实施例提供的一种多空间关系感知模块流程图。FIG. 1 is a flowchart of a multi-spatial relationship perception module provided by an embodiment of the present disclosure.
具体实施方式Detailed ways
为了阐明本发明的技术方案和工作原理,下面将结合附图对本公开实施方式做进一步的详细描述。上述所有可选技术方案,可以采用任意结合形成本公开的可选实施例,在此不再一一赘述。In order to clarify the technical solution and working principle of the present invention, the embodiments of the present disclosure will be further described in detail below with reference to the accompanying drawings. All the above-mentioned optional technical solutions can be combined arbitrarily to form optional embodiments of the present disclosure, which will not be repeated here.
本申请的说明书和权利要求书及上述附图中的术语“步骤1”、“步骤2”、“步骤3”等类似描述是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例能够以除了在这里描述的那些以外的顺序实施。The terms "
第一方面:本公开实施例提供了一种基于多空间关系感知的行人检测方法,该方法包括如下步骤:A first aspect: an embodiment of the present disclosure provides a pedestrian detection method based on multi-spatial relationship perception, and the method includes the following steps:
步骤1,采集行人图像数据集,调整到固定大小进行模型的训练。Step 1: Collect a pedestrian image dataset and adjust it to a fixed size for model training.
步骤2,采用YOLOX的检测框架,此框架结构简洁,无需人工设置锚框,便于训练和部署。将图像输入到框架模型中,先对图像进行数据增强,优选的,步骤2数据增强包括图像的随机水平翻转,颜色抖动,多尺度增强以及马赛克坐标增强方法等,以扩大训练集规模,提高模型的泛化能力。In step 2, the detection framework of YOLOX is adopted, which has a simple structure and does not need to manually set anchor frames, which is convenient for training and deployment. Input the image into the frame model, and first perform data enhancement on the image. Preferably, the data enhancement in step 2 includes random horizontal flipping of the image, color jittering, multi-scale enhancement and mosaic coordinate enhancement methods, etc., in order to expand the scale of the training set and improve the model. generalization ability.
步骤3,将数据增强后的图像输入到Focus模块中,对图像按照奇偶进行切片操作,获得4张图像,再沿着通道方向进行拼接。Focus模块在不增加计算量的同时进行了下采样,而且保留了更加完整了图像信息。Step 3: Input the data-enhanced image into the Focus module, slice the image according to parity, obtain 4 images, and then stitch them along the channel direction. The Focus module performs downsampling without increasing the amount of computation, and retains more complete image information.
步骤4,将拼接后的图像输入到YOLOX检测框架的主干网络中,与主干网络连接的是三个分支,这三个分支分别对应不同的感受野,三种感受野能够覆盖不同尺寸的目标。优选的,步骤4中三个分支分别对应的感受野为下采样8倍、16倍、32倍。Step 4: Input the spliced image into the backbone network of the YOLOX detection framework. Three branches are connected to the backbone network. These three branches correspond to different receptive fields, and the three receptive fields can cover targets of different sizes. Preferably, the receptive fields corresponding to the three branches in step 4 are down-sampling 8 times, 16 times, and 32 times respectively.
步骤5,每个分支包含2个部分,多空间关系感知模块和检测头,多空间关系感知模块通过在不同空间维度中结合特征之间的关系,将全局信息和局部信息有效地融合在一起,获得多空间关系感知特征图。Step 5, each branch contains 2 parts, a multi-spatial relationship perception module and a detection head. The multi-spatial relationship perception module effectively fuses global information and local information together by combining the relationship between features in different spatial dimensions, Obtain multi-spatial relation-aware feature maps.
附图1为一种多空间关系感知模块工作流程图,结合该图,多空间关系感知模块的工作流程具体如下:Accompanying drawing 1 is a kind of working flow chart of the multi-spatial relationship perception module, in conjunction with this figure, the work flow of the multi-spatial relationship perception module is as follows:
输入到多空间关系感知模块的特征图X维度为H×W×C,H为高,W为宽,C为通道数;The X dimension of the feature map input to the multi-spatial relationship perception module is H×W×C, where H is height, W is width, and C is the number of channels;
(1)构建H×W空间的关系特征图;(1) Construct the relational feature map of H×W space;
在H×W空间范围,将特征图X分解成H×W个长度为C的特征向量,特征向量xi映射到特征向量xj的关系信息用ri,j表示,计算方式如下:In the H×W space range, the feature map X is decomposed into H×W feature vectors of length C, and the relationship information of the feature vector x i mapped to the feature vector x j is represented by ri , j , and the calculation method is as follows:
其中,和φH×W为2个嵌入函数,均由一个1×1的卷积层、一个BatchNormalization层和一个ReLU激活层组成。相应的,特征向量xj映射到特征向量xi的关系信息为rj,i=fH×W(xj,xi),则(ri,j,rj,i)描述了特征向量xi和xj之间的双向关系;对于单向关系,计算出所有特征向量之间的关系信息并进行堆叠即可得到一个亲和矩阵矩阵通道数为H×W,因此双向关系可以得到两个不同的亲和矩阵M1和M2,对特征的局部信息进行深度挖掘。in, and φ H×W are 2 embedding functions, each consisting of a 1×1 convolutional layer, a BatchNormalization layer and a ReLU activation layer. Correspondingly, the relationship information of the feature vector x j mapped to the feature vector x i is r j, i = f H×W (x j , x i ), then (r i, j , r j, i ) describes the feature vector Two-way relationship between x i and x j ; for one-way relationship, calculate the relationship information between all eigenvectors and stack them to get an affinity matrix The number of matrix channels is H×W, so two different affinity matrices M 1 and M 2 can be obtained from the bidirectional relationship, and the local information of the feature can be deeply mined.
为了能够同时开发特征的全局信息,则需要将原始的全局结构信息保留下来,具体地,对原始特征图X进行1×1卷积后,在通道方向做全局平均池化操作,获得一个全局结构特征图F,将全局结构特征图F与两个亲和矩阵串联在一起,获得一个特征矩阵Y,公式如下:In order to develop the global information of features at the same time, it is necessary to retain the original global structure information. Specifically, after performing 1×1 convolution on the original feature map X, perform a global average pooling operation in the channel direction to obtain a global structure feature map F, Concatenate the global structural feature map F with two affinity matrices to obtain a feature matrix Y, the formula is as follows:
Y=[pool(θH×W(X)),θH×W(M1),θH×W(M2)]。Y=[pool(θ H×W (X)), θ H×W (M 1 ), θ H×W (M 2 )].
pool表示全局平均池化,θH×W和θH×W均由一个一个1×1的卷积层、一个BatchNormalization层和一个ReLU激活层组成,相比于和φH×W,它们的输出激活节点数量都是不一样的;将特征矩阵Y通过1×1的卷积,来融合特征矩阵中包含的所有全局和局部信息,从而得到属于H×W空间的关系特征图。pool represents global average pooling. Both θ H×W and θ H×W consist of a 1×1 convolutional layer, a BatchNormalization layer and a ReLU activation layer, compared to and φ H×W , the number of output activation nodes is different; the feature matrix Y is fused by 1×1 convolution to fuse all the global and local information contained in the feature matrix, so as to obtain the space belonging to H×W The relational feature map of .
(2)构建通道空间C的关系特征图;(2) Construct the relational feature map of the channel space C;
同理,在通道空间范围,将特征图X分解成C个长度为H×W的特征向量,特征向量xa映射到特征向量xb的关系信息ra,b为:Similarly, in the channel space range, the feature map X is decomposed into C feature vectors of length H×W, and the relationship information r a and b of the feature vector x a mapped to the feature vector x b are:
其中和φC函数与和φH×W一致,只是输出维度不同;采用与步骤5(1)相同的计算方式获得亲和矩阵即双向关系可以得到两个不同的亲和矩阵M′1和M′2;in and φ C function and Consistent with φ H×W , but the output dimension is different; use the same calculation method as step 5(1) to obtain the affinity matrix That is, the bidirectional relationship can obtain two different affinity matrices M′ 1 and M′ 2 ;
不同于步骤5中获得的结构特征图F,在本步骤中对原始特征图X进行1×1卷积后,在H×W维度做全局平均池化,获得结构特征图将结构特征图与两个亲和矩阵串联在一起,获得的特征矩阵Y′,计算方式如下:Different from the structural feature map F obtained in step 5, in this step, after 1×1 convolution is performed on the original feature map X, global average pooling is performed in the H×W dimension to obtain the structural feature map Concatenate the structural feature map with the two affinity matrices to obtain the feature matrix Y′, which is calculated as follows:
θC和函数功能与θH×W和一致,只是输出维度不同;将特征矩阵Y′通过1×1的卷积,来融合特征矩阵中包含的所有全局和局部信息,从而得到属于通道空间C的关系特征图。θ C and function function with θ H×W and Consistent, but the output dimensions are different; the feature matrix Y′ is convolutional by 1×1 to fuse all the global and local information contained in the feature matrix, so as to obtain the relational feature map belonging to the channel space C.
将H×W空间和通道空间C的关系特征图点乘,获得多空间关系感知特征图,此关系感知特征图包含了不同空间维度中特征的全局和局部信息,并将其充分融合,提高了特征的有效性和判别能力。Dot multiplication of the relational feature maps of H×W space and channel space C to obtain a multi-space relation-aware feature map, which contains the global and local information of features in different spatial dimensions, and fully fuses them to improve the performance. Validity and discriminative power of features.
步骤6,将多空间关系感知特征图放入检测头中,不同于传统的YOLO系列检测头将分类和坐标定位耦合在一起训练,YOLOX将分类和坐标定位进行解耦,先通过一个1×1的卷积对通道进行降维,后接两个轻量分支,分别进行分类和回归,能够有效提高模型收敛速度。Step 6: Put the multi-spatial relationship-aware feature map into the detection head. Unlike the traditional YOLO series detection head, which couples classification and coordinate positioning for training, YOLOX decouples classification and coordinate positioning, first through a 1 × 1 The convolution of the channel reduces the dimension of the channel, followed by two lightweight branches for classification and regression respectively, which can effectively improve the convergence speed of the model.
优选的,在训练阶段,分类损失函数采用交叉熵,回归损失函数采用GIOU损失,并用L1范数对获取的位置信息施加惩罚。Preferably, in the training phase, the classification loss function adopts cross entropy, the regression loss function adopts GIOU loss, and the L1 norm is used to impose a penalty on the acquired position information.
第二方面,本公开实施例提供了一种基于多空间关系感知的行人检测装置,基于相同的技术构思,该装置可以实现或执行所有可能的实现方式中任一项所述的一种基于多空间关系感知的行人检测方法。In a second aspect, embodiments of the present disclosure provide a pedestrian detection device based on multi-spatial relationship perception. Based on the same technical concept, the device can implement or execute the multi-space relationship-based detection device in any of the possible implementation manners. A spatial relationship aware pedestrian detection method.
优选的,该装置包括数据获取单元、第一数据处理单元、第二数据处理单元、结果获取单元;Preferably, the device includes a data acquisition unit, a first data processing unit, a second data processing unit, and a result acquisition unit;
所述数据获取单元,用于执行所有可能的实现方式中任一项所述的一种基于多空间关系感知的行人检测方法的步骤1的步骤;The data acquisition unit is used to perform the steps of
所述第一数据处理单元,用于执行所有可能的实现方式中任一项所述的一种基于多空间关系感知的行人检测方法的步骤2和步骤3的步骤;The first data processing unit is used to execute the steps of step 2 and step 3 of a pedestrian detection method based on multi-spatial relationship perception according to any one of all possible implementations;
所述第二数据处理单元,用于执行所有可能的实现方式中任一项所述的一种基于多空间关系感知的行人检测方法的步骤4和步骤5的步骤;The second data processing unit is used to perform the steps of step 4 and step 5 of a pedestrian detection method based on multi-spatial relationship perception described in any one of all possible implementations;
所述结果获取单元,用于执行所有可能的实现方式中任一项所述的一种基于多空间关系感知的行人检测方法的步骤6的步骤。The result obtaining unit is configured to execute the steps of step 6 of the pedestrian detection method based on multi-spatial relationship perception described in any one of the possible implementation manners.
需要说明的是,上述实施例提供的一种基于多空间关系感知的行人检测装置在执行一种基于多空间关系感知的行人检测方法时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外上述实施例提供的一种基于多空间关系感知的行人检测装置与一种基于多空间关系感知的行人检测方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。It should be noted that, when the pedestrian detection device based on multi-spatial relationship perception provided in the above embodiment executes a pedestrian detection method based on multi-spatial relationship perception, only the division of the above functional modules is used as an example to illustrate, and the practical application In the device, the above-mentioned function distribution can be completed by different function modules according to the needs, that is, the internal structure of the device is divided into different function modules, so as to complete all or part of the functions described above. In addition, a pedestrian detection device based on multi-spatial relationship perception provided by the above embodiments and a pedestrian detection method based on multi-spatial relationship perception belong to the same concept.
第三方面,本公开实施例提供了一种终端设备,该终端设备包括所有可能的实现方式中任一项所述一种基于多空间关系感知的行人检测装置。In a third aspect, an embodiment of the present disclosure provides a terminal device, where the terminal device includes the apparatus for detecting pedestrians based on multi-spatial relationship perception according to any one of all possible implementation manners.
以上结合附图对本发明进行了示例性描述,显然,本发明具体实现并不受上述方式的限制,凡是采用了本发明的方法构思和技术方案进行的各种非实质性的改进;或者未经改进、等同替换,将本发明的上述构思和技术方案直接应用于其他场合的,均在本发明的保护范围之内。The present invention has been exemplarily described above with reference to the accompanying drawings. Obviously, the specific implementation of the present invention is not limited by the above-mentioned methods, and all kinds of insubstantial improvements made by the method concept and technical solution of the present invention are adopted; Improvements, equivalent replacements, and direct application of the above concepts and technical solutions of the present invention to other occasions are all within the protection scope of the present invention.
Claims (7)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111510823.5A CN114332919B (en) | 2021-12-11 | 2021-12-11 | Pedestrian detection method and device based on multi-spatial relationship sensing and terminal equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111510823.5A CN114332919B (en) | 2021-12-11 | 2021-12-11 | Pedestrian detection method and device based on multi-spatial relationship sensing and terminal equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114332919A true CN114332919A (en) | 2022-04-12 |
CN114332919B CN114332919B (en) | 2024-10-29 |
Family
ID=81050935
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111510823.5A Active CN114332919B (en) | 2021-12-11 | 2021-12-11 | Pedestrian detection method and device based on multi-spatial relationship sensing and terminal equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114332919B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114663861A (en) * | 2022-05-17 | 2022-06-24 | 山东交通学院 | Vehicle re-identification method based on dimension decoupling and non-local relation |
CN115082855A (en) * | 2022-06-20 | 2022-09-20 | 安徽工程大学 | Pedestrian occlusion detection method based on improved YOLOX algorithm |
CN115311690A (en) * | 2022-10-08 | 2022-11-08 | 广州英码信息科技有限公司 | End-to-end pedestrian structural information and dependency relationship detection method thereof |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110796239A (en) * | 2019-10-30 | 2020-02-14 | 福州大学 | A deep learning target detection method based on channel and space fusion perception |
CN111369543A (en) * | 2020-03-07 | 2020-07-03 | 北京工业大学 | A fast pollen particle detection algorithm based on dual self-attention module |
CN112733693A (en) * | 2021-01-04 | 2021-04-30 | 武汉大学 | Multi-scale residual error road extraction method for global perception high-resolution remote sensing image |
CN113505640A (en) * | 2021-05-31 | 2021-10-15 | 东南大学 | Small-scale pedestrian detection method based on multi-scale feature fusion |
CN113567984A (en) * | 2021-07-30 | 2021-10-29 | 长沙理工大学 | A method and system for detecting small artificial targets in SAR images |
-
2021
- 2021-12-11 CN CN202111510823.5A patent/CN114332919B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110796239A (en) * | 2019-10-30 | 2020-02-14 | 福州大学 | A deep learning target detection method based on channel and space fusion perception |
CN111369543A (en) * | 2020-03-07 | 2020-07-03 | 北京工业大学 | A fast pollen particle detection algorithm based on dual self-attention module |
CN112733693A (en) * | 2021-01-04 | 2021-04-30 | 武汉大学 | Multi-scale residual error road extraction method for global perception high-resolution remote sensing image |
CN113505640A (en) * | 2021-05-31 | 2021-10-15 | 东南大学 | Small-scale pedestrian detection method based on multi-scale feature fusion |
CN113567984A (en) * | 2021-07-30 | 2021-10-29 | 长沙理工大学 | A method and system for detecting small artificial targets in SAR images |
Non-Patent Citations (1)
Title |
---|
聂玮;曹悦;朱冬雪;朱艺璇;黄林毅;: "复杂监控背景下基于边缘感知学习网络的行为识别算法", 计算机应用与软件, no. 08, 12 August 2020 (2020-08-12) * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114663861A (en) * | 2022-05-17 | 2022-06-24 | 山东交通学院 | Vehicle re-identification method based on dimension decoupling and non-local relation |
CN115082855A (en) * | 2022-06-20 | 2022-09-20 | 安徽工程大学 | Pedestrian occlusion detection method based on improved YOLOX algorithm |
CN115082855B (en) * | 2022-06-20 | 2024-07-12 | 安徽工程大学 | Pedestrian shielding detection method based on improved YOLOX algorithm |
CN115311690A (en) * | 2022-10-08 | 2022-11-08 | 广州英码信息科技有限公司 | End-to-end pedestrian structural information and dependency relationship detection method thereof |
CN115311690B (en) * | 2022-10-08 | 2022-12-23 | 广州英码信息科技有限公司 | End-to-end pedestrian structural information and dependency relationship detection method thereof |
Also Published As
Publication number | Publication date |
---|---|
CN114332919B (en) | 2024-10-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112308092B (en) | Light-weight license plate detection and identification method based on multi-scale attention mechanism | |
CN114332919A (en) | A pedestrian detection method, device and terminal device based on multi-spatial relationship perception | |
CN113095152B (en) | Regression-based lane line detection method and system | |
CN116665176B (en) | A multi-task network road target detection method for autonomous vehicle driving | |
CN107358576A (en) | Depth map super resolution ratio reconstruction method based on convolutional neural networks | |
CN110298387A (en) | Incorporate the deep neural network object detection method of Pixel-level attention mechanism | |
CN110348383B (en) | Road center line and double line extraction method based on convolutional neural network regression | |
CN107808376B (en) | A Deep Learning-Based Hand Raised Detection Method | |
CN115205264A (en) | A high-resolution remote sensing ship detection method based on improved YOLOv4 | |
CN114529982B (en) | Lightweight human body posture estimation method and system based on streaming attention | |
CN111599007B (en) | Smart city CIM road mapping method based on unmanned aerial vehicle aerial photography | |
CN112862690A (en) | Transformers-based low-resolution image super-resolution method and system | |
CN105488777A (en) | System and method for generating panoramic picture in real time based on moving foreground | |
CN106372630A (en) | Face direction detection method based on deep learning | |
CN103077538B (en) | Adaptive tracking method of biomimetic-pattern recognized targets | |
CN116935486A (en) | Sign language identification method and system based on skeleton node and image mode fusion | |
CN116612427A (en) | Intensive pedestrian detection system based on improved lightweight YOLOv7 | |
CN116363526A (en) | MROCNet model construction and multi-source remote sensing image change detection method and system | |
CN107944437A (en) | A kind of Face detection method based on neutral net and integral image | |
CN115909488A (en) | An occluded person re-identification method based on pose guidance and dynamic feature extraction | |
CN109272450B (en) | Image super-resolution method based on convolutional neural network | |
CN114399728A (en) | A crowd counting method in foggy scene | |
CN117612029B (en) | A remote sensing image target detection method based on progressive feature smoothing and scale-adaptive dilated convolution | |
Li et al. | Monocular 3-D Object Detection Based on Depth-Guided Local Convolution for Smart Payment in D2D Systems | |
CN118430011A (en) | Robust 2D human pose estimation method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |