CN114332919A - A pedestrian detection method, device and terminal device based on multi-spatial relationship perception - Google Patents

A pedestrian detection method, device and terminal device based on multi-spatial relationship perception Download PDF

Info

Publication number
CN114332919A
CN114332919A CN202111510823.5A CN202111510823A CN114332919A CN 114332919 A CN114332919 A CN 114332919A CN 202111510823 A CN202111510823 A CN 202111510823A CN 114332919 A CN114332919 A CN 114332919A
Authority
CN
China
Prior art keywords
spatial relationship
feature
feature map
relationship
pedestrian detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111510823.5A
Other languages
Chinese (zh)
Other versions
CN114332919B (en
Inventor
姜峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Xingzheyi Intelligent Transportation Technology Co ltd
Original Assignee
Nanjing Xingzheyi Intelligent Transportation Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Xingzheyi Intelligent Transportation Technology Co ltd filed Critical Nanjing Xingzheyi Intelligent Transportation Technology Co ltd
Priority to CN202111510823.5A priority Critical patent/CN114332919B/en
Publication of CN114332919A publication Critical patent/CN114332919A/en
Application granted granted Critical
Publication of CN114332919B publication Critical patent/CN114332919B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Image Analysis (AREA)

Abstract

本发明公开了一种基于多空间关系感知的行人检测方法、装置及终端设备,该方法包括步骤1,采集行人图像数据集,调整到固定大小进行模型的训练;步骤2,采用YOLOX的检测框架,将图像输入到框架模型中,先对图像进行数据增强;步骤3,将数据增强后的图像输入到Focus模块中,对图像按照奇偶进行切片操作,获得4张图像,再沿着通道方向进行拼接;步骤4,将拼接后的图像输入到YOLOX检测框架的主干网络中,与主干网络连接的是三个分支;步骤5,每个分支包含2个部分,多空间关系感知模块和检测头,多空间关系感知模块通过在不同空间维度中结合特征之间的关系,将全局信息和局部信息有效地融合在一起,获得多空间关系感知特征图。该方法既关注于全局信息,又能提取局部信息,并将二者有效融合,从而获取到更加有辨识度的特征信息,提高行人检测性能。

Figure 202111510823

The invention discloses a pedestrian detection method, device and terminal device based on multi-spatial relationship perception. The method includes step 1: collecting a pedestrian image data set and adjusting it to a fixed size for model training; step 2, using a YOLOX detection framework , input the image into the frame model, first perform data enhancement on the image; step 3, input the data-enhanced image into the Focus module, slice the image according to parity, obtain 4 images, and then proceed along the channel direction. Splicing; Step 4, input the spliced image into the backbone network of the YOLOX detection framework, and three branches are connected to the backbone network; Step 5, each branch contains 2 parts, a multi-spatial relationship perception module and a detection head, The multi-spatial relationship-aware module effectively fuses global information and local information together by combining the relationship between features in different spatial dimensions to obtain multi-spatial relationship-aware feature maps. This method not only focuses on global information, but also extracts local information, and effectively fuses the two to obtain more recognizable feature information and improve pedestrian detection performance.

Figure 202111510823

Description

一种基于多空间关系感知的行人检测方法、装置及终端设备A pedestrian detection method, device and terminal device based on multi-spatial relationship perception

技术领域technical field

本发明涉及图像识别研究领域,尤其是行人检测方法,具体涉及一种基于多空间关系感知的行人检测方法、装置及终端设备。The invention relates to the field of image recognition research, in particular to a pedestrian detection method, and in particular to a pedestrian detection method, device and terminal device based on multi-spatial relationship perception.

背景技术Background technique

随着智慧城市建设的不断发展,许多人工智能新技术应用于智能交通、智能政务、智能工厂等,而每个应用都离不开群众,都是服务于人,因此,行人检测是很多应用技术的前提。然而,现实场景往往比较复杂,如人群密集时导致躯体交错重叠,或被物体遮挡,光照变化强烈,恶劣气候因素(雨雪天气等)导致的画面模糊等,这些真实情况加大了行人检测的难度。为此,急需一种行人检测技术,能够在图像中的行人区域挖掘出更加深层次的、有判别力的特征,足以在各种环境下表征出行人。With the continuous development of smart city construction, many new artificial intelligence technologies are applied to smart transportation, smart government affairs, smart factories, etc., and each application is inseparable from the masses and serves people. Therefore, pedestrian detection is one of many application technologies. the premise. However, real scenes are often more complex, such as dense crowds causing bodies to overlap or overlap, or being blocked by objects, strong changes in illumination, blurred images caused by harsh weather factors (rain and snow, etc.), etc. These real situations increase the difficulty of pedestrian detection. difficulty. To this end, a pedestrian detection technology is urgently needed, which can dig deeper and discriminative features in the pedestrian area in the image, which is sufficient to characterize pedestrians in various environments.

在实现本发明过程中,发明人发现现有技术中至少存在如下问题:目前流行的行人检测技术大多基于卷积神经网络(CNN),而多数CNN行人检测模型都是使用有限的感受野,很难结合全局信息学习到丰富的结构模式,比如利用CNN对行人进行检测和分割,从而获取最终位置信息;比如使用CNN结合特征融合进行行人检测;虽然有的方法考虑到不同的感受野,但是没有很好的结合全局和局部信息;此外,还有些方法通过堆叠网络深度来增强模型的学习能力,这种模型无论是训练还是部署都十分耗费资源。In the process of realizing the present invention, the inventor found that there are at least the following problems in the prior art: the current popular pedestrian detection technologies are mostly based on convolutional neural networks (CNN), and most CNN pedestrian detection models use a limited receptive field, which is very difficult to achieve. It is difficult to combine global information to learn rich structural patterns, such as using CNN to detect and segment pedestrians to obtain final location information; such as using CNN combined with feature fusion for pedestrian detection; although some methods consider different receptive fields, but no It is a good combination of global and local information; in addition, there are some methods to enhance the learning ability of the model by stacking the network depth, which is very resource-intensive for both training and deployment.

发明内容SUMMARY OF THE INVENTION

为了克服现有技术的不足,本发明提供了一种基于多空间关系感知的行人检测方法、装置及终端设备,该方法既关注于全局信息,又能提取局部信息,并将二者有效融合,从而获取到更加有辨识度的特征信息,提高行人检测性能。技术方案如下:In order to overcome the deficiencies of the prior art, the present invention provides a pedestrian detection method, device and terminal device based on multi-spatial relationship perception. Thereby, more recognizable feature information can be obtained, and the pedestrian detection performance can be improved. The technical solution is as follows:

本发明提供了一种基于多空间关系感知的行人检测方法,该方法包括如下步骤:The present invention provides a pedestrian detection method based on multi-spatial relationship perception, the method comprising the following steps:

步骤1,采集行人图像数据集,调整到固定大小进行模型的训练。Step 1: Collect a pedestrian image dataset and adjust it to a fixed size for model training.

步骤2,采用YOLOX的检测框架,将图像输入到框架模型中,先对图像进行数据增强。Step 2, using YOLOX's detection framework, input the image into the framework model, and first perform data enhancement on the image.

步骤3,将数据增强后的图像输入到Focus模块中,对图像按照奇偶进行切片操作,获得4张图像,再沿着通道方向进行拼接。Step 3: Input the data-enhanced image into the Focus module, slice the image according to parity, obtain 4 images, and then stitch them along the channel direction.

步骤4,将拼接后的图像输入到YOLOX检测框架的主干网络中,与主干网络连接的是三个分支,这三个分支分别对应不同的感受野,三种感受野能够覆盖不同尺寸的目标。Step 4: Input the spliced image into the backbone network of the YOLOX detection framework. Three branches are connected to the backbone network. These three branches correspond to different receptive fields, and the three receptive fields can cover targets of different sizes.

步骤5,每个分支包含2个部分,多空间关系感知模块和检测头,多空间关系感知模块通过在不同空间维度中结合特征之间的关系,将全局信息和局部信息有效地融合在一起,获得多空间关系感知特征图。Step 5, each branch contains 2 parts, a multi-spatial relationship perception module and a detection head. The multi-spatial relationship perception module effectively fuses global information and local information together by combining the relationship between features in different spatial dimensions, Obtain multi-spatial relation-aware feature maps.

多空间关系感知模块的工作流程具体如下:The workflow of the multi-spatial relationship perception module is as follows:

输入到多空间关系感知模块的特征图X维度为H×W×C,H为高,W为宽,C为通道数;The X dimension of the feature map input to the multi-spatial relationship perception module is H×W×C, where H is height, W is width, and C is the number of channels;

(1)构建H×W空间的关系特征图;(1) Construct the relational feature map of H×W space;

在H×W空间范围,将特征图X分解成H×W个长度为C的特征向量,特征向量xi映射到特征向量xj的关系信息用ri,j表示,计算方式如下:In the H×W space range, the feature map X is decomposed into H×W feature vectors of length C, and the relationship information of the feature vector x i mapped to the feature vector x j is represented by ri , j , and the calculation method is as follows:

Figure BDA0003405565670000021
Figure BDA0003405565670000021

其中,

Figure BDA0003405565670000022
和φH×W为2个嵌入函数,均由一个1×1的卷积层、一个BatchNormalization层和一个ReLU激活层组成。相应的,特征向量xj映射到特征向量Jxi的关系信息为rj,i=fH×W(xj,xi),则(ri,j,rj,i)描述了特征向量xi和xj之间的双向关系;对于单向关系,计算出所有特征向量之间的关系信息并进行堆叠即可得到一个亲和矩阵
Figure BDA0003405565670000023
矩阵通道数为H×W,因此双向关系可以得到两个不同的亲和矩阵M1和M2,对特征的局部信息进行深度挖掘。in,
Figure BDA0003405565670000022
and φ H×W are 2 embedding functions, each consisting of a 1×1 convolutional layer, a BatchNormalization layer and a ReLU activation layer. Correspondingly, the relationship information of the feature vector x j mapped to the feature vector Jx i is r j, i =f H×W (x j , x i ), then (r i, j , r j, i ) describes the feature vector Two-way relationship between x i and x j ; for one-way relationship, calculate the relationship information between all eigenvectors and stack them to get an affinity matrix
Figure BDA0003405565670000023
The number of matrix channels is H×W, so two different affinity matrices M1 and M2 can be obtained from the bidirectional relationship, and the local information of the feature can be deeply mined.

将原始的全局结构信息保留下来,具体地,对原始特征图X进行1×1卷积后,在通道方向做全局平均池化操作,获得一个全局结构特征图F,

Figure BDA0003405565670000024
将全局结构特征图F与两个亲和矩阵串联在一起,获得一个特征矩阵Y,公式如下:Retain the original global structure information. Specifically, after performing 1×1 convolution on the original feature map X, perform a global average pooling operation in the channel direction to obtain a global structure feature map F,
Figure BDA0003405565670000024
Concatenate the global structural feature map F with two affinity matrices to obtain a feature matrix Y, the formula is as follows:

Figure BDA0003405565670000026
Figure BDA0003405565670000026

pool表示全局平均池化,θH×W

Figure BDA0003405565670000027
均由一个一个1×1的卷积层、一个BatchNormalization层和一个ReLU激活层组成,相比于
Figure BDA0003405565670000025
和φH×W,它们的输出激活节点数量都是不一样的;将特征矩阵Y通过1×1的卷积,来融合特征矩阵中包含的所有全局和局部信息,从而得到属于片×W空间的关系特征图。pool denotes global average pooling, θ H×W and
Figure BDA0003405565670000027
Both consist of a 1×1 convolutional layer, a BatchNormalization layer and a ReLU activation layer, compared to
Figure BDA0003405565670000025
and φ H×W , the number of output activation nodes is different; the feature matrix Y is fused by 1×1 convolution to fuse all the global and local information contained in the feature matrix, so as to obtain the space belonging to the slice×W The relational feature map of .

(2)构建通道空间C的关系特征图;(2) Construct the relational feature map of the channel space C;

同理,在通道空间范围,将特征图X分解成C个长度为H×W的特征向量,特征向量xa映射到特征向量xb的关系信息ra,b为:Similarly, in the channel space range, the feature map X is decomposed into C feature vectors of length H×W, and the relationship information r a and b of the feature vector x a mapped to the feature vector x b are:

Figure BDA0003405565670000031
Figure BDA0003405565670000031

其中

Figure BDA0003405565670000032
阳φC函数与
Figure BDA0003405565670000033
和φH×W一致,只是输出维度不同;采用与步骤5(1)相同的计算方式获得亲和矩阵
Figure BDA0003405565670000034
即双向关系可以得到两个不同的亲和矩阵M′1和M′2。in
Figure BDA0003405565670000032
Yang φ C function with
Figure BDA0003405565670000033
Consistent with φ H×W , but the output dimension is different; use the same calculation method as step 5(1) to obtain the affinity matrix
Figure BDA0003405565670000034
That is, the bidirectional relationship can obtain two different affinity matrices M' 1 and M' 2 .

对原始特征图X进行1×1卷积后,在H×W维度做全局平均池化,获得结构特征图

Figure BDA0003405565670000035
将结构特征图与两个亲和矩阵串联在一起,获得的特征矩阵Y′,计算方式如下:After performing 1×1 convolution on the original feature map X, perform global average pooling in the H×W dimension to obtain the structural feature map
Figure BDA0003405565670000035
Concatenate the structural feature map with the two affinity matrices to obtain the feature matrix Y′, which is calculated as follows:

Y′=[pool(θC(X)),θC(M′1),θC(M′2)]。Y'=[pool(θ C (X)), θ C (M' 1 ), θ C (M' 2 )].

θC

Figure BDA0003405565670000036
函数功能与θH×W
Figure BDA0003405565670000037
一致,只是输出维度不同;将特征矩阵Y′通过1×1的卷积,来融合特征矩阵中包含的所有全局和局部信息,从而得到属于通道空间C的关系特征图。θ C and
Figure BDA0003405565670000036
function function with θ H×W and
Figure BDA0003405565670000037
Consistent, but the output dimensions are different; the feature matrix Y′ is convolutional by 1×1 to fuse all the global and local information contained in the feature matrix, so as to obtain the relational feature map belonging to the channel space C.

将H×W空间和通道空间C的关系特征图点乘,获得多空间关系感知特征图。The multi-space relationship-aware feature map is obtained by dot-multiplying the relationship feature map of the H×W space and the channel space C.

步骤6,将多空间关系感知特征图放入检测头中,YOLOX将分类和坐标定位进行解耦,先通过一个1×1的卷积对通道进行降维,后接两个轻量分支,分别进行分类和回归。Step 6: Put the multi-spatial relationship-aware feature map into the detection head. YOLOX decouples the classification and coordinate positioning. First, the channel is dimensionally reduced by a 1×1 convolution, followed by two lightweight branches, respectively. Perform classification and regression.

优选的,步骤2数据增强包括图像的随机水平翻转,颜色抖动,多尺度增强以及马赛克坐标增强方法。Preferably, the data enhancement in step 2 includes random horizontal flipping of the image, color dithering, multi-scale enhancement and mosaic coordinate enhancement methods.

优选的,步骤4中三个分支分别对应的感受野为下采样8倍、16倍、32倍。Preferably, the receptive fields corresponding to the three branches in step 4 are down-sampling 8 times, 16 times, and 32 times respectively.

优选的,在训练阶段,分类损失函数采用交叉熵,回归损失函数采用GIOU损失,并用L1范数对获取的位置信息施加惩罚。Preferably, in the training phase, the classification loss function adopts cross entropy, the regression loss function adopts GIOU loss, and the L1 norm is used to impose a penalty on the acquired position information.

与现有技术相比,上述技术方案中的一个技术方案具有如下有益效果:通过多空间关系感知模块,深度挖掘不同空间维度中,特征与特征之间的关系,既关注于全局信息,又能提取局部信息,并将二者有效融合,将不同空间的特征信息与特征间的关系信息建立联系,使得模型学习到的特征更加具有辨识度和判别性,从而获取到更加有辨识度的特征信息,提高行人检测准确率。Compared with the prior art, one of the above technical solutions has the following beneficial effects: through the multi-spatial relationship perception module, the relationship between features and features in different spatial dimensions can be deeply excavated, which not only focuses on the global information, but also Extract local information, and effectively fuse the two to establish a connection between the feature information in different spaces and the relationship information between the features, so that the features learned by the model are more recognizable and discriminative, so as to obtain more recognizable feature information. , to improve the accuracy of pedestrian detection.

附图说明Description of drawings

图1为本公开实施例提供的一种多空间关系感知模块流程图。FIG. 1 is a flowchart of a multi-spatial relationship perception module provided by an embodiment of the present disclosure.

具体实施方式Detailed ways

为了阐明本发明的技术方案和工作原理,下面将结合附图对本公开实施方式做进一步的详细描述。上述所有可选技术方案,可以采用任意结合形成本公开的可选实施例,在此不再一一赘述。In order to clarify the technical solution and working principle of the present invention, the embodiments of the present disclosure will be further described in detail below with reference to the accompanying drawings. All the above-mentioned optional technical solutions can be combined arbitrarily to form optional embodiments of the present disclosure, which will not be repeated here.

本申请的说明书和权利要求书及上述附图中的术语“步骤1”、“步骤2”、“步骤3”等类似描述是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例能够以除了在这里描述的那些以外的顺序实施。The terms "step 1", "step 2", "step 3" and similar descriptions in the description and claims of the present application and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. order. It is to be understood that data so used may be interchanged under appropriate circumstances so that the embodiments of the application described herein can be practiced in sequences other than those described herein.

第一方面:本公开实施例提供了一种基于多空间关系感知的行人检测方法,该方法包括如下步骤:A first aspect: an embodiment of the present disclosure provides a pedestrian detection method based on multi-spatial relationship perception, and the method includes the following steps:

步骤1,采集行人图像数据集,调整到固定大小进行模型的训练。Step 1: Collect a pedestrian image dataset and adjust it to a fixed size for model training.

步骤2,采用YOLOX的检测框架,此框架结构简洁,无需人工设置锚框,便于训练和部署。将图像输入到框架模型中,先对图像进行数据增强,优选的,步骤2数据增强包括图像的随机水平翻转,颜色抖动,多尺度增强以及马赛克坐标增强方法等,以扩大训练集规模,提高模型的泛化能力。In step 2, the detection framework of YOLOX is adopted, which has a simple structure and does not need to manually set anchor frames, which is convenient for training and deployment. Input the image into the frame model, and first perform data enhancement on the image. Preferably, the data enhancement in step 2 includes random horizontal flipping of the image, color jittering, multi-scale enhancement and mosaic coordinate enhancement methods, etc., in order to expand the scale of the training set and improve the model. generalization ability.

步骤3,将数据增强后的图像输入到Focus模块中,对图像按照奇偶进行切片操作,获得4张图像,再沿着通道方向进行拼接。Focus模块在不增加计算量的同时进行了下采样,而且保留了更加完整了图像信息。Step 3: Input the data-enhanced image into the Focus module, slice the image according to parity, obtain 4 images, and then stitch them along the channel direction. The Focus module performs downsampling without increasing the amount of computation, and retains more complete image information.

步骤4,将拼接后的图像输入到YOLOX检测框架的主干网络中,与主干网络连接的是三个分支,这三个分支分别对应不同的感受野,三种感受野能够覆盖不同尺寸的目标。优选的,步骤4中三个分支分别对应的感受野为下采样8倍、16倍、32倍。Step 4: Input the spliced image into the backbone network of the YOLOX detection framework. Three branches are connected to the backbone network. These three branches correspond to different receptive fields, and the three receptive fields can cover targets of different sizes. Preferably, the receptive fields corresponding to the three branches in step 4 are down-sampling 8 times, 16 times, and 32 times respectively.

步骤5,每个分支包含2个部分,多空间关系感知模块和检测头,多空间关系感知模块通过在不同空间维度中结合特征之间的关系,将全局信息和局部信息有效地融合在一起,获得多空间关系感知特征图。Step 5, each branch contains 2 parts, a multi-spatial relationship perception module and a detection head. The multi-spatial relationship perception module effectively fuses global information and local information together by combining the relationship between features in different spatial dimensions, Obtain multi-spatial relation-aware feature maps.

附图1为一种多空间关系感知模块工作流程图,结合该图,多空间关系感知模块的工作流程具体如下:Accompanying drawing 1 is a kind of working flow chart of the multi-spatial relationship perception module, in conjunction with this figure, the work flow of the multi-spatial relationship perception module is as follows:

输入到多空间关系感知模块的特征图X维度为H×W×C,H为高,W为宽,C为通道数;The X dimension of the feature map input to the multi-spatial relationship perception module is H×W×C, where H is height, W is width, and C is the number of channels;

(1)构建H×W空间的关系特征图;(1) Construct the relational feature map of H×W space;

在H×W空间范围,将特征图X分解成H×W个长度为C的特征向量,特征向量xi映射到特征向量xj的关系信息用ri,j表示,计算方式如下:In the H×W space range, the feature map X is decomposed into H×W feature vectors of length C, and the relationship information of the feature vector x i mapped to the feature vector x j is represented by ri , j , and the calculation method is as follows:

Figure BDA0003405565670000051
Figure BDA0003405565670000051

其中,

Figure BDA0003405565670000052
和φH×W为2个嵌入函数,均由一个1×1的卷积层、一个BatchNormalization层和一个ReLU激活层组成。相应的,特征向量xj映射到特征向量xi的关系信息为rj,i=fH×W(xj,xi),则(ri,j,rj,i)描述了特征向量xi和xj之间的双向关系;对于单向关系,计算出所有特征向量之间的关系信息并进行堆叠即可得到一个亲和矩阵
Figure BDA0003405565670000053
矩阵通道数为H×W,因此双向关系可以得到两个不同的亲和矩阵M1和M2,对特征的局部信息进行深度挖掘。in,
Figure BDA0003405565670000052
and φ H×W are 2 embedding functions, each consisting of a 1×1 convolutional layer, a BatchNormalization layer and a ReLU activation layer. Correspondingly, the relationship information of the feature vector x j mapped to the feature vector x i is r j, i = f H×W (x j , x i ), then (r i, j , r j, i ) describes the feature vector Two-way relationship between x i and x j ; for one-way relationship, calculate the relationship information between all eigenvectors and stack them to get an affinity matrix
Figure BDA0003405565670000053
The number of matrix channels is H×W, so two different affinity matrices M 1 and M 2 can be obtained from the bidirectional relationship, and the local information of the feature can be deeply mined.

为了能够同时开发特征的全局信息,则需要将原始的全局结构信息保留下来,具体地,对原始特征图X进行1×1卷积后,在通道方向做全局平均池化操作,获得一个全局结构特征图F,

Figure BDA0003405565670000054
将全局结构特征图F与两个亲和矩阵串联在一起,获得一个特征矩阵Y,公式如下:In order to develop the global information of features at the same time, it is necessary to retain the original global structure information. Specifically, after performing 1×1 convolution on the original feature map X, perform a global average pooling operation in the channel direction to obtain a global structure feature map F,
Figure BDA0003405565670000054
Concatenate the global structural feature map F with two affinity matrices to obtain a feature matrix Y, the formula is as follows:

Y=[pool(θH×W(X)),θH×W(M1),θH×W(M2)]。Y=[pool(θ H×W (X)), θ H×W (M 1 ), θ H×W (M 2 )].

pool表示全局平均池化,θH×W和θH×W均由一个一个1×1的卷积层、一个BatchNormalization层和一个ReLU激活层组成,相比于

Figure BDA0003405565670000055
和φH×W,它们的输出激活节点数量都是不一样的;将特征矩阵Y通过1×1的卷积,来融合特征矩阵中包含的所有全局和局部信息,从而得到属于H×W空间的关系特征图。pool represents global average pooling. Both θ H×W and θ H×W consist of a 1×1 convolutional layer, a BatchNormalization layer and a ReLU activation layer, compared to
Figure BDA0003405565670000055
and φ H×W , the number of output activation nodes is different; the feature matrix Y is fused by 1×1 convolution to fuse all the global and local information contained in the feature matrix, so as to obtain the space belonging to H×W The relational feature map of .

(2)构建通道空间C的关系特征图;(2) Construct the relational feature map of the channel space C;

同理,在通道空间范围,将特征图X分解成C个长度为H×W的特征向量,特征向量xa映射到特征向量xb的关系信息ra,b为:Similarly, in the channel space range, the feature map X is decomposed into C feature vectors of length H×W, and the relationship information r a and b of the feature vector x a mapped to the feature vector x b are:

Figure BDA0003405565670000056
Figure BDA0003405565670000056

其中

Figure BDA0003405565670000061
和φC函数与
Figure BDA0003405565670000062
和φH×W一致,只是输出维度不同;采用与步骤5(1)相同的计算方式获得亲和矩阵
Figure BDA0003405565670000063
即双向关系可以得到两个不同的亲和矩阵M′1和M′2;in
Figure BDA0003405565670000061
and φ C function and
Figure BDA0003405565670000062
Consistent with φ H×W , but the output dimension is different; use the same calculation method as step 5(1) to obtain the affinity matrix
Figure BDA0003405565670000063
That is, the bidirectional relationship can obtain two different affinity matrices M′ 1 and M′ 2 ;

不同于步骤5中获得的结构特征图F,在本步骤中对原始特征图X进行1×1卷积后,在H×W维度做全局平均池化,获得结构特征图

Figure BDA0003405565670000064
将结构特征图与两个亲和矩阵串联在一起,获得的特征矩阵Y′,计算方式如下:Different from the structural feature map F obtained in step 5, in this step, after 1×1 convolution is performed on the original feature map X, global average pooling is performed in the H×W dimension to obtain the structural feature map
Figure BDA0003405565670000064
Concatenate the structural feature map with the two affinity matrices to obtain the feature matrix Y′, which is calculated as follows:

Figure BDA0003405565670000065
Figure BDA0003405565670000065

θC

Figure BDA0003405565670000066
函数功能与θH×W
Figure BDA0003405565670000067
一致,只是输出维度不同;将特征矩阵Y′通过1×1的卷积,来融合特征矩阵中包含的所有全局和局部信息,从而得到属于通道空间C的关系特征图。θ C and
Figure BDA0003405565670000066
function function with θ H×W and
Figure BDA0003405565670000067
Consistent, but the output dimensions are different; the feature matrix Y′ is convolutional by 1×1 to fuse all the global and local information contained in the feature matrix, so as to obtain the relational feature map belonging to the channel space C.

将H×W空间和通道空间C的关系特征图点乘,获得多空间关系感知特征图,此关系感知特征图包含了不同空间维度中特征的全局和局部信息,并将其充分融合,提高了特征的有效性和判别能力。Dot multiplication of the relational feature maps of H×W space and channel space C to obtain a multi-space relation-aware feature map, which contains the global and local information of features in different spatial dimensions, and fully fuses them to improve the performance. Validity and discriminative power of features.

步骤6,将多空间关系感知特征图放入检测头中,不同于传统的YOLO系列检测头将分类和坐标定位耦合在一起训练,YOLOX将分类和坐标定位进行解耦,先通过一个1×1的卷积对通道进行降维,后接两个轻量分支,分别进行分类和回归,能够有效提高模型收敛速度。Step 6: Put the multi-spatial relationship-aware feature map into the detection head. Unlike the traditional YOLO series detection head, which couples classification and coordinate positioning for training, YOLOX decouples classification and coordinate positioning, first through a 1 × 1 The convolution of the channel reduces the dimension of the channel, followed by two lightweight branches for classification and regression respectively, which can effectively improve the convergence speed of the model.

优选的,在训练阶段,分类损失函数采用交叉熵,回归损失函数采用GIOU损失,并用L1范数对获取的位置信息施加惩罚。Preferably, in the training phase, the classification loss function adopts cross entropy, the regression loss function adopts GIOU loss, and the L1 norm is used to impose a penalty on the acquired position information.

第二方面,本公开实施例提供了一种基于多空间关系感知的行人检测装置,基于相同的技术构思,该装置可以实现或执行所有可能的实现方式中任一项所述的一种基于多空间关系感知的行人检测方法。In a second aspect, embodiments of the present disclosure provide a pedestrian detection device based on multi-spatial relationship perception. Based on the same technical concept, the device can implement or execute the multi-space relationship-based detection device in any of the possible implementation manners. A spatial relationship aware pedestrian detection method.

优选的,该装置包括数据获取单元、第一数据处理单元、第二数据处理单元、结果获取单元;Preferably, the device includes a data acquisition unit, a first data processing unit, a second data processing unit, and a result acquisition unit;

所述数据获取单元,用于执行所有可能的实现方式中任一项所述的一种基于多空间关系感知的行人检测方法的步骤1的步骤;The data acquisition unit is used to perform the steps of step 1 of a pedestrian detection method based on multi-spatial relationship perception according to any one of all possible implementations;

所述第一数据处理单元,用于执行所有可能的实现方式中任一项所述的一种基于多空间关系感知的行人检测方法的步骤2和步骤3的步骤;The first data processing unit is used to execute the steps of step 2 and step 3 of a pedestrian detection method based on multi-spatial relationship perception according to any one of all possible implementations;

所述第二数据处理单元,用于执行所有可能的实现方式中任一项所述的一种基于多空间关系感知的行人检测方法的步骤4和步骤5的步骤;The second data processing unit is used to perform the steps of step 4 and step 5 of a pedestrian detection method based on multi-spatial relationship perception described in any one of all possible implementations;

所述结果获取单元,用于执行所有可能的实现方式中任一项所述的一种基于多空间关系感知的行人检测方法的步骤6的步骤。The result obtaining unit is configured to execute the steps of step 6 of the pedestrian detection method based on multi-spatial relationship perception described in any one of the possible implementation manners.

需要说明的是,上述实施例提供的一种基于多空间关系感知的行人检测装置在执行一种基于多空间关系感知的行人检测方法时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外上述实施例提供的一种基于多空间关系感知的行人检测装置与一种基于多空间关系感知的行人检测方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。It should be noted that, when the pedestrian detection device based on multi-spatial relationship perception provided in the above embodiment executes a pedestrian detection method based on multi-spatial relationship perception, only the division of the above functional modules is used as an example to illustrate, and the practical application In the device, the above-mentioned function distribution can be completed by different function modules according to the needs, that is, the internal structure of the device is divided into different function modules, so as to complete all or part of the functions described above. In addition, a pedestrian detection device based on multi-spatial relationship perception provided by the above embodiments and a pedestrian detection method based on multi-spatial relationship perception belong to the same concept.

第三方面,本公开实施例提供了一种终端设备,该终端设备包括所有可能的实现方式中任一项所述一种基于多空间关系感知的行人检测装置。In a third aspect, an embodiment of the present disclosure provides a terminal device, where the terminal device includes the apparatus for detecting pedestrians based on multi-spatial relationship perception according to any one of all possible implementation manners.

以上结合附图对本发明进行了示例性描述,显然,本发明具体实现并不受上述方式的限制,凡是采用了本发明的方法构思和技术方案进行的各种非实质性的改进;或者未经改进、等同替换,将本发明的上述构思和技术方案直接应用于其他场合的,均在本发明的保护范围之内。The present invention has been exemplarily described above with reference to the accompanying drawings. Obviously, the specific implementation of the present invention is not limited by the above-mentioned methods, and all kinds of insubstantial improvements made by the method concept and technical solution of the present invention are adopted; Improvements, equivalent replacements, and direct application of the above concepts and technical solutions of the present invention to other occasions are all within the protection scope of the present invention.

Claims (7)

1.一种基于多空间关系感知的行人检测方法,其特征在于,该方法包括如下步骤:1. a pedestrian detection method based on multi-spatial relationship perception, is characterized in that, this method comprises the steps: 步骤1,采集行人图像数据集,调整到固定大小进行模型的训练;Step 1, collect the pedestrian image data set, adjust it to a fixed size for model training; 步骤2,采用YOLOX的检测框架,将图像输入到框架模型中,先对图像进行数据增强;Step 2, using the detection framework of YOLOX, input the image into the frame model, and first perform data enhancement on the image; 步骤3,将数据增强后的图像输入到Focus模块中,对图像按照奇偶进行切片操作,获得4张图像,再沿着通道方向进行拼接;Step 3: Input the image after data enhancement into the Focus module, perform slicing operation on the image according to parity, obtain 4 images, and then splicing along the channel direction; 步骤4,将拼接后的图像输入到YOLOX检测框架的主干网络中,与主干网络连接的是三个分支,这三个分支分别对应不同的感受野,三种感受野能够覆盖不同尺寸的目标;Step 4: Input the spliced image into the backbone network of the YOLOX detection framework. Three branches are connected to the backbone network. These three branches correspond to different receptive fields, and the three receptive fields can cover targets of different sizes; 步骤5,每个分支包含2个部分,多空间关系感知模块和检测头,多空间关系感知模块通过在不同空间维度中结合特征之间的关系,将全局信息和局部信息有效地融合在一起,获得多空间关系感知特征图;Step 5, each branch contains 2 parts, a multi-spatial relationship perception module and a detection head. The multi-spatial relationship perception module effectively fuses global information and local information together by combining the relationship between features in different spatial dimensions, Obtain multi-spatial relation-aware feature maps; 多空间关系感知模块的工作流程具体如下:The workflow of the multi-spatial relationship perception module is as follows: 输入到多空间关系感知模块的特征图X维度为H×W×C,H为高,W为宽,C为通道数;The X dimension of the feature map input to the multi-spatial relationship perception module is H×W×C, where H is height, W is width, and C is the number of channels; (1)构建H×W空间的关系特征图;(1) Construct the relational feature map of H×W space; 在H×W空间范围,将特征图X分解成H×W个长度为C的特征向量,特征向量xi映射到特征向量xj的关系信息用ri,j表示,计算方式如下:In the H×W space range, the feature map X is decomposed into H×W feature vectors of length C, and the relationship information of the feature vector x i mapped to the feature vector x j is represented by ri , j , and the calculation method is as follows:
Figure FDA0003405565660000011
Figure FDA0003405565660000011
其中,
Figure FDA0003405565660000012
和φH×W为2个嵌入函数,均由一个1×1的卷积层、一个BatchNormalization层和一个ReLU激活层组成;相应的,特征向量xj映射到特征向量xi的关系信息为rj,i=fH×W(xj,xi),则(ri,j,rj,i)描述了特征向量xi和xj之间的双向关系;对于单向关系,计算出所有特征向量之间的关系信息并进行堆叠即可得到一个亲和矩阵
Figure FDA0003405565660000013
矩阵通道数为H×W,因此双向关系可以得到两个不同的亲和矩阵M1和M2,对特征的局部信息进行深度挖掘;
in,
Figure FDA0003405565660000012
and φ H×W are two embedding functions, which are composed of a 1×1 convolution layer, a BatchNormalization layer and a ReLU activation layer; correspondingly, the relationship information of the feature vector x j mapped to the feature vector x i is r j, i = f H×W (x j , x i ), then (r i, j , r j, i ) describes the bidirectional relationship between the feature vectors xi and x j ; for the unidirectional relationship, calculate The relationship information between all eigenvectors and stacking can get an affinity matrix
Figure FDA0003405565660000013
The number of matrix channels is H×W, so two different affinity matrices M 1 and M 2 can be obtained from the bidirectional relationship, and the local information of the feature can be deeply mined;
将原始的全局结构信息保留下来,具体地,对原始特征图X进行1×1卷积后,在通道方向做全局平均池化操作,获得一个全局结构特征图F,
Figure FDA0003405565660000014
将全局结构特征图F与两个亲和矩阵串联在一起,获得一个特征矩阵Y,公式如下:
Retain the original global structure information. Specifically, after performing 1×1 convolution on the original feature map X, perform a global average pooling operation in the channel direction to obtain a global structure feature map F,
Figure FDA0003405565660000014
Concatenate the global structural feature map F with two affinity matrices to obtain a feature matrix Y, the formula is as follows:
Figure FDA0003405565660000021
Figure FDA0003405565660000021
pool表示全局平均池化,θH×W
Figure FDA00034055656600000212
均由一个一个1×1的卷积层、一个BatchNormalization层和一个ReLU激活层组成,相比于
Figure FDA0003405565660000023
和φH×W,它们的输出激活节点数量都是不一样的;将特征矩阵Y通过1×1的卷积,来融合特征矩阵中包含的所有全局和局部信息,从而得到属于H×W空间的关系特征图;
pool denotes global average pooling, θ H×W and
Figure FDA00034055656600000212
Both consist of a 1×1 convolutional layer, a BatchNormalization layer and a ReLU activation layer, compared to
Figure FDA0003405565660000023
and φ H×W , the number of output activation nodes is different; the feature matrix Y is convolved by 1×1 to fuse all the global and local information contained in the feature matrix, so as to obtain the space belonging to H×W The relational feature map of ;
(2)构建通道空间C的关系特征图;(2) Construct the relational feature map of the channel space C; 同理,在通道空间范围,将特征图X分解成C个长度为H×W的特征向量,特征向量xa映射到特征向量xb的关系信息ra,b为:Similarly, in the channel space range, the feature map X is decomposed into C feature vectors of length H×W, and the relationship information r a and b of the feature vector x a mapped to the feature vector x b are:
Figure FDA0003405565660000024
Figure FDA0003405565660000024
其中
Figure FDA0003405565660000025
和φC函数与
Figure FDA0003405565660000026
和φH×W一致,只是输出维度不同;采用与步骤5(1)相同的计算方式获得亲和矩阵
Figure FDA0003405565660000027
即双向关系可以得到两个不同的亲和矩阵M′1和M′2
in
Figure FDA0003405565660000025
and φ C function and
Figure FDA0003405565660000026
Consistent with φ H×W , but the output dimension is different; use the same calculation method as step 5(1) to obtain the affinity matrix
Figure FDA0003405565660000027
That is, the bidirectional relationship can obtain two different affinity matrices M′ 1 and M′ 2 ;
对原始特征图X进行1×1卷积后,在H×W维度做全局平均池化,获得结构特征图
Figure FDA0003405565660000028
将结构特征图与两个亲和矩阵串联在一起,获得的特征矩阵Y′,计算方式如下:
After performing 1×1 convolution on the original feature map X, perform global average pooling in the H×W dimension to obtain the structural feature map
Figure FDA0003405565660000028
Concatenate the structural feature map with the two affinity matrices to obtain the feature matrix Y′, which is calculated as follows:
Figure FDA0003405565660000029
Figure FDA0003405565660000029
θC
Figure FDA00034055656600000210
函数功能与θH×W
Figure FDA00034055656600000211
一致,只是输出维度不同;将特征矩阵Y′通过1×1的卷积,来融合特征矩阵中包含的所有全局和局部信息,从而得到属于通道空间C的关系特征图;
θ C and
Figure FDA00034055656600000210
function function with θ H×W and
Figure FDA00034055656600000211
Consistent, but the output dimension is different; the feature matrix Y′ is convolutional by 1×1 to fuse all the global and local information contained in the feature matrix, so as to obtain the relational feature map belonging to the channel space C;
将H×W空间和通道空间C的关系特征图点乘,获得多空间关系感知特征图;Multiply the relationship feature map of H×W space and channel space C to obtain the multi-space relationship-aware feature map; 步骤6,将多空间关系感知特征图放入检测头中,YOLOX将分类和坐标定位进行解耦,先通过一个1×1的卷积对通道进行降维,后接两个轻量分支,分别进行分类和回归。Step 6: Put the multi-spatial relationship-aware feature map into the detection head. YOLOX decouples the classification and coordinate positioning. First, the channel is dimensionally reduced by a 1×1 convolution, followed by two lightweight branches, respectively. Perform classification and regression.
2.根据权利要求1所述的一种基于多空间关系感知的行人检测方法,其特征在于,步骤2数据增强包括图像的随机水平翻转,颜色抖动,多尺度增强以及马赛克坐标增强方法。2 . The pedestrian detection method based on multi-spatial relationship perception according to claim 1 , wherein the data enhancement in step 2 includes random horizontal flipping of images, color dithering, multi-scale enhancement and mosaic coordinate enhancement methods. 3 . 3.根据权利要求1所述的一种基于多空间关系感知的行人检测方法,其特征在于,步骤4中三个分支分别对应的感受野为下采样8倍、16倍、32倍。3 . The pedestrian detection method based on multi-spatial relationship perception according to claim 1 , wherein the receptive fields corresponding to the three branches in step 4 are down-sampling 8 times, 16 times, and 32 times respectively. 4 . 4.根据权利要求1-3任一项所述的一种基于多空间关系感知的行人检测方法,其特征在于,在训练阶段,分类损失函数采用交叉熵,回归损失函数采用GIOU损失,并用L1范数对获取的位置信息施加惩罚。4. A pedestrian detection method based on multi-spatial relationship perception according to any one of claims 1-3, characterized in that, in the training phase, the classification loss function adopts cross entropy, the regression loss function adopts GIOU loss, and uses L1 The norm imposes a penalty on the acquired location information. 5.一种基于多空间关系感知的行人检测装置,其特征在于,该装置可以实现权利要求1-4任一项所述的一种基于多空间关系感知的行人检测方法。5 . A pedestrian detection device based on multi-spatial relationship perception, characterized in that, the device can implement a pedestrian detection method based on multi-spatial relationship perception according to any one of claims 1 to 4 . 6.根据权利要求5所述的一种基于多空间关系感知的行人检测装置,其特征在于,该装置包括数据获取单元、第一数据处理单元、第二数据处理单元、结果获取单元;6. A pedestrian detection device based on multi-spatial relationship perception according to claim 5, wherein the device comprises a data acquisition unit, a first data processing unit, a second data processing unit, and a result acquisition unit; 所述数据获取单元,用于执行权利要求1-4任一项所述的一种基于多空间关系感知的行人检测方法的步骤1的步骤;The data acquisition unit is used to perform the steps of step 1 of the pedestrian detection method based on multi-spatial relationship perception according to any one of claims 1-4; 所述第一数据处理单元,用于执行权利要求1-4任一项所述的一种基于多空间关系感知的行人检测方法的步骤2和步骤3的步骤;The first data processing unit is used to perform the steps of step 2 and step 3 of a pedestrian detection method based on multi-spatial relationship perception according to any one of claims 1-4; 所述第二数据处理单元,用于执行权利要求1-4任一项所述的一种基于多空间关系感知的行人检测方法的步骤4和步骤5的步骤;The second data processing unit is used to perform the steps of step 4 and step 5 of the pedestrian detection method based on multi-spatial relationship perception according to any one of claims 1-4; 所述结果获取单元,用于执行权利要求1-4任一项所述的一种基于多空间关系感知的行人检测方法的步骤6的步骤。The result obtaining unit is configured to execute the steps of step 6 of the pedestrian detection method based on multi-spatial relationship perception according to any one of claims 1-4. 7.一种终端设备,其特征在于,该终端设备包括权利要求5或6任一项所述一种基于多空间关系感知的行人检测装置。7. A terminal device, characterized in that the terminal device comprises the pedestrian detection device based on multi-spatial relationship perception according to any one of claims 5 or 6.
CN202111510823.5A 2021-12-11 2021-12-11 Pedestrian detection method and device based on multi-spatial relationship sensing and terminal equipment Active CN114332919B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111510823.5A CN114332919B (en) 2021-12-11 2021-12-11 Pedestrian detection method and device based on multi-spatial relationship sensing and terminal equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111510823.5A CN114332919B (en) 2021-12-11 2021-12-11 Pedestrian detection method and device based on multi-spatial relationship sensing and terminal equipment

Publications (2)

Publication Number Publication Date
CN114332919A true CN114332919A (en) 2022-04-12
CN114332919B CN114332919B (en) 2024-10-29

Family

ID=81050935

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111510823.5A Active CN114332919B (en) 2021-12-11 2021-12-11 Pedestrian detection method and device based on multi-spatial relationship sensing and terminal equipment

Country Status (1)

Country Link
CN (1) CN114332919B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114663861A (en) * 2022-05-17 2022-06-24 山东交通学院 Vehicle re-identification method based on dimension decoupling and non-local relation
CN115082855A (en) * 2022-06-20 2022-09-20 安徽工程大学 Pedestrian occlusion detection method based on improved YOLOX algorithm
CN115311690A (en) * 2022-10-08 2022-11-08 广州英码信息科技有限公司 End-to-end pedestrian structural information and dependency relationship detection method thereof

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110796239A (en) * 2019-10-30 2020-02-14 福州大学 A deep learning target detection method based on channel and space fusion perception
CN111369543A (en) * 2020-03-07 2020-07-03 北京工业大学 A fast pollen particle detection algorithm based on dual self-attention module
CN112733693A (en) * 2021-01-04 2021-04-30 武汉大学 Multi-scale residual error road extraction method for global perception high-resolution remote sensing image
CN113505640A (en) * 2021-05-31 2021-10-15 东南大学 Small-scale pedestrian detection method based on multi-scale feature fusion
CN113567984A (en) * 2021-07-30 2021-10-29 长沙理工大学 A method and system for detecting small artificial targets in SAR images

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110796239A (en) * 2019-10-30 2020-02-14 福州大学 A deep learning target detection method based on channel and space fusion perception
CN111369543A (en) * 2020-03-07 2020-07-03 北京工业大学 A fast pollen particle detection algorithm based on dual self-attention module
CN112733693A (en) * 2021-01-04 2021-04-30 武汉大学 Multi-scale residual error road extraction method for global perception high-resolution remote sensing image
CN113505640A (en) * 2021-05-31 2021-10-15 东南大学 Small-scale pedestrian detection method based on multi-scale feature fusion
CN113567984A (en) * 2021-07-30 2021-10-29 长沙理工大学 A method and system for detecting small artificial targets in SAR images

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
聂玮;曹悦;朱冬雪;朱艺璇;黄林毅;: "复杂监控背景下基于边缘感知学习网络的行为识别算法", 计算机应用与软件, no. 08, 12 August 2020 (2020-08-12) *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114663861A (en) * 2022-05-17 2022-06-24 山东交通学院 Vehicle re-identification method based on dimension decoupling and non-local relation
CN115082855A (en) * 2022-06-20 2022-09-20 安徽工程大学 Pedestrian occlusion detection method based on improved YOLOX algorithm
CN115082855B (en) * 2022-06-20 2024-07-12 安徽工程大学 Pedestrian shielding detection method based on improved YOLOX algorithm
CN115311690A (en) * 2022-10-08 2022-11-08 广州英码信息科技有限公司 End-to-end pedestrian structural information and dependency relationship detection method thereof
CN115311690B (en) * 2022-10-08 2022-12-23 广州英码信息科技有限公司 End-to-end pedestrian structural information and dependency relationship detection method thereof

Also Published As

Publication number Publication date
CN114332919B (en) 2024-10-29

Similar Documents

Publication Publication Date Title
CN112308092B (en) Light-weight license plate detection and identification method based on multi-scale attention mechanism
CN114332919A (en) A pedestrian detection method, device and terminal device based on multi-spatial relationship perception
CN113095152B (en) Regression-based lane line detection method and system
CN116665176B (en) A multi-task network road target detection method for autonomous vehicle driving
CN107358576A (en) Depth map super resolution ratio reconstruction method based on convolutional neural networks
CN110298387A (en) Incorporate the deep neural network object detection method of Pixel-level attention mechanism
CN110348383B (en) Road center line and double line extraction method based on convolutional neural network regression
CN107808376B (en) A Deep Learning-Based Hand Raised Detection Method
CN115205264A (en) A high-resolution remote sensing ship detection method based on improved YOLOv4
CN114529982B (en) Lightweight human body posture estimation method and system based on streaming attention
CN111599007B (en) Smart city CIM road mapping method based on unmanned aerial vehicle aerial photography
CN112862690A (en) Transformers-based low-resolution image super-resolution method and system
CN105488777A (en) System and method for generating panoramic picture in real time based on moving foreground
CN106372630A (en) Face direction detection method based on deep learning
CN103077538B (en) Adaptive tracking method of biomimetic-pattern recognized targets
CN116935486A (en) Sign language identification method and system based on skeleton node and image mode fusion
CN116612427A (en) Intensive pedestrian detection system based on improved lightweight YOLOv7
CN116363526A (en) MROCNet model construction and multi-source remote sensing image change detection method and system
CN107944437A (en) A kind of Face detection method based on neutral net and integral image
CN115909488A (en) An occluded person re-identification method based on pose guidance and dynamic feature extraction
CN109272450B (en) Image super-resolution method based on convolutional neural network
CN114399728A (en) A crowd counting method in foggy scene
CN117612029B (en) A remote sensing image target detection method based on progressive feature smoothing and scale-adaptive dilated convolution
Li et al. Monocular 3-D Object Detection Based on Depth-Guided Local Convolution for Smart Payment in D2D Systems
CN118430011A (en) Robust 2D human pose estimation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant