CN111079643B

CN111079643B - Face detection method and device based on neural network and electronic equipment

Info

Publication number: CN111079643B
Application number: CN201911285280.4A
Authority: CN
Inventors: 边旭
Original assignee: Sany Heavy Industry Co Ltd
Current assignee: Sany Heavy Industry Co Ltd
Priority date: 2019-12-13
Filing date: 2019-12-13
Publication date: 2023-04-07
Anticipated expiration: 2039-12-13
Also published as: CN111079643A

Abstract

The embodiment of the invention provides a face detection method, a face detection device and electronic equipment based on a neural network, which relate to the field of face recognition, and the method comprises the following steps: determining a face image to be detected, and determining at least one first input feature map based on the face image, wherein m sub-regions are preset in each first input feature map; for each subarea of each input characteristic diagram, performing characteristic extraction according to a preset small convolution kernel to obtain a first output characteristic diagram; for each input characteristic diagram, accumulating the first output characteristic diagrams corresponding to the input characteristic diagrams to obtain a second output characteristic diagram; and performing face recognition based on a second output feature map of the at least one input feature map. The method can have better GPU acceleration experience under the condition of not increasing the precision loss, improves the detection efficiency, and solves the problems that in the prior art, the calculation amount is increased and the detection efficiency is reduced due to the increase of the data amount.

Description

Face detection method, device and electronic equipment based on neural network

技术领域technical field

本发明涉及人脸检测领域，具体而言，涉及一种基于神经网络的人脸检测方法、装置和电子设备。The present invention relates to the field of face detection, in particular to a neural network-based face detection method, device and electronic equipment.

背景技术Background technique

人脸检测算法是实现人脸识别比对验证系统的关键部分。随着深度学习技术的发展，现阶段人脸检测算法大规模使用卷积神经网络实现。The face detection algorithm is a key part of the face recognition comparison verification system. With the development of deep learning technology, the current face detection algorithm is implemented on a large scale using convolutional neural networks.

随着场景的多样性与复杂性增加，训练数据集的数量增加，卷积神经网络的规模变得越来越大，需要更多的计算量与访存量；这就不可避免的导致了检测速度的下降，即使有图形处理器(GPU)加速也很难达到实时检测的要求。As the diversity and complexity of scenes increase, the number of training data sets increases, and the scale of convolutional neural networks becomes larger and larger, requiring more calculations and memory access; this inevitably leads to a decrease in detection speed. Even with graphics processing unit (GPU) acceleration, it is difficult to meet the requirements of real-time detection.

发明内容Contents of the invention

有鉴于此，本发明的目的在于提供一种基于神经网络的人脸检测方法、装置和电子设备。In view of this, the object of the present invention is to provide a neural network-based face detection method, device and electronic equipment.

为了实现上述目的，本发明实施例采用的技术方案如下：In order to achieve the above object, the technical solution adopted in the embodiment of the present invention is as follows:

第一方面，本发明实施例提供一种基于神经网络的人脸检测方法，包括：In the first aspect, the embodiment of the present invention provides a neural network-based face detection method, including:

确定待检测的人脸图像，并基于所述人脸图像确定至少一个第一输入特征图，每个所述第一输入特征图预设有m个子区域；Determining the face image to be detected, and determining at least one first input feature map based on the face image, each of the first input feature maps is preset with m sub-regions;

对于每个所述输入特征图的每个所述子区域，根据预设的小卷积核进行特征提取，得到第一输出特征图；For each of the sub-regions of each of the input feature maps, feature extraction is performed according to a preset small convolution kernel to obtain a first output feature map;

对于每个所述输入特征图，将所述输入特征图对应的第一输出特征图进行累加，得到第二输出特征图；For each of the input feature maps, accumulating the first output feature maps corresponding to the input feature maps to obtain a second output feature map;

基于至少一个所述输入特征图的第二输出特征图进行人脸识别。Face recognition is performed based on a second output feature map of at least one of the input feature maps.

在可选的实施方式中，所述预设的小卷积核为3*3小卷积核，所述m为9；将所述输入特征图对应的第一输出特征图进行累加，得到第二输出特征图的步骤，包括：In an optional implementation manner, the preset small convolution kernel is a 3*3 small convolution kernel, and the m is 9; the first output feature map corresponding to the input feature map is accumulated to obtain the first Two steps of outputting the feature map, including:

将所述输入特征图对应的9个第一输出特征图逐像素进行累加，得到与7*7的卷积核的输出结果相同的第二输出特征图。The nine first output feature maps corresponding to the input feature map are accumulated pixel by pixel to obtain a second output feature map that is the same as the output result of the 7*7 convolution kernel.

在可选的实施方式中，所述小卷积核为9个，每个所述小卷积核对应9个子区域中一个的起始坐标和终止坐标。In an optional implementation manner, there are nine small convolution kernels, and each of the small convolution kernels corresponds to a start coordinate and an end coordinate of one of the nine sub-regions.

在可选的实施方式中，将所述输入特征图对应的第一输出特征图进行累加，得到第二输出特征图的步骤，包括：In an optional implementation manner, the step of accumulating the first output feature map corresponding to the input feature map to obtain the second output feature map includes:

在Eltwise层，将所述输入特征图对应的9个第一输出特征图逐像素进行累加，得到与7*7的卷积核的输出结果相同的第二输出特征图。In the Eltwise layer, the nine first output feature maps corresponding to the input feature map are accumulated pixel by pixel to obtain a second output feature map that is the same as the output result of the 7*7 convolution kernel.

在可选的实施方式中，基于至少一个所述输入特征图的第二输出特征图进行人脸识别的步骤，包括：In an optional implementation manner, the step of performing face recognition based on at least one second output feature map of the input feature map includes:

在GPU加速引擎上对至少一个所述输入特征图的第二输出特征图进行固定尺寸最近邻扩展上采样，得到至少一个第三输入特征图；Perform fixed-size nearest neighbor expansion upsampling on the second output feature map of at least one of the input feature maps on the GPU acceleration engine to obtain at least one third input feature map;

基于至少一个所述第三输入特征图进行人脸识别。Face recognition is performed based on at least one of the third input feature maps.

第二方面，本发明实施例提供一种基于神经网络的人脸检测装置，包括：In a second aspect, an embodiment of the present invention provides a neural network-based face detection device, including:

确定模块，用于确定待检测的人脸图像，并基于所述人脸图像确定至少一个第一输入特征图，每个所述第一输入特征图预设有m个子区域；A determining module, configured to determine a face image to be detected, and determine at least one first input feature map based on the face image, each of the first input feature maps is preset with m sub-regions;

提取模块，用于对于每个所述输入特征图的每个所述子区域，根据预设的小卷积核进行特征提取，得到第一输出特征图；An extraction module, configured to perform feature extraction according to a preset small convolution kernel for each of the sub-regions of each of the input feature maps to obtain a first output feature map;

累加模块，用于对于每个所述输入特征图，将所述输入特征图对应的第一输出特征图进行累加，得到第二输出特征图；An accumulation module, for each of the input feature maps, accumulating the first output feature maps corresponding to the input feature maps to obtain a second output feature map;

检测模块，用于基于至少一个所述输入特征图的第二输出特征图进行人脸识别。A detection module, configured to perform face recognition based on a second output feature map of at least one of the input feature maps.

在可选的实施方式中，所述预设的小卷积核为3*3小卷积核，所述m为9；所述累加模块，用于将所述输入特征图对应的9个第一输出特征图逐像素进行累加，得到与7*7的卷积核的输出结果相同的第二输出特征图。In an optional implementation manner, the preset small convolution kernel is a 3*3 small convolution kernel, and the m is 9; An output feature map is accumulated pixel by pixel to obtain a second output feature map that is the same as the output result of the 7*7 convolution kernel.

第三方面，本发明实施例提供一种电子设备，包括处理器和存储器，所述存储器存储有能够被所述处理器执行的机器可执行指令，所述处理器可执行所述机器可执行指令以实现前述实施方式任一所述的方法。In a third aspect, an embodiment of the present invention provides an electronic device, including a processor and a memory, the memory stores machine-executable instructions that can be executed by the processor, and the processor can execute the machine-executable instructions In order to realize the method described in any one of the foregoing embodiments.

第四方面，本发明实施例提供一种计算机可读存储介质，其上存储有计算机程序，所述计算机程序被处理器执行时实现如前述实施方式中任一项所述的方法。In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method described in any one of the foregoing implementation manners is implemented.

本发明实施例提供的基于神经网络的人脸检测方法、装置、电子设备和计算机可读存储介质，其中，该方法包括：确定待检测的人脸图像，并基于所述人脸图像确定至少一个第一输入特征图，每个所述第一输入特征图预设有m个子区域；对于每个所述输入特征图的每个所述子区域，根据预设的小卷积核进行特征提取，得到第一输出特征图；对于每个所述输入特征图，将所述输入特征图对应的第一输出特征图进行累加，得到第二输出特征图；基于至少一个所述输入特征图的第二输出特征图进行人脸识别。因此，本发明实施例提供的技术方案，通过对原有的检测网络进行改进，利用预设小卷积核对输入特征图的子区域进行特征提取，然后累加，最后根据累加的输出结果进行人脸识别，使其在不增加精度损失的情况下能有更好的GPU加速体验，提高了检测效率，缓解了现有技术中由于数据量增加导致计算量增大，检测效率下降的问题。The neural network-based face detection method, device, electronic device, and computer-readable storage medium provided by the embodiments of the present invention, wherein the method includes: determining a face image to be detected, and determining at least one face image based on the face image The first input feature map, each of the first input feature maps is preset with m sub-regions; for each of the sub-regions of each of the input feature maps, feature extraction is performed according to a preset small convolution kernel, Obtain a first output feature map; for each of the input feature maps, accumulate the first output feature maps corresponding to the input feature maps to obtain a second output feature map; based on at least one of the input feature maps the second Output feature maps for face recognition. Therefore, in the technical solution provided by the embodiment of the present invention, by improving the original detection network, the preset small convolution kernel is used to extract the features of the sub-regions of the input feature map, and then accumulate them, and finally perform face detection according to the accumulated output results. Recognition, so that it can have a better GPU acceleration experience without increasing the loss of precision, improve the detection efficiency, and alleviate the problems in the prior art that the calculation amount increases and the detection efficiency decreases due to the increase in the amount of data.

为使本发明的上述目的、特征和优点能更明显易懂，下文特举较佳实施例，并配合所附附图，作详细说明如下。In order to make the above-mentioned objects, features and advantages of the present invention more comprehensible, preferred embodiments will be described in detail below together with the accompanying drawings.

附图说明Description of drawings

为了更清楚地说明本发明实施例的技术方案，下面将对实施例中所需要使用的附图作简单地介绍，应当理解，以下附图仅示出了本发明的某些实施例，因此不应被看作是对范围的限定，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他相关的附图。In order to illustrate the technical solutions of the embodiments of the present invention more clearly, the accompanying drawings used in the embodiments will be briefly introduced below. It should be understood that the following drawings only show some embodiments of the present invention, and thus It should be regarded as a limitation on the scope, and those skilled in the art can also obtain other related drawings based on these drawings without creative work.

图1示出了本发明实施例提供的一种基于神经网络的人脸检测方法的流程图；Fig. 1 shows a flow chart of a neural network-based face detection method provided by an embodiment of the present invention;

图2示出了图1中步骤S106的执行场景图；Fig. 2 shows the execution scenario diagram of step S106 in Fig. 1;

图3示出了图1中步骤S108的具体流程图；Fig. 3 shows the specific flowchart of step S108 in Fig. 1;

图4示出了本发明实施例提供的一种基于神经网络的人脸检测装置的示意图；Fig. 4 shows a schematic diagram of a neural network-based face detection device provided by an embodiment of the present invention;

图5示出了本发明实施例提供的一种电子设备的示意图。Fig. 5 shows a schematic diagram of an electronic device provided by an embodiment of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。通常在此处附图中描述和示出的本发明实施例的组件可以以各种不同的配置来布置和设计。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. The components of the embodiments of the invention generally described and illustrated in the figures herein may be arranged and designed in a variety of different configurations.

因此，以下对在附图中提供的本发明的实施例的详细描述并非旨在限制要求保护的本发明的范围，而是仅仅表示本发明的选定实施例。基于本发明的实施例，本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例，都属于本发明保护的范围。Accordingly, the following detailed description of the embodiments of the invention provided in the accompanying drawings is not intended to limit the scope of the claimed invention, but merely represents selected embodiments of the invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without making creative efforts belong to the protection scope of the present invention.

需要说明的是，术语“第一”和“第二”等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来，而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that relative terms such as the terms "first" and "second" are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any relationship between these entities or operations. There is no such actual relationship or order between them. Furthermore, the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus comprising a set of elements includes not only those elements, but also includes elements not expressly listed. other elements of or also include elements inherent in such a process, method, article, or device. Without further limitations, an element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the process, method, article or apparatus comprising said element.

实施例1Example 1

请参阅图1，本发明实施例提供了一种基于神经网络的人脸检测方法，该方法包括：Please refer to Fig. 1, the embodiment of the present invention provides a kind of face detection method based on neural network, this method comprises:

步骤S102，确定待检测的人脸图像，并基于人脸图像确定至少一个第一输入特征图，每个第一输入特征图预设有m个子区域；Step S102, determine the face image to be detected, and determine at least one first input feature map based on the face image, each first input feature map is preset with m sub-regions;

步骤S104，对于每个输入特征图的每个子区域，根据预设的小卷积核进行特征提取，得到第一输出特征图；Step S104, for each sub-region of each input feature map, perform feature extraction according to a preset small convolution kernel to obtain a first output feature map;

步骤S106，对于每个输入特征图，将输入特征图对应的第一输出特征图进行累加，得到第二输出特征图；Step S106, for each input feature map, accumulating the first output feature map corresponding to the input feature map to obtain a second output feature map;

步骤S108，基于至少一个输入特征图的第二输出特征图进行人脸识别。Step S108, performing face recognition based on the second output feature map of the at least one input feature map.

本发明实施例提供的基于神经网络的人脸检测方法，通过确定待检测的人脸图像，并基于所述人脸图像确定至少一个第一输入特征图，每个所述第一输入特征图预设有m个子区域；然后对于每个所述输入特征图的每个所述子区域，根据预设的小卷积核进行特征提取，得到第一输出特征图；接着对于每个所述输入特征图，将所述输入特征图对应的第一输出特征图进行累加，得到第二输出特征图；最后基于至少一个所述输入特征图的第二输出特征图进行人脸识别。该方法能够在不增加精度损失的情况下能有更好的GPU加速体验，提高了检测效率，缓解了现有技术中由于数据量增加导致计算量增大，检测效率下降的问题。In the neural network-based face detection method provided by the embodiment of the present invention, by determining the face image to be detected, and determining at least one first input feature map based on the face image, each of the first input feature maps is pre-determined There are m sub-regions; then for each of the sub-regions of each of the input feature maps, feature extraction is performed according to a preset small convolution kernel to obtain a first output feature map; then for each of the input features , accumulating the first output feature maps corresponding to the input feature maps to obtain a second output feature map; finally performing face recognition based on the second output feature map of at least one of the input feature maps. This method can have a better GPU acceleration experience without increasing the loss of precision, improves the detection efficiency, and alleviates the problems in the prior art that the calculation amount increases and the detection efficiency decreases due to the increase in the amount of data.

在一种可能的实施方式中，预设的小卷积核为3*3小卷积核，所述m为9；In a possible implementation manner, the preset small convolution kernel is a 3*3 small convolution kernel, and the m is 9;

对于步骤S106，将所输入特征图对应的第一输出特征图进行累加，得到第二输出特征图，包括：For step S106, the first output feature map corresponding to the input feature map is accumulated to obtain the second output feature map, including:

1、将所述输入特征图对应的9个第一输出特征图逐像素进行累加，得到与7*7的卷积核的输出结果相同的第二输出特征图。1. Accumulate the nine first output feature maps corresponding to the input feature map pixel by pixel to obtain a second output feature map that is the same as the output result of the 7*7 convolution kernel.

在一种可能的实施方式中，小卷积核为9个，每个所述小卷积核对应9个子区域中一个的起始坐标和终止坐标。In a possible implementation manner, there are nine small convolution kernels, and each of the small convolution kernels corresponds to a start coordinate and an end coordinate of one of the nine sub-regions.

在一种可能的实施方式中，步骤S106，将所输入特征图对应的第一输出特征图进行累加，得到第二输出特征图，可以通过以下步骤执行：In a possible implementation manner, in step S106, the first output feature map corresponding to the input feature map is accumulated to obtain the second output feature map, which can be performed by the following steps:

(1)在Eltwise层，将输入特征图对应的9个第一输出特征图逐像素进行累加，得到与7*7的卷积核的输出结果相同的第二输出特征图。(1) In the Eltwise layer, the nine first output feature maps corresponding to the input feature map are accumulated pixel by pixel to obtain the second output feature map that is the same as the output result of the 7*7 convolution kernel.

为了便于理解，下面结合图2对步骤S106的执行场景进行简要说明：For ease of understanding, the execution scenario of step S106 is briefly described below in conjunction with FIG. 2:

在图2中，NewConv(新卷积核)为3x3小卷积核，是一种区域定义卷积新算子：小卷积核作用在输入feature map(特征图)的子区域上，相当于增加Conv(Convolution缩写，卷积)的起始坐标与终止坐标，如首先将7x7的卷积核(Kernel Convolution)扩展(拆分)为9个3x3的小卷积核，每个小卷积核的作用子区域均不同，然后再通过Eltwise层将9个小卷积核的输出feature map(第一输出特征图)逐pixel(像素)累加起来，达到与7x7的卷积核同样的输出结果，即第二输出特征图相同。In Figure 2, NewConv (new convolution kernel) is a 3x3 small convolution kernel, which is a new area-defined convolution operator: the small convolution kernel acts on the sub-area of the input feature map (feature map), which is equivalent to Increase the start coordinates and end coordinates of Conv (Convolution abbreviation, convolution), such as first expanding (split) the 7x7 convolution kernel (Kernel Convolution) into nine 3x3 small convolution kernels, each small convolution kernel The sub-areas of the function are different, and then the output feature map (first output feature map) of the 9 small convolution kernels is accumulated pixel by pixel (pixel) through the Eltwise layer to achieve the same output result as the 7x7 convolution kernel. That is, the second output feature map is the same.

在可能的实施方式中，参照图3，对于步骤S108，基于至少一个所述输入特征图的第二输出特征图进行人脸识别，包括以下步骤：In a possible implementation manner, referring to FIG. 3, for step S108, performing face recognition based on the second output feature map of at least one input feature map includes the following steps:

步骤S302，在GPU加速引擎上对至少一个所述输入特征图的第二输出特征图进行固定尺寸最近邻扩展上采样，得到至少一个第三输入特征图；Step S302, performing fixed-size nearest neighbor extended upsampling on the second output feature map of at least one input feature map on the GPU acceleration engine to obtain at least one third input feature map;

步骤S304，基于至少一个所述第三输入特征图进行人脸识别。Step S304, performing face recognition based on at least one of the third input feature maps.

传统的特征图不同尺寸双线性插值上采样在GPU加速引擎中需要重新实现，并且计算量较高。The traditional bilinear interpolation upsampling of different sizes of feature maps needs to be re-implemented in the GPU acceleration engine, and the calculation load is high.

为了解决上述问题，本发明实施例中，该方法将采用固定尺寸2倍最近邻扩展上采样的方法：由于需要做尺寸扩展的深层特征图近似于目标浅层特征图的一半大小，所以仅需要将深层特征图尺度扩展一倍，对于新的特征图中的每一个pixel(像素)都可以通过如下公式映射回原特征图的最近邻pixel(像素)取值：In order to solve the above problems, in the embodiment of the present invention, the method will adopt the fixed-size 2 times nearest neighbor expansion upsampling method: since the deep feature map that needs to be expanded in size is approximately half the size of the target shallow feature map, it only needs Double the scale of the deep feature map, and each pixel (pixel) in the new feature map can be mapped back to the nearest neighbor pixel (pixel) value of the original feature map by the following formula:

NewPixel(x,y)＝SrcPixel(int(x/2),int(y/2))；NewPixel(x,y)=SrcPixel(int(x/2),int(y/2));

其中，x,y表示坐标。由于该方法采用的最邻近扩展上采样仅涉及到GPU内存的读写操作与非常简单的坐标计算，与传统的双线性插值上采样产生新特征图的pixel相比，该最邻近扩展上采样不需要对原像素值的乘加运算操作，在GPU加速引擎上取得了较好的加速效果，并且检测精度较高。Among them, x and y represent the coordinates. Since the nearest neighbor extended upsampling adopted by this method only involves GPU memory read and write operations and very simple coordinate calculations, compared with traditional bilinear interpolation upsampling to generate new feature map pixels, the nearest neighbor extended upsampling There is no need for multiplication and addition operations on the original pixel values, and a better acceleration effect has been achieved on the GPU acceleration engine, and the detection accuracy is higher.

本发明实施例提供的基于神经网络的人脸检测，结合GPU加速引擎的运行特点，发掘了神经网络结构中运算较慢加速不明显的操作，通过提出区域选择卷积算子，将大卷积核通过拆分成多个小卷积核并Elitwise叠加，能够在更好的GPU加速效果下达到与大卷积核的运行精度；同时，该方法根据人脸检测特征图的尺度关系，通过使用更简单的固定尺寸最近邻扩展上采样，对于GPU加速有更好的适配。此外，该方法在得到更好的加速效果的同时，实际场景中的人脸检测效果较好，但是该方法中的GPU利用率更高，可以支持更多路数的视频数据部署。The face detection based on the neural network provided by the embodiment of the present invention, combined with the operating characteristics of the GPU acceleration engine, explores the operations in the neural network structure that are relatively slow and the acceleration is not obvious. By proposing a region selection convolution operator, the large convolution By splitting the kernel into multiple small convolution kernels and superimposing them with Elitwise, the running accuracy of the large convolution kernel can be achieved under better GPU acceleration; at the same time, according to the scale relationship of the face detection feature map, the method uses Simpler fixed-size nearest-neighbor extended upsampling is better suited for GPU acceleration. In addition, this method has a better face detection effect in actual scenes while obtaining better acceleration effects, but the GPU utilization rate in this method is higher, and it can support more channels of video data deployment.

实施例2Example 2

基于同一发明构思，本申请实施例中还提供了与基于神经网络的人脸检测方法对应的基于神经网络的人脸检测装置，由于本申请实施例中的装置解决问题的原理与本申请实施例上述基于神经网络的人脸检测方法相似，因此装置的实施可以参见方法的实施，重复之处不再赘述。Based on the same inventive concept, the embodiment of the present application also provides a neural network-based face detection device corresponding to the neural network-based face detection method. Since the problem-solving principle of the device in the embodiment of the present application is the same as that of the embodiment of the present application The above neural network-based face detection method is similar, so the implementation of the device can refer to the implementation of the method, and the repetition will not be repeated.

图4为本申请实施例提供的基于神经网络的人脸检测装置的示意图。FIG. 4 is a schematic diagram of a neural network-based face detection device provided by an embodiment of the present application.

参照图4，该装置包括：确定模块401、提取模块402、累加模块403以及检测模块404；Referring to FIG. 4, the device includes: a determination module 401, an extraction module 402, an accumulation module 403, and a detection module 404;

其中，确定模块401，用于确定待检测的人脸图像，并基于所述人脸图像确定至少一个第一输入特征图，每个所述第一输入特征图预设有m个子区域；Wherein, the determining module 401 is configured to determine the face image to be detected, and determine at least one first input feature map based on the face image, and each of the first input feature maps is preset with m sub-regions;

提取模块402，用于对于每个所述输入特征图的每个所述子区域，根据预设的小卷积核进行特征提取，得到第一输出特征图；An extraction module 402, configured to perform feature extraction according to a preset small convolution kernel for each of the sub-regions of each of the input feature maps to obtain a first output feature map;

累加模块403，用于对于每个所述输入特征图，将所述输入特征图对应的第一输出特征图进行累加，得到第二输出特征图；An accumulation module 403, configured to, for each of the input feature maps, accumulate the first output feature maps corresponding to the input feature maps to obtain a second output feature map;

检测模块404，用于基于至少一个所述输入特征图的第二输出特征图进行人脸识别。A detection module 404, configured to perform face recognition based on a second output feature map of at least one of the input feature maps.

一种可选实施方式中，所述预设的小卷积核为3*3小卷积核，所述m为9；所述累加模块403，用于将所述输入特征图对应的9个第一输出特征图逐像素进行累加，得到与7*7的卷积核的输出结果相同的第二输出特征图。In an optional implementation manner, the preset small convolution kernel is a 3*3 small convolution kernel, and the m is 9; the accumulation module 403 is used to combine the 9 corresponding input feature maps The first output feature map is accumulated pixel by pixel to obtain the second output feature map that is the same as the output result of the 7*7 convolution kernel.

一种可选实施方式中，所述小卷积核为9个，每个所述小卷积核对应9个子区域中一个的起始坐标和终止坐标。In an optional implementation manner, there are nine small convolution kernels, and each of the small convolution kernels corresponds to a start coordinate and an end coordinate of one of the nine sub-regions.

一种可选实施方式中，累加模块403，用于在Eltwise层，将所述输入特征图对应的9个第一输出特征图逐像素进行累加，得到与7*7的卷积核的输出结果相同的第二输出特征图。In an optional implementation manner, the accumulation module 403 is used to accumulate the 9 first output feature maps corresponding to the input feature map pixel by pixel at the Eltwise layer to obtain the output result of the convolution kernel of 7*7 The same second output feature map.

一种可选实施方式中，检测模块404，用于在GPU加速引擎上对至少一个所述输入特征图的第二输出特征图进行固定尺寸最近邻扩展上采样，得到至少一个第三输入特征图；基于至少一个所述第三输入特征图进行人脸识别。In an optional implementation manner, the detection module 404 is configured to perform fixed-size nearest neighbor extended upsampling on the second output feature map of at least one input feature map on the GPU acceleration engine to obtain at least one third input feature map ; Perform face recognition based on at least one of the third input feature maps.

本申请实施例提供的基于神经网络的人脸检测装置，与上述实施例提供的基于神经网络的人脸检测方法具有相同的技术特征，所以也能解决相同的技术问题，达到相同的技术效果。The neural network-based face detection device provided in the embodiment of the present application has the same technical features as the neural network-based face detection method provided in the above embodiments, so it can also solve the same technical problems and achieve the same technical effect.

参见图5，本发明实施例还提供一种电子设备100，包括：Referring to FIG. 5, an embodiment of the present invention also provides an electronic device 100, including:

处理器41、存储器42、和总线43；存储器42用于存储执行指令，包括内存421和外部存储器422；这里的内存421也称内存储器，用于暂时存放处理器41中的运算数据，以及与硬盘等外部存储器422交换的数据，处理器41通过内存421与外部存储器422进行数据交换，当所述电子设备100运行时，所述处理器41与所述存储器42之间通过总线43通信，使得所述处理器41在用户态执行以下指令：Processor 41, memory 42, and bus 43; Memory 42 is used for storing execution order, comprises memory 421 and external memory 422; Memory 421 here is also called internal memory, is used for temporarily storing the operation data in processor 41, and with The data exchanged by the external memory 422 such as hard disk, the processor 41 exchanges data with the external memory 422 through the memory 421, when the electronic device 100 is running, the processor 41 communicates with the memory 42 through the bus 43, so that The processor 41 executes the following instructions in the user mode:

确定待检测的人脸图像，并基于所述人脸图像确定至少一个第一输入特征图，每个所述第一输入特征图预设有m个子区域；对于每个所述输入特征图的每个所述子区域，根据预设的小卷积核进行特征提取，得到第一输出特征图；对于每个所述输入特征图，将所述输入特征图对应的第一输出特征图进行累加，得到第二输出特征图；基于至少一个所述输入特征图的第二输出特征图进行人脸识别。Determine the face image to be detected, and determine at least one first input feature map based on the face image, each of the first input feature maps is preset with m sub-regions; for each of each of the input feature maps For each of the sub-regions, feature extraction is performed according to a preset small convolution kernel to obtain a first output feature map; for each of the input feature maps, the first output feature map corresponding to the input feature map is accumulated, Obtaining a second output feature map; performing face recognition based on the second output feature map of at least one input feature map.

可选地，所述预设的小卷积核为3*3小卷积核，所述m为9；将所述输入特征图对应的第一输出特征图进行累加，得到第二输出特征图的步骤，包括：将所述输入特征图对应的9个第一输出特征图逐像素进行累加，得到与7*7的卷积核的输出结果相同的第二输出特征图。Optionally, the preset small convolution kernel is a 3*3 small convolution kernel, and the m is 9; the first output feature map corresponding to the input feature map is accumulated to obtain a second output feature map The step includes: accumulating the 9 first output feature maps corresponding to the input feature map pixel by pixel to obtain a second output feature map that is the same as the output result of the 7*7 convolution kernel.

可选地，处理器41执行的指令中，所述小卷积核为9个，每个所述小卷积核对应9个子区域中一个的起始坐标和终止坐标。Optionally, in the instructions executed by the processor 41, there are nine small convolution kernels, and each of the small convolution kernels corresponds to the start coordinates and end coordinates of one of the nine sub-regions.

可选地，处理器41执行的指令中，将所述输入特征图对应的第一输出特征图进行累加，得到第二输出特征图的步骤，包括：在Eltwise层，将所述输入特征图对应的9个第一输出特征图逐像素进行累加，得到与7*7的卷积核的输出结果相同的第二输出特征图。Optionally, in the instructions executed by the processor 41, the step of accumulating the first output feature map corresponding to the input feature map to obtain the second output feature map includes: at the Eltwise layer, corresponding to the input feature map The nine first output feature maps of are accumulated pixel by pixel to obtain the second output feature map that is the same as the output result of the 7*7 convolution kernel.

可选地，处理器41执行的指令中，基于至少一个所述输入特征图的第二输出特征图进行人脸识别的步骤，包括：在GPU加速引擎上对至少一个所述输入特征图的第二输出特征图进行固定尺寸最近邻扩展上采样，得到至少一个第三输入特征图；基于至少一个所述第三输入特征图进行人脸识别。Optionally, in the instructions executed by the processor 41, the step of performing face recognition based on the second output feature map of at least one of the input feature maps includes: performing the second output feature map of at least one of the input feature maps on the GPU acceleration engine. The second output feature map is subjected to fixed-size nearest neighbor expansion up-sampling to obtain at least one third input feature map; face recognition is performed based on at least one third input feature map.

本发明实施例还提供一种计算机可读存储介质，计算机可读存储介质上存储有计算机程序，计算机程序被处理器运行时执行上述实施例提供的方法的步骤。An embodiment of the present invention also provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is run by a processor, the steps of the method provided by the above-mentioned embodiments are executed.

在本申请所提供的几个实施例中，应该理解到，所揭露的装置和方法，也可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的，例如，附图中的流程图和结构图显示了根据本发明的多个实施例的装置、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上，流程图或框图中的每个方框可以代表一个模块、程序段或代码的一部分，所述模块、程序段或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意，在作为替换的实现方式中，方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如，两个连续的方框实际上可以基本并行地执行，它们有时也可以按相反的顺序执行，这依所涉及的功能而定。也要注意的是，结构图和/或流程图中的每个方框、以及结构图和/或流程图中的方框的组合，可以用执行规定的功能或动作的专用的基于硬件的系统来实现，或者可以用专用硬件与计算机指令的组合来实现。In the several embodiments provided in this application, it should be understood that the disclosed devices and methods may also be implemented in other ways. The device embodiments described above are only illustrative. For example, the flowcharts and structural diagrams in the accompanying drawings show the possible implementation architecture and functions of devices, methods and computer program products according to multiple embodiments of the present invention. and operation. In this regard, each block in a flowchart or block diagram may represent a module, program segment, or part of code that includes one or more Executable instructions. It should also be noted that, in alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It is also to be noted that each block of the block diagrams and/or flow diagrams, and combinations of blocks in the block diagrams and/or flow diagrams, can be implemented by a dedicated hardware-based system that performs the specified function or action may be implemented, or may be implemented by a combination of special purpose hardware and computer instructions.

另外，在本发明各个实施例中的各功能模块或单元可以集成在一起形成一个独立的部分，也可以是各个模块单独存在，也可以两个或更多个模块集成形成一个独立的部分。In addition, each functional module or unit in each embodiment of the present invention can be integrated together to form an independent part, or each module can exist independently, or two or more modules can be integrated to form an independent part.

所述功能如果以软件功能模块的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是智能手机、个人计算机、服务器、或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。If the functions are implemented in the form of software function modules and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the essence of the technical solution of the present invention or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a smart phone, a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method described in each embodiment of the present invention. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disc, etc., which can store program codes. .

以上所述，仅为本发明的具体实施方式，但本发明的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本发明揭露的技术范围内，可轻易想到变化或替换，都应涵盖在本发明的保护范围之内。The above is only a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Anyone skilled in the art can easily think of changes or substitutions within the technical scope disclosed in the present invention. Should be covered within the protection scope of the present invention.

Claims

1. A face detection method based on a neural network is characterized by comprising the following steps:

determining a face image to be detected, and determining at least one input feature map based on the face image, wherein each input feature map is preset with m sub-regions;

for each sub-region of each input feature map, performing feature extraction according to a preset small convolution kernel to obtain a first output feature map; wherein each of the small convolution kernels acts on a different one of the sub-regions of the input feature map, each of the small convolution kernels corresponding to a start coordinate and an end coordinate of the corresponding sub-region;

for each input characteristic diagram, performing pixel-by-pixel accumulation on first output characteristic diagrams corresponding to m sub-regions of the input characteristic diagram to obtain a second output characteristic diagram;

and performing face recognition based on a second output feature map of at least one input feature map.

2. The method according to claim 1, wherein the preset small convolution kernel is 3x3 small convolution kernel, and m is 9; accumulating the first output characteristic diagram corresponding to the input characteristic diagram to obtain a second output characteristic diagram, wherein the step comprises the following steps of:

and accumulating the 9 first output feature maps corresponding to the input feature map pixel by pixel to obtain a second output feature map which is the same as the output result of the 7-by-7 convolution kernel.

3. The method of claim 2, wherein the step of accumulating the first output feature map corresponding to the input feature map to obtain a second output feature map comprises:

and accumulating the 9 first output characteristic graphs corresponding to the input characteristic graphs pixel by pixel in an Eltwise layer to obtain a second output characteristic graph which is the same as the output result of the convolution kernel of 7 by 7.

4. The method of claim 1, wherein the step of performing face recognition based on a second output feature map of the at least one input feature map comprises:

performing fixed-size nearest neighbor extension upsampling on a second output feature map of at least one input feature map on a GPU acceleration engine to obtain at least one third input feature map;

and performing face recognition based on at least one third input feature map.

5. A face detection device based on a neural network is characterized by comprising:

the determining module is used for determining a face image to be detected and determining at least one input feature map based on the face image, wherein m sub-regions are preset in each input feature map;

the extraction module is used for extracting the characteristics of each subarea of each input characteristic diagram according to a preset small convolution kernel to obtain a first output characteristic diagram; wherein each of the small convolution kernels acts on a different one of the sub-regions of the input feature map, each of the small convolution kernels corresponding to a start coordinate and an end coordinate of the corresponding sub-region;

the accumulation module is used for accumulating the first output characteristic diagrams corresponding to the m sub-regions of the input characteristic diagram pixel by pixel to obtain a second output characteristic diagram;

and the detection module is used for carrying out face recognition based on a second output feature map of the at least one input feature map.

6. The apparatus of claim 5, wherein the preset small convolution kernel is 3x3 small convolution kernel, and m is 9; and the accumulation module is used for accumulating the 9 first output characteristic graphs corresponding to the input characteristic graphs pixel by pixel to obtain a second output characteristic graph which is the same as the output result of the 7-by-7 convolution kernel.

7. An electronic device comprising a processor and a memory, the memory storing machine executable instructions executable by the processor to implement the method of any one of claims 1-4.

8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-4.